logistic, storage, shipment, industry and manufacturing concept - cargo boxes storing at warehouse shelves

Marketing has come a long way in the last 20-odd years since the advent of the Worldwide Web. Prior to the web, marketers were considered just this side of used car salesmen, with breathless pitches in full carnival barker-style ads. While marketers might not have become the winners of the most trusted profession surveys, they are no longer universally reviled as dirty little schemers that are just trying to fool us into buying their crappy products.

Now, it’s not schemers but schemas that we are all talking about. The real sexy marketers these days are the data scientists, whose feature analysis drives machine learning, and whose algorithms unearth insights that make sense to humans, but can be discovered only by a machine.

For the uninitiated, a schema is the layout of the database–the model for which fields will be stored, and in what form. Schemas have always been the go-to technique for databases. Think about the classic customer database schema, where you store the first and last names, address, gender, age, and plenty of other information. The original goal behind customer schemas was to accurately fill in as many of the fields for as many customers as possible.

But modern marketing databases are becoming sparse–some of the most interesting fields are only available for a small number of customers. For example, if you sell scissors, most of your sales go for typical uses–every home boasts several scissors, and every child uses one in school–but you’d want to note in your customer database whether this customer is a crafter or a scrapbooker, because not only do they buy more scissors, they buy more expensive scissors.

But this kind of data was expensive to store in the old schema days, because you had to devote fields to crafter (yes or no) and scrapbooker (yes or no), even though hardly any folks have the field marked as yes. It was a colossal waste of space, which also meant that your database ran slower. Today, a new technique has emerged, as I discussed in a recent interview: the schemaless database.

The most common type of schemaless database is one you are looking at right now–the web page. All web pages are composed of HTML, which uses tag-value pairs to define the content on each page. HTML is not the usual form of schemaless databases, however. They started as XML databases, but now are often stored as JSON objects. Whatever the form, dumping the schema allows efficient support of sparse data structures–where you have a lot of data but there are many different tags (or fields).

What does this mean to marketers? More and more obscure data can be stored and retrieved about your customers without gumming up your database performance. Whatever you know can be stored, even if you weren’t aware you wanted to store it when you designed the database. You can always add another tag.

Are you using schemaless databases? If not, why not?