Monday, January 04, 2010

Are Marketing Data Expanding Faster than the Universe?

I'll start by making a statement that I think I can back up: the amount of data available to marketers is growing geometrically, not linearly. Linear growth of marketing data would mean something like the following: An individual starts a new company on January 1, 2010. He buys a list of 1000 names to market to. A year later, he has added 100 names to this list, for 10% growth. A year later, he buys 110 names to market to, for a further 10% growth, and so on, and so on.

Exponential growth is different. When things grow exponentially, they grow as a square of the original. For example, instead of a marketer having 100 names and then 110 names, he’d have something like 100^2 or 10,000 names after a year. This can’t keep up, because we’d simply run out people in the world after two or three years.

The issue is more complicated than this, though, because we’re really talking about not just the number of names growing exponentially, but what we know about these names growing, too. And, to make it even more complicated, we now know more about how these names interact with one another—the network—every year (the gift of Web 2.0). So to recap, we have three sources of information growth for marketers:

1. Increased size of the known universe of names, companies, etc.
2. Increase in what we know about these names
3. Increase in connections between these names, companies, etc.

As far as (2) goes, this is where most of the action is today, and that’s not because people are doing more things that are relevant today that they were 20 years ago, it’s because they’re doing them in a browser, or on a mobile device, or on a game console, and they’ve been cookied or their IP address has been matched in the back end or… you get the picture. And this digitization of behavior is only going to get more extreme, barring a zombie attack or a second Luddite revolution.

The question is, what is the doubling time for marketing (or social science) data today? For microprocessors, according to Moore’s law, speed doubles roughly every 18 months keeping price constant. This has held up fairly consistently over the past 20 years and in my mind is a great empirical proof that de Chardin’s theories on the Omega Point might be true. It would be a great academic study to look at the doubling time for marketing data to add to the table in this amazingly cool article.  If you have doubt of the amount of information available about people growing exponentially, take a look at a Facebook event stream.  Your life, time stamped.

There are two constraining factors here worth noting, though. The first is information (not data) capacity: A company cannot possibly afford to keep up with exponential doubling of marketing data, whether the "double life" is 18 months or 36 months. It’s not a question of storage cost, it’s a question of ability to deal with the data from a logical perspective. The marketing talent at a company, no matter how big, simply cannot deal with a doubling of information every 18 or 36 months. It's kind of like central planning-- the data has to be federated and put into a competitive marketplace to reach its full potential.  So, there has to be some kind of consortium to deal with this complexity, or intermediary vendors distilling the stuff into bite-sized chunks for industries, roles, etc.  And, I'd argue, this is exactly what we've seen happen over the past twenty years, starting with retail scanner data in the early 1980s.

The second constraining factor is the question of information ownership and “walls”. What do you think the amount of proprietary information—defined as that owned by a company and no one else—that made up “all you could know” about a customer? I’d guess in 1975 it was 75%. I wonder what it is now? 20%? And what is the final resting point? It’s lower than what it is now. The point is simple: what a company can know about a customer is more than ever sitting out in the public domain, but the challenge is, what do you do with it? This thought experiment, in my view, makes a strong argument for moving towards cloud computing when it comes to marketing applications.

I’m not sure there’s a conclusion here, but I do think it’s worth noting for all of marketers that we’re in the middle of our own Moore’s law moment, and we better keep thinking about how we capitalize on it.


Tim Furey said...

So here is a simple trade-off question: when is data insight more valuable than just raw contact volume?

We do indeed have exponential data growth from what is likely 10X the number of captured customer interactions each with 10X the depth of data. But while both storage and processing power might enable us to generate insightful behaviorial segmentation and targeting based on thse richer data sets, so too is it easy and cheaper just to blast contact everyone in the universe -- over and over!!!

The issue it seems to me is not processing all the data but rather IDing the most predictive data. Most research at MarketBridge is showing that traditional B2B segmentation variables such as SIC code and employee size are increasingly non-predicitve and show very low correlations to behavior.

Until a company finds the right data on which to focus, blast marekting might actually be more cost effective. LOL. The fact is, the right, highly predictive data IS out there if/when companies find it through either post facto analysis or better yet real a priori experimental design.

Adam Gierisch said...

I agree with Andy that data is growing at a much more rapid pace than anyone can reasonably keep up with. Tim is also correct, and the problem with all of this data is that it it decidedly NOT actionable information.

For this data to be actionable (and therefore of any serious value), it must be able to identify groups that will behave in predictable ways when faced with certain stimulus. Only then have you taken a whole mess of data and distilled it into something that will actually help you optimize your marketing efforts.

So we're back to Tim's original question: should we just fling the kitchen sink at the whole universe to see what sticks or should we run some algorithms on the data to try and isolate some distinct groups for more specialized treatment? In a perfect world it would be some combination of the two whereby the blasting is a data acquisition and hypothesis testing exercise. Of course, the consultants to run such an exercise will also jack up the cost :)

What I'd like to know is what are the meaningful data points? Which variables would have strong correlations to actual behavior? I think Tim's right that the traditional segments based on geography/size/industry etc. are increasingly outdated. So what's important now? I would posit that the age of the founder and their broad industry could be important. For instance, I would expect a 45-year old founder of a brand new flooring company to behave in a demonstrably different way from a 26-year old who recently started a social networking optimization firm.

What else?