The answer to the question in the title is of course almost certainly No. But that’s not what the data said. If you look at the records from 1789, when George Washington’s presidential term began, to 2009 when Barack Obama became the 44th president of the United States, the data would suggest otherwise.
If you examined the data in 2008, with a sample size of n=43, and a test on gender and ethnicity, the patterns in the data would show a correlation that would predict that all future presidents would be white males. We know of course that the prediction would have been wrong. Of course there are other factors at play, and I use this example only to call out my concern about the dangers of the hype around big data and analytics as a panacea for all (or many) ills.
Over the past nine years my company has recorded and analyzed millions of sales cycles through our Dealmaker Smart Sales Application. We have looked at the effectiveness of certain actions in the sales cycle and their impact on the outcome of the sale. There are certain signals that appear to be reliable indicators of the sales outcome (win or loss) but only when the context is considered.
For example: Budget is a key indicator when the level of organizational change for the buying organization is small, but far less so when, alongside the actual purchase of the solution, the buying organization has to undertake a material change management exercise. This is where the data can be misleading. You need the domain knowledge to know what questions to ask.
One of the problems we have observed is that many proponents of big data don’t take enough care to distinguish between correlation and causality, the former being an extrapolation of past data as an indicator of the future, and the latter being an analysis of the real reasons behind past outcomes as a guide towards a future, heuristically determined, prognosis. It is more important to understand why certain patterns emerge than it is to just uncover the patterns.
In most cases, the missing ingredients are domain knowledge, context and applied reasoning. Without domain knowledge, it is not possible to know what questions to ask of the data, and when you don’t have any context it is hard to know when to ask those questions. Without applied reasoning (emphasis on applied) you don’t know what to do with the patterns and uninformed algorithms will lead you astray.
As with any other smart system, effective big data/analytics happen only when there is a combination of knowledge, context, data and applied reasoning. All four components are essential. Big questions are more valuable than big data.
Two Examples: One Bad, One Good.
Let me give you two examples from Google, one good, one bad, to highlight the difference between a prediction that is simply powered by big data versus another that uses knowledge and context as the core of its prediction.
Bad: Google Flu Trends
Google’s data-aggregating tool Google Flu Trends (GFT). The program is designed to provide real-time monitoring of flu cases around the world based on Google searches. Seems like a perfect use of the 500m Google searches made each day. But there’s just one problem: as a new article in Science shows, when you compare its results to the real world, GFT doesn’t really work.
GFT overestimated the prevalence of flu in the 2012-2013 and 2011-2012 seasons by more than 50%. From August 2011 to September 2013, GFT over-predicted the prevalence of the flu in 100 out 108 weeks. During the peak flu season last winter, GFT would have had us believe that 11% of the U.S. had influenza, nearly double the CDC (Center for Disease Control) numbers of 6%. If you wanted to project current flu prevalence, you would have done much better basing your models on 3-week-old data on cases from the CDC than you would have been using GFT’s sophisticated big data methods.
Google had data (copious amounts of it) but CDC had knowledge.
Good: Google Maps
Most everyone has used Google Maps at some point. If you are like me, then you didn’t really think a lot about the smarts underlying that app on your phone.
Google Maps is a great example of a big data / analytics application that works well. It uses a combination knowledge, context, data and smart applied reasoning to direct you to your chosen destination.
Here’s a little insight into how it works:
The maps you see in Google Maps are compiled by a private company with whom Google has a partnership. This company is called Tele Atlas and they are a world leader in navigation and location-based services. The maps are highly accurate and have been hailed for recording extremely rural areas and mapping the terrain correctly. [Knowledge / Data]
Google Maps also offers real-time views of how congested the roads were. But how does it know what traffic is like on the roads? Google realized that as more and more people continued to switch to smartphones, they had a miniature army of traffic monitors. Other phones that are on the same journey inform the traffic flow. Like it or not, you are part of the solution. Of course, Google uses its own algorithms to exclude anomalies, like a postman who chooses to stop much more frequently than the average driver. [Data/Knowledge/Context]
To calculate your ETA Google things like official speed limits and recommended speeds, likely speeds derived from road types, historical average speed data over certain time periods (sometimes just averages, sometimes at particular times of day), actual travel times from previous users, and real-time traffic information. That’s a lot of data. They mix data from whichever sources they have, and come up with the best prediction they can make. [Data/Applied Reasoning]
Google Maps just needs you to tell it where you want to go (small data), and because it knows where you are (context) it can use all its own knowledge, context, data and applied reasoning to get you there on time.
A Final Thought
If Google can provide me with the directions to my customer meeting, shouldn’t there be a smart app that can help me navigate the twists and turns in the sales meeting with the customer? Well there is, and I will be talking about that in another session at Dreamforce this year.