From small chains to massive corporations, in both the public and private sectors, there seems to be some sort of tacit race to space in regards to data discovery and big data. The old adage, “You can’t take it with you when you go” doesn’t apply here—companies that merge, go bankrupt, or otherwise vaporize have still gathered big data that will become useful for the next guy on deck. But having a warehouse of big data just to be able to say, “Come check out how much data I have” isn’t the objective.
If you’re in the practice of collecting big data, here’s some food for thought: even the highest level data analysts and scientists have not come to one succinct conclusion on the definition of the phrase “big data.” So maybe the thing to do is to stop postulating and gathering at lightening speed. Maybe we should be organizing and communicating in the best possible ways to determine a few things. Does the data we’re collecting tell us anything substantial about:
- Consumer behavior
- Geo-specific information about POS volume
- ROI on marketing efforts
- Employee scheduling, benefits, behaviors
When we are collecting data about these important subsets, what are we doing to ensure that there is no overlap in information that could lead to inaccurate conclusions? Let’s take the employee schedule, for example. Let’s say that across a 10-store franchise, there are 100 employees. On the whole, each employee works at one location, but one employee, we’ll call him Joe, works at three different locations. Without taking this into account, the data may conclude that Joe is working too few hours to continue to qualify for his healthcare benefits through the company. In another erroneous scenario, the data could conclude that there are three Joes, causing payroll to spit out three smaller checks for Joe, which will ultimately cost him more in taxes, and just generally make his life harder while also causing issues for bookkeeping if the taxman ever comes to call for an audit.
For the 10-store chain and for Joe, the outcome is not good, no matter how much data—big or small—is collected in an ineffective manner about Joe’s hours, his benefits, his sick days, paid vacation days, and so on.
In another scenario, let’s imagine there are two Joes, both happen to have the same last name. The first Joe Smith is a few months from retiring, and the second Joe is fresh out of college in his first management position. By collecting data about the Joe Smiths, the analysis is compromised when it doesn’t catch the difference between Joe one and Joe two. This could lead to Joe one not receiving his pension on time, or Joe two accidentally being paid Joe one’s salary—far more than he’s making as a green manager in his salad days.
These are just two examples of big data going wrong on a relatively small scale, but when you take the scenario of the Joes and extrapolate the inefficiencies it represents, it becomes easier to see how things on a much larger scale could cause catastrophic error. The best way to avoid complications like the ones listed above are to have in place systems of coding that can tell the difference between Joe one and Joe two, or on a larger scale, say a mix-up in a corporate car parts warehouse in Portland, Maine and Portland, Oregon.
There is no precise solution for big data as how it is gathered and managed across all organizations. The main component of the solution will ultimately be driven by how seriously company executives and higher-ups take accuracy in big data, and who they put in place to manage it. Pretending it’s not necessary to hire big data analysts is likely a mistake—and not having a system in place for catching mistakes made by any data analysis team is an even bigger error. The answer? Care about collecting relevant data, care about who analyzes it, and care about communicating between analysts and executives. Otherwise, all you have is a chaotic library of information that might cause more harm than good.