An article by “Dean of Big Data” Bill Schmarzo, The Mid-market Big Data Call to Action, provides a helpful quick take on the state of big data uptake, contrasting perceived experiences at smaller and larger organizations.

https://upload.wikimedia.org/wikipedia/commons/7/74/Big_Bang_Data_exhibit_at_CCCB_17.JPG

Big Bang Data exhibit at CCCB. Photo by Kippelboy.

Bill presents certain truths that are independent of big data — “It is easier for smaller organizations to drive cross-organizational collaboration and sharing” and “Smaller organizations have a better focus on delivering business results,” for example — yet also illuminating are certain implicit assumptions, points we should be challenging.

In the spirit of friendly discussion, I’ll offer a few data truths I see that are contrary to the article’s premises:

  1. It’s no longer acceptable to equate big data and Hadoop. (Not that I think it ever was.)

    Bill conflates the two in the conclusion he draws from responses to a webinar poll question. He asks, “Where are you in the process of integrating big data with your existing data warehouse environment?” — nothing about Hadoop there — and then concludes, citing poll results, “Over 80% of the attendees still do NOT have any meaningful Hadoop plans.”

    Nowadays, translated into particular technologies, big data means Hadoop, Spark, and Kafka — plus other technologies in their ecosystems of course — or non-Apache software with similar volume, velocity, and variety handling capabilities.

    And yes, I’m with Bill: There are still only 3 defining Vs for big data.

  2. Hadoop and other big data technologies can and do exist outside the data warehouse environment.

    I note that Bill concludes that the 80% who respond to the above question, either “in early discussions” or with “no plans/don’t know,” are not using Hadoop, as if they couldn’t be using it separately from the existing data warehouse. But also I wonder about those “don’t know” responses. Business users and managers focus on the user interface, whether graphical or a query or language. Particularly if you’re using SQL on Hadoop — via Apache Hive or numerous other options — SQL being the traditional data warehousing query language — you may be unaware of your use of Hadoop in your DW environment.

  3. Smaller organizations don’t necessarily face less-significant agility obstacles. It’s not absolute quantity or size that matters, it’s the number and seriousness of obstacles relative to an organization’s size.

    Take the statement, “Smaller organizations have a smaller number of HIPPOs with which to deal.” HIPPO =Highest Paid Person’s Opinion. Not a scientific sampling, but I can tell you that among my consulting clients, in companies with a employee count ranging from a handful to a few hundred, what the CEO says holds absolute sway regardless of the opinion’s technical soundness. The smaller an organization, the greater the immediate impact of any one individual’s opinion, good or bad.

    And “it is easier for small organizations to institute the organizational and cultural change necessary to actually act on the analytic insights.” Not at all. When small organizations institute significant organizational and cultural changes, they may be remaking the whole company. They’re all-in.

    Regardless, I find that small organizations are actually LESS likely than a larger organization to act on analytical insights. That’s both because they have less data to work with, so that there’s less to fuel analyses, and also because they’re much closer to the market and more reliant on insights derived from qualitative observation and direct market interactions.

  4. Data silos are not a big data killer. Done right, they’re simply a waypoint.

    Data silos can be an efficiency booster! When you have an operational task to accomplish, you purpose-design a data store that’s optimal for the task. Certainly, design according to standards that facilitate data integration or, at least, data exchange. Design foresight will ensure data can flow from silos into whatever integrated big data environment you implement. But if you elevate secondary data-use possibilities to the first tier — and especially if you do that in the name of a faddish concept like big data — you create risk and potential delay and performance compromise.

Finally, beware of selection bias, of drawing broad conclusions from a narrow sampling of a target population. People who attend a vendor-produced webinar on data management and analysis, if not actively shopping for capabilities they don’t have, likely feel their organizations fall short of the state of the art. So the good wisdom about organizational dynamics, data warehousing, and big data that emerged in Bill’s article is surely a testament to the the discernment he has gained via long and deep exposure to the topics, not just insights that jumped out of the numbers. Data is a springboard for insight and not a replacement for informed judgment.