Make no mistake: the Big Data movement has done more than just introduce a tidal wave of aggressive marketing campaigns. While it’s true that the “Big” in Big Data is an increasingly ambiguous term – as others have recently noted, someday we’ll just call it “data” again – there’s no denying that it has fundamentally changed the manner by which organizations today make decisions. Beyond fundamentals, this movement has significantly increased the demand for two relatively new and critical organizational roles: Data Scientists and Data Explorers.
Defining the Roles
The Data Scientist role is key because the characteristics of Big Data (volume, variety and velocity) require professional data management, data mining and modeling expertise, with a special emphasis on the richest analysis types (including statistical and predictive). The Data Scientist also needs new skills gained from experience working with the modern, multi-structured data types common in a Big Data solution. These new skills can be acquired most easily by attending vendor training (an example is IBM’s Big Data University, which offers a wide variety of online courses to learn all about managing, processing and analyzing Big Data) and by just rolling up the sleeves and building a pilot project that uses some of the newest Big Data technologies.
The next most important role is called – at least at Jaspersoft – the Data Explorer. This term is broad and spans several analytic skill levels, but is meant to describe anyone in a business function who has the need to put Big Data to work to help make better business decisions. The Data Explorer brings critical business domain knowledge to the table—knowledge, without which, gaining new insight from Big Data simply isn’t possible. The combination of the Data Scientist and the Data Explorer create a new level of unity between business analysis and IT. In this sense, Big Data is the driver of this newfound unity.
Bridging the IT-Business Gap
The general division of labor between the Data Scientist and the Data Explorer is becoming a better-defined process that essentially involves the Data Scientist modeling and analyzing the data in its richest and most timely forms (admittedly in a wide variety of ways). In this sense, the Data Scientist may not even know the relevant questions to ask of the data prior to his analysis; rather, some of his most valuable discoveries may be uncovering these relevant questions. The Data Explorer is most interested in iterative discovery probably on more constrained data sets, which are better suited to making any number of specific data-driven business decisions. Data exploration is, therefore, typically superior for helping to answer pre-defined business questions.
Once the data is captured from its origin (e.g., a live set of web click-stream data) and managed into a Big Data source (such as Apache Hadoop’s HBase or Apache Cassandra), the Data Scientist and Data Explorer can get to work. The Data Scientist may be involved in helping to prepare the data for use where possible (a process which commonly requires some use of Apache MapReduce or a traditional ETL tool). When Big Data is being directly accessed and used natively, as is often the case with Jaspersoft, the Data Scientist would probably validate that a robust and useful data connection has been established. At this point, the Data Explorer is then also in a successful position to use an analytic or reporting tool to access, probe and analyze the data.
The Big Data Skills Shortage
Modern analytic and reporting tools, designed for working with Big Data, are quickly becoming quite powerful and easy to use even for the Data Explorer. While most articles and discussions focus on the skills shortage among Data Scientists (and this skills shortage is largely accurate, in my estimation), what isn’t talked about enough is the skills shortage among Data Explorers. By this I mean that EVERY business person MUST possess sound analytic skills in order to thrive in this new, information-driven economy.
Many of those in business functions today do not possess an adequate analytic skill set. And so I think this “volume skills shortage” will soon be seen as the bigger overall problem to solve. Ideally, colleges and universities must more commonly offer degrees and certificates in “Analytics” or “Information-based Decision Making” – or something along these lines, so that a much larger number of graduates who possess a reasonable level of analytic acumen become available.
The Big Data Change Agent: Open Source Software
Lastly, I am proud that open source software has become such an important change agent during this past decade. It has provided an unbelievably affordable, powerful, secure and modern foundation for a completely new IT infrastructure (cloud-based, scale-out, mobile-connected) and has enabled affordable access and usage of these new Big Data types. In each major area and layer of software, we find open source leading in features, functions, and breadth of use. In fact, the continuing maturation of open source cloud and Big Data software systems is transforming the modern computing landscape right before our eyes. No wonder that nearly all of the most important Big Data projects have come from the open source community and the democratized skills it has nourished.
Ultimately, it’s this democratization that will allow more people in more organizations to thrive in a growingly competitive, information-driven economy. It is clear that we are all now competing on the basis of time and information. Big Data and open source technologies are allowing nearly anyone to compete and succeed in this new battleground, regardless of size. Building and breeding more Data Scientists and Data Explorers is now required to allow the continued growth and success of this new Big Data era.