“Data scientist” is a popular role these days. Everyone seems to have one — or claims to be one.
But how do executives know whether they have the “real thing” and how that value is best employed?
I met with Kaiser Fung, leader of the Applied Analytics program at Columbia University and co-founder of a new school for data scientists called RSquare Edge, to discuss this question. Here, I offer our perspective on capturing true value in the corporation with human talents in data science.
Define What a Data Scientist Means to Your Organization
To start, the term “data scientist” has been diluted, and companies don’t always get what they paid for, so to speak. There’s a difference between operators of software who appear to find real data relationships and the people who can truly validate those relationships.
“Data science” can be defined as using data and quantitative or scientific methods to help solve business problems. But even the use of the word “science” is tricky because many problems are quantitative; they require data, but they’re not scientific. A good example is the Google search engine’s page-rank algorithm that defines page rank as a web page’s authority, which cannot be scientifically determined.
Know What “Data Scientists” Call Themselves
Real data scientists (i.e., the most capable, but also most expensive, resources) are professionals who sometimes have data or computer science degrees and related work experience and who apply machine-learning methodologies, data mining, and statistical models. They typically work in technology companies, especially startups, that build applications in web or mobile devices that consider how to incorporate data into the application.
Data analysts are somewhat less technical but have much broader training, proximity, and accountability to the business unit. They report on, track, and explain business metrics. This may include measuring marketing campaigns and understanding the effectiveness and efficiency of operational expenses. Their educational background is usually in economics, statistics, engineering, and the quantitative disciplines.
Data infrastructure engineers build and manage databases. This includes monitoring data quality and possible issues of privacy and security. They make sure the data are available, encoded properly, and of the right quality — or, validated — so that the team manipulates and analyzes accurate data.
Recognize Advanced Qualifications in a Data Scientist
An experienced and well-educated data scientist typically has a holistic point of view on data centered on figuring out what kinds of data the business should be collecting, understanding the quality of the data, and reporting on business metrics from a past tense or descriptive view.
A data scientist not only analyzes the trends, but also takes the time to discover the causes. Data scientists conduct “what if” scenarios with the end goal of defining actions that influence the company’s objectives. For example, by conducting predictive activities — experimentations aimed at improving the business metrics based on observations about the past — a qualified data scientist can predict an outcome.
Ideally, a strong data scientist leads to the smart design of these “what if” scenarios that can then be predictively tested. This type of advanced predictive analysis is more complex than data quality and integrity know-how.
Someone who has experience with data mining and regression models, application of statistical methodologies, and machine learning can use past outcomes to define the actions an organization should take. In a sense, the data scientist builds a framework on the past, present, and future of the data he’s responsible for.
Software development skills are also paramount, as they are especially helpful in the predictive stage. Outside that, deep experience working with codes and a general proclivity to work with any code that gets the problem solved most effectively are vital attributes.
Top-notch data scientists are comfortable working with real-time, continuously updated algorithms and large amounts of data that tend to propel real-time action in a relatively short time.
Create an Integrative Environment for Data Science
A company should be willing to foster an environment where the data scientist can integrate into the organization to help influence and drive business decisions. In Fung’s view, “The biggest thing about attracting analytics talent is really the cultivation of a ‘moneyball’ culture within the company.”
One of the major frustrations with attracting the right talent relates to whether the organization is willing to use the results derived from data as important decision-making factors. There is no question that most data scientists are very smart, motivated, and driven to do good work, but their counterparts need to appreciate the value they bring to the organization. Often, a data scientist’s biggest frustration is the inability to make a real impact on the business.
It’s a classic “moneyball” scenario — that contest between baseball scouts using gut feelings and instincts versus the data analysts using facts and scrutiny. When analysts are faced with a situation in which the “scouts” are really in power and they’re just generating a lot of reports that collect dust in the corner, it’s discouraging. The key question is, how well is the data team integrated into the organization?
Companies that spend time agonizing over the decision of whether they are getting a true “data scientist” to help improve performance should understand the ideal makeup of an integrative team and consider whether the culture is supportive enough to retain it.
It takes leadership from individuals already inside the firm to bring on the right data team and make this happen.