Dangers of Self-Service predictive scoring models

As predictive analytics comes of age, we’re hearing a lot about data science methodologies like machine learning and data modeling. Until recently, these complex techniques were only employed by a relatively small group of data scientists. But new cloud services for machine learning from the likes of Amazon, Google and Microsoft claim to finally make it easy for any business to take advantage of the predictive revolution. As these types of solutions become common in the market, more do-it-yourself (DIY) tools will emerge for industry-specific flavors of predictive analytics in data-rich sectors like financial services, healthcare or retail, as well as in certain functional areas like predictive sales or marketing.

As a marketer and aspiring data geek myself, this idea is energizing. However, while self-service modeling solutions represent an exciting new frontier for these markets, businesses should be cautious to understand the tradeoffs before jumping in feet-first. Companies might initially save some time and money by shortcutting the heavy lifting of data science, but they will be remiss if they ignore the risks. Here are three key questions teams should ask themselves before putting data modeling tasks into the hands of the everyday marketer or other business function:

Do we understand all of the nuances of the data we want to model?

When I talk with data scientists, they all tell me that the hard part of their jobs is not running machine learning algorithms (that’s trivial, actually). The tricky part is curating and pruning data sets, and determining what type of model to build. Even the most enthusiastic, data-driven marketer can’t produce a reliable algorithm without a deep understanding of the data’s definitions, how to extract, filter and match records, which fields to input and why, ways to properly weight signals or define positive outcomes, etc. Yet these skills are second nature for any good data scientist.

How will we gain trust for the model?

Trust is more important to data science than people acknowledge. When it comes to sales and marketing teams in particular, trust is all too elusive in most companies. The common problem of misalignment has been exacerbated over the past decade by the failed promises of marketing automation. Far from uniting marketers with the sales organizations they’re hired to support, in many cases these systems wreak havoc by introducing a separate vocabulary for marketing teams that widen the divide.

Today, marketing is being disrupted as more sales development technologies emerge and sales creeps up the funnel. This is forcing marketers reimagine their profession and find other ways to add value, but modeling isn’t the answer. Even with the predictive lead scoring that’s emerged in the past few years, it’s clear that no matter how good the data science behind the scenes might be, it is rendered useless if the sales team doesn’t trust the scores coming from marketing. To engender trust, all of the stakeholders need to understand the score, not just the model’s creator.

Where will we operationalize the model into our daily workflows?

Assuming a business ops person can learn how to wrangle their company’s data and gain trust for the model, the final piece of the puzzle is interpreting the results and using them in the business. This can be challenging no matter how accurate the resulting predictions are. The team will need to figure out how to integrate their predictive scores with the company’s technology stack and day-to-day workflows. Later, the data owner will have other important decisions to make – i.e. if the business enters a new market or launches a new product line, should they build a new model to account for that, or recalibrate the existing model? It will also be crucial to incorporate feedback from across the business, and monitor model drift in order to know when it needs debugging for predictive improvements over time.

Depending on whether a business’ use case for predictive analytics is an edge case (as most are), or is straightforward, it can be very challenging to design a statistically accurate model that’s tailored to its needs. A common data modeling pitfall that’s often overlooked is over-fitting, which can produce predictions that point users in a random direction. I like this analogy from William Chen, a data scientist at Quora, about listening to a symphony with a super sensitive hearing aid: “You hear your neighbors shuffling in their seats, the musicians turning their pages, and even the swishing of the conductor’s coat jacket… Fitting a perfect model is only listening to the symphony. Over-fitting is when you hear more noise than you need to, or worse, letting the noise drown out the symphony.” Business teams have to ask themselves whether it’s worth taking that kind of risk with the data they use to run their business.

Many predictive vendors today are able to get new models up and running in a couple days or weeks, and the value of investing in their expertise and platform can be huge. While the concept of self-service modeling sounds promising, at the end of the day I don’t think people really want (or are ready for) it. I think we all just want more control. And the good news is that there are more and more new sales and marketing technologies that deliver that coveted control to the data-driven marketer, through features such as advanced profiling and smart lists. I’d argue that it’s more important for business folks to gain access, manageability, insight and control over all of their internal and external data signals than it is to enjoy bragging rights from tinkering around with predictive analytics.