Quill pen and ink well resting on an old book in a library concept for literature, writing, author and history

We chant “quality content” as if it is some kind of digital marketing mantra, yet what does it really mean? How do we advise authors on exactly what makes content better for your specific customers?

Recently, we took an initial look at this dilemma, by asking whether you should measure content quality with intrinsic characteristics (lack of grammar errors, appropriate grade level, etc.) or extrinsic characteristics (clicks, conversions, etc.). Most of you would probably like to say, “both,” and today we’ll explore how you can pull that off using machine learning platforms, such as WEKA or MALLET. The secret isn’t the software–it’s the trained data scientist who can identify the right features to test. With the right features selected, and enough data available, the machine can detect patterns that can be used to improve future results.

The way that machine learning works is to amass training data that contains the features selected, and also is associated with a known set of outcomes. For the problem that we are discussing, the outcomes might be whether a digital session resulted in a conversion or not.

So, let’s look at an example of what could be done. Perhaps we could amass training data around sessions on your website that either result or do not result in a conversion. So, for a particular product you sell, perhaps we could grab all the session data that results in trial of the software. From those sessions, we could extract all of the URLs visited in those sessions. From there, we could look at all the session that contained those URLs, resulting in training set of a subset of URLs on your website, along with the outcome that each session resulted in a conversion or not.

Using conventional analysis, we could identify which pages are more associated with conversions than others. We could calculate the chance of a conversion for each page, based on the number of page views of each page in conversion vs. non-conversion sessions. We could also see which pages resulted in the most conversions vs. which pages resulted in the most non-conversions.

So, that can help us identify the most valuable pages both in terms of rate and in aggregate. But it doesn’t tell us what it is about those pages that makes them so valuable. Which means that we are just guessing if we try to make more pages like those. Like those in which ways?

That’s where machine learning comes in. Data scientists can extract features out of those pages, such as:

  • Number of words
  • Length of the URL
  • Reading grade level of the content
  • Percentage of product name mentions in text
  • Percentage of page that is in bullets
  • Number of pictures
  • Number of videos
  • Percentage of grammatical and spelling errors
  • Number of broken links or images
  • Voice (first, second, or third person)
  • Percent of passive voice sentences
  • Length of sentences
  • Percent of sentences with prepositional phrases
  • … you get the idea

Just about anything you can think of can be extracted from the page (with enough time and money) and you can examine the relationship between pages containing that quality and conversions. That can provide guidance for your page creation standards to use these better-performing practices when you create new pages and update old ones.

This example just used conversions, but if you have a buyer journey that allows you to detect progress toward conversions, you might be able to conduct even better machine learning experiments, because the real purpose for each page is to move the visitor to the next step of the journey–perhaps that would reveal different insights than just conversions.

You can also segment your visitors by various personas, and see if different personas are persuaded by different page features. You can segment by step of the buyer journey to see if different features are more persuasive at different steps.

Understand that the most important thing is to end the argument about intrinsic vs. extrinsic measurements. Machine learning can bridge the gap.