Text Analytics: Why Arent We Building Understanding?I spent the week in San Fransisco working with some of the biggest companies in the US on social media analytics. My co-author, Mark Eduljee of Microsoft, and I presented a keynote on driving business gain with text analytics. We also attended some of the other sessions dealing with text analytics and, when we debriefed during dinner after the first night, we were both astounded by the amount of effort companies spent on measuring what was going on about their brand in social media and the lack of true understanding of these social conversations.

I have to say, most of the folks attending the conference were BI (Business Intelligence) analysts, so they likely enjoyed hearing about different tools to assess sentiment and various NLP (Natural Language Processing) algorithms out there. However, both Mark and I come from different backgrounds and found efforts to categorize social utterances somewhat anemic in terms of providing true understanding to guide business decisions.

Use all the data

That used to be the rule when trying to generate insights for business. And, it’s still true in certain areas. For instance, financial records really require all the data generated through transactions and TQM processes still require businesses monitor each product as it moves through the manufacturing process to determine when the process exceeds tolerances.

And, arguable, some elements of social media analytics require businesses assess all the data they can get their hands on. But, most analysis is much more effective if businesses collect a representative sample of the data and analyze it more completely rather than using machine scoring to get fewer insights on ALL the data.

Problems with measuring

Several problems crop up when you try to analyze all the data — or what I can measuring versus building understanding.

Data capture

The first problem is data capture. In working on my social media analytics book, I discovered that most tools are only about 65% accurate in collecting data in the first place. Many can’t handle blogs or other macro sites, including news sites. And, almost none can do anything with images. Curalate is an exception, but it ONLY handles image data. Tools also don’t capture everything posted on social networks.


Several presenters talked about the 3 V’s of data — variety, velocity, and volume — which create analysis problems. I like to talk about veracity as a major problem, especially in analyzing text data. IS the data true? And, is your interpretation true?

One problem contributing to veracity issues is the halo effect. If someone is truly unhappy with the company, what they share is totally negative, even if some aspects of your business are actually OK. And, they might not even be customers, but disgruntled employees or suppliers.

Another problem contributing to the veracity issues is the complexity and variety within the language — and that’s just English. Then you have a bunch of other languages to contend with. For instance, just categorizing an utterance as positive or negative is complicated. Take the statement:

That’s the bomb

A computer program might categorize that as a negative statement, but most would agree it’s pretty positive.

Next, you have different vocabularies, even within the same language. My students speak a totally different language than I do, for instance. One called her blog: Nothin’ Huff and I didn’t realize until she explained it in her strategy, but huff means something that isn’t good. Sure, you can program in stuff like that, but it changes all the time.


Most of the tools I’ve researched for analyzing textual data simply categorize utterances as positive, negative, or neutral. But, people have a wide range of feelings and emotions about products that range from very positive to very negative. Existing tools miss the valence of utterances when doing text analytics.


Text analytics tools tend to evaluate text at the partial sentence or sentence level. I did hear one presentation where they were coding a tool to look at a sentence or slightly more and use different parts of the sentence to score the overall sentiment of the utterance. So, negatives were reversed by positives and maybe reversed back again by negatives.

Very cool, but does that really build understand upon which to create better market performance?

What you really need is an understanding of meaning based on the ENTIRE conversation.

In sum, existing tools are really good at answering how much, how many, how often, when, and where kinds of questions hidden within your textual data.

How many folks “like” your brand.

How they move through your conversion funnel

Where they came from?

But, they’re not very good at addressing why, how, and what questions.

Why aren’t consumers buying my brand?

What problems do they face in using my brand?

What features do they like the best? The least?

How do they use my brand in their daily lives?

Building understanding

Solving all these problems still doesn’t build understanding. Building understanding requires a more holistic interpretation of what consumers think and feel about your brand. But, I’ll leave this topic for another day. Stay tuned.

Need Help?

Whether you need a complete social media marketing strategy or some consulting to optimize your existing social media marketing, we can fill your digital marketing funnel or create your brand — online and off. We can help you do your own social media marketing better or do it for you with our community managers, strategists, and account executives. You can request a FREE introductory meeting or sign up for my email newsletter to learn more about social media marketing.