One of the most difficult challenges reporting and analytics face in public relations measurement is sentiment analysis. Machines attempt textual analysis of sentiment all the time; more often than not, it goes horribly wrong.

How does it go wrong? Machines are incapable of understanding context.

Here’s why. Machines are typically programmed to look for certain keywords as proxies for sentiment. Example words which flag as positive sentiment include like, love, enjoy, etc. Example words which flag as negative sentiment include hate, angry, upset, etc.

The problem is keyword-based sentiment detection can’t understand situations like this:

“Oh, yeah, Fast Food Restaurant. I just LOVE the 30 minute wait for my food.”

We humans understand sarcasm. We understand the sentiment of this comment is clearly negative. Yet a machine would flag it as positive, possibly even very positive because of the all-caps LOVE. How terribly wrong.

Here’s another example. Imagine you were ABC Car Company, and your arch competitor was XYZ Motors. Suppose you saw this on Facebook:

“OMG my dad ****ing HATES XYZ. Worst service in the world. Can’t even fix simple problems. At least I drive an ABC.”

A machine says this social post mentions ABC and flags several negative words, classifying it as a negative sentiment post for ABC. We as humans understand it’s actually a positive sentiment towards ABC and negative towards the competitor. What a terrible mixup, to classify something as negative towards your brand when it’s a positive.

How do you solve the problem of machine-generated sentiment analysis gone awry? If you don’t have time to do an in-depth sentiment analysis, your best bet is not to report sentiment at all. Show your stakeholders the above examples as explanations of how sentiment is not reliable when judged solely by machines.

If you have hours or even days before reporting is due, then perform a human-driven sampled sentiment analysis. Here’s how. Using the data analysis tool of your choice, understand the size of the data you’re dealing with, then calculate how many samples you’ll need to examine. The goal is to select a sample of the overall population that represents the entire population.

For example, let’s say you’re looking at Twitter data about Starbucks during the last week. There are approximately 5.48 million Tweets containing the search term Starbucks over the past 7 days. Using statistical methods to calculate the appropriate sample size at a 95% confidence level, with a +/- 5% margin of error, you’d need to examine 385 randomly sampled Tweets to accurately represent the whole population. (SurveyMonkey has a handy calculator if you forgot your Intro to Stats coursework in college.)

Once you’ve got your randomly sampled data, start going through it and coding your sentiment manually, one Tweet at a time. In the example below, I’ve assigned a score of +1 to any positive Tweet, 0 to something neutral or irrelevant, and -1 to something negative:

When you’re done, you can tally up the positives, negatives, and neutrals, report on overall sentiment, and state with confidence that your sentiment analysis is at a 95% confidence level with +/- 5% margin of error. Be sure to disclose what, in your opinion (or the opinion of whoever did the scoring) constitutes positive, neutral, and negative sentiment.

This sort of sentiment analysis requires humans. It takes significant investments of time, people, and effort, but if you want truly accurate sentiment analysis in your gathered public opinion data, it’s the only way to go for now.