The Internet is the most prolific technology of our lifetime. We now have the ability to immediately connect with others across the globe, instantaneously send and receive money, find information about anything under the sun, and conduct business in ways not possible before. With all its glory, however, also comes a darker side. With infinite information-sharing, minimal regulation, and unlimited access, the internet can also serve those with malintent.

With governing bodies often behind the times, keeping the internet a safe place often falls on the shoulders of the businesses who reside there. With reputations and even the bottom-line at stake in some cases, it’s not an option to hope bad actors don’t attack or just eventually disappear. Fortunately, technology has evolved to help keep tabs on what’s being said on the internet megaphone that could be detrimental to business or worse. It’s a matter of finding the right tools to operate the process.

As you can imagine, with so much data and so many information sources, the challenge of detecting intent and emotion in free text is no easy feat. Fortunately, natural language processing (NLP) is built for this. Sentiment analysis—or opinion mining—uses NLP and machine learning to interpret and classify emotions in subjective data. It’s often used in business to detect sentiment in social media, gauge brand reputation, and understand customers, from polarity—positive, negative, neutral—to emotion detection—angry, happy, sad, or afraid.

NLP has the power to help businesses sort through an otherwise impossible amount of data, and in most cases identify the motive behind the words. But this is especially important when you consider the complexities of human language. With context, formal and informal language, misspellings—sometimes intentional to mean something different than the origin word—accuracy becomes important, especially if there are consequences to these actions. Social media has been used to incite violence, discriminate, bully, shame, and harass.

Even for less extreme cases, many NLP tools available today can be used to automatically detect the tone of text and more difficult projects, like analyzing the sentiment of whole-text with specific aspects in it. For example, what someone thought about a restaurant’s food vs. its service or price—and whether text seems sarcastic, fake, or includes toxic language such as threats, insults, obscenity, or hate speech.

Take cyberbullying, for example. More than 40% of adults have personally experienced some form of online harassment, and 75% have seen cyberbullying occurring. That number jumps to 85% when taking into account young people, making it clear why it’s important to keep tabs on this. John Snow Labs’ Spark NLP library includes a trainable multi-class text classification model that uses state-of-the-art universal sentence embeddings as an input for text classifications. The document classifier uses a deep learning model and supports up to 100 classes. Pre-trained models that are freely available with the open source library include detectors for cyberbullying, racism, sexism, or threatening tweets.

Let’s look at how this works in real life. Take the following tweet: “@AMohedin Okay, we have women being physically inferior and either emotionally or mentally inferior in some way.” The emotions detector model classified this as 100% a sexist tweet. On the other hand, take this example: “@LynnMagic people think that implying association via follow is a bad thing. but it’s shockingly accurate.” This was classified as a neutral tweet. As you can imagine, the levels of sexism and racism, or even implications of neutral tweets vary, but this is a good jumping off point to keep an eye on those who cross the line regularly. It’s also vital to keep in mind that the better your training data, the better your results will be, so NLP needs to be constantly improved and fine-tuned as you go.

Similar to cyberbullying, toxic content is another area where NLP can help uncover harmful discussions. Toxic content can be classified as language that infers hate, insult, obscenity, or threats. In this case, we’ll look at social media comments using our Spark NLP’s Multiclassifier DL, another pretrained model. This is a real comment that’s been analyzed as toxic, and more specifically, a threat, using NLP: “I’m also a sock puppet of this account…SUPRISE!! -sincerely, The man that will track you down from the Internet and kill you.” While not every disturbing social media post has credibility, this type of comment can hurt your business, frighten people, or in some cases, cause real harm, and it’s worth taking note of.

The same goes for ‘fake news,’ which has been an especially popular topic of late. While it’s arguably not as detrimental as cyberbullying or toxic content, it does have the power to incite unhealthy debate, which can lead to these other behaviors. NLP can also help comb through article contents, and social media posts promoting them to identify what’s real and what’s not. While a headline like, “White House Makes Trade Pitch, With Focus on Moderates,” would be classified as real news, “Morning Joe Destroys Corrupt Clinton Foundation (Laughable) ‘Total Corruption’,” would be classified as fake.

From cyberbullying and toxic content to fake news, it’s clear why we need to stay ahead of potentially dangerous dialogue online. In fact, another survey by Pew Research drives home why being able to detect this is so important to how we conduct business and ourselves on the internet. Nearly 40% of respondents said they expect the online future will be “more shaped” by negative activities, when asked if they believed public discourse online will become more or less shaped by bad actors, harassment, trolls, and an overall tone of griping, distrust, and disgust. This is a grim outlook, but probably not far from reality, especially taking into consideration current events over the last year.

For businesses, it’s important to keep your websites, platforms, services, and social spaces safe to attract and retain customers. But the implications go far beyond the enterprise. The Internet should be a venue where people can voice their opinions freely, but also one where people aren’t victimized, abused, misled, or slandered by others hiding behind a screen. While there’s rightful concern that regulations may hurt the open exchange of ideas, opinions, disparate views, and conversations – NLP is one step in the right direction to help monitor what’s going on online without infringing on individual rights or enabling toxic content to persevere.