Facebook has announced that they’re testing a new Messenger feature that will translate audio into text, to appear alongside sent audio files. The functionality’s only been made available to a small number of users at this stage, with no word on when it might be rolled out to everyone, but the fact that they’ve even made this option available at all is significant. This is the latest move in Facebook’s push into speech-to-text translation functionality, an advance which would have significant implications for the social giant. While some may see this as a novelty, an add-on feature of sorts, the introduction of this capability is closely aligned with Facebook’s overall strategic goals.
A Sharp Wit(.ai)
In early January, Facebook announced that they had acquired Wit.ai, a company that’s been building an API for voice-activated interfaces. There’s been industry speculation for some time about Facebook’s intention with speech-to-text, with the main focus on the possibilities of communicating via Messenger hands-free. Star Trek-like hands-free messaging certainly does sound interesting, but another element that may be under construction at Facebook HQ could be the automatic translation of languages into the Facebook feed.
This is more in-line with Facebook’s overall vision of ‘connecting the world’ – what if Facebook were able to prefect a translation system that could enable seamless communication between different languages? That’d be a huge boost, both for Zuckerberg’s over-arching ambitions and the platform’s growth strategy in general. While it is already possible to do this using something like Google Translate (and note: Google Translate also got a major upgrade last week), the results can be patchy. Facebook may look to build a system that can translate in real-time, enabling immediate interaction, all within the Messenger app. Such a move could revolutionize connectivity on the platform – with around 100 languages currently used on Facebook, and Messanger being one of the most popular messaging apps in the world (WhatsApp, also owned by Facebook, is the most popular), group geography, personal connectivity, knowledge sharing – everything could change if such capability were enabled.
Always Listening
The other aspect that Facebook is no doubt interested in is data. More data means more power, as evidenced by the market strength of both Google and Facebook – once you control the flow of information, you can dictate terms to advertisers and groups who’d want to use it. In May last year, Facebook put a cat amongst the privacy pigeons when they announced a new, optional, feature that would ‘listen’ to what was happening as you posted updates onto the Facebook app. The system can tune into your surroundings and translate what TV show or song you’re listening to and add that detail to your post. Predictably, people freaked out – this effectively meant Facebook was listening in on your life, could hear what was happening in your home, your bedroom. What’s more, critics of the functionality also theorized that while Facebook had highlighted the term ‘optional’ several times in their announcement, this functionality might, possibly, be active without the user even knowing it.
With a bigger trove of personal data than any company has ever had in history, Facebook walks a fine line on user privacy. And while the company cops more than its fair share of criticism over its handling of such sensitive info, on balance, you’d have to say they’ve managed that conflict fairly well. They’re in uncharted waters for the most part, and they’ve gone to significant efforts to assure users and raise awareness of privacy issues to keep people’s data protected. But at the end of the day the fact is that Facebook’s business model is built on your personal data and the value of that data to other parties. Facebook is storing data on everything, even down to the status updates you never submit. Their databanks are their most valuable assets, and in order to maintain their market position, they need to keep that data flowing, keep seeking new ways to build upon their overflowing data lakes.
So, what if Facebook could devise a process to translate all conversations to text? You’re carrying your phone around you all the time, it’s sitting on the table as you have coffee with friends, rested beside you as you drive. What if Facebook could track what people were saying and add that data to their stocks? Suddenly they’d have a whole new stream of insight to provide to marketers, a vast expanse of keyword mentions and conversational queries that could be collated, logged and passed onto third parties to target marketing messages and geographically focus specific advertisements.
Of course, Facebook would need user permission to do this, and storing an unending amount of speech-to-text data would put a huge burden on their data capacity. But privacy concerns are lessening each day – Facebook announces a new measure and people are up in arms, but then it dies down, the new data they gather is not mis-used wholesale, and people go about their daily, Facebook-aligned lives. Data storage options too are always improving – it’s not hard to imagine that in a few years time Facebook could announce a new process where they’re translating specific segments of spoken conversation to text and noting those mentions for data gathering purposes – never to be shared in detail, of course, never to be linked to any specific user, such data would only be used internally. Would people stop using Facebook if they did?
Lost in Translation
Speech-to-text has long been seen as the next progression in communications – working in media monitoring, we looked into this for years, as it would revolutionize how we did business, being able to detect mentions within TV and radio broadcasts. The problem is that speech-to-text technology has never been up to the required standard to make a significant impact. It’s improved a lot, it’s getting better all the time, but we were never able to rely on the accuracy to a stable degree. Where speech-to-text has significantly improved over time is in learning a single speaker’s voice – some of the top speech-to-text tools on the market actually have a very high level of accuracy when they are trained to translate a single voice; the user speaks to it, corrects mis-spellings and mis-interpretations as they go, and over time the system learns that person’s nuances, which enables it to produce very accurate results. It’s when other distrators come in that the systems have trouble – background noises, different intonations and flourishes, accents.
Where Facebook may have an advantage here is if they can incentivise users to ‘train’ the system to their individual voices. If speech-to-text for messenger proves popular, they’ll be able to build better systems based on user examples, narrowed down to regional dialects and colloquialisms. Everytime a user translates from speech-to-text, for example, they might make a correction here or there – Facebook could track those corrections and find common mis-interpretation patterns, narrowed down to specific regions. Starting off small enables Facebook to build that accuracy and increase the usability, making it more popular when it’s eventually rolled out to everyone. And as that accuracy improves, so too does the breadth of Facebook’s data gathering capabilities. It’s still some way off being anything close to a reality, but real-life conversational data tracking may be the next frontier in the big data journey.