OpenAI is facing yet another controversy after a group of artists leaked Sora, its impressive video generation tool, online, prompting the company to pause the tool for now. The much-anticipated AI tool was supposed to be yet another futuristic offering from the ChatGPT parent and a step forward towards its ultimate goal of achieving artificial general intelligence. However, some artists accuse OpenAI of using them as “PR puppets” and exploiting them for “unpaid R&D and PR.”
Here’s everything we know about the incident and its implications for OpenAI and the broader AI industry.
What Is Sora?
Sora is a text-to-video generation tool that can generate videos of up to 1 minute. OpenAI gave free access to hundreds of artists to test the tool. However, some of these artists leaked the tool online accusing the ChatGPT parent of exploiting them financially.
In their statement posted on Hugging Face, the group said, “Hundreds of artists provide unpaid labor through bug testing, feedback and experimental work for the program for a $150B valued company. While hundreds contribute for free, a select few will be chosen through a competition to have their Sora-created films screened — offering minimal compensation which pales in comparison to the substantial PR and marketing value OpenAI receives.”
OpenAI is this true???
There are some news about an OpenAI Sora leak, and now it’s supposedly available on Huggingface.OpenAI Sora early testers and the artists involved are angry because they feel exploited. They claim they were invited as "testers" but ended up doing free… pic.twitter.com/nK4F6vSqX6
— AshutoshShrivastava (@ai_for_success) November 26, 2024
OpenAI Was Valued at $157 Billion In the Recent Funding Round
The artists are taking a swipe at OpenAI’s mammoth $157 billion valuation, which it achieved in the most recent funding round. Companies like Thrive Capital, Microsoft, and Tiger Global poured a combined $6.6 billion into OpenAI in that round alone, even as the AI startup continues to burn an unfathomable amount of money every quarter.
According to the Information, OpenAI is spending almost $4 billion to run ChatGPT and its annual training costs alone are around $3 billion. OpenAI also spends a good chunk of money on employee salaries and office rents and is expected to lose a mammoth $5 billion this year. The company is in no way short of money and spent an estimated $20 million on buying domain chat.com. No wonder artists feel they have been shortchanged by what’s among the most consequential and cash-rich AI startups.
OpenAI Hasn’t Officially Said That Sora Leak Was Authentic
Meanwhile, the artists who leaked Sora categorically stated that they were not against using AI tools for art. They however added, “What we don’t agree with is how this artist program has been rolled out and how the tool is shaping up ahead of a possible public release. We are sharing this to the world in the hopes that OpenAI becomes more open, more artist friendly and supports the arts beyond PR stunts.”
They fault OpenAI’s content approval process and say, “every output needs to be approved by the OpenAI team before sharing.”
OpenAI hasn’t formally accepted that the leak was real. In a statement, OpenAI spokesperson Niko Felix said, “For now, OpenAI has paused Sora but hasn’t officially confirmed whether the leak for real. Sora is still in research preview, and we’re working to balance creativity with robust safety measures for broader use.”
Felix added, “Hundreds of artists in our alpha have shaped Sora’s development, helping prioritize new features and safeguards. Participation is voluntary, with no obligation to provide feedback or use the tool. We’ve been excited to offer these artists free access and will continue supporting them through grants, events, and other programs.”
Outside Testing Is Common in the AI Industry
To be sure, such outside testing is quite common in the AI industry and tech in general. However, it’s very uncommon for the tech to be leaked as access to these kinds of early flagship products are controlled tightly. The issue raises a question mark on OpenAI’s safety mechanisms at a time when the debate over safe AI deployment is gaining traction.
It also raises concerns over AI companies like OpenAI paying their due share to original content developers. Many media organizations have sued AI companies over them (allegedly) illegally scraping their copyrighted work to train their AI models. These companies and individuals have good reason to raise these concerns, as OpenAI and some of its rivals never got permission to train their AI products on their IP.
Journalists and workers in any number of industries are also worrying that the tech could eventually eat away many newsroom jobs.
OpenAI is Facing Many Lawsuits Over Copyright Issues
Last year, The New York Times filed a lawsuit, accusing AI giants OpenAI and Microsoft of brazenly copying millions of articles from their websites without due permission to train their AI systems.
Earlier this year, journalists Andrea Bartz, Kirk Wallace Johnson, and Charles Graeber filed a class action suit against Anthropic in a California court accusing it of using their work without permission to train the company’s Claude chatbot.
Last month, the New York Post and Wall Street Journal’s parent company sued Jeff Bezos-backed Perplexity, accusing the AI startup of illegally using their copyrighted news.
OpenAI has also been accused of using YouTube videos to train its models. The company is facing a lawsuit over the issue and its former chief technology officer, Mira Murati, avoided questions about the data sources for Sora in a WSJ interview. You can check out the interview below.
It’s difficult to tell how OpenAI could possibly escape these copyright issues as it clearly never got permission for at least the vast majority of the IP that its AI products are trained on.
It is Time for AI Companies To Pay Up
Tech companies have now been looking to license the work of media organizations to train their LLMs with. Earlier this year, OpenAI signed a $250 million deal with News Corp, which gives it access to current and archived content on leading portals like The Wall Street Journal, the New York Post, and The Daily Telegraph. OpenAI has also signed a deal with Time that gives the Microsoft-backed company access to its archived content dating back a century. This may seem like a lot of data but it is nowhere near enough to train a model as powerful as GPT-4 (at least with current technology).
Companies like Meta Platforms have been scrapping through publicly posted user data unless regulations in some regions specifically bar them from doing so. Last year, Google also quietly updated its privacy policy to add that it can use scraped web data to train its AI models.
Most of the time, data is being used for profit without the user having the option to opt-out. The reason tech companies have been able to scrape so much public data without retribution (so far) is because of the lax privacy laws.
The allegations by artists in the Sora leak case go to say that AI biggies are not paying their due share to creators and artists even as their valuations are skyrocketing.