AI models such as OpenAI’s ChatGPT and Meta are in hot water after a group of writers, including popular comedian Sarah Silverman, brought class-action complaints against both companies for copyright infringement.
The complaints claimed that OpenAI and Meta remixed “the copyrighted works of thousands of book authors – and many others – without consent, compensation, or credit.”
Ever since artificial intelligence (AI) tools started to explode in popularity with the general public following the launch of OpenAI’s DALLE art-generating model the technology has been criticized for using copyrighted works without the permission of their owners.
What Does the Lawsuit Allege Against OpenAI and Meta?
The plaintiffs (authors) filed exhibits showing prompts and responses from ChatGPT that they believe are evidence of copyright infringement. For example, when asked for the plot of Paul Tremblay’s book The Cabin at the End of the World, it returned a 420-word response explaining early plot points. Subsequent prompts asking for more of the plot were successful in getting ChatGPT to return the rest, including the end’s plot twist.
Sarah Silverman joined the authors’ fight Friday when she filed her suit, about 2 weeks after the initial filing from a large group of writers. She claims that her book The Bedwetter, along with the works of many other writers, was obtained through what is often called a ‘shadow library’ which are huge (and very illegal) databases full of copyrighted books, articles, and more.
Are the Authors Right?
Without direct knowledge of everything OpenAI used to train its models, we can’t confirm without a doubt whether these books were used. However, we can gain insight the same way some of the authors did by asking ChatGPT itself.
Of course, ChatGPT claims that it doesn’t have the ability to access copyrighted books or materials when asked to return the first chapter of Silverman’s book.
Ask the question in a slightly different way and it will return a slightly different answer, this time saying that the chapter isn’t recorded in the dataset it was trained on. Even changing up the approach and asking it to determine what book a paragraph from Silverman’s Bedwetter doesn’t do the trick.
So does this mean that ChatGPT wasn’t actually trained with these books? Not necessarily. It turns out that it can easily recognize a few other books from authors that don’t happen to be suing OpenAI. It was able to identify a section of Brandon Sanderson’s Mistborn: The Final Empire, which is copyrighted.
This doesn’t necessarily mean that GPT-4 was trained on this book though. The section included in the prompt was quite long and included the rather unique names of characters that could have been used to identify the book through other content on the web such as reviews.
And yet, GPT-4 was able to identify a separate copyrighted book published by Penguin Books, The Master and the Margarita, with a much shorter section that doesn’t even include the names of characters.
The problem with this approach in trying to figure out if ChatGPT is breaking copyright laws is that we have no idea where this information is coming from. Nevertheless, it seems like it might have access to some text from copyrighted books.
Does The Fair Use Doctrine Apply?
While it’s certainly important to determine whether OpenAI and Meta used copyrighted texts to build their models, it may not matter in a legal sense. Companies that produce AI models argue that even if the models are trained with copyrighted material, this use constitutes Fair Use under US copyright laws.
The fair use doctrine generally allows the use of copyrighted works without the permission of the owner as long as the work is transformative, doesn’t harm the owner, and isn’t used for commercial gain. That last requirement likely makes the provision useless to most AI model makers such as OpenAI, which sells its models for commercial gain.
The US Supreme Court recently reaffirmed this provision in the case Andy Warhol Foundation for the Visual Arts v. Goldsmith.
The court decided that even though Warhol’s Orange Prince transformed the original work of photographer Lynn Goldsmith, it was not protected under the fair use doctrine because it was intended for commercial use.
However, there is no clear consensus on the exact factors that determine whether something constitutes fair use and some argue that AI models still do fall under fair use despite the intended commercial gain.
Will the Lawsuits Go Anywhere?
While the exact merits of the complaints are unclear so far, it does seem to have some evidence of OpenAI using copyrighted materials for commercial gain. However, it likely won’t be a simple case as it’s dealing with the cutting edge of technology that may not fit past molds of copyright law. Even if the Meta and OpenAI win the lawsuits, we may be able to finally get a peek behind the curtain to see how these models are trained.
- New Supreme Court Decision Changes Copyright Precedent, Putting AI Models At Existential Risk
- Elon Musk Blasts OpenAI for Using His $100m Donation to Create a ‘$30B Market Cap For-Profit’
- Best Tech Stocks to Watch in 2023 – How to Buy Tech Stocks
What's the Best Crypto to Buy Now?
- B2C Listed the Top Rated Cryptocurrencies for 2023
- Get Early Access to Presales & Private Sales
- KYC Verified & Audited, Public Teams
- Most Voted for Tokens on CoinSniper
- Upcoming Listings on Exchanges, NFT Drops