Google has been ordered to pay a €250 million (around $270 million) fine by France’s competition watchdog for violating commitments made in a previous antitrust settlement regarding its negotiations with news publishers over compensation for the use of their content to train its generative artificial intelligence models.

The landmark sanction by the Autorité de la Concurrence focuses on Google’s failure to adhere to several commitments it made in June 2022 to negotiate transparently and without discrimination with French news outlets over payments for displaying snippets of their articles on services like Google Search and News.

Lack of Transparency in Revenue Data and Payment Methodology

According to the regulator’s press release, Google breached its commitments by not providing complete data to news publishers about the revenues generated from displaying their protected content, making it impossible to ensure that the pre-negotiated payments were fair and accurate.

The Autorité also took issue with the opaque methodology used by Google to calculate its payment offers to publishers. The regulator found that Google’s approach lacked objectivity, did not account for key factors like indirect revenue sources, and unduly discriminated against some outlets by setting arbitrary remuneration thresholds below which they received no payment at all.

“The Authority notes, in this regard, that Google confined these indirect revenues to a marginal part in the determination of its financial proposals, while it appears from the aforementioned decisions that they constitute the largest share of revenues resulting from the display of protected content on its services”, the regulator stated in a translated press release published yesterday.

Google Trained its AI System on News Content Without Informing Media Companies

companies like google use public data to train ai models

Perhaps most significantly, the Autorité’s sanctions directly call out Google’s practices around training its AI systems like the Gemini chatbot on the copyrighted content of news publishers without first notifying or obtaining permission from those publishers.

The agency’s inquiry revealed that Google used content from news sites and agencies to train the foundation model that powers Gemini when it was first launched in France as Bard in July 2023, in violation of its commitment to engage in negotiations transparently.

“The question of whether the use of press publications as part of an artificial intelligence service falls within protection under the regulations of neighboring rights has not been decided at this stage”, the press release highlighted.

“The Authority considers, at the very least, that by failing to inform publishers of the use of their content for their Bard software, Google has breached commitment no. 1.”

Artificial intelligence systems like chatbots use massive data sets that are generally publicly available on the internet to train their large language models (LLMs). These materials help data scientists and machine learning specialists feed the model with the information that it will process, summarize, and respond to users when prompted.

The issue lies in the fact that much of this information is considered copyrighted material, as is the case of news articles produced by top-notch media companies and their journalists – i.e. investigative pieces and exclusive articles.

Companies like OpenAI and Alphabet (GOOG), both of which are actively developing AI technologies, have been forced to negotiate with news publishers and other companies that produce the materials they are using to train their AI models to purchase the rights to use their data legally.

Google claims that it is the only company to have signed agreements with 280 French news publishers that result in payments of “tens of millions of euros per year”. The scope and terms of these agreements typically remain secret.

Meanwhile, OpenAI recently signed a deal with Le Monde, one of the largest news publishers in the country, to train its AI models with its extensive database of articles and content.

Google’s Opt-Out Tool Results in Disastrous Marketing and Financial Consequences

The Autorité also found that Google initially failed to provide publishers with a straightforward technical option to opt out of having their content ingested to train the AI system without being automatically delisted from Google’s primary Search service as a result.

“Google failed to provide, until at least September 28, 2023, a technical solution to allow publishers and press agencies to opt out of their content being used to train Bard without such a decision affecting the display of their content on other Google services,” the document reads.

Delisting a news company from Google’s search tool would almost certainly have a disastrous impact, likely putting it out of business as it would be much harder to attract readers.

The regulator warned that it will closely monitor the effectiveness of Google’s opt-out mechanisms going forward to ensure that publishers can freely choose if they would agree that their content is used for large AI training efforts.

Google Rushes to Turn the Page to Avoid Stirring Up the Hornet’s Nest

finding about ai training by oxford university survey
Source: Oxford University Survey

For its part, Google acknowledged the issues identified by the French authority regarding the lack of transparency in negotiations and the AI copyright question, though it portrayed the $270 million fine as “disproportionate.”

“We have compromised because it is time to turn the page and, as our numerous agreements with publishers prove, we want to focus on sustainable approaches in order to connect Internet users with quality content and work constructively with publishers,” a blog post published by Google reads.

The company agreed not to contest the facts in the regulatory case in exchange for a streamlined settlement process and monetary penalty under the approved range proposed by investigators.

However, Google maintained that the French authority “does not challenge the way web content is used to improve newer products like generative AI,” which the company claims is covered by copyright exceptions.

Google seems to be willing to move on at the moment, possibly to avoid stirring up the hornet’s nest. Considering the controversy surrounding AI training and the use of copyrighted data for this purpose, the faster AI labs can clear this hurdle, the easier it will be for them to grow their AI models to keep fine-tuning the quality of the output that they can produce.

Lawmakers are still struggling to make sense of how artificial intelligence is transforming society and what kind of safeguards they need to put in place to protect the public and organizations from having their data unlawfully exploited by AI labs.

Earlier this month, the European Union approved the most comprehensive piece of legislation drafted to date to oversee the safe and transparent development of AI technologies – a new law called the AI Act.

The legislation is considered the world’s first comprehensive legal framework pushed forward by developed nations to regulate the technology.

Pending Cases Could Set Precedent for Upcoming Regulatory and Judicial Actions

While the French decision represents one of the most aggressive regulatory actions taken thus far over AI copyright issues, it leaves several key legal questions unresolved regarding the scope of intellectual property protections and limitations around ingesting copyrighted works for large-scale training of commercial AI products.

Because this decision is based on a previous agreement between Google and publishers, it doesn’t really set a major precedent for the use of copyrighted materials in AI training sets. However, it shows that regulators are serious and ready to call out AI firms that don’t follow the law.

Companies, regulators, and publishers keep battling in the midst of a rapidly escalating environment filled with lawsuits targeting the legality of AI companies’ data practices. This sets the stage for precedent-setting court rulings that could shape the future development of generative AI.

Flagship cases like The New York Times lawsuit against OpenAI are being closely watched by industry professionals, tech companies, and media businesses to see how courts respond to these challenging and pressing matters in the absence of specific laws that provide a robust framework for judging one way or the other.

If The New York Times wins the lawsuit, the AI industry will be forced to change dramatically. AI giants would no longer be able to haphazardly collect data from all around the internet to train their models as they would be opening themselves up to massive lawsuits.