June 3, 2025

Google has been found using online content to train its search-based AI tools without the knowledge or consent of the publishers who produced it, according to leaked court testimony.

The documents, which were released during the US antitrust trial on Google’s online search dominance, also reveal that the company’s executives rejected alternative approaches that would have allowed publishers a say in the use of their content by the tech major.

The revelations come amid increased scrutiny globally of similar anti-competitive practices that have led to numerous court cases and government clampdowns on tech giants.

The Google trial is based on a lawsuit based on complaints that the company’s search engine holds an illegal monopoly in online search over rivals like Perplexity and OpenAI.

Chetna Bindra, a Google Search product management executive, explicitly stated that the company had drawn a “hard red line”, requiring all publishers who wanted their content to appear in search results to allow the content to feed Google's AI features.

The documents suggest that Google described a proposal for an alternative approach as “likely unstable” and decided against imposing additional controls. Publishers who were “not satisfied” could choose to remove themselves entirely from search indexing.

Instead of giving alternative options, the tech giant deliberately chose a restrictive path and planned to implement the changes through “silent updates” and “no public announcement” about how they were using publishers’ data.

“Do what we say, say what we do, but carefully,” Bindra said in the document.

160 billion content snippets

Google's internal deliberations, displayed in federal court during the May testimony, show the company considered multiple approaches to handle publisher content for AI training.

Among the options discussed was “SGE (search generative experience) only opt-outs”, which would have let publishers stay out of AI-generated summaries without disappearing from the search engine.

Google had to remove around half of what it was using — 80 billion snippets of content out of 160 billion — from its AI training material to comply with opt-out requests.

Under Google's approach, publishers who used an option called Google-Extended to block AI training based on their content but chose to remain in the search engine still found their content feeding the company's most visible AI products.

The internal documents validate what publishers have long suspected.

Google's supposed respect for content creators was performative theatre designed to provide legal cover while the company systematically harvested their work.

Paul Bannister, chief strategy officer at Raptive, which represents online creators, called the revelation “a little bit damning”.

He noted that the documents “pretty clearly show that they knew there was a range of options and they pretty much chose the most conservative, most protective of them — the option that didn't give publishers any controls at all”.

Online search monopoly

Last August, when a US federal judge ruled that Google violated antitrust laws by maintaining an illegal monopoly in the online search market, the court found that the search giant controlled approximately 90 percent of the US search engine market and used exclusionary agreements with device manufacturers and browser developers to suppress competition.

The company's internal presentations acknowledged this while recommending how to present the policy changes and what not to say explicitly.

“If aligned, as a next step, we will work on actual language and get this out,” said Bindra’s document, which was written in April 2024.

One month later, at Google's annual developer conference, the company launched its “fully revamped” AI-infused search experience.

When someone searches for information on Google now, instead of clicking through to a news website or blog, they often get their answer directly from Google's AI summary at the top of the search results.

This means publishers lose the website visits they depend on to show ads and sell products to readers, resulting in immediate financial damage.

Lost revenue

Industry executives report that traffic to their sites has plummeted since Google launched these AI-powered answer boxes, cutting off a crucial revenue stream that many publishers need to survive.

“Publishers and some governments around the world are trying to figure out how to get fair payments for original content from journalists, writers and other creators,” says Schiffrin.

“The French competition authority has fined Google. The New York Times is suing OpenAI. Other outlets feel it is not worth it to sue, and so they are striking deals on their own,” she added.

The lawsuit by The New York Times claims that OpenAI and Microsoft used millions of its articles without permission to train AI systems, violating copyright law.

While France's competition authority fined Google €250 million for breaching licensing commitments to French publishers, it found that the company had trained its AI chatbot, Bard (now Gemini), on news content without informing them, thereby violating EU intellectual property rules.

Google’s explanation for rejecting more detailed controls seems aimed at maintaining its flexibility while limiting publishers’ power.

Google's head of search Liz Reid testified that creating multiple opt-outs would be “challenging” because it would require separate models for different features, adding “enormous complexity” and significant hardware costs.

“That would mean if Search has multiple GenAI features on the page, which it can easily do, each of those would be required to have a separate model powering it. But we don’t build separate models for those,” Reid said.

However, publisher rights defenders argue this explanation is disingenuous.

“This is a strategy to ensure that Google has full market power, and the publishers lose one of their key chips in the negotiation,” says Brooke Hartley Moy, CEO of AI start-up Infactory, who works with publishers.

Yet the consequences reach further than the balance sheets. The foundation of credible information is beginning to erode.

If Google's approach succeeds in weakening journalism and professional content creation, the consequences will extend far beyond corporate balance sheets.