Google adds a switch for publishers to opt out of becoming AI training data

Now the Google-Extended flag in robots.txt can tell Google’s crawlers to include a site in search without using it to train new AI models like the ones powering Bard.

By Emma Roth, a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

Sep 28, 2023, 7:31 PM UTC

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

Illustration: The Verge

Google just announced it’s giving website publishers a way to opt out of having their data used to train the company’s AI models while remaining accessible through Google Search. The new tool, called Google-Extended, allows sites to continue to get scraped and indexed by crawlers like the Googlebot while avoiding having their data used to train AI models as they develop over time.

The company says Google-Extended will let publishers “manage whether their sites help improve Bard and Vertex AI generative APIs,” adding that web publishers can use the toggle to “control access to content on a site.” Google confirmed in July that it’s training its AI chatbot, Bard, on publicly available data scraped from the web.

Google-Extended is available through robots.txt, also known as the text file that informs web crawlers whether they can access certain sites. Google notes that “as AI applications expand,” it will continue to explore “additional machine-readable approaches to choice and control for web publishers” and that it will have more to share soon.

Already, many sites have moved to block the web crawler that OpenAI uses to scrape data and train ChatGPT, including The New York Times, CNN, Reuters, and Medium. However, there have been concerns over how to block out Google. After all, websites can’t close off Google’s crawlers completely, or else they won’t get indexed in search. This has led some sites, such as The New York Times, to legally block Google instead by updating their terms of service to ban companies from using their content to train AI.

Google adds a switch for publishers to opt out of becoming AI training data

Google adds a switch for publishers to opt out of becoming AI training data

Now the Google-Extended flag in robots.txt can tell Google’s crawlers to include a site in search without using it to train new AI models like the ones powering Bard.

iPhone owners say the latest iOS update is resurfacing deleted nudes

The Mac vs. PC war is back on

Google Sheets’ new formatting feature has Excel switchers excited

Sony’s new PlayStation PC overlay is a simple start

Google opens up its smart home to everyone and will make Google TVs home hubs

More from this stream From ChatGPT to Google Bard: how AI is rewriting the internet

At least in Canada, companies are responsible when their customer service chatbots lie to their customer.

Scientists are extremely concerned about this rat's “dck.”

Sora’s AI-generated video looks cool, but it’s still bad with hands.

You sound like a bot

Google adds a switch for publishers to opt out of becoming AI training data

Google adds a switch for publishers to opt out of becoming AI training data

Now the Google-Extended flag in robots.txt can tell Google’s crawlers to include a site in search without using it to train new AI models like the ones powering Bard.

Share this story

Related

iPhone owners say the latest iOS update is resurfacing deleted nudes

The Mac vs. PC war is back on

Google Sheets’ new formatting feature has Excel switchers excited

Sony’s new PlayStation PC overlay is a simple start

Google opens up its smart home to everyone and will make Google TVs home hubs

More from this stream From ChatGPT to Google Bard: how AI is rewriting the internet

At least in Canada, companies are responsible when their customer service chatbots lie to their customer.

Scientists are extremely concerned about this rat's “dck.”

Sora’s AI-generated video looks cool, but it’s still bad with hands.

You sound like a bot