Google just announced It offers website publishers a way to opt out of having their data used to train the company’s AI models while still remaining accessible through Google Search. The new tool, called Google extendedallows sites to continue to be crawled and indexed by crawlers such as Googlebot while preventing your data from being used to train the company’s current and future AI models.
The company says Google-Extended will allow publishers to “manage whether their sites help improve.” Bard and Vertex AI Generative APIs,” adding that web publishers can use the toggle to “control access to a site’s content.” Google confirmed in July that it is training its AI chatbot, Bard, with publicly available data scraped from the web.
Google Extended is available through robots.txt, also known as the text file that tells web crawlers whether they can access certain sites. Google notes that “as AI applications expand,” it will continue to explore “additional machine-readable approaches to web editor choice and control” and will have more to share soon.
Many sites have already taken steps to block the web crawler that OpenAI uses to mine data and train ChatGPT, including The New York Times, cnn, Reutersand Half. However, there have been concerns about blocking Google. After all, websites can’t completely shut down Google’s crawlers, or else they won’t be indexed in searches. This has led to some sites, such as He New York Timesto legally Instead, block Google by updating its terms of service to prohibit companies from using its content to train AI.