July 31st, 2024

Websites Are Blocking the Wrong AI Scrapers

Outdated robots.txt instructions are causing confusion, blocking old AI scrapers while allowing CLAUDEBOT to scrape freely. Many sites haven't updated their blocklists, complicating management for website owners.

Read original articleLink Icon
Websites Are Blocking the Wrong AI Scrapers

Many websites are mistakenly blocking outdated AI scrapers from Anthropic while leaving its current scraper, CLAUDEBOT, unblocked. This confusion arises from website owners copying and pasting old robots.txt instructions, which do not reflect the latest changes in AI crawling technology. Anthropic has confirmed that its older bots, ANTHROPIC-AI and CLAUDE-WEB, are no longer in use and that CLAUDEBOT is now the active crawler. However, many popular sites, including Reuters and Condé Nast, have not updated their blocklists, allowing CLAUDEBOT to scrape their content freely. The operator of Dark Visitors, a site that tracks web crawlers, noted that the rapid evolution of AI scrapers makes it challenging for website owners to keep their blocklists current. This has led to some sites blocking all crawlers or only allowing a few, which can inadvertently restrict access for legitimate services like search engines and academic research tools. The Data Provenance Initiative highlighted the burden on website owners to manage these evolving agents, as many are unaware of which bots are active or who operates them. The situation has prompted calls for AI companies to be more respectful of website owners' preferences and for creators to consider paywalls to protect their content from unregulated scraping. Overall, the landscape of AI scrapers is complex and constantly changing, leading to significant confusion among content creators and website operators.

Link Icon 1 comments