December 17th, 2024

The Rise of the AI Crawler

AI crawlers like GPTBot and Claude are generating significant web traffic but struggle with JavaScript rendering, leading to inefficiencies. Recommendations include server-side rendering and efficient URL management for better accessibility.

Read original articleLink Icon
The Rise of the AI Crawler

AI crawlers have emerged as a significant force on the web, with OpenAI's GPTBot and Anthropic's Claude generating substantial traffic across Vercel's network. In the past month, GPTBot made 569 million requests, while Claude followed with 370 million, together accounting for about 28% of Googlebot's total requests. Despite their growing presence, AI crawlers face challenges, particularly in handling JavaScript rendering. None of the major AI crawlers, including ChatGPT and Claude, currently execute JavaScript, which limits their ability to access client-side rendered content. This contrasts with Googlebot, which effectively renders JavaScript. The analysis also revealed that AI crawlers exhibit inefficiencies, such as high rates of 404 errors and redirects, indicating a need for improved URL management. Furthermore, AI crawlers prioritize different content types, with ChatGPT focusing on HTML and Claude on images. Recommendations for web developers include prioritizing server-side rendering for critical content and maintaining efficient URL management to enhance crawler accessibility. For those wishing to restrict crawler access, using robots.txt and Vercel's firewall options is advised. Overall, while AI crawlers are rapidly scaling, they still require optimization to effectively navigate and index modern web applications.

- AI crawlers are generating significant web traffic, with GPTBot and Claude leading in requests.

- Major AI crawlers do not execute JavaScript, limiting their access to dynamic content.

- High rates of 404 errors and redirects indicate inefficiencies in AI crawler behavior.

- Recommendations include server-side rendering for critical content and efficient URL management.

- Web developers can use robots.txt and firewalls to control crawler access.

Link Icon 2 comments
By @arlattimore - about 1 month
The inefficiencies in the crawling for these AI products is surely going to get better in a hurry, they'll be burning through resources/money as it stands.