July 4th, 2024

Cloudflare rolls out feature for blocking AI companies' web scrapers

Cloudflare introduces a new feature to block AI web scrapers, available in free and paid tiers. It detects and combats automated extraction attempts, enhancing website security against unauthorized scraping by AI companies.

Read original articleLink Icon
Cloudflare rolls out feature for blocking AI companies' web scrapers

Cloudflare has introduced a new feature to block web scrapers used by artificial intelligence (AI) companies from extracting website content. This feature is part of Cloudflare's content delivery network (CDN) and is available in both free and paid tiers. Many AI companies rely on web content for training their large language models (LLMs), and Cloudflare's tool aims to address the issue of some LLM developers not providing opt-out options for website operators. The feature utilizes AI to detect automated content extraction attempts, even those trying to evade detection by mimicking real browsers. Cloudflare will continuously update the feature to adapt to changes in AI scraping bots and is also launching a tool for website operators to report new bots encountered. The company's system assigns a score to website visits to identify potential bot activity, with requests from a bot collecting content for Perplexity AI consistently receiving low scores. Cloudflare's initiative aims to enhance website security and prevent unauthorized scraping by AI companies.

Link Icon 4 comments
By @unyttigfjelltol - 5 months
I use Perplexity and it says its process is to break down a request into a series of web searches that it conducts at central servers in real time basically at my request. It then reviews the pages for relevant information and provides a summary of sorts.

I can imagine this might result in Perplexity having a bigger visibility to web site owners, because it's contstantly doing bursts of searches from a central server that aren't labeled like a web scraper. But, that's exactly how it responds in real time to my user inputs and requests.

Cloudflare's service appears to target actual web scraping instead of real time searches, so I'm not sure they actually hit the nail on the head in mentioning Perplexity in relation to their service.

By @iruoy - 5 months
See https://news.ycombinator.com/item?id=40865627 for cloudflare's blogpost.
By @bob1029 - 5 months
What does the end game look like?

Browser automation tools imply that we can battle this forever. Much like with copyright and other forms of DRM/anti-cheating technology.

I don't believe there exists purpose in chasing this rabbit beyond making customers and investors believe they have a chance of actually catching it.

By @zx8080 - 5 months
It's probably going to be like a firewall or antivirus or "endpoint security" market. Protection from AI intelligence as-a-service.