June 21st, 2024

OpenAI and Anthropic are ignoring robots.txt

Two AI startups, OpenAI and Anthropic, are reported to be disregarding robots.txt rules, allowing them to scrape web content despite claiming to respect such regulations. TollBit analytics revealed this behavior, raising concerns about data misuse.

Read original article

OpenAI and Anthropic are ignoring robots.txt

OpenAI and Anthropic, two leading AI startups, have been found to disregard the established rule of robots.txt, which prevents bots from scraping web content. Despite publicly stating their respect for such rules, these companies are either ignoring or bypassing them, as revealed by analytics from TollBit. TollBit, a startup facilitating licensing deals between publishers and AI firms, informed major publishers about this issue. While OpenAI and Anthropic claim to respect robots.txt, their actions suggest otherwise, as they continue to scrape content from websites. This behavior has raised concerns about the misuse of web data for training AI models. OpenAI, known for ChatGPT, and Anthropic, behind the chatbot Claude, have faced scrutiny for their data scraping practices. The situation highlights the ongoing debate around copyright and AI training data, with the US Copyright Office expected to provide updated guidance on this matter.

Public servants uneasy as government 'spy' robot prowls federal offices

Public servants in Gatineau are uneasy as a robot from the VirBrix platform optimizes workspaces by collecting data on air quality and light levels. Despite assurances, the Government Services Union expresses privacy concerns.

Internet Archive forced to remove 500k books after publishers' court win

The Internet Archive removed 500,000 books due to a court ruling favoring publishers. The organization is appealing, arguing for fair use. Supporters stress the impact on education and access to information.

Lessons About the Human Mind from Artificial Intelligence

In 2022, a Google engineer claimed AI chatbot LaMDA was self-aware, but further scrutiny revealed it mimicked human-like responses without true understanding. This incident underscores AI limitations in comprehension and originality.

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.

The Encyclopedia Project, or How to Know in the Age of AI

Artificial intelligence challenges information reliability online, blurring real and fake content. An anecdote underscores the necessity of trustworthy sources like encyclopedias. The piece advocates for critical thinking amid AI-driven misinformation.

3 comments

By @joshstrange - 10 months

http://archive.today/bVgFO

By @arthurcolle - 10 months

robots.txt is a suggestion not a rule

By @Handy-Man - 10 months

Title editorialized due to being too long

OpenAI and Anthropic are ignoring robots.txt

Related

Public servants uneasy as government 'spy' robot prowls federal offices

Internet Archive forced to remove 500k books after publishers' court win

Lessons About the Human Mind from Artificial Intelligence

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The Encyclopedia Project, or How to Know in the Age of AI

Related

Public servants uneasy as government 'spy' robot prowls federal offices

Internet Archive forced to remove 500k books after publishers' court win

Lessons About the Human Mind from Artificial Intelligence

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The Encyclopedia Project, or How to Know in the Age of AI