February 13th, 2025

AI summaries turn real news into nonsense, BBC finds

The BBC's research revealed that over 51% of AI-generated news summaries contained significant inaccuracies, with Gemini performing the worst. The study emphasizes the need for responsible AI use to maintain public trust.

Read original articleLink Icon
AI summaries turn real news into nonsense, BBC finds

The BBC has conducted research into the accuracy of AI-generated news summaries following a significant error by Apple's AI service, which misrepresented a news story about a murder case. The study evaluated various AI assistants, including OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity, assessing their ability to provide accurate responses based on BBC news articles. The findings revealed that 51% of AI-generated answers contained significant issues, with 19% introducing factual errors and 13% misquoting or altering information from the original articles. Gemini performed the worst, with 34% of its responses judged problematic, while ChatGPT had the least at 15%. The BBC's Programme Director for Generative AI emphasized the potential value of AI when used responsibly but warned of the challenges it poses to the information ecosystem. The research highlighted the risks of misinformation and the potential erosion of public trust in factual reporting, raising concerns about the implications of AI in professional communication and critical thinking.

- The BBC's research found that over half of AI-generated news summaries contained significant inaccuracies.

- Apple's AI service was criticized for a major error in summarizing a news story, prompting the BBC's investigation.

- Gemini was identified as the least accurate AI assistant, with a high percentage of problematic responses.

- The study underscores the importance of responsible AI use to maintain public trust in news and information.

- Concerns were raised about the broader implications of AI on critical thinking and professional communication.

Link Icon 4 comments
By @amai - 3 months
The link to the BBC research document is https://www.bbc.co.uk/aboutthebbc/documents/bbc-research-int...

"In December 2024, the BBC carried out research into the accuracy of four prominent AI assistants that can search the internet – OpenAI’s ChatGPT; Microsoft’s Copilot; Google’s Gemini; and Perplexity. We did this by reviewing responses from the AI assistants to 100 questions about the news, asking AI assistants to use BBC News sources where possible. Ordinarily the BBC ‘blocks’ these AI assistants from accessing the BBC’s websites. These blocks were lifted for the duration of the research and have since been reinstated. AI answers were reviewed by BBC journalists, all experts in the question topics. Journalists rated each AI answer against seven criteria – (i) accuracy; (ii) attribution of sources; (iii) impartiality; (iv) distinguishing opinions from facts; (v) editorialisation (inserting comments and descriptions not backed by the facts presented in the source); (vi) context; (vii) the representation of BBC content in the response. For each of these criteria, journalists could rate each response as having no issues; some issues; significant issues or don’t know."

By @ChrisArchitect - 3 months
By @kolinko - 3 months
I checked the original research by BBC but the underlying data was not published - no way to replicate the study, and no way to check its validity.

They didn’t provide the answers models provided, so we can’t even say if the reviews of the answers were correct. And since the experiment relied on BBC giving timed access to their archive to the models, we couldn’t even type in the same prompts into the models to see the model outputs.

By @esbranson - 3 months
Ah the AI naysayers. Down with all kings but King Ludd!

Pretty good though, for consumer-grade, off-the-shelf AI.