AI worse than humans in every way at summarising information, trial finds
A trial by ASIC found AI less effective than humans in summarizing documents, with human summaries scoring 81% compared to AI's 47%. AI often missed context and included irrelevant information.
Read original articleA recent trial conducted by Amazon for Australia's corporate regulator, the Securities and Investments Commission (ASIC), revealed that artificial intelligence (AI) is less effective than humans at summarizing documents. The trial involved comparing summaries generated by the AI model Llama2-70B with those produced by ten ASIC staff members. Reviewers assessed the summaries based on criteria such as coherency, length, and the ability to identify relevant references. The results showed that human summaries scored 81% on the evaluation rubric, while AI summaries only achieved 47%. Reviewers noted that AI often failed to capture nuance and context, sometimes including incorrect or irrelevant information. Consequently, they expressed concerns that AI-generated summaries could create additional work due to the need for fact-checking against original documents. Although the report acknowledged that advancements in AI technology could improve summarization capabilities in the future, it emphasized that human critical analysis remains superior. The findings suggest that AI should be viewed as a tool to assist rather than replace human efforts in summarization tasks.
- A government trial found AI to be less effective than humans in summarizing documents.
- Human summaries scored significantly higher than AI summaries in various evaluation criteria.
- Reviewers noted that AI often missed context and included irrelevant information.
- The trial highlighted the potential for AI to create additional work due to fact-checking needs.
- Future advancements in AI may improve summarization, but human analysis is currently unmatched.
Related
Mozilla.ai did what? When silliness goes dangerous
Mozilla.ai, a Mozilla Foundation project, faced criticism for using biased statistical models to summarize qualitative data, leading to doubts about its scientific rigor and competence in AI. The approach was deemed ineffective and compromised credibility.
Everyone Is Judging AI by These Tests. Experts Say They're Close to Meaningless
Benchmarks used to assess AI models may mislead, lacking crucial insights. Google and Meta's AI boasts are criticized for outdated, unreliable tests. Experts urge more rigorous evaluation methods amid concerns about AI's implications.
When ChatGPT summarises, it does nothing of the kind
The article critiques ChatGPT's summarization limitations, citing a failed attempt to summarize a 50-page paper accurately. It questions the reliability of large language models for business applications due to inaccuracies.
There's Just One Problem: AI Isn't Intelligent, and That's a Systemic Risk
AI mimics human intelligence but lacks true understanding, posing systemic risks. Over-reliance may lead to failures, diminish critical thinking, and fail to create enough jobs, challenging economic stability.
There's Just One Problem: AI Isn't Intelligent
AI mimics human intelligence without true understanding, posing systemic risks and undermining critical thinking. Economic benefits may lead to job quality reduction and increased inequality, failing to address global challenges.
There's a whole category of issues around this that I don't see how the current formulation of AI based on LLMs can solve.
This is an old model that was not dominant even when released. This study must be fairly old or I question the qualification of the group running it.
Better than an average human? Absolutely.
So not cheaper, faster, or more consistent?
Sounds more like "worse than humans in the few ways measured in this limited trial"
Related
Mozilla.ai did what? When silliness goes dangerous
Mozilla.ai, a Mozilla Foundation project, faced criticism for using biased statistical models to summarize qualitative data, leading to doubts about its scientific rigor and competence in AI. The approach was deemed ineffective and compromised credibility.
Everyone Is Judging AI by These Tests. Experts Say They're Close to Meaningless
Benchmarks used to assess AI models may mislead, lacking crucial insights. Google and Meta's AI boasts are criticized for outdated, unreliable tests. Experts urge more rigorous evaluation methods amid concerns about AI's implications.
When ChatGPT summarises, it does nothing of the kind
The article critiques ChatGPT's summarization limitations, citing a failed attempt to summarize a 50-page paper accurately. It questions the reliability of large language models for business applications due to inaccuracies.
There's Just One Problem: AI Isn't Intelligent, and That's a Systemic Risk
AI mimics human intelligence but lacks true understanding, posing systemic risks. Over-reliance may lead to failures, diminish critical thinking, and fail to create enough jobs, challenging economic stability.
There's Just One Problem: AI Isn't Intelligent
AI mimics human intelligence without true understanding, posing systemic risks and undermining critical thinking. Economic benefits may lead to job quality reduction and increased inequality, failing to address global challenges.