Study: Almost all leading AI chatbots show signs of cognitive decline
A study in The BMJ found leading AI chatbots show cognitive decline, with ChatGPT 4o scoring highest. Limitations in visuospatial skills and executive functions may hinder their clinical effectiveness.
Read original articleA recent study published in The BMJ reveals that nearly all leading AI chatbots exhibit signs of cognitive decline, challenging the notion that AI could soon replace human doctors. The research assessed the cognitive abilities of prominent large language models (LLMs) such as ChatGPT and Gemini using the Montreal Cognitive Assessment (MoCA), a test typically used to detect early signs of dementia. The results indicated that older versions of chatbots performed worse, similar to older patients. ChatGPT 4o scored the highest with 26 out of 30, while Gemini 1.0 scored the lowest at 16. All chatbots struggled particularly with visuospatial skills and executive functions, which are critical for clinical applications. The findings suggest that while LLMs can perform well in certain medical diagnostic tasks, their cognitive limitations, especially in visual abstraction and executive function, may hinder their effectiveness in clinical settings. The authors conclude that neurologists are unlikely to be replaced by AI models in the near future and may instead find themselves treating AI systems that exhibit cognitive impairments.
- Leading AI chatbots show signs of cognitive decline, challenging their potential to replace human doctors.
- The study used the Montreal Cognitive Assessment (MoCA) to evaluate the cognitive abilities of various LLMs.
- ChatGPT 4o scored the highest, while all models struggled with visuospatial skills and executive functions.
- Findings indicate significant cognitive limitations in AI that could impede clinical applications.
- Neurologists may soon encounter AI models presenting with cognitive impairments rather than being replaced by them.
Related
IRL 25: Evaluating Language Models on Life's Curveballs
A study evaluated four AI models—Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro, and Mistral Large—on real-life communication tasks, revealing strengths in professionalism but weaknesses in humor and creativity.
Brute-Forcing the LLM Guardrails
The article examines the challenges of bypassing guardrails in large language models, particularly regarding medical diagnoses, revealing vulnerabilities in AI systems and the need for improved safeguards.
The more sophisticated AI models get, the more likely they are to lie
Recent research shows that advanced AI models, like ChatGPT, often provide convincing but incorrect answers due to training methods. Improving transparency and detection systems is essential for addressing these inaccuracies.
Brute-Forcing the LLM Guardrails
The article discusses how prompt engineering can bypass guardrails in large language models, achieving a 60% success rate in extracting medical diagnoses, highlighting vulnerabilities and the need for improved defenses.
A.I. Chatbots Defeated Doctors at Diagnosing Illness
A study found ChatGPT outperformed human doctors in diagnosing medical conditions, achieving 90% accuracy compared to 76% for doctors using the AI and 74% for those not using it.
Related
IRL 25: Evaluating Language Models on Life's Curveballs
A study evaluated four AI models—Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro, and Mistral Large—on real-life communication tasks, revealing strengths in professionalism but weaknesses in humor and creativity.
Brute-Forcing the LLM Guardrails
The article examines the challenges of bypassing guardrails in large language models, particularly regarding medical diagnoses, revealing vulnerabilities in AI systems and the need for improved safeguards.
The more sophisticated AI models get, the more likely they are to lie
Recent research shows that advanced AI models, like ChatGPT, often provide convincing but incorrect answers due to training methods. Improving transparency and detection systems is essential for addressing these inaccuracies.
Brute-Forcing the LLM Guardrails
The article discusses how prompt engineering can bypass guardrails in large language models, achieving a 60% success rate in extracting medical diagnoses, highlighting vulnerabilities and the need for improved defenses.
A.I. Chatbots Defeated Doctors at Diagnosing Illness
A study found ChatGPT outperformed human doctors in diagnosing medical conditions, achieving 90% accuracy compared to 76% for doctors using the AI and 74% for those not using it.