February 19th, 2025

Older AI models show signs of cognitive decline, study shows

A study in the BMJ reveals older AI models show cognitive decline, raising concerns for medical diagnostics. Critics argue human cognitive tests are unsuitable for evaluating AI performance.

Read original articleLink Icon
Older AI models show signs of cognitive decline, study shows

A recent study published in the BMJ indicates that older AI models, particularly large language models (LLMs) and chatbots, exhibit signs of cognitive decline, similar to humans. The research involved testing various LLMs, including OpenAI's ChatGPT and Alphabet's Gemini, using the Montreal Cognitive Assessment (MoCA), a tool typically used to evaluate cognitive impairment in humans. While newer models like ChatGPT version 4 scored relatively well, older models like Gemini 1.0 performed poorly, raising concerns about their reliability in medical diagnostics. Critics of the study argue that applying human cognitive tests to AI is inappropriate, as the MoCA was designed for human cognition and does not align with the operational framework of LLMs. They suggest that the study's methodology and framing anthropomorphize AI, leading to misleading conclusions. The study's authors acknowledge the limitations of their findings and emphasize the need for a critical examination of AI's role in clinical settings, particularly in tasks requiring visual and executive functions. The debate continues, with some experts calling for more rigorous testing of AI models over time to better understand their cognitive capabilities.

- Older AI models show signs of cognitive decline, raising concerns for medical diagnostics.

- The study used the Montreal Cognitive Assessment (MoCA) to evaluate AI performance.

- Critics argue that human cognitive tests are not suitable for AI evaluation.

- Newer models performed better than older ones, highlighting potential reliability issues.

- The study emphasizes the need for critical assessment of AI's role in healthcare.

Link Icon 3 comments
By @snypher - about 2 months
So did the previous generation models test higher when released, and now don't test as high? I may have misread the article but it seemed like the older models had a worse score all along, which can't really be called 'decline'?
By @xg15 - about 2 months
...by which they mean "earlier-generation models have lower scores in a test measuring cognitive function than newer models".

The entire study was an exercise in academic clickbait, something that even other scientists complained about:

> Other scientists have been left unconvinced about the study and its findings, going so far as to critisize the methods and the framing — in which the study's authors are accused of anthropomorphizing AI by projecting human conditions onto it. There is also criticism of the use of MoCA. This was a test examined purely for use in humans, it is suggested, and would not render meaningful results if applied to other forms of intelligence.

The study authors defend with the classic "It's just a joke, bro" card:

> Responding to the discussion, lead author of the study Roy Dayan, a doctor of medicine at the Hadassah Medica Center in Jerusalem, commented that many of the responses to the study have taken the framing too literally. Because the study was published in the Christmas edition of the BMJ, they used humor to present the findings of the study — including the pun "Age Against the Machine" — but intended the study to be considered seriously.

By @jcz_nz - about 2 months
This “study” is… total BS. I’d be super suspicious of anything else Roy Dayan ever puts his name on. Clickbait is the most generous interpretation.