November 17th, 2024

A.I. Chatbots Defeated Doctors at Diagnosing Illness

A study found ChatGPT outperformed human doctors in diagnosing medical conditions, achieving 90% accuracy compared to 76% for doctors using the AI and 74% for those not using it.

Read original article

A.I. Chatbots Defeated Doctors at Diagnosing Illness

A recent study revealed that ChatGPT outperformed human doctors in diagnosing medical conditions based on case histories. Conducted by Dr. Adam Rodman and his team, the study involved 50 physicians who were assessed on their diagnostic abilities using six complex case histories. ChatGPT achieved an average score of 90%, while doctors using the chatbot scored 76% and those without it scored 74%. The findings highlighted a concerning trend: many doctors did not fully utilize the chatbot's capabilities, often sticking to their initial diagnoses despite contrary suggestions from the AI. This indicates a potential overconfidence in their diagnostic skills and a lack of familiarity with effectively leveraging AI tools. The study underscores the need for better integration of AI in medical practice, as well as training for physicians to enhance their diagnostic processes. The results suggest that while AI can serve as a valuable diagnostic aid, there is still a significant gap in how doctors interact with these technologies.

- ChatGPT outperformed doctors in diagnosing illnesses in a recent study.

- Doctors using the chatbot showed only marginal improvement over those who did not.

- Many physicians did not fully utilize the chatbot's capabilities, often ignoring its suggestions.

- The study highlights the need for better training in AI tools for medical professionals.

- There is a significant gap in the integration of AI in clinical practice.

Can ChatGPT do data science?

A study led by Bhavya Chopra at Microsoft, with contributions from Ananya Singha and Sumit Gulwani, explored ChatGPT's challenges in data science tasks. Strategies included prompting techniques and leveraging domain expertise for better interactions.

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

Recent research shows GPT-4V outperforms physicians in medical imaging accuracy but has flawed rationales. Its potential in decision support requires further evaluation before clinical use, highlighting AI's limitations.

Kids who use ChatGPT as a study assistant do worse on tests

A University of Pennsylvania study found that high school students using ChatGPT scored worse on tests, while a specialized AI tutor improved problem-solving but not test scores, highlighting potential learning inhibition.

Kids who use ChatGPT as a study assistant do worse on tests

A University of Pennsylvania study found high school students using ChatGPT performed worse on math tests, indicating that reliance on AI may hinder learning and problem-solving skills despite improved practice performance.

The more sophisticated AI models get, the more likely they are to lie

Recent research shows that advanced AI models, like ChatGPT, often provide convincing but incorrect answers due to training methods. Improving transparency and detection systems is essential for addressing these inaccuracies.

12 comments

By @wumeow - 5 months

> How, then, do doctors diagnose patients? The problem, said Dr. Andrew Lea, a historian of medicine at Brigham and Women’s Hospital who was not involved with the study, is that “we really don’t know how doctors think.” In describing how they came up with a diagnosis, doctors would say, “intuition,” or, “based on my experience,” Dr. Lea said.

I was a big fan of the show House as a kid, and I remember being blown away when I learned that the “Department of Diagnostic Medicine” was made up for the show and not a standard department in every large hospital.

By @rtsil - 5 months

> “They didn’t listen to A.I. when A.I. told them things they didn’t agree with,” Dr. Rodman said.

Replace AI with patient, and it's a far too familiar experience.

By @julienchastang - 5 months

Most fascinating part of the article is at the end:

``` Dr. Chen said he noticed that when he peered into the doctors’ chat logs, “they were treating it like a search engine for directed questions: ‘Is cirrhosis a risk factor for cancer? What are possible diagnoses for eye pain?’” “It was only a fraction of the doctors who realized they could literally copy-paste in the entire case history into the chatbot and just ask it to give a comprehensive answer to the entire question,” Dr. Chen added. “Only a fraction of doctors actually saw the surprisingly smart and comprehensive answers the chatbot was capable of producing.”

```

By @adxl - 5 months

I beat doctors at diagnosing family members. It’s not hard, many doctors are terrible at diagnosis.

By @blackeyeblitzar - 5 months

This doesn't surprise me. When I see a doctor in real life, they are moving patient to patient quickly and are always behind schedule. I'm sure they are also losing lots of time to submitting notes and administrative things. So they don't really have time to deeply analyze my health situation. Multiple times I've been the one to compare different test results, establish patterns or correlations, perform research, and suggest possible diagnoses or solutions to my doctor, who then double-checks my work and accepts my theory or offers an alternative. This collaborative approach has been helpful because there are many conditions where you could mistakenly match symptoms to some generic condition if you don't look at things more deeply. After all, there is a very broad set of medical conditions out there and a single doctor cannot reasonably be expected to know all of them deeply.

But I've also had medical professionals, particularly the non doctors (nurse practitioners, physicians assistants, etc) who are much less receptive and more fixated on their first guess, which has sometimes resulted in precious lost time and repeated visits for me. The linked research finding is interesting, and I think highlights the pitfall of professionals who believe too much in their own expertise or gut feeling even when they've not really examined the case carefully:

> The chatbot, from the company OpenAI, scored an average of 90 percent when diagnosing a medical condition from a case report and explaining its reasoning. Doctors randomly assigned to use the chatbot got an average score of 76 percent. Those randomly assigned not to use it had an average score of 74 percent.

> The study showed more than just the chatbot’s superior performance.

> It unveiled doctors’ sometimes unwavering belief in a diagnosis they made, even when a chatbot potentially suggests a better one.

> And the study illustrated that while doctors are being exposed to the tools of artificial intelligence for their work, few know how to exploit the abilities of chatbots. As a result, they failed to take advantage of A.I. systems’ ability to solve complex diagnostic problems and offer explanations for their diagnoses.

By @jimmytucson - 5 months

I was lucky enough to be in undergrad with Dr. Rodman. One of the kindest and most intellectually honest people I’ve ever met. Passionate about ideas but at the same time, emotionally uninvested in bad ones.

He was a history major before he went on to study medicine, and he now does a podcast on the history of medicine called Bedside Rounds. He gets really excited when talking about something he finds interesting and it makes you want to follow him down the rabbit hole. Highly recommend listening at half speed: http://bedside-rounds.org/

By @hereme888 - 5 months

You'd be surprised at how limited LLMs still are at choosing the best treatment course for a diagnosis.

First: which chatbot can correctly pick the labs, imaging and other methods of investigation without wasting tons of $ and going off the rails with rabbit holes?

Second: get a chatbot to understand the clinical impression and correlate it to the history, labs and imaging.

Then: get a chatbot to understand that despite X being a standard antibiotic regimen for the infection, given the person's age, lab findings, and severity of the disease, Y for Z many days is actually a better strategy for these specific instances.

By @Pigalowda - 5 months

That test case involved a 76-year-old patient with severe pain in his low back, buttocks and calves when he walked. The pain started a few days after he had been treated with balloon angioplasty to widen a coronary artery. He had been treated with the blood thinner heparin for 48 hours after the procedure. The man complained that he felt feverish and tired. His cardiologist had done lab studies that indicated a new onset of anemia and a buildup of nitrogen and other kidney waste products in his blood. The man had had bypass surgery for heart disease a decade earlier. The case vignette continued to include details of the man’s physical exam, and then provided his lab test results. The correct diagnosis was cholesterol embolism — a condition in which shards of cholesterol break off from plaque in arteries and block blood vessels.

————————

Strange answer.

By @maeil - 5 months

> A.I. systems should be “doctor extenders,” Dr. Rodman said, offering valuable second opinions on diagnoses.

Clearly not. These results show that most of the time doctors should be A.I extenders, offering valuable second opinions on diagnoses.

It tells you everything that he's "shocked", and despite that shock, he still maintains the above keeping with the cognitive dissonance. Many of us who have enough experience with modern healthcare could see this coming from miles away, and would have seen the opposite result (doctors beating GPT on average) as shocking.

By @Slep9 - 5 months

Not surprising at all. Most of my good doctor friends were some of the most knowledgeable but dumb persons I have seen.

By @mgh2 - 5 months

https://archive.is/QT7QH#selection-659.0-659.52

50 sample size not very reliable.

By @smrtinsert - 5 months

Well probably have to wait for a generational switch before we see doctors effectively leveraging AI.

A.I. Chatbots Defeated Doctors at Diagnosing Illness

Related

Can ChatGPT do data science?

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

Kids who use ChatGPT as a study assistant do worse on tests

Kids who use ChatGPT as a study assistant do worse on tests

The more sophisticated AI models get, the more likely they are to lie

Related

Can ChatGPT do data science?

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

Kids who use ChatGPT as a study assistant do worse on tests

Kids who use ChatGPT as a study assistant do worse on tests

The more sophisticated AI models get, the more likely they are to lie