A.I. Chatbots Defeated Doctors at Diagnosing Illness
A study found ChatGPT outperformed human doctors in diagnosing medical conditions, achieving 90% accuracy compared to 76% for doctors using the AI and 74% for those not using it.
Read original articleA recent study revealed that ChatGPT outperformed human doctors in diagnosing medical conditions based on case histories. Conducted by Dr. Adam Rodman and his team, the study involved 50 physicians who were assessed on their diagnostic abilities using six complex case histories. ChatGPT achieved an average score of 90%, while doctors using the chatbot scored 76% and those without it scored 74%. The findings highlighted a concerning trend: many doctors did not fully utilize the chatbot's capabilities, often sticking to their initial diagnoses despite contrary suggestions from the AI. This indicates a potential overconfidence in their diagnostic skills and a lack of familiarity with effectively leveraging AI tools. The study underscores the need for better integration of AI in medical practice, as well as training for physicians to enhance their diagnostic processes. The results suggest that while AI can serve as a valuable diagnostic aid, there is still a significant gap in how doctors interact with these technologies.
- ChatGPT outperformed doctors in diagnosing illnesses in a recent study.
- Doctors using the chatbot showed only marginal improvement over those who did not.
- Many physicians did not fully utilize the chatbot's capabilities, often ignoring its suggestions.
- The study highlights the need for better training in AI tools for medical professionals.
- There is a significant gap in the integration of AI in clinical practice.
Related
Can ChatGPT do data science?
A study led by Bhavya Chopra at Microsoft, with contributions from Ananya Singha and Sumit Gulwani, explored ChatGPT's challenges in data science tasks. Strategies included prompting techniques and leveraging domain expertise for better interactions.
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine
Recent research shows GPT-4V outperforms physicians in medical imaging accuracy but has flawed rationales. Its potential in decision support requires further evaluation before clinical use, highlighting AI's limitations.
Kids who use ChatGPT as a study assistant do worse on tests
A University of Pennsylvania study found that high school students using ChatGPT scored worse on tests, while a specialized AI tutor improved problem-solving but not test scores, highlighting potential learning inhibition.
Kids who use ChatGPT as a study assistant do worse on tests
A University of Pennsylvania study found high school students using ChatGPT performed worse on math tests, indicating that reliance on AI may hinder learning and problem-solving skills despite improved practice performance.
The more sophisticated AI models get, the more likely they are to lie
Recent research shows that advanced AI models, like ChatGPT, often provide convincing but incorrect answers due to training methods. Improving transparency and detection systems is essential for addressing these inaccuracies.
I was a big fan of the show House as a kid, and I remember being blown away when I learned that the “Department of Diagnostic Medicine” was made up for the show and not a standard department in every large hospital.
Replace AI with patient, and it's a far too familiar experience.
``` Dr. Chen said he noticed that when he peered into the doctors’ chat logs, “they were treating it like a search engine for directed questions: ‘Is cirrhosis a risk factor for cancer? What are possible diagnoses for eye pain?’” “It was only a fraction of the doctors who realized they could literally copy-paste in the entire case history into the chatbot and just ask it to give a comprehensive answer to the entire question,” Dr. Chen added. “Only a fraction of doctors actually saw the surprisingly smart and comprehensive answers the chatbot was capable of producing.”
```
But I've also had medical professionals, particularly the non doctors (nurse practitioners, physicians assistants, etc) who are much less receptive and more fixated on their first guess, which has sometimes resulted in precious lost time and repeated visits for me. The linked research finding is interesting, and I think highlights the pitfall of professionals who believe too much in their own expertise or gut feeling even when they've not really examined the case carefully:
> The chatbot, from the company OpenAI, scored an average of 90 percent when diagnosing a medical condition from a case report and explaining its reasoning. Doctors randomly assigned to use the chatbot got an average score of 76 percent. Those randomly assigned not to use it had an average score of 74 percent.
> The study showed more than just the chatbot’s superior performance.
> It unveiled doctors’ sometimes unwavering belief in a diagnosis they made, even when a chatbot potentially suggests a better one.
> And the study illustrated that while doctors are being exposed to the tools of artificial intelligence for their work, few know how to exploit the abilities of chatbots. As a result, they failed to take advantage of A.I. systems’ ability to solve complex diagnostic problems and offer explanations for their diagnoses.
He was a history major before he went on to study medicine, and he now does a podcast on the history of medicine called Bedside Rounds. He gets really excited when talking about something he finds interesting and it makes you want to follow him down the rabbit hole. Highly recommend listening at half speed: http://bedside-rounds.org/
First: which chatbot can correctly pick the labs, imaging and other methods of investigation without wasting tons of $ and going off the rails with rabbit holes?
Second: get a chatbot to understand the clinical impression and correlate it to the history, labs and imaging.
Then: get a chatbot to understand that despite X being a standard antibiotic regimen for the infection, given the person's age, lab findings, and severity of the disease, Y for Z many days is actually a better strategy for these specific instances.
————————
Strange answer.
Clearly not. These results show that most of the time doctors should be A.I extenders, offering valuable second opinions on diagnoses.
It tells you everything that he's "shocked", and despite that shock, he still maintains the above keeping with the cognitive dissonance. Many of us who have enough experience with modern healthcare could see this coming from miles away, and would have seen the opposite result (doctors beating GPT on average) as shocking.
Related
Can ChatGPT do data science?
A study led by Bhavya Chopra at Microsoft, with contributions from Ananya Singha and Sumit Gulwani, explored ChatGPT's challenges in data science tasks. Strategies included prompting techniques and leveraging domain expertise for better interactions.
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine
Recent research shows GPT-4V outperforms physicians in medical imaging accuracy but has flawed rationales. Its potential in decision support requires further evaluation before clinical use, highlighting AI's limitations.
Kids who use ChatGPT as a study assistant do worse on tests
A University of Pennsylvania study found that high school students using ChatGPT scored worse on tests, while a specialized AI tutor improved problem-solving but not test scores, highlighting potential learning inhibition.
Kids who use ChatGPT as a study assistant do worse on tests
A University of Pennsylvania study found high school students using ChatGPT performed worse on math tests, indicating that reliance on AI may hinder learning and problem-solving skills despite improved practice performance.
The more sophisticated AI models get, the more likely they are to lie
Recent research shows that advanced AI models, like ChatGPT, often provide convincing but incorrect answers due to training methods. Improving transparency and detection systems is essential for addressing these inaccuracies.