October 26th, 2024

AI-powered transcription tool used in hospitals invents things no one ever said

Researchers found that OpenAI's Whisper AI transcription tool often generates false information, with inaccuracies in up to 80% of transcriptions, raising serious concerns, especially in healthcare settings.

Read original articleLink Icon
AI-powered transcription tool used in hospitals invents things no one ever said

Researchers have identified significant issues with Whisper, an AI-powered transcription tool developed by OpenAI, which is increasingly used in various sectors, including healthcare. The tool is prone to generating false information, known as hallucinations, which can include fabricated statements, racial commentary, and even non-existent medical treatments. Interviews with software engineers and researchers revealed that hallucinations were found in a substantial number of transcriptions, with one study indicating that 80% of analyzed public meeting transcriptions contained inaccuracies. The prevalence of these errors raises concerns, particularly in medical settings where accurate transcriptions are critical for patient care. Despite OpenAI's warnings against using Whisper in high-risk domains, many healthcare providers have adopted it to streamline documentation processes. Critics argue that the tool's inaccuracies could lead to misdiagnoses and other serious consequences. Additionally, the erasure of original audio recordings by some applications complicates the verification of transcriptions, further heightening the risk of errors going unnoticed. Experts are calling for regulatory oversight and improvements to the technology to mitigate these issues, emphasizing the need for a higher standard of accuracy in AI transcription tools.

- Whisper, an AI transcription tool, frequently generates false information, raising concerns in healthcare.

- Hallucinations were found in up to 80% of analyzed transcriptions, including harmful content.

- Many medical centers are using Whisper despite warnings against its use in high-risk areas.

- The erasure of original audio recordings complicates error verification in transcriptions.

- Experts are advocating for regulatory oversight and improvements to AI transcription technologies.

Link Icon 10 comments
By @jqpabc123 - 6 months
Expecting intelligence and accuracy to "emerge" from a statistical process is absurb.

In others words, LLMs are only clearly useful if the results don't really matter or they can and will be externally verified.

LLMs negate a fundamental argument for computing --- instead of accurate results at low cost, we now have inaccurate results at high cost.

There is undoubtedly some utility to be had here but it is not at all clear or obvious that this will be widely transformative.

By @logn - 6 months
There will always be a need for both human oversight and accountability, and this is a good example. I think the net result will be, eventually, more and better jobs. It's a better job to validate the transcriptions than to actually transcribe.

Another example in medicine, radiologists will start handling orders of magnitude more cases. But the number of scans done might also increase exponentially as costs likewise drop.

By @rahimnathwani - 6 months

  A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed.
The '100 hours' is almost useless information. 'About half' is meaningless without knowing the sample size. Perhaps he had 5 transcripts averaging 20 hours each, and 2 of the 5 had issues. Or perhaps there were hundreds of short transripts, where the 'almost half' would imply significance.
By @freilanzer - 6 months
> In an example they uncovered, a speaker said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.”

> But the transcription software added: “He took a big piece of a cross, a teeny, small piece ... I’m sure he didn’t have a terror knife so he killed a number of people.”

> A speaker in another recording described “two other girls and one lady.” Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black.”

> In a third transcription, Whisper invented a non-existent medication called “hyperactivated antibiotics.”

I didn't expect it to be this bad.

By @AStonesThrow - 6 months
This clinical service is not something that you, the patient, should want or allow.

I use a digital recorder app to record audio from my clinical consultations. It's important for me, as a patient, to have a record, because I'm alone in there, and I frequently misremember or misunderstand things that were said.

My current recorder app has a transcription feature. It's fairly good at picking out words. It's supposed to recognize and label speakers as well, but that requires a lot of manual editing after the fact.

Still, it's fantastic having my own durable record of what was said to me, and by me. There are usually a few surprises in there!

Now, I've stopped asking for permission to record, because usually they become hostile to it. Nevertheless, it's legal, and it's my right to have.

By @notjulianjaynes - 6 months
I used whisper to create an srt file from some some voice memos I made while I was driving and it 'hallucinated' "subtitles by the amara.org community" at the very end. Re ran as txt and what do you know that line disappeared.
By @add-sub-mul-div - 6 months
Oh, you don't say.

(Literally)

By @sirolimus - 6 months
Well, no shit AI isnt meant for anything as serious as medicinal logging
By @mensetmanusman - 6 months
The whisper of death is a risk.