October 26th, 2024

AI-powered transcription tool used in hospitals invents things no one ever said

Researchers found that OpenAI's Whisper AI transcription tool often generates false information, with inaccuracies in up to 80% of transcriptions, raising serious concerns, especially in healthcare settings.

Read original article

AI-powered transcription tool used in hospitals invents things no one ever said

Researchers have identified significant issues with Whisper, an AI-powered transcription tool developed by OpenAI, which is increasingly used in various sectors, including healthcare. The tool is prone to generating false information, known as hallucinations, which can include fabricated statements, racial commentary, and even non-existent medical treatments. Interviews with software engineers and researchers revealed that hallucinations were found in a substantial number of transcriptions, with one study indicating that 80% of analyzed public meeting transcriptions contained inaccuracies. The prevalence of these errors raises concerns, particularly in medical settings where accurate transcriptions are critical for patient care. Despite OpenAI's warnings against using Whisper in high-risk domains, many healthcare providers have adopted it to streamline documentation processes. Critics argue that the tool's inaccuracies could lead to misdiagnoses and other serious consequences. Additionally, the erasure of original audio recordings by some applications complicates the verification of transcriptions, further heightening the risk of errors going unnoticed. Experts are calling for regulatory oversight and improvements to the technology to mitigate these issues, emphasizing the need for a higher standard of accuracy in AI transcription tools.

- Whisper, an AI transcription tool, frequently generates false information, raising concerns in healthcare.

- Hallucinations were found in up to 80% of analyzed transcriptions, including harmful content.

- Many medical centers are using Whisper despite warnings against its use in high-risk areas.

- The erasure of original audio recordings complicates error verification in transcriptions.

- Experts are advocating for regulatory oversight and improvements to AI transcription technologies.

ChatGPT Isn't 'Hallucinating'–It's Bullshitting – Scientific American

AI chatbots like ChatGPT can generate false information, termed as "bullshitting" by authors to clarify responsibility and prevent misconceptions. Accurate terminology is crucial for understanding AI technology's impact.

AiOla open-sources ultra-fast 'multi-head' speech recognition model

aiOla has launched Whisper-Medusa, an open-source AI model that enhances speech recognition, achieving over 50% faster performance. It supports real-time understanding of industry jargon and operates in over 100 languages.

Chatbots Are Primed to Warp Reality

The integration of AI chatbots raises concerns about misinformation and manipulation, particularly in political contexts, as they can mislead users and implant false memories despite efforts to improve accuracy.

AI solution to the cocktail party problem used in court

AI technology has improved the "cocktail party problem," enhancing audio clarity in legal contexts. Wave Sciences' algorithm effectively isolates voices, with plans for broader applications in military and consumer markets.

Whisper-Large-v3-Turbo

Whisper is an advanced ASR model by OpenAI, supporting 99 languages with features like transcription, translation, and timestamp generation. The latest version offers faster performance but with slight quality trade-offs.

10 comments

By @jqpabc123 - 6 months

Expecting intelligence and accuracy to "emerge" from a statistical process is absurb.

In others words, LLMs are only clearly useful if the results don't really matter or they can and will be externally verified.

LLMs negate a fundamental argument for computing --- instead of accurate results at low cost, we now have inaccurate results at high cost.

There is undoubtedly some utility to be had here but it is not at all clear or obvious that this will be widely transformative.

By @logn - 6 months

There will always be a need for both human oversight and accountability, and this is a good example. I think the net result will be, eventually, more and better jobs. It's a better job to validate the transcriptions than to actually transcribe.

Another example in medicine, radiologists will start handling orders of magnitude more cases. But the number of scans done might also increase exponentially as costs likewise drop.

By @rahimnathwani - 6 months

  A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed.

The '100 hours' is almost useless information. 'About half' is meaningless without knowing the sample size. Perhaps he had 5 transcripts averaging 20 hours each, and 2 of the 5 had issues. Or perhaps there were hundreds of short transripts, where the 'almost half' would imply significance.

By @freilanzer - 6 months

> In an example they uncovered, a speaker said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.”

> But the transcription software added: “He took a big piece of a cross, a teeny, small piece ... I’m sure he didn’t have a terror knife so he killed a number of people.”

> A speaker in another recording described “two other girls and one lady.” Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black.”

> In a third transcription, Whisper invented a non-existent medication called “hyperactivated antibiotics.”

I didn't expect it to be this bad.

By @AStonesThrow - 6 months

This clinical service is not something that you, the patient, should want or allow.

I use a digital recorder app to record audio from my clinical consultations. It's important for me, as a patient, to have a record, because I'm alone in there, and I frequently misremember or misunderstand things that were said.

My current recorder app has a transcription feature. It's fairly good at picking out words. It's supposed to recognize and label speakers as well, but that requires a lot of manual editing after the fact.

Still, it's fantastic having my own durable record of what was said to me, and by me. There are usually a few surprises in there!

Now, I've stopped asking for permission to record, because usually they become hostile to it. Nevertheless, it's legal, and it's my right to have.

By @notjulianjaynes - 6 months

I used whisper to create an srt file from some some voice memos I made while I was driving and it 'hallucinated' "subtitles by the amara.org community" at the very end. Re ran as txt and what do you know that line disappeared.

By @clcaev - 6 months

https://dl.acm.org/doi/10.1145/3630106.3658996

By @add-sub-mul-div - 6 months

Oh, you don't say.

(Literally)

By @sirolimus - 6 months

Well, no shit AI isnt meant for anything as serious as medicinal logging

By @mensetmanusman - 6 months

The whisper of death is a risk.

AI-powered transcription tool used in hospitals invents things no one ever said

Related

ChatGPT Isn't 'Hallucinating'–It's Bullshitting – Scientific American

AiOla open-sources ultra-fast 'multi-head' speech recognition model

Chatbots Are Primed to Warp Reality

AI solution to the cocktail party problem used in court

Whisper-Large-v3-Turbo

Related

ChatGPT Isn't 'Hallucinating'–It's Bullshitting – Scientific American

AiOla open-sources ultra-fast 'multi-head' speech recognition model

Chatbots Are Primed to Warp Reality

AI solution to the cocktail party problem used in court

Whisper-Large-v3-Turbo