Hospitals use a transcription tool powered by a hallucination-prone OpenAI model

A few months ago, my doctor introduced an AI transcription tool that he used to record and summarize his patient conversations. In my case the abstract was fine but the researchers quoted from ABC News I found that this isn't always the case with OpenAI's Whisper, a tool many hospitals use – sometimes it just makes everything up.

Whisper is used for a medical transcription tool by a company called Nabla, which is estimated to have transcribed 7 million medical conversations ABC News. More than 30,000 doctors and 40 health systems use it, the outlet writes. Nabla is reportedly aware that Whisper can hallucinate and is “addressing the problem.”

A group of researchers from Cornell University, the University of Washington and others found in a study that Whisper hallucinated in about 1 percent of the transcriptions and formed complete sentences with sometimes violent sentiments or nonsensical phrases during pauses in the recordings. The researchers, who collected audio samples from TalkBank's AphasiaBank as part of the study, find that silence is particularly common when someone with a language disorder called aphasia is speaking.

One of the researchers, Allison Koenecke of Cornel University, posted examples like the one below in a thread about the study.

The researchers found that hallucinations also included made-up medical conditions or phrases that one might expect from a YouTube video, such as “Thanks for watching!” (OpenAI has reportedly transcribed over a million hours of YouTube videos to create GPT- 4 to train.)

The study was presented in June at the Association for Computing Machinery's FAccT conference in Brazil. It is not clear whether it has been peer-reviewed.

OpenAI spokesperson Taya Christianson emailed a statement to The edge:

We take this issue seriously and are continually working on improvements, including reducing hallucinations. For the use of Whisper on our API platform, our usage guidelines prohibit its use in certain high-risk decision-making contexts, and our open source usage model map contains recommendations against its use in high-risk domains. We thank the researchers for sharing their results.

Leave a Comment

url url url url url url url url url url url url url url url url url url url