September 5th, 2024

AI solution to the cocktail party problem used in court

AI technology has improved the "cocktail party problem," enhancing audio clarity in legal contexts. Wave Sciences' algorithm effectively isolates voices, with plans for broader applications in military and consumer markets.

Read original articleLink Icon
AI solution to the cocktail party problem used in court

AI technology has made significant strides in addressing the "cocktail party problem," which involves isolating a specific voice from a cacophony of background noise. This advancement is particularly relevant in legal contexts, where audio evidence can be compromised by overlapping voices. Keith McElveen, founder of Wave Sciences, developed an AI solution that analyzes sound reflections in a room to distinguish between competing voices. After a decade of research, the technology was successfully applied in a US murder case, transforming previously inadmissible audio into crucial evidence. The algorithm can now operate effectively with just two microphones, achieving results comparable to human hearing. Wave Sciences aims to expand its applications beyond forensic use, targeting military, smart speaker, and hearing aid markets. Additionally, AI is being utilized in other forensic areas, such as voice pattern analysis and audio integrity verification. The technology's ability to mimic human auditory processing suggests potential insights into how the human brain resolves similar auditory challenges.

- AI has successfully addressed the "cocktail party problem," enhancing audio clarity in noisy environments.

- The technology has been used in legal cases, turning inadmissible audio into pivotal evidence.

- Wave Sciences' algorithm can perform effectively with minimal microphones, rivaling human hearing capabilities.

- The company plans to expand its technology for use in various consumer and military applications.

- AI is increasingly being integrated into forensic science for voice authentication and audio integrity checks.

Related

Generating audio for video

Generating audio for video

Google DeepMind introduces V2A technology for video soundtracks, enhancing silent videos with synchronized audio. The system allows users to guide sound creation, aligning audio closely with visuals for realistic outputs. Ongoing research addresses challenges like maintaining audio quality and improving lip synchronization. DeepMind prioritizes responsible AI development, incorporating diverse perspectives and planning safety assessments before wider public access.

Introducing Voice Isolator and Background Noise Remover

Introducing Voice Isolator and Background Noise Remover

The website offers a Voice Isolator tool for extracting clear speech by removing background noise from audio. Users can try a sample, access FAQs, and explore other AI audio solutions by ElevenLabs.

AI speech generator 'reaches human parity' – but it's too dangerous to release

AI speech generator 'reaches human parity' – but it's too dangerous to release

Microsoft's VALL-E 2 AI speech generator replicates human voices accurately using minimal audio input. Despite its potential in various fields, Microsoft refrains from public release due to misuse concerns.

Cops are using AI chatbots to write crime reports. Will they hold up in court?

Cops are using AI chatbots to write crime reports. Will they hold up in court?

Police departments are adopting AI technology to quickly draft crime reports from body camera audio, improving efficiency but raising concerns about accuracy, bias, and the need for ethical oversight.

CMG pitch deck on listening to your conversations to target ads

CMG pitch deck on listening to your conversations to target ads

Cox Media Group is developing "Active Listening," a targeted advertising tool using audio from smart devices, raising privacy concerns and legal questions, while major tech companies distance themselves from the program.

Link Icon 3 comments
By @sigmoid10 - 8 months
>After two hitmen were arrested for killing a man, the FBI wanted to prove that they'd been hired by a family going through a child custody dispute. The FBI arranged to trick the family into believing that they were being blackmailed for their involvement - and then sat back to see the reaction. While texts and phone calls were reasonably easy for the FBI to access, in-person meetings in two restaurants were a different matter. But the court authorised the use of Wave Sciences’ algorithm, meaning that the audio went from being inadmissible to a pivotal piece of evidence.

While the technology and the way it is marketed now is cool and everything, it feels like the most interesting part of the article is this FBI investigation. Too bad it's only mentioned in these few sentences with no further details.

By @costco - 8 months
Software that does this is running on millions of Android phones. After you do OK Google voice enrollment, they calculate a speaker representation embedding for you. When the voice activity detector detects voice, it passes the audio through VoiceFilter-Lite https://google.github.io/speaker-id/publications/VoiceFilter... along with your speaker embedding, and the model is trained to return only your speech. The original, more computationally expensive model is called VoiceFilter and I was so amazed when I saw the examples on this page: https://google.github.io/speaker-id/publications/VoiceFilter...
By @jwlit - 8 months
You can see some information on how the technology works in this demo / paper: https://news.ycombinator.com/item?id=41455176