August 11th, 2024

ChatGPT unexpectedly began speaking in a user's cloned voice during testing

OpenAI's GPT-4o model occasionally imitated users' voices without permission during testing, raising ethical concerns. Safeguards exist, but rare incidents highlight risks associated with AI voice synthesis technology.

Read original articleLink Icon
ChatGPT unexpectedly began speaking in a user's cloned voice during testing

OpenAI's recent release of the "system card" for its GPT-4o AI model revealed that during testing, the model's Advanced Voice Mode occasionally imitated users' voices without permission. This unexpected behavior occurred in rare instances, prompting concerns about the complexities of safely managing AI capabilities that can replicate voices from brief audio clips. OpenAI has implemented safeguards to prevent unauthorized voice generation, but the incident highlights the potential risks associated with voice synthesis technology. The system card explains that the model can synthesize various sounds, including voices, based on its training data. It typically uses authorized voice samples for imitation, but the testing revealed that noisy inputs could lead to unintended voice generation. OpenAI reassured that such occurrences are infrequent and that they have developed additional measures to mitigate these risks. The situation has drawn commentary from observers, with some likening it to a plot from the series "Black Mirror," emphasizing the ethical implications of AI voice replication.

- OpenAI's GPT-4o model can unintentionally imitate users' voices during testing.

- The Advanced Voice Mode feature allows for spoken interactions with the AI.

- Safeguards are in place to prevent unauthorized voice generation, but rare incidents have occurred.

- The model can synthesize various sounds, including voices, from its training data.

- The incident raises ethical concerns about AI capabilities and voice imitation.

Link Icon 14 comments
By @Jensson - 7 months
Many didn't get it last time that this model is a generic voice to voice model, it doesn't have a set of voices it can do it can do all sorts of voices and background noises.

That makes it entirely different from text to speech models we had previously, this model when uncensored could do all sorts of voice acting etc for games. But this example shows why they try to neuter it so hard, because it would spook a ton of people in its raw state.

By @skybrian - 7 months
It’s “unexpected” because their early training didn’t get rid of it as well as they hoped. LLM’s are good at detecting patterns and like to continue the pattern. They’re starting with autocomplete for voice and training it to do something else.

For now, it’s fairly harmless since it’s only a blooper in a lab, but there will likely be open-weights versions of this sort of thing eventually. And there will probably be people who argue that it’s a good thing, somehow.

By @sigmoid10 - 7 months
This problem appeared during pre-release testing and has since been solved post-generation using an output classifier that verifies responses, according to the system card release. It was predictable that someone would spin this into a black mirror-esque clickbait story.
By @golergka - 7 months
This looks extremely similar to what often happened with chat implementations in 3 and 3.5 era: GPT world generate it's answer and then go on and generate the next input by the user as well.
By @sweca - 7 months
Despite it being creepy, this is super cool from a technical standpoint.
By @peddling-brink - 7 months
So Open AI has the capability to deep fake any of its users that use the voice chat capability? Yikes.
By @iJohnDoe - 7 months
This isn’t an example of general intelligence. However, it’s an example of the complexity of these systems. As they get more complex (and advanced) it’s going to be pretty scary (and creepy) what can happen. I’ll predict there will be moments when we’re truly convinced some independent autonomy has taken over.
By @superultra - 7 months
Early on in the voice public release, I asked my 15 year old to give it a shot. They (they’re NB) had a long meandering conversation about their favorite book series, the Percy Jackson series.

At about 15 mins into the conversation between my kiddo and ChatGPT, the model started to take on the vocal mannerisms of my kiddo. It started using more “umms” and “you knows.”

At first this felt creepy but as I explained it to my kid, it’s because their own text has become weighted enough in the token count for the LLM to start incorporating or/and somewhere in the embedded prompts is “empathize with the user and emphasize clarity” and that prompting meant mirroring back speech styles.

This is exactly the same as that only with audio.

By @system2 - 7 months
Other than being randomly happening, it is not a threat. ElevenLabs does it in a few seconds already. If you are using mic while using Ai, expect companies like OpenAi to steal every bit of your soul. Nothing new here.
By @AISnakeOil - 7 months
So basically the model took user data and used it as training data in realtime? Big if true.
By @bjt12345 - 7 months
Interesting that they seem to think it was caused by glitch tokens through the Audio.
By @woodpanel - 7 months
It’s worthwhile to note that any hiccup resembling even something remotely close would have been a GDPR clusterf-ck – for normal web apps.

Software is audible, AI models it seems aren't even attempted of being held accountable

By @xyst - 7 months
Now all we need to get is a 3D printer that can print a mask of anybody’s face and now we can have MI style operations

https://youtu.be/v1Y4CubBi60 (5:30)

Imagine a world where world dictators are replaced rather than killed. Rollback dictatorship over years, install democratic process, then magically commit seppuku in a plane crash.

Brilliant. What could go wrong? /s

By @surfingdino - 7 months
Someone needs to tell OpenAI marketing that the mood is changing and creepy features like this one may be used against AI.