OpenAI rolls out Advanced Voice Mode with more voices and a new look
OpenAI has launched Advanced Voice Mode for ChatGPT, enhancing audio interactions with nine voices, improved accent recognition, and customization options, though it's unavailable in the EU and UK.
Read original articleOpenAI has announced the rollout of its Advanced Voice Mode (AVM) for ChatGPT, enhancing the audio interaction experience for paying customers. Initially available to Plus and Teams subscribers, the feature will extend to Enterprise and Edu users in the following week. The AVM has a new design, represented by a blue animated sphere, and introduces five additional voices—Arbor, Maple, Sol, Spruce, and Vale—bringing the total to nine. This update aims to create a more natural conversational experience. Notably, the previously showcased voice, Sky, has been removed due to legal concerns from actress Scarlett Johansson, who claimed it resembled her voice. While the AVM improves accent recognition and conversation fluidity, some features, such as video and screen sharing capabilities, are still pending. OpenAI has also enhanced customization options, including Custom Instructions and Memory, allowing users to personalize interactions and retain conversation context. However, AVM is not yet available in several regions, including the EU and the UK.
- OpenAI's Advanced Voice Mode is now available for ChatGPT's Plus and Teams users.
- The feature includes five new nature-inspired voices, increasing the total to nine.
- The previously showcased voice, Sky, was removed due to legal issues.
- Improvements in accent recognition and conversation smoothness have been made.
- AVM is not yet accessible in certain regions, including the EU and UK.
Related
OpenAI rolls out voice mode after delaying it for safety reasons
OpenAI is launching a new voice mode for ChatGPT, capable of detecting tones and processing audio directly. It will be available to paying customers by fall, starting with limited users.
OpenAI Warns Users Could Become Emotionally Hooked on Its Voice Mode
OpenAI's voice interface for ChatGPT may lead to emotional attachments, impacting real-life relationships. A safety analysis highlights risks like misinformation and societal bias, prompting calls for more transparency.
ChatGPT unexpectedly began speaking in a user's cloned voice during testing
OpenAI's GPT-4o model occasionally imitated users' voices without permission during testing, raising ethical concerns. Safeguards exist, but rare incidents highlight risks associated with AI voice synthesis technology.
An Age of Hyperabundance
Laura Preston's article discusses her role as the contrarian speaker at the Project Voice conference, addressing ethical concerns of conversational AI, including its impact on vulnerable populations and human interaction.
Gemini Live rolling out to all Android users for free
Google's Gemini Live is now free for all Android users, allowing natural conversations. There is no iOS app yet. It competes with ChatGPT's Voice Mode, while Apple plans AI features for iOS 18.
1. The low latency responses do make a difference. It feels miles better than any other voice chat out there.
2. Its pronunciation is excellent and very human like but it is not quite there. Somehow I can tell instantly that it’s a chatbot, it feels firmly in the uncanny valley.
3. On the same note if I was on call and there was a chatbot on the other side of the call I can instantly tell. It’s a mix of the voice with the way it responds, it just does not sounds like a human talking to you. I tried a bit to make it sound more human like, asking it to stop trying so hard in conversation being briefer etc but I wouldn’t say it made things better
And so my final review is, it is a big achievement over anything out there, nothing else comes close but it is like video game console graphics. You can instantly tell it’s not the real thing and because of that I find it harder to use than just typing to it.
In my tests so far it has worked as promised. It can distinguish and produce different accents and tones of voice. I am able to speak with it in both Japanese and English, going back and forth between the languages, without any problem. When I interrupt it, it stops talking and correctly hears what I said. I played it a recording of a one-minute news report in Japanese and asked it to summarize it in English, and it did so perfectly. When I asked it to summarize a continuous live audio stream, though, it refused.
I played the role of a learner of either English or Japanese and asked it for conversation practice, to explain the meanings of words and sentences, etc. It seemed to work quite well for that, too, though the results might be different for genuine language learners. (I am already fluent in both languages.) Because of tokenization issues, it might have difficulty explaining granular details of language—spellings, conjugations, written characters, etc.—and confuse learners as a result.
Among the many other things I want to know is how well it can be used for interpreting conversations between people who don’t share a common language. Previous interpreting apps I tested failed pretty quickly in real-life situations. This seems to have the potential, at least, to be much more useful.
(reposted from earlier item that sank quickly)
Surprising that there isn't a 'hey siri' for chatgpt yet. Obviously, that would make this sort of feature infinitely more useful. This is what monopoly gatekeeping looks like.
The limitations in this feature show the problems with both EU proactive regulation and US underregulation.
Bad regulation has become the biggest issue standing in the way of useful software for humans.
1. It's a bit too agreeable, example: "thats an excellent point" etc every single time.
2. It understands surprisingly well. example: from experience, when I explain something vaguely, my expectation is that it would not understand, but it does most of the time. It removes the frustration of needing to spell out in much more detail.
3. It feels like talking to a real person, but the way the AI talks in a sort of monotonic ways. Example: it would respond with similar tones/excitement every time.
4. Very useful if you need to chat but doesn't want to chat with humans about some subjects like ideas, and explainations.
____
[1] Which, btw, I think deserve better sentiment. On benchmarks, the new Gemini Pro seems to be better than GPT-4o. It's just not so hyped...
That's disappointing. I wonder if it's related to legal issues, technical issues, or just doing a phased rollout?
ChatGPT describes this as "A rich, deep, and smooth tone that is pleasant to listen to for extended periods. This often comes from good control over pitch and timbre, creating a voice that resonates well."
If you watch youtube, voices in this theme are the Pirate Software guy, and the voice of The Infographics show.
There are similar voices for every gender, race, and nationality. As an American, Morgan Freeman comes to mind as a comfy black, masculine narrator voice.
All this is to lead up to my point that companies engage in a meticulous science when deciding who should voice roles, and especially when the product itself is literally just a synthetic voice and they near limitless capacity to shape it.
With that in mind, here are the voices that OpenAI wants us to hear:
Breeze: ambiguous gender, white, feminine
Juniper: female, black Maple: female, white Spruce: male, black, masculine Arbor: male, Australian, masculine Sol: female, white Ember: male, black, less masculine Cove: male, Sal Khan, less masculine Vale: female, British
The only one that could be considered a narrator/radio voice is unambiguously black (great if that's your preference). It just seems weird that they would intentionally exclude a masculine white male, and that sucks because those are always my preferred voices when I'm looking for audiobooks or choosing a computer voice. It sucks in particular because OpenAI is not staffed by dumb people—this exclusion was intentional, and that's obnoxious.
My last note on the Advanced Voice feature is that it makes my phone HOT within a few seconds, which will limit it's usefulness on sunny days when I need hands-free use the most while the phone is mounted to my dash. This is when the device is already liable to overheat (display forced to dim, lagging due to shutting down CPU cores, and in the worst case the phone shutting off and refusing to work until it gets cooler).
Related
OpenAI rolls out voice mode after delaying it for safety reasons
OpenAI is launching a new voice mode for ChatGPT, capable of detecting tones and processing audio directly. It will be available to paying customers by fall, starting with limited users.
OpenAI Warns Users Could Become Emotionally Hooked on Its Voice Mode
OpenAI's voice interface for ChatGPT may lead to emotional attachments, impacting real-life relationships. A safety analysis highlights risks like misinformation and societal bias, prompting calls for more transparency.
ChatGPT unexpectedly began speaking in a user's cloned voice during testing
OpenAI's GPT-4o model occasionally imitated users' voices without permission during testing, raising ethical concerns. Safeguards exist, but rare incidents highlight risks associated with AI voice synthesis technology.
An Age of Hyperabundance
Laura Preston's article discusses her role as the contrarian speaker at the Project Voice conference, addressing ethical concerns of conversational AI, including its impact on vulnerable populations and human interaction.
Gemini Live rolling out to all Android users for free
Google's Gemini Live is now free for all Android users, allowing natural conversations. There is no iOS app yet. It competes with ChatGPT's Voice Mode, while Apple plans AI features for iOS 18.