AI speech generator 'reaches human parity' – but it's too dangerous to release
Microsoft's VALL-E 2 AI speech generator replicates human voices accurately using minimal audio input. Despite its potential in various fields, Microsoft refrains from public release due to misuse concerns.
Read original articleMicrosoft has developed an AI speech generator named VALL-E 2 that can replicate human voices with high accuracy using just a few seconds of audio. The AI engine, based on neural codec language models, achieves human parity in zero-shot text-to-speech synthesis, producing speech comparable to human performance. VALL-E 2 incorporates features like Repetition Aware Sampling and Grouped Code Modeling to enhance speech quality and efficiency. Despite its capabilities, Microsoft has decided not to release VALL-E 2 to the public due to concerns about potential misuse, such as voice cloning and deepfake technology. The researchers behind VALL-E 2 suggest that the technology could have practical applications in education, entertainment, journalism, accessibility features, and more, but emphasize the need for protocols to ensure ethical use, especially when dealing with unseen speakers. The AI speech generator remains a research project with no immediate plans for public access.
Related
Generating audio for video
Google DeepMind introduces V2A technology for video soundtracks, enhancing silent videos with synchronized audio. The system allows users to guide sound creation, aligning audio closely with visuals for realistic outputs. Ongoing research addresses challenges like maintaining audio quality and improving lip synchronization. DeepMind prioritizes responsible AI development, incorporating diverse perspectives and planning safety assessments before wider public access.
All web "content" is freeware
Microsoft's CEO of AI discusses open web content as freeware since the 90s, raising concerns about AI-generated content quality and sustainability. Generative AI vendors defend practices amid transparency and accountability issues. Experts warn of a potential tech industry bubble.
'Skeleton Key' attack unlocks the worst of AI, says Microsoft
Microsoft warns of "Skeleton Key" attack exploiting AI models to generate harmful content. Mark Russinovich stresses the need for model-makers to address vulnerabilities. Advanced attacks like BEAST pose significant risks. Microsoft introduces AI security tools.
The model in question is Microsoft Vall-E2 without the click bait headline.
> Our experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach human parity on these benchmarks. Moreover, VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases
https://www.microsoft.com/en-us/research/project/vall-e-x/va...
> This page is for research demonstration purposes only. Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public.
If you go back and look at older cities they almost all have the same pattern: walls and gates.
I figure now that the Internet is a badlands roamed by robots pretending to be people as they attempt to rob you for their masters, we'll see the formation of cryptologically-secured enclaves. Maybe? Who knows?
At this point I'm pretty much going to restrict online communication to encrypted authenticated channels. (Heck, I should sign this comment, eh? If only as a performance?) Hopefully it remains difficult to build an AI that can guess large numbers. ;P
> so 2024 will be the last human election and what we mean by that is not that it's just going to be an AI running as president in 2028 but that will really be although maybe um it will be you know humans as figureheads but it'll be Whoever greater compute power will win
We saw already AI voices influencing elections in India https://restofworld.org/2023/ai-voice-modi-singing-politics/
> AI-generated songs, like the ones featuring Prime Minister Narendra Modi, are gaining traction ahead of India’s upcoming elections. [...] Earlier this month, an Instagram video of Modi “singing” a Telugu love song had over 2 million views, while a similar Tamil-language song had more than 2.7 million. A Punjabi song racked up more than 17 million views.
1. “… but if you and your big rich company were to acquihire us you'd get access…” — though as this is MS it probably isn't that!
2. That is only works as well as claimed in specific circumstances, or has significant flaws, so they don't want people looking at it too closely just yet. The wordage “in benchmarks used by Microsoft” might point to this.
3. That a competitor is getting close to releasing something similar, or has just done so, and they don't want to look like they were in second place or too far behind.
social solutions take too long to use against the tech, but tech solutions are fallible. to be defeatist about it, there's going to be a golden window of time here where some really nasty scams have no impedance.
> "This page is for research demonstration purposes only. Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public."
A speech generator can help rob 1000 banks.
Truly irresponsible
Related
Generating audio for video
Google DeepMind introduces V2A technology for video soundtracks, enhancing silent videos with synchronized audio. The system allows users to guide sound creation, aligning audio closely with visuals for realistic outputs. Ongoing research addresses challenges like maintaining audio quality and improving lip synchronization. DeepMind prioritizes responsible AI development, incorporating diverse perspectives and planning safety assessments before wider public access.
All web "content" is freeware
Microsoft's CEO of AI discusses open web content as freeware since the 90s, raising concerns about AI-generated content quality and sustainability. Generative AI vendors defend practices amid transparency and accountability issues. Experts warn of a potential tech industry bubble.
'Skeleton Key' attack unlocks the worst of AI, says Microsoft
Microsoft warns of "Skeleton Key" attack exploiting AI models to generate harmful content. Mark Russinovich stresses the need for model-makers to address vulnerabilities. Advanced attacks like BEAST pose significant risks. Microsoft introduces AI security tools.