July 3rd, 2024

Introducing Voice Isolator and Background Noise Remover

The website offers a Voice Isolator tool for extracting clear speech by removing background noise from audio. Users can try a sample, access FAQs, and explore other AI audio solutions by ElevenLabs.

Read original article

Introducing Voice Isolator and Background Noise Remover

The website offers a Voice Isolator tool that can extract clear speech from any audio by removing background noise. Users can try a sample by enabling microphone access to record themselves or upload audio files for voice isolation. The tool is suitable for film, podcast, and interview post-production needs. The site also provides answers to frequently asked questions about the Voice Isolator, such as pricing, file size limitations, compatibility with music vocals, and the possibility of streaming through an API for real-time applications. ElevenLabs, the company behind the tool, emphasizes creating high-quality AI audio solutions and offers various products and solutions for different industries and use cases. Users can access the tool for free and explore other AI audio products and services provided by ElevenLabs.

Generating audio for video

Google DeepMind introduces V2A technology for video soundtracks, enhancing silent videos with synchronized audio. The system allows users to guide sound creation, aligning audio closely with visuals for realistic outputs. Ongoing research addresses challenges like maintaining audio quality and improving lip synchronization. DeepMind prioritizes responsible AI development, incorporating diverse perspectives and planning safety assessments before wider public access.

24 comments

By @IncreasePosts - 10 months

What is the current SOTA for voice->text?

I have a recording I've been sitting on for 2 years(a guest lecture which a friend recorded) which contains a very heavy amount of background noise, where you can just barely make out what is being said by the lecturer. I wonder if there is any hope I will ever be able to read a transcript from it.

I can figure out what the lecturer is saying (maybe only because I have some context about what he is talking about), but it is too painful to sit through 2 hours of it and try to transcribe it.

I tried uploading the audio file to this service, but basically get nothing useful returned to me.

By @Murky3515 - 10 months

Please think twice before sharing your personal voice samples with a random online website just because they offer a cool demo.

By @greypowerOz - 10 months

"Voice Isolator costs 1000 characters for every minute of audio." - can someone expand on this currency for someone out of the loop?

By @tomaskafka - 10 months

Are there actual before/after samples? I’m sure as hell not sending samples of my voice to AI voice cloning company.

By @almog - 10 months

I'd like to have something else but for live calls: a process that takes two audio inputs and "subtracts" the noise from one input from the other. My use case would be to have two dynamic microphones, one directed at the window and one that I'm using for a conference call. I'm assuming having two inputs should make the process easier for real time (20ms?) processing and might require less compute.

If such process can output a clear sound, I could chain it with Blackhole and have it and use the processed clear signal as an input for the call.

By @throwup238 - 10 months

They also just announced licensed celebrity voices in their Reader app this past week.

Judy Garland, Burt Reynolds, Laurence Olivier, and James Dean are the first ones.

By @chmars - 10 months

How is this different from Auphonic?

https://auphonic.com/features

By @taraparo - 10 months

I prefer https://product.supertone.ai/clear which is one time payment and not subscription based.

By @IvyMike - 10 months

The one thing I hate about this: There are so-called "first amendment auditors", who professionally annoy people on the street, trying to provoke a reaction. They monetize the resulting video on youtube.

You used to be able to pull out your phone and play Disney soundtracks or Taylor Swift music which would result in the video being non-monetizable. But improvements in audio isolation techniques have now defeated this countermeasure. Being a professional annoyance is once again a career choice.

Edit: this is one instance I've personally seen: https://www.instagram.com/p/C7IEFxQSJQw/?hl=en&img_index=1

By @andrewstuart - 10 months

Tried it with several files.

It didn't seem to do much better than audio filters for ffmpeg that have been tuned for removing background noise and enhancing voice. Maybe I'm missing something or using the wrong source data.

By @simshay - 10 months

I have used ai|coustics previously and I think their output quality is way better than Eleven Labs or Auphonic. They really do a good job there.

By @hnsdmpf - 10 months

The video is impressive, but for the files I uploaded is hallucinated or changed the voices quiet a bit. IMO ai-coustics.com or auphonic is a better alternative. ai-coustics even offers video upload and let's you choose the enhancement level.

By @terrycody - 10 months

Sorry a noob here, is there sth can strip the Youtube videos voice talking, but leave everything else 100% untouched? I dunnon what this thing called, or is it exactly the same thing I was looking for?

By @gtvwill - 10 months

Or I could just download virtual dj and run it for free on a computer and just do this locally, right now, with zero fancy hardware and arguably some of the best stems algorithms on the market.

By @ec109685 - 10 months

I had very loud background music playing, and while it could completely eliminate that (impressive!), the voice was much more garbled then when there wasn’t any background noise playing.

By @dayjah - 10 months

My test sample, me talking with my baby babbling in the background, returned a silent audio track. I guess I nor the baby are considered signal ~_~

By @ukuina - 10 months

Are there any open source STT solutions that also handle speaker diarization? MacWhisper has promised this for a long time without delivering.

By @jdprgm - 10 months

Elevenlabs has some pretty cool stuff but I really despise how it's all cloud based. Wish there was an audio ai company following a path similar to what topaz has been doing for video/photo ai with desktop software. Open source has been lagging more than I expected in this area too.

By @Animats - 10 months

What's it going to take to do this locally?

By @dc3k - 10 months

i think i’ll stick to nvidia broadcast for this

By @dougdonohoe - 10 months

How much does it cost in the FAQ: Voice Isolator costs 1000 characters for every minute of audio.

Since when are characters a currency?

By @localfirst - 10 months

just tried it and its not that great

many many ppl are complaining that they have to spend quite a bit of credit to get the desired effect

so likely this is just another "pay-to-fine-tune" not unlike "pay-to-play" schemes in online games--the hook is to get you in to buy credits which you will use to chase the desired quality.

besides there are local TTS models now that rivals Elevenlabs. Their pricing is ridiculous $200/1M is way too expensive.

Introducing Voice Isolator and Background Noise Remover

Related

Generating audio for video

Related

Generating audio for video