July 22nd, 2024

Audapolis: Edit audio files by word, not waveform

The Audapolis project on GitHub offers a tailored editor for spoken-word media with audio-to-text transcription. It supports various media types, works on Windows, Linux, and macOS, and stores data locally. Funding sources include governmental and foundation support.

Read original articleLink Icon
InterestSkepticismComparisons
Audapolis: Edit audio files by word, not waveform

The audapolis project on GitHub offers an editor tailored for spoken-word media, featuring a wordprocessor-like interface and automatic audio-to-text transcription. It supports editing various media types, including video, audio, and mixed media, and is compatible with Windows, Linux, and macOS. Notably, all data is stored locally without cloud storage. Users can download the latest version from the provided link and report bugs or provide feedback through the GitHub repository. A survey is also available for users to share their needs and expectations. The project received funding from September 2021 to February 2022 from the "Bundesministerium für Bildung und Forschung", Prototype Fund, and Open Knowledge Foundation Deutschland.

Related

Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3

Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3

The GitHub project "Groqnotes" is a streamlit app utilizing Groq, Whisper, and Llama3 to create structured notes from audio content efficiently. It offers rapid transcription, markdown styling, and download options. Access online or set up locally.

That Editor

That Editor

The GitHub repository hosts a DOS-like editor created for video production, not ideal for general use. It reflects historical hardware and software limitations, tailored for specific vintage computing requirements.

Show HN: AI assisted image editing with audio instructions

Show HN: AI assisted image editing with audio instructions

The GitHub repository hosts "AAIELA: AI Assisted Image Editing with Language and Audio," a project enabling image editing via audio commands and AI models. It integrates various technologies for object detection, language processing, and image inpainting. Future plans involve model enhancements and feature integrations.

Transcribro: On-device Accurate Speech-to-text

Transcribro: On-device Accurate Speech-to-text

The GitHub repository for "Transcribro" offers project details, downloads, community links, contribution guidelines, donations, branding guidelines, and keyboard UI screenshots. Contact for project-specific support or inquiries.

Audacity 3.6

Audacity 3.6

Audacity 3.6 brings master effects, a new compressor, and limiter with gain reduction history. It offers factory presets, dark and light themes, improved performance, and custom theme installation. Users can switch themes in Preferences. Audacity is a free, open-source audio editor for various operating systems.

AI: What people are saying
The Audapolis project on GitHub garners mixed reactions from users.
  • Some users appreciate it as a free alternative to Descript and praise its open-source nature.
  • Several users suggest improvements, such as adding a demo video and supporting modern speech recognition models like Whisper.
  • There are comparisons to other tools like Adobe's demo, Hindenburg, and iOS voice memos, highlighting similar functionalities.
  • Some users express skepticism about the practicality of text-based audio editing for serious audio work.
  • Comments also touch on the project's funding, with some noting the support from the German government.
Link Icon 23 comments
By @vunderba - 5 months
I remember when Adobe demoed this idea of being able to edit waveforms by the recognized text back in 2016 and it was pretty mind blowing for the time.

https://youtu.be/I3l4XLZ59iw

EDIT: I could also definitely see Audapolis being useful if you could integrate it into a podcast's post processing flow (volume normalization, de-essing) by recognizing certain verbal tics and automatically removing them from the audio such as "ummmm...", etc.

By @bluelightning2k - 5 months
A genuinely free alternative to Descript sounds very useful.

I've always liked the idea of Descript and was considering building something similar before it came out. The problem is my use case is a couple of videos a year so doesn't fit with an expensive monthly subscription

By @hammeiam - 5 months
I've spent some of my free time over the past couple of months working on something similar. It's in a decent state but I need help from somebody who understands the .fcpxml format so you can export your edits to Davinci and FCP.

Take a look at https://matcha.video

By @petarb - 5 months
This is awesome to see as an open source project.

This functionality is some of my favorite when editing videos in Descript. It’s so much easier than chopping up waveforms in Audacity

By @corn13read2 - 4 months
This is pretty dated and doesn't support whisper which is the de-facto speech recognition model currently
By @raymond_goo - 5 months
By @Machado117 - 4 months
The other day I was using the voice memos app on iOS 18 and was surprised to find that it also supports editing the recording by transcript
By @alsetmusic - 5 months
One of the hosts of a podcast that I listen to has had positive things to say about DeScript.[0] Just mentioning it because he's been talking about it for a few years so I expect its had a good amount of feature development over time.

[0] descript.com/

By @pryelluw - 5 months
If the maintainer is reading, having a demo video would be nice.
By @leetrout - 5 months
Hindenburg also added this capability.

> Hindenburg’s manuscript feature gives you a complete overview of your audio. You can select the text just as you would in a text document and watch as your edits are made in real-time. If you need to export your text in a specific format, no problem. Hindenburg supports the most common text and transcription export formats.

https://hindenburg.com/

By @emadda - 5 months
Nice, are there plans to notarize the mac app?

I built something similar here: https://bigwav.app

By @geekodour - 5 months
this looks great! will try out. I built a similar but very scrappy tool for the same usecase last year, I'd probably not build it if i found this.

[0] https://github.com/geekodour/wscribe-editor

By @jdprgm - 5 months
This really needs a video demo or at least a more in depth text description of the features. Will download later to try but curious does this just do simple hard cuts on audio text or is there any ai magic for blending sentence timing if that makes sense?

A number of comments turned me onto Descript -- made a similar comment on another audio thread recently: drives me absolutely insane how all audio tools with any AI are web based monthly saas instead of offline private gpu upfront purchase.

By @generalizations - 5 months
Combine this with the tech to generate new audio matching the speaker's voice profile, and you've really got something cool.
By @jiehong - 5 months
That’s awesome!

Is 1 emoji for each commit title a new trend?

By @j45 - 4 months
This is exciting to see - it seems the last release of was a year ago.

Can anyone clarify if this project is active?

By @StarterPro - 4 months
Call me a jerk, but anyone who is editing audio seriously, probably wants the waveform, no?
By @frakkingcylons - 5 months
Somewhat off-topic: I saw the funding note at the bottom - it’s pretty cool that the German government is giving some funding to projects like this. I wonder how much the US is doing in that regard, like if there’s a list of projects that tax dollars goes towards.
By @iainctduncan - 5 months
IMHO you should really change the headline on this. I'm an audio person, and my first thought was "that's stupid, words are awful at describing sound". But then I looked, and editing transcriptions of voice recordings by word is actually a great idea. That was not the impression the headline gave me, FWIW!
By @MForster - 5 months
And here I was expecting that I could edit the text and the app would change the audio file to say what I had typed...