OTranscribe: A free and open tool for transcribing audio interviews
oTranscribe is a free web application for transcribing audio recordings, allowing seamless navigation, automatic saving, and local file storage. It supports multiple export formats and is open source under the MIT license.
Read original articleoTranscribe is a free web application designed to simplify the transcription of recorded interviews. It operates exclusively on desktop computers, allowing users to pause, rewind, and fast-forward audio without needing to switch between different applications like Quicktime and Word. The tool features interactive timestamps for easy navigation through transcripts and automatically saves progress to the browser's storage every second. It ensures privacy by keeping audio files and transcripts on the user's computer. Users can export their work in various formats, including Markdown, plain text, and Google Docs, and it supports video files with an integrated player. The application is open source and licensed under the MIT license, created by Elliot Bentley as part of the MuckRock Foundation.
- oTranscribe is a free web app for transcribing audio recordings.
- It allows seamless navigation and automatic saving of transcripts.
- Users can export transcripts in multiple formats.
- The app prioritizes user privacy by keeping files local.
- It is open source and developed under the MIT license.
Related
Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3
The GitHub project "Groqnotes" is a streamlit app utilizing Groq, Whisper, and Llama3 to create structured notes from audio content efficiently. It offers rapid transcription, markdown styling, and download options. Access online or set up locally.
Transcribro: On-device Accurate Speech-to-text
The GitHub repository for "Transcribro" offers project details, downloads, community links, contribution guidelines, donations, branding guidelines, and keyboard UI screenshots. Contact for project-specific support or inquiries.
Audapolis: Edit audio files by word, not waveform
The Audapolis project on GitHub offers a tailored editor for spoken-word media with audio-to-text transcription. It supports various media types, works on Windows, Linux, and macOS, and stores data locally. Funding sources include governmental and foundation support.
Show HN: Voice Out – Text-to-speech to read any webpage, Google Doc, or PDF
Voice Out is a free Chrome extension that reads text aloud from various sources, supports over 60 languages, and enhances productivity with features like background listening and text highlighting.
Reduct: Transcript-Based Video Editing
Reduct is a collaborative platform for managing video and audio content, offering transcription, translation, and editing features. It supports various sectors and formats, enhancing collaboration and content accessibility.
- Many users appreciate oTranscribe for its simplicity and manual transcription assistance, though some find it too basic.
- There is a demand for more advanced features, such as real-time transcription and AI integration, which oTranscribe currently lacks.
- Users share alternative tools and services, including Whisper-based applications and other AI-driven transcription solutions.
- Concerns about language support and accuracy, especially for non-English languages, are frequently mentioned.
- Several users express interest in tools that combine transcription with translation capabilities.
Worked excellent.
It generates both a file that just contains a line per uninterrupted speaker speech prefixed with the speaker number, as well as a file with timestamps which I believe would be used as subtitles.
So no AI here, folks.
- Transcribe word-by-word in real time as audio is recorded
- Work entirely locally
- Use relatively recent open-source local models?
I've been using otter.ai for real-time meeting transcriptions - letting me multitask and instantly catch up if I'm asked a question by skimming the most recent few seconds worth of the transcript - but it's far from perfect and occasionally their real-time service has significant transcription delays, not to mention it requires internet connectivity.
Most of the Whisper-based apps out there, though, as well as (when I last checked) the whisper.cpp demo code, require an entire recording to be ingested at once. There are others that rely on e.g. Apple's dictation frameworks, which is a bit dated in capability at the moment.
Anything folks are using out there?
You do still need to proof and QA even AI results, if you want a publication quality result, and do things like attribute who is speaking when (at least Whisper can't do that), and correct "unusual" last names and things. So I feel like people using AI still need good tools for the correcting/finishing/proofing too, that would be similar to the tools for non-assisted transcription.
Does oTranscribe automatically convert audio into text?
Sorry! It doesn’t. oTranscribe makes the manual task of transcribing
audio a lot less painful. But you still have to do the transcription.
https://www.gally.net/temp/20240809geminitranscription/index...
Aside from some minor punctuation and capitalization issues, Gemini’s transcription looks nearly perfect to me. There were only one or two words that I think it misheard. If I had transcribed the audio myself, I would have made more mistakes than that.
One passage struck me in particular:
And then he comes up with "weird," which becomes viral and the rest, and here he is.
How did Gemini know to put “weird” in quotation marks, to indicate—correctly—that the speaker was referring to Walz’s use of the word as a word? According to Politico, Walz first used the word in that context in the media on July 23.https://www.politico.com/news/2024/07/26/trump-vance-weird-0...
https://video2srt.ccextractor.org/
Disclaimer: Working on this project.
It is functional but a bit slow. I think using whisper directly instead of swift bindings will help a lot.
Really interested in adding diarisation but having a lot of trouble converting Pyannote to CoreML. Pyannote runs so slowly with torch on CPU. Haven’t gotten around putting my latest work for that on GitHub yet.
Happy to accept contributions —
Some priorities right now:
* Fixing signing for local builds
* Replace swift whisper with whisper cpp
* Allowing users to provide their own models
Current features: 1. Download from YT 2. Transcribe using Vosk (output has time codes included) 3. Speaker diarization using pyannote - this isn't perfect and needs a bit more ironing out.
What needs to be done: 4. Store the transcription in a search engine (can include vectors) 5. Implement a webapp
If anyone here is interested to join forces, let me know.
Not developing it actively after I created tables of contents for the several videos I needed, years ago. If I ever need it again, I will probably work on mobile UI (aka responsive)
Nowadays, I use libretranslate/libretranslate and pluja/whishper to do this, but not at real time.
Related
Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3
The GitHub project "Groqnotes" is a streamlit app utilizing Groq, Whisper, and Llama3 to create structured notes from audio content efficiently. It offers rapid transcription, markdown styling, and download options. Access online or set up locally.
Transcribro: On-device Accurate Speech-to-text
The GitHub repository for "Transcribro" offers project details, downloads, community links, contribution guidelines, donations, branding guidelines, and keyboard UI screenshots. Contact for project-specific support or inquiries.
Audapolis: Edit audio files by word, not waveform
The Audapolis project on GitHub offers a tailored editor for spoken-word media with audio-to-text transcription. It supports various media types, works on Windows, Linux, and macOS, and stores data locally. Funding sources include governmental and foundation support.
Show HN: Voice Out – Text-to-speech to read any webpage, Google Doc, or PDF
Voice Out is a free Chrome extension that reads text aloud from various sources, supports over 60 languages, and enhances productivity with features like background listening and text highlighting.
Reduct: Transcript-Based Video Editing
Reduct is a collaborative platform for managing video and audio content, offering transcription, translation, and editing features. It supports various sectors and formats, enhancing collaboration and content accessibility.