August 21st, 2024

Whisper-WebUI

Whisper-WebUI is a Gradio-based interface for OpenAI's Whisper model, enabling subtitle generation from various sources, supporting multiple formats, and offering translation features. Installation requires git, Python, and FFmpeg.

Read original article

Whisper-WebUI is a Gradio-based browser interface designed for OpenAI's Whisper model, enabling users to easily generate subtitles. It supports multiple Whisper implementations, including `openai/whisper`, `SYSTRAN/faster-whisper`, and `Vaibhavs10/insanely-fast-whisper`. The tool can create subtitles from various sources such as files, YouTube videos, and microphone input, and it supports formats like SRT, WebVTT, and plain text. Additionally, it features translation capabilities for speech-to-text and text-to-text using models like Facebook's NLLB and DeepL API. Audio processing is enhanced with Silero VAD for pre-processing and speaker diarization through the pyannote model for post-processing. To install and run Whisper-WebUI, users need `git`, `python` (versions 3.8 to 3.10), and `FFmpeg`. The repository provides comprehensive installation instructions, including automatic scripts and Docker support. The project is optimized for efficient VRAM usage and transcription speed, particularly with the `faster-whisper` implementation, and offers various models with different sizes and VRAM requirements to accommodate different hardware capabilities.

- Whisper-WebUI is a browser interface for OpenAI's Whisper model.

- It supports multiple implementations and can generate subtitles from various sources.

- The tool includes translation features and supports multiple subtitle formats.

- Installation requires `git`, `python`, and `FFmpeg`, with detailed instructions provided.

- The project is optimized for VRAM usage and transcription speed.

Show HN: 30ms latency screen sharing in Rust

BitWHIP is a Rust-based CLI WebRTC Agent for low-latency desktop sharing and video streaming. It supports open protocols and works well with OBS, FFmpeg, and GStreamer. Find more on GitHub.

AiOla open-sources ultra-fast 'multi-head' speech recognition model

aiOla has launched Whisper-Medusa, an open-source AI model that enhances speech recognition, achieving over 50% faster performance. It supports real-time understanding of industry jargon and operates in over 100 languages.

Show HN: LLM Aided Transcription Improvement

The LLM-Aided Transcription Improvement Project on GitHub enhances audio transcription quality using a multi-stage pipeline, supporting local and cloud-based models, requiring Python 3.12 for installation and execution.

Open-source tool translates and dubs videos into other languages using AI

pyvideotrans is a GitHub tool for video translation and dubbing, supporting multiple languages. It offers features like subtitle generation, audio extraction, and batch processing, with installation guides for various operating systems.

Llamafile v0.8.13 (and Whisperfile)

Llamafile version 0.8.13 supports the Gemma 2B and Whisper models, allowing users to transcribe audio files. Compatibility requires 16kHz .wav format, with performance improved using GPU on M2 Max.

3 comments

By @jamesblonde - 8 months

This was a lab assignment I gave my students at KTH in Nov 2022Ö https://github.com/ID2223KTH/id2223kth.github.io/blob/master...

See some of their example solutions here (along with their projects in 2022): https://id2223kth.github.io/assignments/project/ServerlessML...

One of the student labs was gamified language learning - say the image in the language you are learning, and Whisper tells you whether you said it in an understandable way.

This Gradio implementation is a more polished version of their early efforts.

By @yjftsjthsd-h - 8 months

This says it can generate subtitle files, which is something I've wanted from whisper for a while. But does anyone know of a way to do that with just a cli tool that I can run locally? Like, ideally, just `whisper-make-subtitles ./*.mp4` to loop over every .mp4 in a directory and create matching subtitle files.

By @vlugorilla - 8 months

Also check out https://whishper.net

Show HN: 30ms latency screen sharing in Rust

BitWHIP is a Rust-based CLI WebRTC Agent for low-latency desktop sharing and video streaming. It supports open protocols and works well with OBS, FFmpeg, and GStreamer. Find more on GitHub.

AiOla open-sources ultra-fast 'multi-head' speech recognition model

Show HN: LLM Aided Transcription Improvement

Open-source tool translates and dubs videos into other languages using AI

Llamafile v0.8.13 (and Whisperfile)

Llamafile version 0.8.13 supports the Gemma 2B and Whisper models, allowing users to transcribe audio files. Compatibility requires 16kHz .wav format, with performance improved using GPU on M2 Max.

Whisper-WebUI

Related

Show HN: 30ms latency screen sharing in Rust

AiOla open-sources ultra-fast 'multi-head' speech recognition model

Show HN: LLM Aided Transcription Improvement

Open-source tool translates and dubs videos into other languages using AI

Llamafile v0.8.13 (and Whisperfile)

Related

Show HN: 30ms latency screen sharing in Rust

AiOla open-sources ultra-fast 'multi-head' speech recognition model

Show HN: LLM Aided Transcription Improvement

Open-source tool translates and dubs videos into other languages using AI

Llamafile v0.8.13 (and Whisperfile)