August 21st, 2024

Whisper-WebUI

Whisper-WebUI is a Gradio-based interface for OpenAI's Whisper model, enabling subtitle generation from various sources, supporting multiple formats, and offering translation features. Installation requires git, Python, and FFmpeg.

Read original articleLink Icon
Whisper-WebUI

Whisper-WebUI is a Gradio-based browser interface designed for OpenAI's Whisper model, enabling users to easily generate subtitles. It supports multiple Whisper implementations, including `openai/whisper`, `SYSTRAN/faster-whisper`, and `Vaibhavs10/insanely-fast-whisper`. The tool can create subtitles from various sources such as files, YouTube videos, and microphone input, and it supports formats like SRT, WebVTT, and plain text. Additionally, it features translation capabilities for speech-to-text and text-to-text using models like Facebook's NLLB and DeepL API. Audio processing is enhanced with Silero VAD for pre-processing and speaker diarization through the pyannote model for post-processing. To install and run Whisper-WebUI, users need `git`, `python` (versions 3.8 to 3.10), and `FFmpeg`. The repository provides comprehensive installation instructions, including automatic scripts and Docker support. The project is optimized for efficient VRAM usage and transcription speed, particularly with the `faster-whisper` implementation, and offers various models with different sizes and VRAM requirements to accommodate different hardware capabilities.

- Whisper-WebUI is a browser interface for OpenAI's Whisper model.

- It supports multiple implementations and can generate subtitles from various sources.

- The tool includes translation features and supports multiple subtitle formats.

- Installation requires `git`, `python`, and `FFmpeg`, with detailed instructions provided.

- The project is optimized for VRAM usage and transcription speed.

Link Icon 3 comments
By @jamesblonde - 8 months
This was a lab assignment I gave my students at KTH in Nov 2022Ö https://github.com/ID2223KTH/id2223kth.github.io/blob/master...

See some of their example solutions here (along with their projects in 2022): https://id2223kth.github.io/assignments/project/ServerlessML...

One of the student labs was gamified language learning - say the image in the language you are learning, and Whisper tells you whether you said it in an understandable way.

This Gradio implementation is a more polished version of their early efforts.

By @yjftsjthsd-h - 8 months
This says it can generate subtitle files, which is something I've wanted from whisper for a while. But does anyone know of a way to do that with just a cli tool that I can run locally? Like, ideally, just `whisper-make-subtitles ./*.mp4` to loop over every .mp4 in a directory and create matching subtitle files.
By @vlugorilla - 8 months
Also check out https://whishper.net