Whisper-WebUI
Whisper-WebUI is a Gradio-based interface for OpenAI's Whisper model, enabling subtitle generation from various sources, supporting multiple formats, and offering translation features. Installation requires git, Python, and FFmpeg.
Read original articleWhisper-WebUI is a Gradio-based browser interface designed for OpenAI's Whisper model, enabling users to easily generate subtitles. It supports multiple Whisper implementations, including `openai/whisper`, `SYSTRAN/faster-whisper`, and `Vaibhavs10/insanely-fast-whisper`. The tool can create subtitles from various sources such as files, YouTube videos, and microphone input, and it supports formats like SRT, WebVTT, and plain text. Additionally, it features translation capabilities for speech-to-text and text-to-text using models like Facebook's NLLB and DeepL API. Audio processing is enhanced with Silero VAD for pre-processing and speaker diarization through the pyannote model for post-processing. To install and run Whisper-WebUI, users need `git`, `python` (versions 3.8 to 3.10), and `FFmpeg`. The repository provides comprehensive installation instructions, including automatic scripts and Docker support. The project is optimized for efficient VRAM usage and transcription speed, particularly with the `faster-whisper` implementation, and offers various models with different sizes and VRAM requirements to accommodate different hardware capabilities.
- Whisper-WebUI is a browser interface for OpenAI's Whisper model.
- It supports multiple implementations and can generate subtitles from various sources.
- The tool includes translation features and supports multiple subtitle formats.
- Installation requires `git`, `python`, and `FFmpeg`, with detailed instructions provided.
- The project is optimized for VRAM usage and transcription speed.
Related
Show HN: 30ms latency screen sharing in Rust
BitWHIP is a Rust-based CLI WebRTC Agent for low-latency desktop sharing and video streaming. It supports open protocols and works well with OBS, FFmpeg, and GStreamer. Find more on GitHub.
AiOla open-sources ultra-fast 'multi-head' speech recognition model
aiOla has launched Whisper-Medusa, an open-source AI model that enhances speech recognition, achieving over 50% faster performance. It supports real-time understanding of industry jargon and operates in over 100 languages.
Show HN: LLM Aided Transcription Improvement
The LLM-Aided Transcription Improvement Project on GitHub enhances audio transcription quality using a multi-stage pipeline, supporting local and cloud-based models, requiring Python 3.12 for installation and execution.
Open-source tool translates and dubs videos into other languages using AI
pyvideotrans is a GitHub tool for video translation and dubbing, supporting multiple languages. It offers features like subtitle generation, audio extraction, and batch processing, with installation guides for various operating systems.
Llamafile v0.8.13 (and Whisperfile)
Llamafile version 0.8.13 supports the Gemma 2B and Whisper models, allowing users to transcribe audio files. Compatibility requires 16kHz .wav format, with performance improved using GPU on M2 Max.
See some of their example solutions here (along with their projects in 2022): https://id2223kth.github.io/assignments/project/ServerlessML...
One of the student labs was gamified language learning - say the image in the language you are learning, and Whisper tells you whether you said it in an understandable way.
This Gradio implementation is a more polished version of their early efforts.
Related
Show HN: 30ms latency screen sharing in Rust
BitWHIP is a Rust-based CLI WebRTC Agent for low-latency desktop sharing and video streaming. It supports open protocols and works well with OBS, FFmpeg, and GStreamer. Find more on GitHub.
AiOla open-sources ultra-fast 'multi-head' speech recognition model
aiOla has launched Whisper-Medusa, an open-source AI model that enhances speech recognition, achieving over 50% faster performance. It supports real-time understanding of industry jargon and operates in over 100 languages.
Show HN: LLM Aided Transcription Improvement
The LLM-Aided Transcription Improvement Project on GitHub enhances audio transcription quality using a multi-stage pipeline, supporting local and cloud-based models, requiring Python 3.12 for installation and execution.
Open-source tool translates and dubs videos into other languages using AI
pyvideotrans is a GitHub tool for video translation and dubbing, supporting multiple languages. It offers features like subtitle generation, audio extraction, and batch processing, with installation guides for various operating systems.
Llamafile v0.8.13 (and Whisperfile)
Llamafile version 0.8.13 supports the Gemma 2B and Whisper models, allowing users to transcribe audio files. Compatibility requires 16kHz .wav format, with performance improved using GPU on M2 Max.