September 4th, 2024

Ask HN: How to transcribe a couple thousand calls per day?

The user finds Microsoft Speech Service complicated and Azure OpenAI whisper has a low quota. They seek a method to batch transcribe short calls using a powerful server with RTX 4090 cards.

The user has evaluated various speech transcription services, finding Microsoft Speech Service overly complicated and noting that the Azure OpenAI whisper deployment has a low quota. They have had some success using ggeranov's whisper on a Mac, but this solution is not integrated into their corporate network. The user is seeking a method to batch transcribe calls, as they are currently behind schedule. They have access to a server equipped with two RTX 4090 graphics cards, which is ready for use with the necessary Nvidia drivers. The calls that need transcription are relatively short, averaging 90 seconds each.

- Microsoft Speech Service is considered too complicated for user needs.

- Azure OpenAI whisper has a low quota, limiting its usability.

- ggeranov's whisper works well on a Mac but is not part of the corporate network.

- The user needs to batch transcribe calls and is currently behind schedule.

- The available server has powerful hardware (2x RTX 4090) ready for transcription tasks.

OpenAI is set to lose $5B this year

OpenAI's projected costs for 2024 are $7 billion, with a potential $5 billion loss. Revenue from ChatGPT is about $2 billion annually, indicating a significant financial shortfall.

OpenAI rolls out voice mode after delaying it for safety reasons

OpenAI is launching a new voice mode for ChatGPT, capable of detecting tones and processing audio directly. It will be available to paying customers by fall, starting with limited users.

OTranscribe: A free and open tool for transcribing audio interviews

oTranscribe is a free web application for transcribing audio recordings, allowing seamless navigation, automatic saving, and local file storage. It supports multiple export formats and is open source under the MIT license.

Show HN: LLM Aided Transcription Improvement

The LLM-Aided Transcription Improvement Project on GitHub enhances audio transcription quality using a multi-stage pipeline, supporting local and cloud-based models, requiring Python 3.12 for installation and execution.

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

An analysis by Backprop shows the Nvidia RTX 3090 can effectively serve large language models to thousands of users, achieving 12.88 tokens per second for 100 concurrent requests.

6 comments

By @dodysw - 7 months

I transcribed between 3000 to 4000 of 10s-30s short videos, every day for almost 2 years for fun. A cheap desktop linux with second hand x-mining RTX 3060 and 3080Ti, connected over home network using basic Gradio and faster-whisper, so they can be exposed as public API and called from corporate network. Relatively easy and much cheaper compared to commercial offerings at the time. These GPUs are over powered for the task and every day only spent 1 to 2 hours of actual encoding, it's so quick, and it's using the biggest whisper model with audio preprocessing and VAD to improve success rate.

By @solardev - 8 months

Does it have to use Whisper? If so, can't you just run it on that server instead of the Mac? https://github.com/openai/whisper/discussions/1463

If it doesn't, there are a bunch of other speech recognition APIs. Most of them use older techs but might be good enough: https://www.gladia.io/blog/openai-whisper-vs-google-speech-t...

Personally I found Otter.ai works really well for the transcription part, but they don't have an API: https://otter.ai

You can also just upload them all to YouTube in a private playlist and it'll automatically transcribe them for you.

By @philipkiely - 8 months

This is a complete shameless plug but I just published some documentation on automatically building Whisper inference engines with TensorRT-LLM which has the batch inference that you're looking for: https://docs.baseten.co/performance/examples/whisper-trt

By @arthurdelerue - 8 months

We use Whisper Large on NLP Cloud (https://nlpcloud.com/home/playground/asr). It works very well and it's simple to set up in my opinion. If you have a batch to process you could simply subscribe to their pay-as-you-go plan for a couple of weeks/months maybe?

By @eevmanu - 8 months

Consider "Whisper Large V3" on console.groq.com, imo is fast reliable and cheap ($0.03/hour transcribed).

OpenAI is set to lose $5B this year

OpenAI's projected costs for 2024 are $7 billion, with a potential $5 billion loss. Revenue from ChatGPT is about $2 billion annually, indicating a significant financial shortfall.

OpenAI rolls out voice mode after delaying it for safety reasons

OpenAI is launching a new voice mode for ChatGPT, capable of detecting tones and processing audio directly. It will be available to paying customers by fall, starting with limited users.

OTranscribe: A free and open tool for transcribing audio interviews

Show HN: LLM Aided Transcription Improvement

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

An analysis by Backprop shows the Nvidia RTX 3090 can effectively serve large language models to thousands of users, achieving 12.88 tokens per second for 100 concurrent requests.

Ask HN: How to transcribe a couple thousand calls per day?

Related

OpenAI is set to lose $5B this year

OpenAI rolls out voice mode after delaying it for safety reasons

OTranscribe: A free and open tool for transcribing audio interviews

Show HN: LLM Aided Transcription Improvement

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

Related

OpenAI is set to lose $5B this year

OpenAI rolls out voice mode after delaying it for safety reasons

OTranscribe: A free and open tool for transcribing audio interviews

Show HN: LLM Aided Transcription Improvement

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands