August 12th, 2024

Show HN: LLM Aided Transcription Improvement

The LLM-Aided Transcription Improvement Project on GitHub enhances audio transcription quality using a multi-stage pipeline, supporting local and cloud-based models, requiring Python 3.12 for installation and execution.

Read original article

Show HN: LLM Aided Transcription Improvement

The LLM-Aided Transcription Improvement Project on GitHub focuses on enhancing the quality of audio transcriptions generated by models like OpenAI's Whisper. It employs a multi-stage processing pipeline that utilizes language model (LLM) prompts to improve the structure, readability, and formatting of transcription outputs. Key features include a multi-stage processing approach that cleans errors and formats text in markdown, parallel processing for efficiency, support for both local and cloud-based LLMs with OpenAI's GPT-4o-mini as the default, and mechanisms for quality assessment of the final output compared to the original transcription. To use the project, users need Python 3.12 or higher and must follow installation steps that include cloning the repository, creating a virtual environment, and configuring environment variables for API keys and model selection. Users can then execute the script with their transcription JSON file to generate a formatted markdown file. Example outputs are provided to illustrate the transformation from raw JSON to structured markdown.

- The project enhances audio transcription quality using a multi-stage processing pipeline.

- It supports both local and cloud-based language models.

- Users can process transcriptions concurrently for improved efficiency.

- Installation requires Python 3.12 and specific library dependencies.

- Example outputs demonstrate the project's effectiveness in formatting transcriptions.

Show HN: AI assisted image editing with audio instructions

The GitHub repository hosts "AAIELA: AI Assisted Image Editing with Language and Audio," a project enabling image editing via audio commands and AI models. It integrates various technologies for object detection, language processing, and image inpainting. Future plans involve model enhancements and feature integrations.

LLMs can solve hard problems

LLMs, like Claude 3.5 'Sonnet', excel in tasks such as generating podcast transcripts, identifying speakers, and creating episode synopses efficiently. Their successful application demonstrates practicality and versatility in problem-solving.

PyTorch – Torchchat: Chat with LLMs Everywhere

The torchchat GitHub repository enables execution of large language models using PyTorch on multiple platforms, supporting models like Llama 3 and Mistral, with features for chatting, text generation, and evaluation.

OTranscribe: A free and open tool for transcribing audio interviews

oTranscribe is a free web application for transcribing audio recordings, allowing seamless navigation, automatic saving, and local file storage. It supports multiple export formats and is open source under the MIT license.

Show HN: LLM Aided OCR (Correcting Tesseract OCR Errors with LLMs)

The LLM-Aided OCR Project enhances Optical Character Recognition by integrating natural language processing and large language models, producing accurate documents from raw OCR text and supporting local and cloud-based LLMs.

2 comments

By @gavmor - 9 months

I record long, rambling voice memos in noisy environments which Whisper struggles to parse. Perhaps this can rescue me from the tedium of hand-stitching the fragmented results together. GIGO, of course, but there's an equilibrium here that might be struck.

By @ramonverse - 9 months

i'm curious about the chunk splitting approach you mentioned. how do you determine the optimal chunk size for processing? seems like there could be a tradeoff between context preservation and processing efficiency. have you experimented with different chunk sizes and their impact on the quality of the final output? this could be really important for handling things like long-range dependencies in the text.

Show HN: LLM Aided Transcription Improvement

Related

Show HN: AI assisted image editing with audio instructions

LLMs can solve hard problems

PyTorch – Torchchat: Chat with LLMs Everywhere

OTranscribe: A free and open tool for transcribing audio interviews

Show HN: LLM Aided OCR (Correcting Tesseract OCR Errors with LLMs)

Related

Show HN: AI assisted image editing with audio instructions

LLMs can solve hard problems

PyTorch – Torchchat: Chat with LLMs Everywhere

OTranscribe: A free and open tool for transcribing audio interviews

Show HN: LLM Aided OCR (Correcting Tesseract OCR Errors with LLMs)