AiOla open-sources ultra-fast 'multi-head' speech recognition model
aiOla has launched Whisper-Medusa, an open-source AI model that enhances speech recognition, achieving over 50% faster performance. It supports real-time understanding of industry jargon and operates in over 100 languages.
Read original articleaiOla has introduced Whisper-Medusa, an open-source AI model that enhances automatic speech recognition by combining OpenAI’s Whisper technology with aiOla’s innovations, achieving over 50% faster performance without sacrificing accuracy. Whisper-Medusa operates by predicting ten tokens simultaneously, compared to Whisper's single-token prediction, which significantly accelerates speech processing, particularly for long-form audio. The model is currently available as a 10-head version, with plans for a 20-head version in the future. The model's weights and code are accessible on platforms like Hugging Face and GitHub.
Whisper-Medusa is designed to support businesses by streamlining operations and improving efficiency through its ability to understand industry-specific jargon in real-time, without the need for prior retraining. The technology allows frontline workers to complete tasks via voice or touch, transforming unstructured speech data into actionable insights. This capability is beneficial across various sectors, including aviation, food manufacturing, logistics, and healthcare, as it can comprehend over 100 languages and adapt to different accents and acoustic environments.
With a reported accuracy of over 95%, Whisper-Medusa empowers businesses to optimize processes, reduce costs, and enhance resource allocation, all while maintaining existing workflows. The introduction of this model marks a significant advancement in speech recognition technology, providing organizations with a powerful tool to improve operational efficiency.
Related
WhatsApp Android beta reveals Llama 3 405B option
WhatsApp is updating to version 2.24.14.7 on Google Play Beta, introducing the Meta AI Llama model for enhanced user interactions. Users can choose between different Llama models for tailored AI experiences.
OpenAI slashes the cost of using its AI with a "mini" model
OpenAI launches GPT-4o mini, a cheaper model enhancing AI accessibility. Meta to release Llama 3. Market sees a mix of small and large models for cost-effective AI solutions.
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Big tech wants to make AI cost nothing
Meta has open-sourced its Llama 3.1 language model for organizations with fewer than 700 million users, aiming to enhance its public image and increase product demand amid rising AI infrastructure costs.
OpenAI rolls out voice mode after delaying it for safety reasons
OpenAI is launching a new voice mode for ChatGPT, capable of detecting tones and processing audio directly. It will be available to paying customers by fall, starting with limited users.
Since faster-whisper claims 4x speedup over base Whisper, and I've found WhisperX to be faster still (for longer audio where it can do batch inference), at least on consumer GPUs.
So with AiOla saying "50% speedup", is that actually noteworthy?
If you're interested, you might as well check out Gladia, at least they have a pricing section and allow you to use it as a developer, unlike just asking you to "Request a Demo".
And while a sibling comment links to the GitHub repository, their entire website does not contain such a link.
---
Edit: My bad, for some reason I first checked the website instead of the blog post. Looks much more interesting now.
Related
WhatsApp Android beta reveals Llama 3 405B option
WhatsApp is updating to version 2.24.14.7 on Google Play Beta, introducing the Meta AI Llama model for enhanced user interactions. Users can choose between different Llama models for tailored AI experiences.
OpenAI slashes the cost of using its AI with a "mini" model
OpenAI launches GPT-4o mini, a cheaper model enhancing AI accessibility. Meta to release Llama 3. Market sees a mix of small and large models for cost-effective AI solutions.
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Big tech wants to make AI cost nothing
Meta has open-sourced its Llama 3.1 language model for organizations with fewer than 700 million users, aiming to enhance its public image and increase product demand amid rising AI infrastructure costs.
OpenAI rolls out voice mode after delaying it for safety reasons
OpenAI is launching a new voice mode for ChatGPT, capable of detecting tones and processing audio directly. It will be available to paying customers by fall, starting with limited users.