June 20th, 2024

Show HN: Local voice assistant using Ollama, transformers and Coqui TTS toolkit

The GitHub project "june" combines Ollama, Hugging Face Transformers, and Coqui TTS Toolkit for a private voice chatbot on local machines. It includes setup, usage, customization details, and FAQs. Contact for help.

Read original article

Show HN: Local voice assistant using Ollama, transformers and Coqui TTS toolkit

The GitHub project "june" is a local voice chatbot merging Ollama, Hugging Face Transformers, and the Coqui TTS Toolkit. It offers a privacy-centric voice interaction solution for local machines. The project covers installation guidelines, usage instructions, customization options, and a FAQ section. For additional information or support, reach out for assistance.

Show HN: Pomoglorbo, a TUI Pomodoro timer for your terminal

A Pomodoro Technique timer, Pomoglorbo, enhances productivity with customizable features like audio settings and work intervals. Users can contribute to the project following guidelines for development and testing.

Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3

The GitHub project "Groqnotes" is a streamlit app utilizing Groq, Whisper, and Llama3 to create structured notes from audio content efficiently. It offers rapid transcription, markdown styling, and download options. Access online or set up locally.

Show HN: Feedback on Sketch Colourisation

The GitHub repository contains SketchDeco, a project for colorizing black and white sketches without training. It includes setup instructions, usage guidelines, acknowledgments, and future plans. Users can seek support if needed.

LibreChat: Enhanced ChatGPT clone for self-hosting

LibreChat introduces a new Resources Hub, featuring a customizable AI chat platform supporting various providers and services. It aims to streamline AI interactions, offering documentation, blogs, and demos for users.

Gren 0.4: New Foundations

Gren 0.4 updates its functional language with enhanced core packages, a new compiler, revamped FileSystem API, improved functions, and a community shift to Discord. These updates aim to boost usability and community engagement.

14 comments

By @modeless - 11 months

Coqui's XTTSv2 is good for this because it has a streaming mode. I have my own version of this where I got ~500ms end-to-end response latency, which is much faster than any other open source project I've seen. https://github.com/jdarpinian/chirpy

These are easy to make and fun to play with and it's awesome to have everything local. But it will take more to build something truly useable. A truly natural conversational AI needs to understand the nuances of conversation, most importantly when to speak and when to wait. It also needs to know subtleties of the user's voice that no speech recognizer can output, and it needs control over the output voice more precise than any TTS provides. Audio-to-audio models in the style of GPT-4o are clearly the way forward. (And someday soon, video-to-video models for video calling with a virtual avatar. And the step after that is robotics for physical avatars).

There aren't any open source audio-to-audio models yet but there are some promising approaches. https://ultravox.ai has the input half at least. https://tincans.ai/slm has a cool approach too.

By @replete - 11 months

I tried a similar project out last week, which uses Ollama, FastWhisperAPI, and MeloTTS: https://github.com/PromtEngineer/Verbi

Docker is a great option if you want lots of people to try out your project, but not many apps in this space come with a dockerfile

By @sleight42 - 11 months

Ok, I need this but cloning Majel Barrett as the voice of the Enterprise computer.

By @xan_ps007 - 11 months

we have made an open source orchestration which enables you to plug in your own TTS/ASR/LLM for end-to-end voice conversations at -> https://github.com/bolna-ai/bolna.

We are also working on a complete open source stack for ASR+TTS+LLM and will be releasing it shortly.

By @underlines - 11 months

Honestly, there are so many Project on Github doing STT - LLM - TTS that I lost count. The only revolutionary thing that feels like magic is if the STT supports Voice Activity Detection and low latency LLM inference on Groq, so conversations feel natural.

By @aftbit - 11 months

Looks interesting! Is the latency low enough for it to feel natural? How's the Coqui speech quality?

By @wkat4242 - 11 months

I currently use Ollama + Openwebui for this. It also has a really serviceable voice mode. And it has many options like RAG integrations, custom models, memories to know you better, vision, a great web interface etc. But I'll have a look at this thing.

By @xan_ps007 - 11 months

Today we released our full open source end to end ASR+LLM+TTS dockerized stack at ->

https://news.ycombinator.com/item?id=40789200

By @replete - 11 months

How does the STT compare to Fastwhisper?

By @Gryph0n77 - 11 months

How many RAM GB the model requires?

By @skenderbeu - 11 months

My very first Multimodal AI star on Github. Hope we see more of these in the future.

By @m3kw9 - 11 months

How long till a stand alone OS that makes AI usage its first class citizen?

Show HN: Local voice assistant using Ollama, transformers and Coqui TTS toolkit

Related

Show HN: Pomoglorbo, a TUI Pomodoro timer for your terminal

Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3

Show HN: Feedback on Sketch Colourisation

LibreChat: Enhanced ChatGPT clone for self-hosting

Gren 0.4: New Foundations

Related

Show HN: Pomoglorbo, a TUI Pomodoro timer for your terminal

Groqnotes: Generate structured notes from audio using Groq, Whisper, and Llama3

Show HN: Feedback on Sketch Colourisation

LibreChat: Enhanced ChatGPT clone for self-hosting

Gren 0.4: New Foundations