August 15th, 2024

Launch HN: Hamming (YC S24) – Automated Testing for Voice Agents

Hamming automates testing for LLM voice agents, enhancing efficiency and accuracy through realistic scenarios. Founders from Tesla and Anduril prioritize data privacy and plan future automation and optimization tools.

CuriositySkepticismExcitement

Launch HN: Hamming (YC S24) – Automated Testing for Voice Agents

Hamming is a platform designed to automate the testing of LLM (Large Language Model) voice agents, allowing users to simulate interactions with difficult end users. The service aims to streamline the iterative process of developing voice agents, which often involves extensive manual testing to ensure accuracy and effectiveness, particularly in contexts like fast food drive-throughs where order accuracy is critical. Hamming's testing process includes creating diverse user personas, simulating real-world scenarios, scoring interactions based on predefined criteria, and tracking quality metrics in production. The founders, Sumanyu and Marius, draw on their experiences from Tesla and Anduril, emphasizing the importance of realistic simulations in testing autonomous systems. Currently, Hamming is manually onboarding users but plans to transition to a self-serve model soon. The platform prioritizes user data privacy and is working towards HIPAA compliance. Future developments include automating scenario generation and LLM judge creation, as well as exploring optimization tools for voice agents. Hamming invites feedback from users and developers in the voice technology space to enhance their offerings.

- Hamming automates testing for LLM voice agents to improve efficiency and accuracy.

- The platform focuses on creating realistic user scenarios and scoring interactions.

- Founders leverage experience from Tesla and Anduril to enhance testing methodologies.

- Hamming is transitioning to a self-serve model while ensuring data privacy.

- Future plans include automation of testing processes and optimization tools for voice agents.

Self hosting a Copilot replacement: my personal experience

The author shares their experience self-hosting a GitHub Copilot replacement using local Large Language Models (LLMs). Results varied, with none matching Copilot's speed and accuracy. Despite challenges, the author plans to continue using Copilot.

Using Agents to Not Use Agents: How we built our Text-to-SQL Q & A system

Ask-a-Metric is a WhatsApp-based AI tool for SQL queries in the development sector, improving accuracy and efficiency through a pseudo-agent pipeline, achieving under 15 seconds response time and low costs.

Language model can listen while speaking

Recent advancements in speech language models led to the development of the listening-while-speaking language model (LSLM), enabling real-time interaction and robust performance in interactive speech dialogue systems.

My chatbot builder is over-engineered, and I love it

The article details the development of Fastmind, a scalable chatbot builder, emphasizing user feedback, the importance of familiar technologies, and the need to launch products sooner for iterative improvement.

Show HN: AI co-worker for system software development (Rust,C,C++,pdf)

H2LooP.AI accelerates system software development by converting unstructured data into structured datasets for model fine-tuning, offering tools for debugging and emphasizing user control over data and models.

AI: What people are saying

The comments reflect a mix of excitement and skepticism regarding the automation of testing for LLM voice agents.

Some users express concerns about the efficiency of voice agents compared to non-voice interfaces, questioning the need for voice interactions.
Others celebrate the potential of the product, highlighting its usefulness for real-time testing and quality control.
There are worries about the impact of LLMs on low-income workers, with calls for AI to create new opportunities rather than just replace existing jobs.
Several commenters inquire about open-source options and integration possibilities with existing systems.
Some express doubt about the reliability of current voice agents, suggesting that the market may be premature for such testing services.

19 comments

By @themacguffinman - 9 months

AI voice agents are weird to me because voice is already a very inefficient and ambiguous medium, the only reason I would make a voice call is to talk to a human who is equipped to tackle the ambiguous edge cases that the engineers didn't already anticipate.

If you're going to develop AI voice agents to tackle pre-determined cases, why wouldn't you just develop a self-serve non-voice UI that's way more efficient? Why make your users navigate a nebulous conversation tree to fulfill a programmable task?

Personally when I realize I can only talk to a bot, I lose interest and end the call. If I wanted to do something routine, I wouldn't have called.

By @neilk - 9 months

Why “Hamming”? As in Richard Hamming, ex-Bell Labs, “You and Your Research”?

By @pj_mukh - 9 months

My 2.5 year old yesterday starting saying "Hey, This is a test, Can you hear me?", parroting me spending hours testing my LLM. Hah.

This will work with a https://www.pipecat.ai type system? Would love to wrap a continuous testing system with my bot.

By @zebomon - 9 months

As someone whose job has been negatively impacted by LLMs already, I'll echo the sentiment here that use cases like this one are sort of depressing, as they will primarily impact people who work long hours for small pay. It certainly seems like there's money to be made in this, so congratulations. The landing page is clear and inviting as well. I think I understand what my workflow inside it would be like based on your text and images.

I'm most excited to see well-done concepts in this space, though, as I hope it means we're fast-forwarding past this era to one in which we use AI to do new things for people and not just do old things more cheaply. There's undeniably value in the latter but I can't shake the feeling that the short-term effects are really going to sting for some low-income people who can only hope that the next wave of innovations will benefit them too.

By @diwank - 9 months

Congratulations for the launch! We had a big QC need for https://kea.ai/ where we needed to stress test our CX agents in real time too. This would be a big life saver. kudos on the product and the brilliant demo!

By @atyro - 9 months

Nice! Great to see the UI looks clean enough that it's accessible to non-engineers. The prompt management and active monitoring combo looks especially useful. Been looking for something with this combo for an expense app we're building.

By @serjester - 9 months

I feel like the better positioning would be evals for voice agents. It seems just as challenging to figure out all the ways your system can go wrong, as it is to build the system in the first place. Doing this in a way that actually adds value without any domain expertise, seems impossible.

If it did, wouldn't all the companies with production AI text interfaces be using similar techniques? Now being able to easily replay a conversation that was recorded with a real user seems like a huge value add.

By @euvin - 9 months

The idea of testing an agent with annoying situations, like uncooperative people or vague responses, makes me wonder if, in the future, similar approaches might be tried on humans. People could be (unknowingly) subjected to automated "social benchmarks" with artificially designed situations, which I'm sure I don't have to explain how dystopian that is.

It would essentially be another form of a behavioral interview. I wonder if this exists already, in some form?

By @telecomhacker - 9 months

I work in the telecom space. I don't think this paradigm will get adopted in the near future. Customers are already building voice bots on top of Google Dialogflow e.g. Cognigy. Cognigy does have LLM capabilities, but it is not widely adopted. I think voice bots will still have to be manually configured for some time.

By @xan_ps007 - 9 months

is there an open source variant available? I am building https://github.com/bolna-ai/bolna which is an open source voice orchestration.

would love to have something like this integrated as part of our open source stack.

By @rstocker99 - 9 months

That drive through customer… oh my. I have new found empathy for drive through operators.

By @bazlan - 9 months

As someone who has worked in TTS for over 4 years now. I can tell you that evaluation is the most difficult aspect of generative audio ML.

How will this really check that the models are performing well vs just listening?

By @prithvi24 - 9 months

This is great to see. Evals on voice are hard - we only have evals on text based prompting, but it doesn't fully capture everything. Excited to give this a try.

By @kinard - 9 months

I'm working on AI voice agents here in the UK for real estate professionals, unfortunately I couldn't try your service.

By @vizhang92 - 9 months

Awesome work guys! Which industries / jobs do you suspect will be adopting voice agents the fastest?

By @meiraleal - 9 months

There is not even one reliable and proven "voice agent" yet (correct me if I'm wrong but the best available, elevenlabs, isn't that great yet to be a voice agent) but there is already companies selling the test of voice agents?

Selling shovels on a gold rush seems to have become the only one mantra here.

By @plurby - 9 months

Wow, gonna test this with my Retell AI agent.

Launch HN: Hamming (YC S24) – Automated Testing for Voice Agents

Related

Self hosting a Copilot replacement: my personal experience

Using Agents to Not Use Agents: How we built our Text-to-SQL Q & A system

Language model can listen while speaking

My chatbot builder is over-engineered, and I love it

Show HN: AI co-worker for system software development (Rust,C,C++,pdf)

Related

Self hosting a Copilot replacement: my personal experience

Using Agents to Not Use Agents: How we built our Text-to-SQL Q & A system

Language model can listen while speaking

My chatbot builder is over-engineered, and I love it

Show HN: AI co-worker for system software development (Rust,C,C++,pdf)