August 15th, 2024

Launch HN: Hamming (YC S24) – Automated Testing for Voice Agents

Hamming automates testing for LLM voice agents, enhancing efficiency and accuracy through realistic scenarios. Founders from Tesla and Anduril prioritize data privacy and plan future automation and optimization tools.

CuriositySkepticismExcitement
Launch HN: Hamming (YC S24) – Automated Testing for Voice Agents

Hamming is a platform designed to automate the testing of LLM (Large Language Model) voice agents, allowing users to simulate interactions with difficult end users. The service aims to streamline the iterative process of developing voice agents, which often involves extensive manual testing to ensure accuracy and effectiveness, particularly in contexts like fast food drive-throughs where order accuracy is critical. Hamming's testing process includes creating diverse user personas, simulating real-world scenarios, scoring interactions based on predefined criteria, and tracking quality metrics in production. The founders, Sumanyu and Marius, draw on their experiences from Tesla and Anduril, emphasizing the importance of realistic simulations in testing autonomous systems. Currently, Hamming is manually onboarding users but plans to transition to a self-serve model soon. The platform prioritizes user data privacy and is working towards HIPAA compliance. Future developments include automating scenario generation and LLM judge creation, as well as exploring optimization tools for voice agents. Hamming invites feedback from users and developers in the voice technology space to enhance their offerings.

- Hamming automates testing for LLM voice agents to improve efficiency and accuracy.

- The platform focuses on creating realistic user scenarios and scoring interactions.

- Founders leverage experience from Tesla and Anduril to enhance testing methodologies.

- Hamming is transitioning to a self-serve model while ensuring data privacy.

- Future plans include automation of testing processes and optimization tools for voice agents.

AI: What people are saying
The comments reflect a mix of excitement and skepticism regarding the automation of testing for LLM voice agents.
  • Some users express concerns about the efficiency of voice agents compared to non-voice interfaces, questioning the need for voice interactions.
  • Others celebrate the potential of the product, highlighting its usefulness for real-time testing and quality control.
  • There are worries about the impact of LLMs on low-income workers, with calls for AI to create new opportunities rather than just replace existing jobs.
  • Several commenters inquire about open-source options and integration possibilities with existing systems.
  • Some express doubt about the reliability of current voice agents, suggesting that the market may be premature for such testing services.
Link Icon 19 comments
By @themacguffinman - 9 months
AI voice agents are weird to me because voice is already a very inefficient and ambiguous medium, the only reason I would make a voice call is to talk to a human who is equipped to tackle the ambiguous edge cases that the engineers didn't already anticipate.

If you're going to develop AI voice agents to tackle pre-determined cases, why wouldn't you just develop a self-serve non-voice UI that's way more efficient? Why make your users navigate a nebulous conversation tree to fulfill a programmable task?

Personally when I realize I can only talk to a bot, I lose interest and end the call. If I wanted to do something routine, I wouldn't have called.

By @neilk - 9 months
Why “Hamming”? As in Richard Hamming, ex-Bell Labs, “You and Your Research”?
By @pj_mukh - 9 months
My 2.5 year old yesterday starting saying "Hey, This is a test, Can you hear me?", parroting me spending hours testing my LLM. Hah.

This will work with a https://www.pipecat.ai type system? Would love to wrap a continuous testing system with my bot.

By @zebomon - 9 months
As someone whose job has been negatively impacted by LLMs already, I'll echo the sentiment here that use cases like this one are sort of depressing, as they will primarily impact people who work long hours for small pay. It certainly seems like there's money to be made in this, so congratulations. The landing page is clear and inviting as well. I think I understand what my workflow inside it would be like based on your text and images.

I'm most excited to see well-done concepts in this space, though, as I hope it means we're fast-forwarding past this era to one in which we use AI to do new things for people and not just do old things more cheaply. There's undeniably value in the latter but I can't shake the feeling that the short-term effects are really going to sting for some low-income people who can only hope that the next wave of innovations will benefit them too.

By @diwank - 9 months
Congratulations for the launch! We had a big QC need for https://kea.ai/ where we needed to stress test our CX agents in real time too. This would be a big life saver. kudos on the product and the brilliant demo!
By @atyro - 9 months
Nice! Great to see the UI looks clean enough that it's accessible to non-engineers. The prompt management and active monitoring combo looks especially useful. Been looking for something with this combo for an expense app we're building.
By @serjester - 9 months
I feel like the better positioning would be evals for voice agents. It seems just as challenging to figure out all the ways your system can go wrong, as it is to build the system in the first place. Doing this in a way that actually adds value without any domain expertise, seems impossible.

If it did, wouldn't all the companies with production AI text interfaces be using similar techniques? Now being able to easily replay a conversation that was recorded with a real user seems like a huge value add.

By @euvin - 9 months
The idea of testing an agent with annoying situations, like uncooperative people or vague responses, makes me wonder if, in the future, similar approaches might be tried on humans. People could be (unknowingly) subjected to automated "social benchmarks" with artificially designed situations, which I'm sure I don't have to explain how dystopian that is.

It would essentially be another form of a behavioral interview. I wonder if this exists already, in some form?

By @telecomhacker - 9 months
I work in the telecom space. I don't think this paradigm will get adopted in the near future. Customers are already building voice bots on top of Google Dialogflow e.g. Cognigy. Cognigy does have LLM capabilities, but it is not widely adopted. I think voice bots will still have to be manually configured for some time.
By @xan_ps007 - 9 months
is there an open source variant available? I am building https://github.com/bolna-ai/bolna which is an open source voice orchestration.

would love to have something like this integrated as part of our open source stack.

By @rstocker99 - 9 months
That drive through customer… oh my. I have new found empathy for drive through operators.
By @bazlan - 9 months
As someone who has worked in TTS for over 4 years now. I can tell you that evaluation is the most difficult aspect of generative audio ML.

How will this really check that the models are performing well vs just listening?

By @prithvi24 - 9 months
This is great to see. Evals on voice are hard - we only have evals on text based prompting, but it doesn't fully capture everything. Excited to give this a try.
By @kinard - 9 months
I'm working on AI voice agents here in the UK for real estate professionals, unfortunately I couldn't try your service.
By @vizhang92 - 9 months
Awesome work guys! Which industries / jobs do you suspect will be adopting voice agents the fastest?
By @meiraleal - 9 months
There is not even one reliable and proven "voice agent" yet (correct me if I'm wrong but the best available, elevenlabs, isn't that great yet to be a voice agent) but there is already companies selling the test of voice agents?

Selling shovels on a gold rush seems to have become the only one mantra here.

By @plurby - 9 months
Wow, gonna test this with my Retell AI agent.