Michael Seibel

Hamming AI (YC S24) - Automated testing for voice agents

Hamming tests your AI voice agents 100x faster than manual calls. Create Character.ai-style personas and scenarios. Run 100s of simultaneous phone calls to find bugs in your voice agents. Get detailed analytics on where to improve.

Add a comment

Replies

Best
Sumanyu Sharma
👋 Hi ya'll - Sumanyu and Marius here from Hamming AI. Hamming lets you automatically test your LLM voice agent. In our interactive demo, you play the role of the voice agent, and our agent will play the role of a difficult end user. We'll then score your performance on the call. 🕵️ Try it here: https://app.hamming.ai/voice-demo (no signup needed). In practice, our agents call your agent! Marius and I previously ran growth and data teams at companies like Citizen, Tesla, and Anduril. We're excited to launch our automated voice testing feature to help you test your voice agents 100x faster than manual phone calls. 📞 LLM voice agents currently require a LOT of iteration and tuning. For example, one of our customers is building an LLM drive-through voice agent for fast food chains. Their KPI is order accuracy. It's crucial for their system to gracefully handle dietary restrictions like allergies and customers who get distracted or otherwise change their minds mid-order. Mistakes in this context could lead to unhappy customers, potential health risks, and financial losses. 🪄 Our solution involves four steps: (1) Create diverse but realistic user personas and scenarios covering the expected conversation space. We create these ourselves for each of our customers. (2) Have our agents call your agent when we test your agent's ability to handle things like background noise, long silences, or interruptions. Or have us test just the LLM / logic layer (function calls, etc.) via an API hook. (3) We score the outputs for each conversation using deterministic checks and LLM judges tailored to the specific problem domain (e.g., order accuracy, tone, friendliness). (4) Re-use the checks and judges above to score production traffic and use it to track quality metrics in production. (i.e., online evals) We created a Loom recording showing our customers' logged-in experience: Logged-in Video Walkthrough We think there will be more and more voice companies, and making the experimentation process easier is a problem we are excited about solving. 📩 If you're building voice agents and you're struggling to make them reliable, reach out at sumanyu@hamming.ai! ❤️ Shoutout to @rajiv_ayyangar and @gabe for helping us with the launch!
Rohan Chaubey
🔌 Plugged in
@sumanyu_sharma Congrats on the launch Sumanyu! :) This is the best first comment I have seen on PH outlining the product details, walkthrough, contact info., etc. Shared it with our community.
Sumanyu Sharma
@rohanrecommends You're the absolute best. Love your support!
Nikhil Pareek
Evals for a specific industry is a great idea. Llm as a judge is great, but comes with its own challenges, would be interesting to see how it performs for wide usecases. Also, persona generation automation based on usecases would also be great. Im sure thats in our roadmap 😀 Congratulations on the launch! 🚀
Sumanyu Sharma
@nikhilpareek Absolutely. So far we've seen 95%+ alignment between LLM and human judgement. Yup, we're already doing persona generation based on use cases :)
Sumanyu Sharma
@nikhilpareek Try the demo here: https://app.hamming.ai/voice-demo We created these three to show the range but we can basically create any persona you can imagine :)
Tony Han
1000 parallel simulated calls to the AI voice agent is such a banger line and claim! As a product person, this is my #1 concern when I build AI product which is consistency. Sometimes, you just don't know why something breaks. Testing by hands and evaluating with eyes only go so far. It's about time for automated testing for LLM voice agent! Does this work with traditional chatbot usage? For example, we have one AI avatar talking to a human through web app? Congrats on the launch @sumanyu_sharma and team!
Sumanyu Sharma
@tonyhanded Yup it works for traditional conversational chat as well; voice is just a special case of the former!
Bailey Spell
Couldn't be more excited for these guys!! They have been working incredibly hard and have provided significant value to all of their current customers. They love them and anyone building in voice can get incredibly better results through Hamming!
Sumanyu Sharma
@baileyg2016 I appreciate your support! 🙏
Illumi Killua
Hi @sumanyu_sharma congrats on the launch 💥
Roop Reddy
@sumanyu_sharma Cheers for the launch!! Are the personas fixed or custom to my use case?
Sumanyu Sharma
@roopreddy Good question! We create bespoke personas for your use case. This way the simulators mimic how real customers interact with your systems!
Vidhi Nagpal
Impressive efficiency boost!
Atharva Bhange
Congrats on the launch, this looks like a interesting product.
Kyrylo Silin
Hey Sumanyu, I’m curious, how adaptable is Hamming to different industries? For instance, could it be used to test voice agents in customer support or healthcare settings? Congrats on the launch!
Sumanyu Sharma
@kyrylosilin We're very adaptable. We can simulate any scenario for any industry. See these demos for examples: https://app.hamming.ai/voice-demo We're working with a team that's building an AI fast-food drive-thru. So, we simulated users who order vegan food, have gluten-free allergies, etc. Healthcare and customer support is a big focus for us!
Julia Turc
🔌 Plugged in
"Best demo at YC Live" -- Garry Tan's words, not mine :)
Sumanyu Sharma
@juliaturc Appreciate your support :) 🙏
Aaron
This sounds like a game changer for voice agents! I’m curious about the scalability of the testing. Can Hamming handle multiple LLMs calling each other simultaneously for larger companies? Would love to see a case study on that!
Sumanyu Sharma
@aarontalk Sure! Could you clarify: `handle multiple LLMs calling each other simultaneously for larger companies`? If you're building a voice agent, your agent can handle 100s of parallel calls right now. So our voice agents can similarly make 100s of parallel calls to test your agent! Happy to clarify here!
Charu Gupta
Interesting. Would you be able to also rate the input-output dataset in Post-production? @sumanyu_sharma
SEN
Congrats on the launch, 100x faster than manual calls is impressive and will definitely help developers improve their LLM agents. The approach of creating realistic user personas and scenarios is spot on, especially in high-stakes settings like drive-throughs. I can see how this could drastically reduce errors. Excited to see the impact this will have on the voice tech landscape! Upvoted and can't wait to try the demo! Keep pushing the boundaries!
Sumanyu Sharma
@big_tree Love your support. Here's the demo link: https://app.hamming.ai/voice-demo
William Jin
This could save me so much time on testing. How do you handle diverse user scenarios?
Sumanyu Sharma
@william_jin We generate all user scenarios and personas to mimic how real customers interact with your agents! Happy to help if you're currently building a voice agent! sumanyu@hamming.ai
Ema Elisi
This is an impressive way to streamline voice agent testing, Sumanyu! Automating the process and scoring performance is a game changer for anyone in voice tech. The focus
Sumanyu Sharma
@ema_elisi Yesss - building reliable voice agents is hard right now. We're aiming to 1000x the speed of iteration.
Star Boat
This is a game changer, @sumanyu_sharma! Testing LLM voice agents at warp speed 🌟 is something we desperately needed. I love that you’re not just throwing more data at the problem but actually scoring real-world scenarios. The drive-through example really hits home; it’s all about that order accuracy! Can’t wait to see how this evolves and transforms the industry. Upvoted!
Sumanyu Sharma
@star_boat Absolutely! We generate user personas that's similar to real-world customers to speed up time to value! :)
Tony Kam
Hamming team, congrats on the launch!
AuroraW
Impressive. This is sure to drive technological advancements in the industry and deliver better voice experiences for users. Congrats on the launch!
Michael Green
This is an impressive launch, @sumanyu_sharma and @marius! The automation of voice agent testing can truly transform how companies evaluate their systems. The detailed analytics you provide will definitely help teams identify issues, especially in complex scenarios like fast-food drive-throughs. I love how you focus on real user personas, as it adds so much value to the testing process. Can't wait to try out the demo and see how Hamming can elevate the quality of LLM agents in real-time! Keep up the great work, Makers! 🚀
Sumanyu Sharma
@michaelgreen Yup, we'd love your feedback on the demo here: https://app.hamming.ai/voice-demo
zane
Hamming is an impressive tool! It tests your AI voice agents 100 times faster than manual calls, allows you to create Character.ai-style personas and scenarios, runs hundreds of simultaneous phone calls to identify bugs, and provides detailed analytics for improvement. This will greatly enhance testing efficiency and product quality for developers. 📞🤖🚀