Launched this week

Voker
The Agent Analytics Platform for AI Product Teams
321 followers
The Agent Analytics Platform for AI Product Teams
321 followers
Voker is the Agent Analytics Platform for AI product teams. It gives you the usage behavior and agent performance insights you need to monitor and optimize your production agents at scale. Install the lightweight, provider agnostic SDK and Voker handles the rest: automatic intent, correction and resolution detection on your user to agent interactions, conversation reconstructions, queryable timelines, agent performance tracking so you can build the best agents possible.











Voker
I’m Tyler - CoFounder of Voker, and I’m so tired of being disappointed by AI hype claims. I bet you are too.
I studied physics in college, and worked in data science, ML, and analytics until founding Voker. I’m a skeptical person by nature (I think it's the scientist in me) and my gut reaction to any technology hype is to be cautiously optimistic until I see things proven out in data.
I felt this way about LLMs when they first hit mainstream. I knew they had real potential applications, but was also worried about the lofty marketing buzz they were getting.
AI as an industry has written checks that individual builders are left to cash. Promising full automation, PhD-level intelligence, and perfect results. As someone who's skeptical of that narrative, I still believe agents can genuinely deliver, but only if teams are rigorous about measuring performance in production. Every website or product has Amplitude or PostHog for click and pageview analytics; a standard way to understand who's using it and how. Agents have no equivalent, so we built Voker.
We are the Agent Analytics Platform where you can:
- Monitor your agents
- Measure their performance
- See what users are asking
- Know for certain agents are delivering for your users
- Optimize based on real data
You install our SDK, and Voker collects your agent conversation data, automatically detecting:
- User intents (Book me a hotel in Vegas for next Saturday with a poolside view)
- Corrections (No, that room doesn’t have a poolside view!! TRY AGAIN)
- Agent resolutions (Tool Result: Room Booked... Success!)
These automated annotations are the foundation for building a holistic view of agent performance and user behavior in one analytics platform.
We asked 100+ AI founders, product managers, and agent engineers how they monitor their agents in production and the answer was resounding: by combing through individual traces (with the occasional evals sprinkled in). They all reported that they depend on customer complaints to tell them when agents are messing up. We feel strongly that there is a third leg of the agent monitoring stool missing - Agent Analytics.
You shouldn’t have to wait for users to complain to learn that a recent prompt change is breaking your hotel booking agent, or that the AI finance advisor you built is calling the wrong tool to look up realtime stock prices.
Turns out the antidote to AI hype is simple: measure your agents diligently, then iterate until you get it right.
Your users deserve better AI experiences (we all do)!
Install the Voker SDK on our free tier (up to 2,000 events/mo), and start building better agents today:
https://voker.ai/
@tyler_postle Hey Tyler — congrats on the launch 👋
The "third leg of the agent monitoring stool" framing really resonates. I'm running a few agents in production myself (Telegram + VK Teams bots fronting an OpenClaw agent), and the gap I keep hitting isn't detecting that something went wrong - it's reconstructing why. Logs show the tool calls, but the model's reasoning between turns is gone unless I instrument it manually.
Quick question: does Voker capture the reasoning/thinking blocks, or just the user-facing turns and tool I/O? That's basically the line between "agent monitoring" and "agent debugging" for me.
Either way - good to see someone taking the analytics angle seriously instead of just shipping another eval framework. Will give the SDK a spin this week.
Voker
@artem_fedorovich Thanks Artem, look forward to your feedback.
I will clarify - we think Obs and Evals tools are definitely important and serve their purpose, but they aren't enough when you start to truly scale your agent usage volume.
and to answer your question: yes we collect reasoning and thinking blocks and they're a part of our analytics processing pipelines! we're still in design phase on how to best display them in the UI though. Let me know if you have any ideas - just inline in the session reconstruction? or is there a specific type of analysis or question you find yourself working on when you need to know how reasoning impacted agent performance?
@tyler_postle Honestly, inline in the session reconstruction is the obvious first move, but the thing I'd kill for is a diff view between reasoning and outcome.
Like - agent decided to call Tool A "because the user implied X", but the actual result was Y. Most of the time when I'm debugging, the failure is in that gap: reasoning sounded fine in isolation, but it was based on a wrong read of the user's intent two turns earlier. Hard to spot scrolling through a linear timeline.
Maybe something like: filter sessions where "stated reasoning ↔ actual outcome" drift is high → those are the ones worth a human look. Could turn into a "reasoning regression" metric over time.
Either way, will play with the SDK this week and report back 👌
Voker
@artem_fedorovich love that idea! sending it to our team. I like that it can be useful both for individual conversation deep-dives and as a dataset for future regression training. we'll keep you in the loop when/if we build it!
@tyler_postle Can Voker track performance regressions after a prompt, model, or tool change, and show whether success rates dropped for specific intents?
Voker
@sead_sehovic Yes! As long as you version your agent in our SDK, we allow you to segment your data across the platform by version. If you take a look at our demo video in the launch, you can see where we have resolution rate and correction rate by intent categories by version (that's a mouthful!)
Congrats on the launch!
How does Voker handle intent attribution when the agent proactively redirects the user, say, a billing agent that detects the user is actually in the wrong product area and routes them elsewhere? The intent the user arrived with and the intent the agent resolved can diverge legitimately, and in those cases it's not clear whether that should register as a correction event or a successful resolution. Curious how the analytics model handles that distinction, since getting it wrong would skew correction rates significantly for agents designed to reroute.
Voker
@binu_george Love this question. Today we don't have an explicit way to tie two agents together. We know this is critical because most scaled agent products have multi-agent handoff systems like you mentioned.
What we have customers do today is treat the handoff as a successful resolution.
Of course sometimes this is truly the resolution (in the case of an orchestrator agent for example ) but sometimes its actually just passing the buck. We dont have a good way to differentiate these situations today, other than decoding the name and description of the agent its handed off to - in addition to any other information you send to us through our events SDK.
We absolutely intended to build direct features to support this pattern better because its very common.
Thanks for the question!
Most observability tools treat agent calls as black boxes, logging tokens but missing the decision loop entirely. Building RetainSure's AI workflows, we struggled to attribute downstream outcomes back to specific agent choices. Our logging was ad hoc and we ended up rebuilding it multiple times. Does Voker capture branching decisions when an agent picks between tool calls, or is it focused on input/output tracing?
Voker
@retain_dev Yes! Voker automatically tracks all the information your agent is provided to make its decisions, so you can see both the tools available and the tools used. This has helped our customers notice that their tools may need new descriptions when the agent has what it needs but isn't calling the right tool.
Automatic intent and resolution detection is the right abstraction. Most agent monitoring tools just log tokens or latency, but you actually need to know if the user got what they came for. We're building AI-driven customer success at RetainSure and agent quality drift between deployments is a real headache. How does Voker handle cases where the user's intent shifts mid-conversation?
Voker
@dhiraj_patel5 We're actually purpose built for complex, long running, multi-intent conversations! When our SDK detects multiple intents within a conversation, they get categorized into "Session Paths" that show up in our session timeline. This way you can easily navigate to different parts of the conversation without scrolling through the whole session. You can also analyze the accuracy of the agent on these separate intents across other surfaces in our product.
Love the brutal honesty here AI has definitely written checks that devs are stuck cashing in production. Quick question on the SDK: how does it handle semantic variations for corrections? Will it catch things like actually scratch that versus no that's wrong out of the box, or do we need to train it on our own domain vocabulary?
Voker
@vikramp7470 Good question - Voker will detect those kind of phrases, even with semantic variation. That being said, if you have super specific domain vocabulary, where two words might mean the same thing to a lay-person but not to you as a domain expert - then you will need to pass Voker some context in the form of either knowledge docs or feedback on our annotations (APIs for these are in the works!)
thanks Vikram!
@tyler_postle Makes sense semantic variation handling is honestly the hard part in production. Cool that Voker already catches most of that out of the box 👌
Voker
@vikramp7470 Thanks! I'll pass your positive feedback to the team, our founding engineer Zach spent a lot of time working on that detection system because its foundational to all the other analytics our platform provides.
@tyler_postle Really cool to hear that 👏
Thanks
How do you determine the quality of answers? I have an AI service with its own vector database. For almost any user question, we know the answer, provide tourist attractions, and we have more of them than ChatGPT. Will you be able to understand whether these are top-tier attractions or not?
Voker
@natalia_iankovych When you send the information from your vector DB to your agent, Voker will also track that context. We'll use the information from your own RAG data to make our assessment on the quality of the response to the user! Essentially any information that your agent has to make its decision - Voker will also track and assess.
Hey Tyler, went through Voker's site and the "Amplitude for agents" framing is honestly the cleanest take I've read on this gap. one thing I wanted to ask, how do you detect a "correction" automatically, is it sentiment delta on the next user message or something pattern-based? that label seems to do a lot of work in the product.
Voker
@axlerodd good to know that "Amplitude for agents" resonated.
We detect corrections by processing user messages across multiple turns, and evaluate them within the context of the conversation and the original user intents that were detected. We use LLMs for language processing, and then we have a technique for hierarchical classification to categorize atomic annotations like intents and corrections into more general and insightful categories (you don't want to have to read a list of 1000s of corrections, you want a theme of "the agent is too happy" or "the agent claims it has tools it doesn't" )
Does that help? Maybe we should add better examples on our homepage?