Voker

The Agent Analytics Platform for AI Product Teams

371 followers

The Agent Analytics Platform for AI Product Teams

371 followers

Visit website

Data analysis tools

•

AI Metrics and Evaluation

•

AI Agents

Voker is the Agent Analytics Platform for AI product teams. It gives you the usage behavior and agent performance insights you need to monitor and optimize your production agents at scale. Install the lightweight, provider agnostic SDK and Voker handles the rest: automatic intent, correction and resolution detection on your user to agent interactions, conversation reconstructions, queryable timelines, agent performance tracking so you can build the best agents possible.

Free

Launch tags:Analytics•Developer Tools•Artificial Intelligence

Launch Team / Built With

SetappGrow, and monetize your app with unified AI gateway

Promoted

Voker

Maker

📌

I’m Tyler - CoFounder of Voker, and I’m so tired of being disappointed by AI hype claims. I bet you are too.

I studied physics in college, and worked in data science, ML, and analytics until founding Voker. I’m a skeptical person by nature (I think it's the scientist in me) and my gut reaction to any technology hype is to be cautiously optimistic until I see things proven out in data.

I felt this way about LLMs when they first hit mainstream. I knew they had real potential applications, but was also worried about the lofty marketing buzz they were getting.

AI as an industry has written checks that individual builders are left to cash. Promising full automation, PhD-level intelligence, and perfect results. As someone who's skeptical of that narrative, I still believe agents can genuinely deliver, but only if teams are rigorous about measuring performance in production. Every website or product has Amplitude or PostHog for click and pageview analytics; a standard way to understand who's using it and how. Agents have no equivalent, so we built Voker.

We are the Agent Analytics Platform where you can:

- Monitor your agents
- Measure their performance
- See what users are asking
- Know for certain agents are delivering for your users
- Optimize based on real data

You install our SDK, and Voker collects your agent conversation data, automatically detecting:

- User intents (Book me a hotel in Vegas for next Saturday with a poolside view)
- Corrections (No, that room doesn’t have a poolside view!! TRY AGAIN)
- Agent resolutions (Tool Result: Room Booked... Success!)

These automated annotations are the foundation for building a holistic view of agent performance and user behavior in one analytics platform.

We asked 100+ AI founders, product managers, and agent engineers how they monitor their agents in production and the answer was resounding: by combing through individual traces (with the occasional evals sprinkled in). They all reported that they depend on customer complaints to tell them when agents are messing up. We feel strongly that there is a third leg of the agent monitoring stool missing - Agent Analytics.

You shouldn’t have to wait for users to complain to learn that a recent prompt change is breaking your hotel booking agent, or that the AI finance advisor you built is calling the wrong tool to look up realtime stock prices.

Turns out the antidote to AI hype is simple: measure your agents diligently, then iterate until you get it right.

Your users deserve better AI experiences (we all do)!

Install the Voker SDK on our free tier (up to 2,000 events/mo), and start building better agents today:
https://voker.ai/

Report

2mo ago

@tyler_postle Hey Tyler — congrats on the launch 👋

The "third leg of the agent monitoring stool" framing really resonates. I'm running a few agents in production myself (Telegram + VK Teams bots fronting an OpenClaw agent), and the gap I keep hitting isn't detecting that something went wrong - it's reconstructing why. Logs show the tool calls, but the model's reasoning between turns is gone unless I instrument it manually.

Quick question: does Voker capture the reasoning/thinking blocks, or just the user-facing turns and tool I/O? That's basically the line between "agent monitoring" and "agent debugging" for me.

Either way - good to see someone taking the analytics angle seriously instead of just shipping another eval framework. Will give the SDK a spin this week.

Report

2mo ago

Voker

Maker

@artem_fedorovich Thanks Artem, look forward to your feedback.
I will clarify - we think Obs and Evals tools are definitely important and serve their purpose, but they aren't enough when you start to truly scale your agent usage volume.

and to answer your question: yes we collect reasoning and thinking blocks and they're a part of our analytics processing pipelines! we're still in design phase on how to best display them in the UI though. Let me know if you have any ideas - just inline in the session reconstruction? or is there a specific type of analysis or question you find yourself working on when you need to know how reasoning impacted agent performance?

Report

2mo ago

💎 Pixel perfection

@tyler_postle Honestly, inline in the session reconstruction is the obvious first move, but the thing I'd kill for is a diff view between reasoning and outcome.

Like - agent decided to call Tool A "because the user implied X", but the actual result was Y. Most of the time when I'm debugging, the failure is in that gap: reasoning sounded fine in isolation, but it was based on a wrong read of the user's intent two turns earlier. Hard to spot scrolling through a linear timeline.

Maybe something like: filter sessions where "stated reasoning ↔ actual outcome" drift is high → those are the ones worth a human look. Could turn into a "reasoning regression" metric over time.

Either way, will play with the SDK this week and report back 👌

Report

2mo ago

Voker

Maker

@artem_fedorovich love that idea! sending it to our team. I like that it can be useful both for individual conversation deep-dives and as a dataset for future regression training. we'll keep you in the loop when/if we build it!

Report

2mo ago

@tyler_postle Can Voker track performance regressions after a prompt, model, or tool change, and show whether success rates dropped for specific intents?

Report

2mo ago

Voker

Maker

@sead_sehovic Yes! As long as you version your agent in our SDK, we allow you to segment your data across the platform by version. If you take a look at our demo video in the launch, you can see where we have resolution rate and correction rate by intent categories by version (that's a mouthful!)

Report

2mo ago

💡 Bright idea

Congrats on the launch!
How does Voker handle intent attribution when the agent proactively redirects the user, say, a billing agent that detects the user is actually in the wrong product area and routes them elsewhere? The intent the user arrived with and the intent the agent resolved can diverge legitimately, and in those cases it's not clear whether that should register as a correction event or a successful resolution. Curious how the analytics model handles that distinction, since getting it wrong would skew correction rates significantly for agents designed to reroute.

Report

2mo ago

Voker

Maker

@binu_george Love this question. Today we don't have an explicit way to tie two agents together. We know this is critical because most scaled agent products have multi-agent handoff systems like you mentioned.

What we have customers do today is treat the handoff as a successful resolution.
Of course sometimes this is truly the resolution (in the case of an orchestrator agent for example ) but sometimes its actually just passing the buck. We dont have a good way to differentiate these situations today, other than decoding the name and description of the agent its handed off to - in addition to any other information you send to us through our events SDK.

We absolutely intended to build direct features to support this pattern better because its very common.

Thanks for the question!

Report

2mo ago

Most observability tools treat agent calls as black boxes, logging tokens but missing the decision loop entirely. Building RetainSure's AI workflows, we struggled to attribute downstream outcomes back to specific agent choices. Our logging was ad hoc and we ended up rebuilding it multiple times. Does Voker capture branching decisions when an agent picks between tool calls, or is it focused on input/output tracing?

Report

2mo ago

Voker

Maker

@retain_dev Yes! Voker automatically tracks all the information your agent is provided to make its decisions, so you can see both the tools available and the tools used. This has helped our customers notice that their tools may need new descriptions when the agent has what it needs but isn't calling the right tool.

Report

2mo ago

The 'Amplitude for agents ' framing is exactly right. I added LangSmith tracing to my own agentic project and it's useful for debugging individual runs, but it tells you nothing about patterns across hundreds of conversations.. like which intent category has the worst resolution rate, or whether a prompt change two weeks ago quietly broke a specific tool call path. That's a product analytics problem, not a tracing problem, and nobody was treating it that way.

One thing I'm genuinely curious about.. the automatic correction detection sounds powerful but also tricky. If a user rephrases their request because the agent misunderstood vs. because they just changed their mind, those look identical in the conversation. How are you distinguishing genuine agent failures from natural user intent shifts? That line matters a lot if you're using corrections to drive prompt improvements.

Report

2mo ago

Voker

Maker

@akshaypal_bishnoi thanks for the validation! and great question - you're absolutely right. We called them "corrections" because they aren't necessarily ALWAYS a failure. Maybe the user is refining their intent - they didn't provide enough info at first, or they forgot something. At its core, its a correction to either their own intent or to the agents replies. Like you said - distinguishing these is important, that's where our hierarchical text classification comes in (technical info on our blog) !

TLDR: we categorize atomic corrections into classifications, and thats how we distinguish between natural user intent shifts vs genuine agent failures.

Report

2mo ago

Automatic intent and resolution detection is the right abstraction. Most agent monitoring tools just log tokens or latency, but you actually need to know if the user got what they came for. We're building AI-driven customer success at RetainSure and agent quality drift between deployments is a real headache. How does Voker handle cases where the user's intent shifts mid-conversation?

Report

2mo ago

Voker

Maker

@dhiraj_patel5 We're actually purpose built for complex, long running, multi-intent conversations! When our SDK detects multiple intents within a conversation, they get categorized into "Session Paths" that show up in our session timeline. This way you can easily navigate to different parts of the conversation without scrolling through the whole session. You can also analyze the accuracy of the agent on these separate intents across other surfaces in our product.

Report

2mo ago

Love the brutal honesty here AI has definitely written checks that devs are stuck cashing in production. Quick question on the SDK: how does it handle semantic variations for corrections? Will it catch things like actually scratch that versus no that's wrong out of the box, or do we need to train it on our own domain vocabulary?

Report

2mo ago

How do you determine the quality of answers? I have an AI service with its own vector database. For almost any user question, we know the answer, provide tourist attractions, and we have more of them than ChatGPT. Will you be able to understand whether these are top-tier attractions or not?

Report

2mo ago

Voker

Maker

@natalia_iankovych When you send the information from your vector DB to your agent, Voker will also track that context. We'll use the information from your own RAG data to make our assessment on the quality of the response to the user! Essentially any information that your agent has to make its decision - Voker will also track and assess.

Report

2mo ago

1 2 3

Forum Threads

p/voker

•

7d ago

Agents can now learn from their own mistakes - with Smart Skills

Your agents don t need more expensive models. They need to learn from their own mistakes automatically. Introducing Smart Skills: living, self-improving instructions that get generated, tested, and refined automatically based on how your agents are performing with your users in production.

How it works:

Generated from real signals. Voker watches for user corrections, frustration, and failed outcomes, then writes a Smart Skill to fix the issue. Proven to perform better than Claude use these logs to fix .
Self-correcting. Every Smart Skill is versioned and measured against your agent's performance. If a Smart Skill is underperforming or regressing behavior, Voker flags it and auto-optimizes it.
Personalized. Enable Smart Skills globally, to specific companies, or to individual users.
Tied to cost and resolution tracking. See how Smart Skills are saving tokens. Fewer complaints, faster resolutions, and less agent "thinking" usually means lower spend, and now you can prove it.
Fully auditable. Every Smart Skill addition or removal shows up as a milestone directly on your performance charts, so you can see what changed and when.