I've been building ClawMetry for past 5 weeks. 90k+ installs across 100+ countries.
The observability features I built first were the ones I personally needed: a live execution graph (Flow tab), full decision transcripts (Brain tab), token cost tracking per session, and visibility into sub-agent spawns.
But I keep hearing variations of the same thing: "I don't really know what my agents are doing." And everyone means something slightly different by that.
For some it's costs. For some it's timing (why did this take 4 minutes?). For some it's trust (did the agent actually do what I think it did?). For some it's failures (where exactly did it break?).
So I want to ask you directly:
If you're running AI agents today -- what's the one thing missing from your observability setup? What would make you feel like you actually understand what's happening inside your agents?
Options I'm thinking about next:
- Alerting (get notified when an agent fails or goes over budget)
- Cost per task breakdown (not just per session)
- Agent run comparisons (before/after a prompt change)
- Memory snapshots (what did the agent "know" at each decision point)
Drop your answer below. The next feature I build will be heavily influenced by this thread.
(ClawMetry is free to try locally: pip install clawmetry. Cloud: app.clawmetry.com, $5/node/month, 7-day free trial.)
ClawMetry
Hey Product Hunt! 👋
I'm Vivek, and I built ClawMetry because I got tired of not knowing what my AI agents were doing.
I run several OpenClaw agents. They handle code, research, deployment, scheduling. But every time one took 10 minutes on a task, I had no idea: is it stuck? Did it hallucinate? Is it burning through tokens?
NemoClaw (NVIDIA's AI agent sandbox) made running agents safer. But the built-in TUI is ephemeral and terminal-only. You can't see what happened yesterday. You can't watch 10 sandboxes from your phone. You can't track costs across your fleet.
So I built ClawMetry for NemoClaw. One command on the host, and every sandbox gets full observability:
🧠 Brain tab: every thought, tool call, and decision in real time
📊 Token tracking: per call, per session, no surprises
🔐 E2E encrypted: keys never leave your machine
🌐 Cloud dashboard: monitor everything from any browser
It's open source (MIT), free for local use, and took about two months of obsessive building.
With your love and support, ClawMetry has been downloaded 95,000+ times across 100+ countries. This NemoClaw integration is the next step.
What's coming next:
• Policy drift detection (get alerted when sandbox policies change)
• Remote egress approvals from your phone
• Fleet-wide policy management
Cloud sync is $5/sandbox/month. Local dashboard is free forever.
Would love your feedback. Happy to answer any questions!
🔗 https://clawmetry.com/nemoclaw
Token tracking per session is exactly what I've been wanting. Running multiple sandboxes and having zero visibility into which ones are burning through credits is so frustrating.
The E2E encryption part is a nice touch too. Most monitoring tools want you to ship all your data to their cloud which is a nonstarter for anything sensitive. Open source MIT makes it easy to just try it without committing.
ClawMetry
@mihir_kanzariya Spot on, Mihir. The "which sandbox is burning credits" problem was exactly what pushed me to build the token tracking. When you're running 4-5 sandboxes and the bill spikes, you need to know which one did it.
And yeah, the E2E encryption was non-negotiable from day one. Your agent's thoughts, prompts, API keys should never touch someone else's server in plaintext. The encryption key is generated on your machine and stays there.
Thanks for the kind words. Let me know how it goes once you try it out!
@vivek_chandI've been running a few AI agents myself , and seriously , most of the time I have no idea what's happening when one hangs or takes too long 😅.
The token tracking + history feature is a lifesaver ... I did not realize how unpredictable costs could get until I started monitoring them.
One idea I'd love to see: a quick health alert .. when an agent behaves differently than usual. Even a small
notification would save a lot of time checking manually.
ClawMetry
@evelyn_white That "why is it hanging" feeling is exactly why the Brain tab exists 😄 You can literally watch the agent think, see which tool call it's stuck on, and figure out if it's waiting on a timeout or looping.
Love the health alert idea. Actually, we already have basic alerting (Telegram, Slack, webhooks) and this fits perfectly. Imagine: "Agent X has been idle for 5 minutes after calling exec" or "Token spend exceeded $10 in the last hour." Adding this to the roadmap.
Thanks for trying it out, Evelyn. This kind of feedback is gold.
@vivek_chand Many congratulations on yet another exciting launch, Vivek.
The one-command install with zero config is brilliant, reminds me of the early Docker days when simplicity actually mattered :)
Excited to see everything on the roadmap you’d shared with me.
Documentation.AI
This reminds me of debugging microservices but for AI agents. Makes me curious, how the real-time flow visualization handles really chatty agents that make tons of tool calls..
ClawMetry
@roopreddy Roop, thank you! And that's one of the best edge cases to stress test with.
The brain tab caps at 500 events client-side, oldest events dropping off first. Two independent filter axes let you cut through burst noise: filter by agent source AND by event type simultaneously (AND logic, all purely client-side, no server round-trips). So if a sub-agent fires 40 exec calls, you isolate that agent's web_search events in two clicks. The server batches on a 0.5-second poll cycle, so a burst of 50 tool calls in 10 seconds arrives as roughly 5 batches rather than 50 individual updates. The honest limitation: no DOM virtualization yet, so very high burst rates can cause choppy re-renders. That's on the improvement list. The initial brain history load pre-clips to the 300 most recent events so the page stays fast after long agent runs.
Nas.io
Congrats on shipping @vivek_chand! I'm curious about the policy drift detection feature you mentioned, how would that work in practice when sandboxes update their own policies?
ClawMetry
@nuseir_yassin1 Nuseir, really appreciate the support!
ClawMetry runs on both the host and inside each sandbox. The host side can observe policy changes as they happen; the sandbox side sees what the agent does within those boundaries. Policy drift detection on the roadmap will use exactly that to compare the active runtime policy against the static baseline and alert when they diverge.
GrowMeOrganic
The real-time flow visualization sounds like exactly what I needed when debugging agent chains that would mysteriously stall for minutes at a time. :D
ConnectMachine
Quick question about the token tracking per session feature, does it break down costs by specific tool calls or just aggregate session totals?
Bababot
Congarts to the launch.
@vivek_chand The E2E encryption approach is smart since most monitoring tools force you to ship sensitive agent data to their servers. Quick question though, how does the real-time visualization perform when you're monitoring multiple sandboxes simultaneously?