Launched this week

PandaProbe Cloud
agent engineering, fully managed.
730 followers
agent engineering, fully managed.
730 followers
PandaProbe Cloud gives your team full-stack tracing, evals, and monitoring for agents with zero infrastructure to manage. Ship better agents without the ops overhead.











PandaProbe
👋 Hey Product Hunt!
I'm Sina, founder of PandaProbe.
A while back we launched the open-source version here — the response was incredible. Today we're back with what many of you asked for: PandaProbe Cloud — full-stack tracing, evals, and monitoring for agents, with zero infrastructure to manage.
Here's a pattern every agent builder knows: you ship, it looks fine in testing, then quietly misbehaves in production — nobody knows why. Once agents start chaining LLMs, tools, APIs, MCPs, and sub-agents, debugging becomes archaeology. Logs tell you something happened — not why, not whether quality regressed, not how the session held together. And solving that shouldn't mean building your own agent engineering stack.
That's PandaProbe Cloud: ship better agents without the ops overhead.
What you get
🔎 Tracing — full agent executions captured as sessions, traces, and spans.
📊 Evaluation — score traces and sessions using SOTA agent-specific metrics.
⏱️ Monitoring — schedule recurring evals to track your agent's health in production.
☁️ Fully managed — we handle the infra. You just connect, ship, and improve.
Who it's for
🧑💻 AI engineers debugging agent behavior across LLMs, tools, and workflows.
🏗️ Platform teams monitoring quality and reliability without owning more infra.
🔬 Builders experimenting with agents who want to iterate faster.
🚀 Startups who want production-grade observability from day one.
Quickstart:
☁️ Cloud signup: https://app.pandaprobe.com/
🤖 Run: npx skills add chirpz-ai/pandaprobe-skills --skill '*' --yes
💥 Then ask your coding agent to "set up PandaProbe".
Free to start — generous usage credits. Up and running in minutes.
Quick links
📖 Docs: https://docs.pandaprobe.com
⭐ Open source: https://github.com/chirpz-ai/pandaprobe
I'll be here all day — drop your questions and feedback below.
Thanks for checking it out 🙏
— Sina
Bundling tracing, evals, and monitoring into a single managed layer is a sharp call. Teams building agent pipelines typically end up with fragile homegrown span logging that breaks the moment they chain subagents. The hardest part of multi-agent workflows isn't inference: it's reconstructing what happened across tool calls when something fails silently. How does trace collection handle async fan-out across spawned subagents?
PandaProbe
@anand_thakkar1 You've nailed exactly why homegrown span logging breaks down — the moment you introduce subagents, the causal chain fragments and silent failures become invisible.
PandaProbe handles this through the session model. Rather than treating each trace in isolation, PandaProbe groups all traces — including those spawned across subagents — into a single session that represents the full agent lifecycle. This means even when execution fans out across parallel or async subagent calls, the session becomes your reconstruction layer: you can see the full execution tree, where control passed, and where things broke down across the chain.
The goal is exactly what you described — making "what actually happened" answerable without having to manually piece together fragmented logs after the fact.
What kind of fan-out patterns are you dealing with? Happy to dig into specifics. 🙏
Congrats on the Cloud launch. The session-as-the-unit framing feels right for agents, because individual traces are not enough once tools, MCP calls, and subagents start branching.
One edge case I’d love to understand: if an agent fans out into parallel subagents and some tool calls continue after the parent task has already moved on, how does PandaProbe decide what still belongs to the same session timeline? Is it based on explicit session IDs, automatic propagation, or both?
PandaProbe
@studentzuo This one has a specific edge case detail I want you to verify — flagging with a bracket:
Great question, and you've picked out exactly the edge case that breaks most session grouping approaches.
PandaProbe's session grouping is built on session_id propagation through the instrumentation wrapper — as long as the session_id is carried through, spans from parallel subagents and async tool calls get attached to the correct session timeline automatically, even if they resolve after the parent task has moved on.
The framing you used is exactly right — individual traces aren't the unit that matters once you're dealing with fan-out. The session is what gives you the full picture. Happy to dig into specifics if you want to share more about your setup 🙏
For MCP-heavy agents, I’m curious how session grouping works in Cloud. Do MCP tool calls get attached to one session timeline automatically, or is that only reliable if I pass the same session_id through each layer?
PandaProbe
@novamaker01 Great question, and an important one for MCP-heavy setups.
MCP tool calls get attached to the session timeline automatically — as long as the instrumentation wrapper carries the session_id, our integration intercepts and captures all MCP calls and their context without any extra wiring on your end.
One thing worth knowing: if your MCP calls involve multi-layer data access, there may be some context loss at those deeper layers. It's something we're actively working on, but for the vast majority of MCP setups you'll get full, automatic session correlation out of the box.
Happy to dig into specifics if you want to share more about your setup. 🙏
FuseBase
Congrats on the cloud launch @sina_tayebati
The open-source version saved me during a nasty multi-agent debug a few months back, so this is exciting. Does the eval scoring work for custom agent frameworks or is it tied to specific SDKs?
PandaProbe
@kate_ramakaieva Thank you, really glad PandaProbe helped you get through that debug — multi-agent issues are the worst to untangle!
And yes, absolutely works with custom frameworks. The SDK provides decorators you can wrap around any custom agent orchestration or logic, so you're not locked into specific frameworks. Whether you've built something fully custom or are using a framework we don't have a native integration for yet, you can still get full tracing coverage with minimal instrumentation.
Would love to hear what stack you're running — always useful to know what people are building with! 🙏
Hey @sina_tayebati ,
qq if we start on cloud but need to migrate back to self-hosted open source later because of data residency laws, is the data schema 100% compatible?
PandaProbe
@vikramp7470 Great question! Schema-wise, yes — fully compatible between Cloud and open-source.
Honest caveat though: a dedicated migration export feature isn't live yet. It's on the roadmap, but worth knowing upfront if data residency is a hard requirement for you.
Happy to chat through your situation if it helps 🙏
Agent observability gets messy once the same user task spans tools + retries, so the session/traces/spans framing makes sense. One thing I'd check before wiring this into prod: can evals be versioned against both prompt/model changes and tool schema changes? Otherwise a regression dashboard can get a little misleading after an MCP/API update.
PandaProbe
@xiaosong001 Sharp observation — and yes, PandaProbe handles this. Traces and sessions can be versioned and tagged, so you can anchor your eval runs to specific prompt or model versions and compare apples to apples after a change.
You can also filter across dev and prod environments separately, which helps a lot with exactly the scenario you're describing — making sure a regression signal is a real behavioral shift, not just noise from a schema or API update.
It's the kind of thing that matters a lot in prod and not at all in a demo, so glad you raised it 🙏