PandaProbe Cloud

Agent Engineering, Fully Managed.

743 followers

Agent Engineering, Fully Managed.

743 followers

Visit website

Observability tools

•

LLM Developer Tools

PandaProbe Cloud gives your team full-stack tracing, evals, and monitoring for agents with zero infrastructure to manage. Ship better agents without the ops overhead.

Free Options

Launch tags:Open Source•Developer Tools•Artificial Intelligence

Launch Team / Built With

Viktor.comAn AI coworker that actually does the work

Promoted

PandaProbe

Maker

📌

👋 Hey Product Hunt!

I'm Sina, founder of PandaProbe.

A while back we launched the open-source version here — the response was incredible. Today we're back with what many of you asked for: PandaProbe Cloud — full-stack tracing, evals, and monitoring for agents, with zero infrastructure to manage.

Here's a pattern every agent builder knows: you ship, it looks fine in testing, then quietly misbehaves in production — nobody knows why. Once agents start chaining LLMs, tools, APIs, MCPs, and sub-agents, debugging becomes archaeology. Logs tell you something happened — not why, not whether quality regressed, not how the session held together. And solving that shouldn't mean building your own agent engineering stack.

That's PandaProbe Cloud: ship better agents without the ops overhead.

What you get
🔎 Tracing — full agent executions captured as sessions, traces, and spans.
📊 Evaluation — score traces and sessions using SOTA agent-specific metrics.
⏱️ Monitoring — schedule recurring evals to track your agent's health in production.
☁️ Fully managed — we handle the infra. You just connect, ship, and improve.

Who it's for
🧑‍💻 AI engineers debugging agent behavior across LLMs, tools, and workflows.
🏗️ Platform teams monitoring quality and reliability without owning more infra.
🔬 Builders experimenting with agents who want to iterate faster.
🚀 Startups who want production-grade observability from day one.

Quickstart:

☁️ Cloud signup: https://app.pandaprobe.com/

🤖 Run: npx skills add chirpz-ai/pandaprobe-skills --skill '*' --yes

💥 Then ask your coding agent to "set up PandaProbe".

Free to start — generous usage credits. Up and running in minutes.

Quick links
📖 Docs: https://docs.pandaprobe.com
⭐ Open source: https://github.com/chirpz-ai/pandaprobe

I'll be here all day — drop your questions and feedback below.

Thanks for checking it out 🙏
— Sina

Report

2mo ago

Bundling tracing, evals, and monitoring into a single managed layer is a sharp call. Teams building agent pipelines typically end up with fragile homegrown span logging that breaks the moment they chain subagents. The hardest part of multi-agent workflows isn't inference: it's reconstructing what happened across tool calls when something fails silently. How does trace collection handle async fan-out across spawned subagents?

Report

2mo ago

PandaProbe

Maker

@anand_thakkar1 You've nailed exactly why homegrown span logging breaks down — the moment you introduce subagents, the causal chain fragments and silent failures become invisible.

PandaProbe handles this through the session model. Rather than treating each trace in isolation, PandaProbe groups all traces — including those spawned across subagents — into a single session that represents the full agent lifecycle. This means even when execution fans out across parallel or async subagent calls, the session becomes your reconstruction layer: you can see the full execution tree, where control passed, and where things broke down across the chain.

The goal is exactly what you described — making "what actually happened" answerable without having to manually piece together fragmented logs after the fact.

What kind of fan-out patterns are you dealing with? Happy to dig into specifics. 🙏

Report

2mo ago

Congrats on the Cloud launch. The session-as-the-unit framing feels right for agents, because individual traces are not enough once tools, MCP calls, and subagents start branching.

One edge case I’d love to understand: if an agent fans out into parallel subagents and some tool calls continue after the parent task has already moved on, how does PandaProbe decide what still belongs to the same session timeline? Is it based on explicit session IDs, automatic propagation, or both?

Report

2mo ago

PandaProbe

Maker

@studentzuo This one has a specific edge case detail I want you to verify — flagging with a bracket:

Great question, and you've picked out exactly the edge case that breaks most session grouping approaches.

PandaProbe's session grouping is built on session_id propagation through the instrumentation wrapper — as long as the session_id is carried through, spans from parallel subagents and async tool calls get attached to the correct session timeline automatically, even if they resolve after the parent task has moved on.

The framing you used is exactly right — individual traces aren't the unit that matters once you're dealing with fan-out. The session is what gives you the full picture. Happy to dig into specifics if you want to share more about your setup 🙏

Report

2mo ago

For MCP-heavy agents, I’m curious how session grouping works in Cloud. Do MCP tool calls get attached to one session timeline automatically, or is that only reliable if I pass the same session_id through each layer?

Report

2mo ago

PandaProbe

Maker

@novamaker01 Great question, and an important one for MCP-heavy setups.

MCP tool calls get attached to the session timeline automatically — as long as the instrumentation wrapper carries the session_id, our integration intercepts and captures all MCP calls and their context without any extra wiring on your end.

One thing worth knowing: if your MCP calls involve multi-layer data access, there may be some context loss at those deeper layers. It's something we're actively working on, but for the vast majority of MCP setups you'll get full, automatic session correlation out of the box.

Happy to dig into specifics if you want to share more about your setup. 🙏

Report

2mo ago

FuseBase

Congrats on the cloud launch @sina_tayebati

The open-source version saved me during a nasty multi-agent debug a few months back, so this is exciting. Does the eval scoring work for custom agent frameworks or is it tied to specific SDKs?

Report

2mo ago

PandaProbe

Maker

@kate_ramakaieva Thank you, really glad PandaProbe helped you get through that debug — multi-agent issues are the worst to untangle!

And yes, absolutely works with custom frameworks. The SDK provides decorators you can wrap around any custom agent orchestration or logic, so you're not locked into specific frameworks. Whether you've built something fully custom or are using a framework we don't have a native integration for yet, you can still get full tracing coverage with minimal instrumentation.

Would love to hear what stack you're running — always useful to know what people are building with! 🙏

Report

2mo ago

Hey @sina_tayebati ,
qq if we start on cloud but need to migrate back to self-hosted open source later because of data residency laws, is the data schema 100% compatible?

Report

2mo ago

PandaProbe

Maker

@vikramp7470 Great question! Schema-wise, yes — fully compatible between Cloud and open-source.

Honest caveat though: a dedicated migration export feature isn't live yet. It's on the roadmap, but worth knowing upfront if data residency is a hard requirement for you.

Happy to chat through your situation if it helps 🙏

Report

2mo ago

Agent observability gets messy once the same user task spans tools + retries, so the session/traces/spans framing makes sense. One thing I'd check before wiring this into prod: can evals be versioned against both prompt/model changes and tool schema changes? Otherwise a regression dashboard can get a little misleading after an MCP/API update.

Report

2mo ago

PandaProbe

Maker

@xiaosong001 Sharp observation — and yes, PandaProbe handles this. Traces and sessions can be versioned and tagged, so you can anchor your eval runs to specific prompt or model versions and compare apples to apples after a change.

You can also filter across dev and prod environments separately, which helps a lot with exactly the scenario you're describing — making sure a regression signal is a real behavioral shift, not just noise from a schema or API update.

It's the kind of thing that matters a lot in prod and not at all in a demo, so glad you raised it 🙏

Report

2mo ago

1 2 3 4

Forum Threads

p/pandaprobe-cloud

•

2mo ago

Agent engineering is where software engineering was before unit tests existed

We're launching PandaProbe Cloud, but I keep coming back to this thought:

Software engineering spent decades building the infrastructure to trust code in production unit tests, CI/CD, canary deployments, logging pipelines. Nobody questions that investment anymore.

View all

Quickstart:

☁️ Cloud signup: https://app.pandaprobe.com/

🤖 Run: npx skills add chirpz-ai/pandaprobe-skills --skill '*' --yes