👋 Hey Product Hunt!

I’m Sina, founder of PandaProbe.

Building AI agents is getting easier, but understanding and trusting them in production is still way too hard.

Once agents call LLMs, tools, APIs, MCPs, and sub-agents, logs are not enough. You need to know what happened, why it failed, whether quality regressed, and if the agent is reliable across a full session.

PandaProbe is my attempt to solve that: an open-source agent engineering platform for tracing, evaluation, monitoring, and debugging AI agent applications.

The goal is simple: to enable developers move from “the agent runs on my laptop” to “I understand what happened in production, I can measure quality, and I can improve it continuously.”

What PandaProbe provides

🔎 Tracing — capture agent executions as traces and spans across LLM calls, tool calls, agents, and custom logic.
🧵 Sessions — group related traces to understand the full lifecycle of an agent.
📊 Evaluations — score traces and sessions with built-in agent-focused metrics.
⏱️ Monitoring — schedule recurring evaluations for new traces and sessions.
🛠️ Open source + cloud — build from our source GitHub and self-host or use PandaProbe Cloud.

Who it’s for

🧑‍💻 AI engineers — debug agent behavior across LLMs, tools, and workflows.
🏗️ Agent platform teams — monitor quality, regressions, and reliability in production.
🔬 Teams experimenting with agents — understand failures faster and compare iterations.
🚀 Startups building AI products — add observability and evaluation early before agents become impossible to reason about.

Quick links

GitHub: https://github.com/chirpz-ai/pandaprobe
Docs: https://docs.pandaprobe.com
Cloud: https://www.pandaprobe.com/

I’ll be here all day answering questions and collecting feedback.

If you’re building agents today, what’s the hardest part to debug or evaluate?

Thanks for checking it out 🙏
— Sina

Forum Threads

p/pandaprobe

•

5h ago

why are ai agents still so hard to debug in production?

feels like the industry figured out how to build ai agents faster than how to understand them.

everyone demos agents.
very few teams can confidently answer:

why an agent failed
what changed between runs
whether quality is improving or regressing
or if the agent is actually reliable over time

curious how people here are handling this today.

View all

GitHub: https://github.com/chirpz-ai/pandaprobe
Docs: https://docs.pandaprobe.com
Cloud: https://www.pandaprobe.com/

I’ll be here all day answering questions and collecting feedback.

If you’re building agents today, what’s the hardest part to debug or evaluate?

Thanks for checking it out 🙏
— Sina

PandaProbe

open source agent engineering platform

open source agent engineering platform

Forum Threads

why are ai agents still so hard to debug in production?

Forum Threads

why are ai agents still so hard to debug in production?