Sina Tayebati

PandaProbe - open source agent engineering platform

PandaProbe is an open-source agent engineering platform that gives you deep observability into AI agent applications. Use it to trace, evaluate, monitor and debug your AI agents in development and production.

Add a comment

Replies

Best
Boyuan Deng

Quick q, how does PandaProbe’s tracing handle multi-step agent loops where the failure is caused by an earlier decision that only becomes obvious later?

Sina Tayebati
@boyuan_deng1 Great question — that’s exactly the kind of failure mode we care about. PandaProbe traces the full execution as a structured trajectory (sessions → traces → spans), so you can follow multi-step loops end-to-end, not just isolated steps. More importantly, we don’t just log steps — we evaluate across the trajectory. That means when a failure shows up later, you can trace it back to earlier decisions and see where things started to drift (e.g., looping, bad tool use, misalignment). So instead of “something broke at step 20,” you can actually pinpoint “the breakdown started at step 5.”
Olia Nemirovski

Congrats on launching! How does PandaProbe handle sub-agent calls? Like if agent A spins up agent B, do both get traced under the same session tree

Sina Tayebati
@olia_nemirovski Thank you! Yes — they’re all captured within the same session. If a supervisor agent (A) calls a sub-agent (B), it’s treated as part of the same execution thread. The sub-agent call appears as a span within the parent trace, and that span can expand into its own nested chain of steps. So you get a unified, hierarchical view of the full interaction — making it easy to see how parent and sub-agents relate and where issues emerge.
Matthew

We've been running Langfuse for our agent stack for about six months and the trace UI is decent, but session-level evals across multi-agent runs are still where things get messy. Curious how PandaProbe handles that. If a sub-agent fails three turns deep, do you surface root cause at the session level or do I still have to walk the span tree manually? Also, what's the storage model look like for self-hosted? Postgres only, or something columnar for the trace volume? One more thing: any plans for OpenTelemetry-native ingestion so I don't have to swap out my existing tracing SDK across services?

Sina Tayebati
@brainystudy Great questions — you’re hitting exactly the pain points we’ve been focusing on. On evaluation: this is actually the primary focus of PandaProbe. Instead of just surfacing spans, we evaluate at the session level using trajectory-based metrics designed for multi-step, multi-agent workflows. So if a sub-agent fails a few steps deep, you don’t have to manually walk the tree — the system surfaces degradation and helps point you to where things started going wrong. On storage: current self-hosted setup is Postgres + Redis. On OpenTelemetry: our schema is largely OTEL-compatible. We apply some normalization on top, and if your schema differs, we surface warnings with guidance — but in most cases (~90%) it works without needing to swap out your existing tracing setup.
Shlok Mestry

Honestly the open source + self hostable combo is what makes this worth a proper look. most observability tools want you locked into their cloud and charging per seat by the time you actually need it. been burned by that before with Datadog at a startup. one instrument() call to trace the whole run is a nice dx too, gonna try this on a side project this week

Sina Tayebati

@shlokmestry Really appreciate that — and yeah, that exact lock-in/pricing pain is something we wanted to avoid from day one.

That’s why we made it open source + self-hostable, so teams can keep full control as they scale instead of getting boxed into per-seat or per-trace pricing later.

And glad you called out the DX — we’ve been trying to make instrumentation as lightweight as possible.

Would love to hear how it works for your side project 🙌

Matthew

We've been running Langfuse for our agent stack for about six months and the trace UI is decent, but session-level evals across multi-agent runs are still where things get messy. Curious how PandaProbe handles that. If a sub-agent fails three turns deep, do you surface root cause at the session level, or do I still have to walk the span tree manually? Also, what's the storage model look like for self-hosted? Postgres only, or something columnar for the trace volume? One more thing: any plans for OpenTelemetry-native ingestion so I don't have to swap out my existing tracing SDK across services?

Alfa Baye

This looks neat and seems great especially for complex AI workflows, but just thinking about it, how'd the revenue side work?

Lokesh Pawar

What does the integration ecosystem look like for native orchestration frameworks like LangGraph or CrewAI?