Launching today

Tracea
Datadog for AI agents with traces, RCA, and team memory
23 followers
Datadog for AI agents with traces, RCA, and team memory
23 followers
Agents fail silently. You fire one off, it runs, nothing comes back - no trace, no cost data, no idea which call broke. Tracea captures every tool call, LLM response, and cost spike. Automatic RCA tells you exactly why it failed. YAML detection rules catch loops, spikes, and silent errors before they hit production. Self-hosted. One Docker command. No data leaves your network. Company Brain turns every session into team memory - agents start smarter each run.







Tracea
Hey Product Hunt! 👋 Launching today as part of the @gustaf x @Y Combinator builder challenge.
I'm Darshan, the maker of @Tracea.
I built this because I kept running AI agent sessions that would fail silently - no trace of what broke, no cost visibility, no way to debug after the fact. I open-sourced an early version called @Observagent and 300 developers found it with zero marketing. That told me the problem was real.
@Tracea is the production-grade version:
🔍 Full event timeline: every tool call, LLM response, cost and latency spike in order
🧠 Automatic RCA: understand exactly why an agent failed
🚨 Detection + alerting: catch cost spikes, loops, and silent errors before they hurt
💡 Company Brain: sessions synthesized into team knowledge so agents start smarter each run
🔒 Self-hosted: one Docker command, nothing leaves your network
Works with every framework out of the box. No SDK lock-in, no integration work.
Would love to hear how you're currently handling visibility into your agent runs. Happy to answer anything!
Product Hunt
Tracea
@curiouskitty On signal vs noise: the synthesizer doesn't try to remember everything. It looks for patterns that repeat across sessions: workflows that succeed consistently, error/fix pairs that appear more than once, file clusters that get touched together for the same reason. A one-off trace that doesn't recur never makes it into memory. The confidence threshold requires multiple observations before anything becomes durable.
On staying accurate: entries use an upsert model with a confidence score that decays when they stop appearing in sessions. If your codebase changes and an old workflow no longer applies, new sessions will stop confirming it and it fades. If something actively conflicts with existing memory, it gets flagged for re-evaluation rather than silently overwriting. The goal is that stale knowledge becomes low-confidence and stops surfacing, rather than sitting there misleading agents.
It's not perfect — if a team goes three months without touching a particular workflow, the decay is time-based not change-based, so you can still get outdated entries. That's the next problem to solve. But it's meaningfully better than agents starting cold every run with no context at all.