Launched this week

TruLayer

Launched this week

Tracing, evals, and a control loop for production LLMs

8 followers

Tracing, evals, and a control loop for production LLMs

8 followers

Visit website

TruLayer is an AI reliability platform for teams shipping LLMs to production. Tracing — OTLP-native plus SDKs for OpenAI, Anthropic, LangChain, Vercel AI SDK, CrewAI, and 11 others. Evals — 25 LLM-judge evaluators inline: hallucination, faithfulness, tool-call correctness, PII, citation density. Control loop (new in v0.1) — eval fires → cluster → prompt diff → A/B → auto-ship → auto-rollback on regression. HITL gate at any step. Free tier: 1M spans/month, no card.

Free Options

Launch tags:SaaS•Developer Tools•Artificial Intelligence

Launch Team / Built With

AssemblyAI Voice Agent API — One API to build production-ready voice agents

One API to build production-ready voice agents

Promoted

Maker

📌

Hi PH — I'm Wei Hai, I built TruLayer. Why this product: I've spent 15 years on reliability and data infrastructure for production systems — ServiceNow's enterprise alerting, People.ai's real-time data ingest, ClickUp's Automations platform. The pattern is always the same. A new infrastructure layer gets adopted. Teams build on top of it. The observability and reliability tooling lags 2-3 years behind. That is where AI infrastructure is today, and TruLayer is the layer I wish I had at every previous role. The three surfaces — tracing, evals, control loop — are the production-grade version of what every team building production AI ends up patching together with shell scripts, GitHub Actions workflows, and manual playbooks. The control loop is the most differentiated piece, and worth being specific about. To be clear about what "automated remediation" means: the loop does not modify a response that has already been delivered to the user. It detects a failure pattern and queues a remediation cycle that improves the system for the next occurrence. The improvement is to the system, not to a past response. That distinction matters and I want to call it out explicitly. The A/B test step is not optional. I was tempted to offer a "just apply the fix immediately" mode, but applying an untested prompt change to all live traffic based on a single failure cluster is a reliability mistake, not a reliability improvement. The A/B runs first. The retry budget caps how many times the loop can cycle before escalating to a human — because an unbounded auto-remediation loop is worse than a human reviewing once. The HITL gate is configurable at any step. Some teams will approve before the A/B fires. Some will approve before the auto-ship. Some will run the whole loop automatically and only review rollbacks. Different risk tolerances, same product. Honest gaps: re-evaluation after remediation is automatic for retry actions but not yet for prompt-modification or fallback-model actions — those are audited but do not re-enter the eval pipeline automatically. Per-trace before/after diffs (latency, output length, structural comparison) are not computed yet. Both are on the roadmap. Try it at trulayer.ai. Happy to answer technical questions in the comments.

Report

3d ago

Maker

If you ship an AI customer support agent that handles refunds, here is what can go wrong on a single $500 request:

→ They get $500. (working as intended)

→ They get $100. (under-refund — angry customer, support ticket)

→ They get $1,000. (over-refund — your finance team calling)

→ The agent says "let me redirect you to our coupon department." (a department that does not exist)

When the second, third, or fourth one happens, you want three things: which step in the agent chain misfired, what the model was reasoning about when it produced the wrong amount, and a rule that stops the same class of failure from repeating on the next call.

Most observability tools give you the broken trace. That is the first thing.

TruLayer gives you all three. 25 evaluators score every output inline as each span arrives — tool-call correctness, faithfulness, hallucination — not in a nightly batch. When an eval rule fires, the control loop acts on the next call: retry with a fallback model, modify the prompt, or route to a human review queue before the next user hits the same failure path.

Observe → eval → remediate, in one closed loop.

Report

1d ago