Launched this week

TruLayer
Tracing, evals, and a control loop for production LLMs
8 followers
Tracing, evals, and a control loop for production LLMs
8 followers
TruLayer is an AI reliability platform for teams shipping LLMs to production. Tracing — OTLP-native plus SDKs for OpenAI, Anthropic, LangChain, Vercel AI SDK, CrewAI, and 11 others. Evals — 25 LLM-judge evaluators inline: hallucination, faithfulness, tool-call correctness, PII, citation density. Control loop (new in v0.1) — eval fires → cluster → prompt diff → A/B → auto-ship → auto-rollback on regression. HITL gate at any step. Free tier: 1M spans/month, no card.









If you ship an AI customer support agent that handles refunds, here is what can go wrong on a single $500 request:
→ They get $500. (working as intended)
→ They get $100. (under-refund — angry customer, support ticket)
→ They get $1,000. (over-refund — your finance team calling)
→ The agent says "let me redirect you to our coupon department." (a department that does not exist)
When the second, third, or fourth one happens, you want three things: which step in the agent chain misfired, what the model was reasoning about when it produced the wrong amount, and a rule that stops the same class of failure from repeating on the next call.
Most observability tools give you the broken trace. That is the first thing.
TruLayer gives you all three. 25 evaluators score every output inline as each span arrives — tool-call correctness, faithfulness, hallucination — not in a nightly batch. When an eval rule fires, the control loop acts on the next call: retry with a fallback model, modify the prompt, or route to a human review queue before the next user hits the same failure path.
Observe → eval → remediate, in one closed loop.