Launched this week

Kaval
Verify what your AI agent believes before it takes action
76 followers
Verify what your AI agent believes before it takes action
76 followers
Your agent's worst mistakes won't look like mistakes. They'll be confident actions on cached facts, stored fields, and RAG chunks that quietly went stale. Kaval re-derives the truth the instant before your agent acts and returns a verdict to branch on: act, or don't. One MCP call, a typed pass/block with the proof.









How does Kaval handle the latency cost of re-deriving truth on every single agent action, especially for chains with hundreds of steps?
@kucuksahal10675
Good question, this is the core design constraint. We don't re-derive truth on every action:
1. Kaval gates beliefs, not steps. A 300-step chain usually rests on maybe a dozen beliefs (invoice is unpaid, this user is an admin, price is current). Reads and reversible ops pass through untouched. Only consequential actions hit the gate, and those check beliefs that are shared across most of the chain.
2. Freshness SLAs short-circuit. Every check carries an SLA (14d for an org chart, 60s for a price). If the belief was confirmed inside the window, verify returns from cache.
3. Monitors do the expensive work off the hot path. Register your belief store once and we sweep it in the background on the SLA, then webhook you only the newly-risky beliefs. At action time the gate is a cache hit against the last sweep. Full multi-source re-derivation only fires when something got flagged.
There are also per-call speed tiers (instant is cache/prior only with no network, fast is a cheap model, deep is full multi-source with citations), and if you pass the content hash from read time we catch changed-since-read with a hash compare before anything expensive runs.
Net: the hot path is a handful of cache-hit gates per chain. Re-derivation happens async where latency doesn't matter.
the re-derive step is the right instinct but what's the added latency per action in practice, and does it scale down for agents that fire dozens of tool calls a minute? feels like there's a tradeoff between catching stale facts and slowing the whole loop down, curious how you tuned that
spent a weekend stress-testing kaval against our agent's worst habit, hallucinating from stale data, and the typed pass/block with the proof feels like exactly what we needed to catch silent failures before they hit users.