Launching today

Orqen
Optimize, route, and safeguard your LLM agent context.
5 followers
Optimize, route, and safeguard your LLM agent context.
5 followers
Stop paying the "input token tax." Orqen is a native AI proxy that sits between your app and LLM providers. It dynamically compresses chat histories, filters tool schemas, trims large tool outputs, and acts as a circuit breaker for agent loops.



Hi Product Hunt! 👋
I'm the maker behind Orqen.
If you've ever shipped an autonomous AI agent or a multi-step workflow into production, you've probably lived through the same nightmare I did: the features work, but the provider invoices don't.
The harsh reality of agentic workflows is that the real budget killer is input tokens, not output. Every loop iteration resends the full chat history, every tool schema, and bloated JSON from previous tool steps. Your context window balloons, latency spikes, and margins vanish.
I got tired of bolting custom trimming and routing logic onto every project. I wanted token efficiency without rewriting my agent architecture.
That's why I built Orqen a developer-native, drop-in proxy between your SDK and providers like OpenAI, Anthropic, or Bedrock. Swap your base URL and API key; your agent code stays the same.
🛠️ What Orqen does out of the box
Payload optimisation — Compresses schemas and manages conversation history intelligently: recent turns stay verbatim; older turns are compressed or summarised only when the window gets heavy (no blind summarisation on every call).
Smart tool routing — Instead of sending 20+ tool definitions every turn, Orqen forwards a smaller set matched to the current intent, with fail-open behaviour when routing is uncertain.
Tool output trimming — Strips useless metadata and noise from large JSON tool results before they go back into the next LLM call.
Session spend caps (optional) — Per-session USD limits so a runaway agent loop hits a clean hard stop instead of an open-ended bill.
⚡ Built for production
Orqen is fail-open on optimisation: if the pipeline hits its time budget, the original payload is forwarded unchanged so production doesn't break. Typical overhead is on the order of ~300ms on top of the provider call.
🎁 Free tier & quick start
Quick start is under two minutes for major Python/JS SDKs. There's a permanent free tier. If you use your monthly optimisation allowance, Orqen falls back to passthrough requests still reach your provider; they just aren't optimised until the monthly reset.
How are you managing LLM token costs today especially long agent loops or MCP-style tool sprawl? Writing logic in-app, or mostly swapping models and hoping for the best?
Drop your stack below (frameworks, providers, MCP or not). I'll be in the comments all day for technical questions, feature ideas, and billing horror stories.
Thank you for the support! 🚀