Launching today

OpenInfer
Keep your OpenClaw agents running. Free beta, no code change
16 followers
Keep your OpenClaw agents running. Free beta, no code change
16 followers
Inference engines were built for conversational AI. Same compute, same cost for every request. Agentic AI is different: always-on, background workloads, massive context sizes. OpenInfer disaggregates model execution across heterogeneous compute nodes, unlocking hardware conventional stacks cannot use. No high-end GPU dependency. A fundamentally different cost structure. OpenInfer Beta is FREE for background workloads. The inference stack built for agentic AI.





Hey Product Hunt 👋 — Behnam here, founder of OpenInfer.
The inference stack was designed for chat. Every AI request gets the same treatment: same compute, same cost, regardless of what the workload actually needs. That works fine when a human is waiting on a response. It's the wrong approach entirely for a background agent running for hours with nobody watching.
Here's what that means in practice: 90% of your agent workloads are latency-tolerant, routine, and always-on — but you're paying premium GPU prices for all of them. Every session. Every time.
We built OpenInfer to fix this. The idea is simple: route each session to the compute it actually needs. High-SLA sessions — human in the loop, real-time — get premium hardware. Background agents, async tasks, always-on workloads get routed to lower-cost GPUs and leaner infrastructure at a fraction of the price. The routing is automatic. Your model doesn't change. Your code doesn't change.
Starting today, background task inference is free.
If you're running OpenClaw and getting hit by Anthropic's restrictions — we're a drop-in replacement. Zero code changes. Running on AWS to start.
openinfer.io/beta
We're a small team moving fast. Honest feedback is very valuable to us. Happy to answer anything in the comments. Also join our community to ask questions, request any features or give us feedback, we love to hear from you: https://discord.gg/sBQSSXue
Isn't this essentially just a smarter load balancer with model routing on top? What's the fundamental difference between what OpenInfer does and running two vLLM pools with a proxy in front?
@kam_eshghi great question kam, the routing is more about enabling SLAs, and also being aware of more complex topologies (across CPUs, between CPUs and GPUs, large context vs fast conversational inference). this is where we would see value in agentic inference
What does the setup actually look like? Do I point my OpenClaw config at an OpenInfer endpoint, or is there an SDK involved? Trying to understand how 'zero code changes' works in practice.
@alexlawson304 you just edit your existing openclaw.config file. there is a video on our website: https://openinfer.io/beta
let us know on discord or github if you still have issues
We've been running OpenClaw agents in production for 3 months and the Anthropic rate limiting this week genuinely broke two of our workflows. Signed up for the beta — is there a Discord or Slack where we can report issues during the trial?
@snorkel_whisper here is our discord channel: https://discord.gg/bNt4C8pw
we also keep an eye on our github page discussion: https://github.com/open-infer/openinfer-openclaw-beta/discussions
CPU decode for background agents sounds good in theory but I'd want to see latency numbers. What's the P99 for a 32K context on the CPU ring topology? And what happens if a 'background' session suddenly needs to respond to a human check-in — how fast is the topology switch? Hours 2–3
@ranveer_mehra great question, it is not just CPU we use, we do a balance between lower end GPU, CPU, balance between prefill, decode and a mixture of topologies.
The free beta is one topologies for one type of SLA. If interested let us know at hello@openinfer.io or on our discord https://discord.gg/sBQSSXue
What's the data plane look like? When OpenInfer routes a session to a CPU ring topology, is the KV cache staying on the same nodes throughout the session lifetime, or does it get redistributed if topology changes mid-session?
@aman_bajetha As of today, the redistribution is not real time when topology changes. But yes, this is a feature that we support.
Just connected our OpenClaw setup to the beta — took about 8 minutes, not 2, but still impressively fast for an infra change of this kind. Agents are running. Will report back on cost numbers after 24 hours.
@harshit_sunal it depends on the query and prompt token size. But glad that you are experiencing the Beta and will provide us useful feedback. Thank you.