OpenInfer

Keep your OpenClaw agents running. Free beta, no code change

20 followers

Keep your OpenClaw agents running. Free beta, no code change

20 followers

Inference engines were built for conversational AI. Same compute, same cost for every request. Agentic AI is different: always-on, background workloads, massive context sizes. OpenInfer disaggregates model execution across heterogeneous compute nodes, unlocking hardware conventional stacks cannot use. No high-end GPU dependency. A fundamentally different cost structure. OpenInfer Beta is FREE for background workloads. The inference stack built for agentic AI.

Free

Launch tags:API•Developer Tools•Artificial Intelligence

Launch Team

Viktor.comAn AI coworker that actually does the work

Promoted

Maker

📌

Hey Product Hunt 👋 — Behnam here, founder of OpenInfer.
The inference stack was designed for chat. Every AI request gets the same treatment: same compute, same cost, regardless of what the workload actually needs. That works fine when a human is waiting on a response. It's the wrong approach entirely for a background agent running for hours with nobody watching.
Here's what that means in practice: 90% of your agent workloads are latency-tolerant, routine, and always-on — but you're paying premium GPU prices for all of them. Every session. Every time.
We built OpenInfer to fix this. The idea is simple: route each session to the compute it actually needs. High-SLA sessions — human in the loop, real-time — get premium hardware. Background agents, async tasks, always-on workloads get routed to lower-cost GPUs and leaner infrastructure at a fraction of the price. The routing is automatic. Your model doesn't change. Your code doesn't change.
Starting today, background task inference is free.
If you're running OpenClaw and getting hit by Anthropic's restrictions — we're a drop-in replacement. Zero code changes. Running on AWS to start.
openinfer.io/beta
We're a small team moving fast. Honest feedback is very valuable to us. Happy to answer anything in the comments. Also join our community to ask questions, request any features or give us feedback, we love to hear from you: https://discord.gg/sBQSSXue

Report

3mo ago

Isn't this essentially just a smarter load balancer with model routing on top? What's the fundamental difference between what OpenInfer does and running two vLLM pools with a proxy in front?

Report

3mo ago

Maker

@kam_eshghi great question kam, the routing is more about enabling SLAs, and also being aware of more complex topologies (across CPUs, between CPUs and GPUs, large context vs fast conversational inference). this is where we would see value in agentic inference

Report

3mo ago

What does the setup actually look like? Do I point my OpenClaw config at an OpenInfer endpoint, or is there an SDK involved? Trying to understand how 'zero code changes' works in practice.

Report

3mo ago

Maker

@alexlawson304 you just edit your existing openclaw.config file. there is a video on our website: https://openinfer.io/beta

let us know on discord or github if you still have issues

Report

3mo ago

We've been running OpenClaw agents in production for 3 months and the Anthropic rate limiting this week genuinely broke two of our workflows. Signed up for the beta — is there a Discord or Slack where we can report issues during the trial?

Report

3mo ago

Maker

@snorkel_whisper here is our discord channel: https://discord.gg/bNt4C8pw

we also keep an eye on our github page discussion: https://github.com/open-infer/openinfer-openclaw-beta/discussions

Report

3mo ago

CPU decode for background agents sounds good in theory but I'd want to see latency numbers. What's the P99 for a 32K context on the CPU ring topology? And what happens if a 'background' session suddenly needs to respond to a human check-in — how fast is the topology switch? Hours 2–3

Report

3mo ago

Maker

@ranveer_mehra great question, it is not just CPU we use, we do a balance between lower end GPU, CPU, balance between prefill, decode and a mixture of topologies.

The free beta is one topologies for one type of SLA. If interested let us know at hello@openinfer.io or on our discord https://discord.gg/sBQSSXue

Report

3mo ago

What's the data plane look like? When OpenInfer routes a session to a CPU ring topology, is the KV cache staying on the same nodes throughout the session lifetime, or does it get redistributed if topology changes mid-session?

Report

3mo ago

@aman_bajetha As of today, the redistribution is not real time when topology changes. But yes, this is a feature that we support.

Report

3mo ago

Just connected our OpenClaw setup to the beta — took about 8 minutes, not 2, but still impressively fast for an infra change of this kind. Agents are running. Will report back on cost numbers after 24 hours.

Report

3mo ago

@harshit_sunal it depends on the query and prompt token size. But glad that you are experiencing the Beta and will provide us useful feedback. Thank you.

Report

3mo ago

1 2

Reviews