oneinfer.ai Forums on Product Hunt

p/oneinfer-ai-2 Unified Inference Stack with multi cloud GPU orchestration

•0 reviews•73 followers

Start new thread

trending

•

19d ago

It is coming.

The most requested feature since we open sourced oneinfer-edge.
We are adding a "Quantization Playbook" as a standalone feature inside the oneinfer-edge open source repo.
Right now, quantizing a model for local deployment means researching the right format for your hardware, running separate tools, debugging format incompatibilities with your serving library, and then starting the deployment process from scratch. A multi-hour process that sits completely outside your inference control plane.
With the Quantization Playbook inside oneinfer-edge - your hardware is already scanned. Your serving library is already detected. Your model is already known. Quantization becomes the natural step before deployment - not a separate workflow you manage outside the app.
Pick your model. oneinfer-edge handles the quantization. Deploy directly from the same control plane.
And before you deploy - you see exactly what you are trading. Perplexity. Token accuracy. Quality delta across quantization levels. So the decision is not a guess. It is a tradeoff you make with full visibility.
This is part of our commitment to making oneinfer-edge the only open source tool a developer needs to go from raw model to running inference - local, cloud, or both.
More details dropping soon.
Repo in the comments below.
Star the repo. You will not want to miss this one.

•

18d ago

If AI tokens were free tomorrow, what would you build?

No token limits. No API bills. No cost optimization.

What's the AI product, agent, or workflow you've always wanted to build but couldn't justify because of inference costs?

Curious to see what everyone would create in a world where AI usage was essentially unlimited.

•

20d ago

Feature Updates for oneinfer-edge

Hardware checks. Compatibility scans. Model deployment. Copilot routing. Local hosting. Multi-cloud instances. Cloud failover. Used to take a day. Now under 10 minutes.
AI moves fast. Deployment doesn't. 40% of teams take more than a week to get a single model into production. Data scientists spend over a quarter of their working day on setup, not science.
That's not an AI problem. That's an infrastructure problem.
oneinfer-edge fixes it. Not by reinventing the stack. By orchestrating what already exists into one open source control plane.
- Multiple serving libraries. One scan.
Ollama, llama.cpp, vLLM, SGLang, TensorRT-LLM, PyTorch, Dynamo. Instead of manually testing each one against your model and hardware, oneinfer-edge evaluates all five simultaneously and tells you exactly which one to use, for local, cloud, or both. Hours of trial and error eliminated before a single deployment.
- Traffic control panel for agentic harnesses. Zero code changes.
You can now leverage locally deployed models through the existing agentic copilots like codex, kilocode, opencode and openclaw and more upcoming.
-Model, serving library and hardware compatibility. Before you deploy.
Wrong serving library for your hardware. Wrong runtime for your model. These failures usually show up mid-deployment. oneinfer-edge runs a full compatibility scan across your model, your serving libraries, and your local hardware upfront. Complete picture. No surprises.
- Model and hardware resource checks. Local and cloud.
Paste any HuggingFace model ID. oneinfer-edge computes model weights, KV cache, and serving library overhead together and tells you whether it fits your machine or which cloud instance makes sense when local is not enough. No wasted downloads. No failed runs.
- Cloud instances marketplace. One API for everything.
Spin up instances across any cloud provider from the same control plane using a single OpenAI-compatible API. No switching between platforms. No managing separate configurations per provider. One place to create, manage, and monitor, regardless of which cloud you choose.
- Hybrid routing. Local, cloud, or both. Optimised automatically.
Local handles volume. Cloud handles complexity. When local capacity is exceeded, traffic fails over automatically. Routine tasks stay local. Complex reasoning goes to the cloud only when needed. Inference already accounts for 80 to 90% of the lifetime cost of a production AI system. Intelligent routing alone cuts that by 30 to 60%. Local-first hybrid orchestration pushes further.
We are just getting started. More coming in the next few days. Stay tuned!!!
Repo: https://github.com/oneinfer/onei...
Star it. Fork it. Consider contributing to the community.

•

29d ago

New Feature: Connect your locally hosted AI models to coding copilots with a click in oneinfer-edge

We shipped a new feature in oneinfer-edge (fully open source) to connect your locally deployed model to coding copilots like codex, OpenClaw, OpenCode and kilo code etc....

No plugin. No config file. No IDE restart. You click ONEINFER, a local proxy intercepts your copilot's requests, translates the format, routes to your self hosted model, and returns the response.
Your IDE doesn't know anything changed.
The proxy handles the ugly parts, model name rewriting, response format translation, streaming, so you don't have to spend an afternoon debugging why Codex expects an OpenAI messages format and your local model returns something else.
Switch back to original models in one click, mid-session, no restart. For when you actually need it.
This is just the start. Support for more agentic harnesses and copilots is already in the works, we're expanding the list based on what the community actually uses. So please voice out what you need in the github issues.
oneinfer-edge is the proxy, the hardware compatibility scanner, the inference routing, it's all in the repo. We'd rather you read the code than take our word for it.

•

7mo ago

oneinfer.ai - Unified Inference Stack with multi cloud GPU orchestration

OneInfer is a unified inference layer for multi-cloud GPU infrastructure. One API to access 100+ AI models across multiple providers. We automatically route requests based on cost, latency, and availability. Scale to zero when idle, autoscale to thousands when busy. Switch providers anytime without changing your code. One API key. 100+ models. Zero vendor lock-in.