Launching today

AIVory Smart Inference
Cheaper inference. One URL. No code changes.
4 followers
Cheaper inference. One URL. No code changes.
4 followers
Providers cut prices all the time — your bill pays like they don't. Smart Inference reroutes every LLM call to the cheapest provider serving the same model, in real time. Drop-in OpenAI-compatible: swap one URL, keep your SDK, prompts, and streaming. 50+ models, pay-as-you-go from $10, credits never expire, no subscription. Median ~30% savings, up to 89% on open-weight models. Or self-host — spin up spot GPUs in one click and route to them through the same endpoint.






AIVory Guard
Hey Product Hunt 👋
I'm Jenni, one of the makers of AIVory Smart Inference.
Here's the thing that bugged us into building this: providers cut their prices all the time. You're paying like they don't. They all have massive overcapacity they unload for cheaper prices if you know when and where to look. But often not via the directest route - so you do not get to take advantage of this - and your bill reflects that.
Smart Inference fixes that with one change:
That's it. Keep your OpenAI SDK, your prompts, your retry logic, your streaming handlers, your tool-calling schemas. Every request gets routed to the cheapest healthy provider serving the model you asked for, in real time. Median savings land around 30%; on open-weight models with high price spread we've seen up to 89%.
What's in the launch:
* 50+ models through one endpoint - GPT, Claude, Gemini, Llama, Mistral, DeepSeek, Qwen, and more
* Live spot pricing - refreshed continuously, sorted cheapest first
* Savings dashboard - shows what you paid vs. what OpenAI/Azure/AWS/Anthropic would have charged
* Pay-as-you-go - $10 to start, credits never expire, no subscription, no seats
And because sometimes routing isn't enough - when you want to run an open-weight model on your own hardware — Smart Inference also aggregates spot GPU capacity across several providers in the same dashboard. Deploy an H100 in one click, and route to it through the same OpenAI-compatible endpoint. One backend mode flips to another without your code noticing.
Smart Inference is one piece of the AIVory platform - Guard (compliance scanning for your code) and Architect (visual designer for your infrastructure) share the same backend. But Smart Inference stands on its own, and that's what we're launching today.
Early access is live. We're around in the comments all day - would love your feedback.
-- Jenni & the AIVory team