Launched this week

RunInfra
Describe the AI model you need and get an optimized AI
178 followers
Describe the AI model you need and get an optimized AI
178 followers
Tell RunInfra what you need and it builds the production API. No dashboards. No config. Describe any open source model or full app in plain language. We optimize it for real: benchmark GPUs, quantize the model, generate custom CUDA kernels with our Forge agent. It runs faster and cheaper than standard hosting. Build voice (speech → AI → speech), doc search, vision, or model routing, all in one chat. Pay per million tokens. Scale to zero. Run managed or on your own GPUs.









Generating custom CUDA kernels automatically via the Forge agent is a step further than most optimize-your-model tools that stop at quantization. For less common architectures where kernel patterns aren't well-trodden, does it fall back to a safer generic path, or is manual tuning still needed there?
RightNow AI
Osama, the part that lands for me is not having to become an infrastructure expert just to get a model running properly. That barrier has quietly killed plenty of good ideas, so seeing it lowered is refreshing.
Tried spinning up a custom voice pipeline in the chat and it actually worked without me touching a config file. The CUDA kernel generation for a smaller Llama variant was way faster than I expected, ran cooler on my GPU too.
Tried spinning up a vision model just by describing it and it actually returned a working endpoint, no dashboard digging required. The custom CUDA kernel generation is a wild flex for a chat interface.
Spent a few minutes describing a doc search use case and the generated API was already hitting it faster than my usual setup, the per-token pricing is a nice touch too. Curious how the custom CUDA kernels hold up on weirder workloads.
Tried it with a vision pipeline and the custom CUDA kernels actually beat the hosted version I was using. The plain-language setup is refreshing, no YAML rabbit holes.