Launched this week

RunInfra
Describe the AI model you need and get an optimized AI
174 followers
Describe the AI model you need and get an optimized AI
174 followers
Tell RunInfra what you need and it builds the production API. No dashboards. No config. Describe any open source model or full app in plain language. We optimize it for real: benchmark GPUs, quantize the model, generate custom CUDA kernels with our Forge agent. It runs faster and cheaper than standard hosting. Build voice (speech → AI → speech), doc search, vision, or model routing, all in one chat. Pay per million tokens. Scale to zero. Run managed or on your own GPUs.









RightNow AI
This looks impressive . I'm curious if I deploy with one open source model today and decide to switch to another later does Run Infra automatically re-optimize everything, or is that something I need to trigger manually?
RightNow AI
RightNow AI
Auto-generating custom CUDA kernels is the part that would make me nervous to trust blindly. A kernel can be fast and still be subtly wrong on edge cases, like a numerically unstable softmax or a padding bug that only shows up on odd sequence lengths. What's the testing story before a generated kernel goes into a production API, do you diff outputs against the reference implementation across a range of inputs first?
One thing I like is that you're optimizing for production instead of making demos easier .Lots AI tools get you to Hello world but far fewer help with latency cost, and scaling once people actually start using the product.
RightNow AI
Building production APIs from plain English and auto kernel optimization feels like the direction a lot of us need. Especially for voice/vision stuff where every ms counts.
How's the Forge agent doing on more complex full-app descriptions so far?
How does the Forge agent actually decide when to write a custom CUDA kernel versus just relying on quantization, and does that choice change the price I pay per million tokens?