
RightNow AI
#1 GPU AI Code Editor
4.7•10 reviews•1.7K followers
#1 GPU AI Code Editor
4.7•10 reviews•1.7K followers
RightNow AI is the #1 GPU AI code editor. It combines GPU profiling, benchmarking, AI optimization, GPU virtualization, and a full GPU emulator in one environment to help developers analyze and optimize CUDA code faster
This is the 10th launch from RightNow AI. View more

Forge Agent
Launched this week
Forge turns PyTorch models into optimized CUDA and Triton kernels automatically. 32 AI agents run in parallel, each trying different optimization strategies like tensor cores, memory coalescing, and kernel fusion. A judge validates every kernel for correctness before benchmarking. We got 5x faster inference than torch.compile on Llama 3.1 8B and 4x on Qwen 2.5 7B. Works on any PyTorch model. Free trial on one kernel. Full credit refund if we don't beat torch.compile.





Free Options
Launch Team / Built With







RightNow AI
Cloudthread
Congrats! Can you dictate rules that the judge uses?
RightNow AI
Product Hunt
32 parallel coder+judge pairs is a smart setup. The judge comparison logic is the interesting part... wondering if it just checks against torch.compile baseline or if you can define custom metrics like memory footprint or specific tensor core utilization targets.
Turning “PyTorch in, tuned CUDA/Triton out” into something productized like this is a very ambitious swing, especially with 32 agents coordinating on the same kernel. The hardest part of these systems in my experience is not just finding a faster variant once, but keeping the optimized kernels robust across driver changes, new GPUs and slightly different input shapes without a constant babysitting loop.
How are you handling that stability vs. raw speed tradeoff in the UX: do you bias toward conservative, portable kernels by default, or lean into aggressive, hardware-specific wins and let power users manage the risk?