Forge CLI - Swarm agents optimize CUDA/Triton for any HF/PyTorch model

RightNow AI

•5d ago

Forge generates optimized GPU kernels from any PyTorch or HuggingFace model. 32 parallel Coder+Judge agents compete to find the fastest CUDA/Triton implementation. Up to 5× faster than torch.compile(mode='max-autotune') with 97.6% correctness. Enter HuggingFace model ID, get optimized kernels for every layer. Powered by optimized NVIDIA Nemotron 3 Nano 30B at 250k tokens/sec. "Full refund if we don't beat torch.compile"

Replies

Best

RightNow AI

Maker

📌

2025 was the year of AI agents. 2026 will be the year of swarm agents. Forge is how we're starting it 32 agents competing in parallel to optimize your GPU kernels. Enter a HuggingFace model ID, get optimized CUDA/Triton for every layer. "Full refund if we don't beat torch.compile" Would love your feedback:D

Report

5d ago

RightNow AI

Maker

What a beautiful swarm agents 🤗

Report

5d ago

Newsletter Apps About FAQ Terms Privacy & Cookies Privacy Choices Advertise llms.txtContact us: hello@producthunt.com

Forge CLI - Swarm agents optimize CUDA/Triton for any HF/PyTorch model

Replies

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads