Launched this week

General Compute
AI models that run on an inference cloud optimized for speed
529 followers
AI models that run on an inference cloud optimized for speed
529 followers
GPUs are built for training, not inference. General Compute is an inference cloud running on ASICs — purpose-built alternatives to Nvidia silicon designed specifically for inference. We deliver 5x faster responses and higher per-user throughput for latency-sensitive workloads like coding and voice agents. Our OpenAI-compatible API means you swap your base URL, keep your existing workflows, and run real-time AI on infrastructure built for the job.






Kipps AI
Big congrats on the launch!
How about the time to set up? Is it able to run on CPU?
Neutron
@nishit_chittora It's just an API! So you can run an HTTP request from a CPU :)
Congrats on the launch. Your onboarding workflow is great. I missed clear models and pricing information upfront, and when I got onboarded I saw that you offer three models at somewhat premium pricing.
This leads me to my question: what is your value prop beyond latency? Because if you're competing on price, OpenRouter is still going to get you.
Model
Context
Input / 1M
Output / 1M
DeepSeek V3.2
Reasoning
deepseek-v3.2
32k
$3.00
$4.50
DeepSeek V3.1
Reasoning
deepseek-v3.1
128k
$3.00
$4.50
MiniMax M2.7
minimax-m2.7
160k
$0.40
$2.34
*latency and speed are apparently the value props?
Neutron
@robert_douglass These are what they are calling "premium tokens" - faster and more expensive. You don't need to compete on price if you can be much faster (proven by Cerebras). But now that you mention it we are lowering prices to be in line with open router. Should be updated later today!
Same price just 5x faster!
Congrats on the launch! Efficiently stashing heterogenous ASICs behind a homogeneous API is a challenging and exciting endeavor :) Especially curious about the technologies powering elastic scaling with request volume and bursts. Would love to see a characterization of that in maybe a future blog post as it would certainly be useful to many service designers!
Neutron
@tejas_harith We'll do a blog post about it at some point, and we're always hiring smart people that find these kinds of endeavors interesting!
The sequential call problem for agents is real. Latency adds up quickly when you chain together 50 or more LLM calls. I'm curious how the ASIC stack deals with variable prompt lengths, as that's often where GPU inference becomes unpredictable as well.
OpenClaw can sign itself up? That's wild. Finally someone building for a world where agents run themselves. 👏
Neutron
@timothy_oluwatipin2 Thank you!
PS Agent sign up is brilliant! I'm going to study that.
Neutron
@robert_douglass Appreciate it :)
ZeroTrusted.ai
Studied full stack development but never really got deep into the infrastructure side of things. Always assumed GPUs handled everything AI related. The idea that inference needs its own optimised hardware makes sense when you think about it. Congratulations on the launch.
Neutron
@sidraarifali Thank you!