crunr - Launch and run any compute job on AWS with 1 command
by•
crunr — run it, ghost it.
GPU compute is $1.5/hr.
But your real bill looks like this:
- Idle time sitting there: $800/mo
- Infra team to manage it: $3,000/mo
- Failed setups and debugging: days lost
- 3am emergency fixes: priceless
crunr fixes all of it.
$ crunr run train.py --gpu
Spins up → runs → terminates.
You pay for compute only. Nothing else.
No idle bills. No DevOps. No lingering servers. Built for ML researchers, indie AI builders, and startup teams who just want their job to run.


Replies
crunr
@sandeep_01 The "hiring a full-time delivery driver because you order food three times a week" line nails why per-day GPU rental is insane for bursty workloads. The 65% idle number is the whole pitch in one stat. Question on the ephemeral model, since "instance gone, every time" is the feature and the risk: what happens to a long training run that dies at hour 6 of 9? On spot especially, preemption isn't an edge case, it's Tuesday. Is checkpointing on the user to wire up, or does crunr snapshot to S3 on interruption so a killed run resumes instead of restarting from zero? For a 3-hour run that's a shrug, for a multi-day fine-tune it's the difference between the tool being usable and not. Upvoted.
crunr
@artem_fedorovich
glad that line landed : it's exactly how it felt watching the bill.
honest answer:
instance dies : it's gone. always. no idle after a crash.
--s3 flag on and everything in outputs/ is already in S3 before it terminates. crash at hour 6 — last checkpoint is safe.
checkpointing logic is on you for now. save to outputs/ periodically. crunr handles the rest.
for spot specifically : on-demand is the right call for multi-day runs. spot makes sense when your job is resumable or short enough that a restart is fine.
one week away : automatic mid-job checkpointing ships. crash anywhere, resume from exactly there. no wiring needed.
and thank you for the upvote. 🙏
it is a cool problem to solve we also have similar kind of problem at our backend we are just paying for idle server times on aws, but one query is like where are you keeping the instance output and is it also available for cpu
crunr
@shuvam_mandal2
thank you! and yes — exactly that problem.
two things on your questions:
outputs — by default they rsync straight back to your laptop when the job finishes. configure S3 once with crunr s3 setup and outputs go to your own S3 bucket automatically. can even skip local download entirely with --s3-no-local and pull from S3 whenever you need.
CPU — fully supported. crunr run script.py without --gpu picks a CPU instance. need specific RAM? --memory 32 gets you 32GB+. add --spot if you want spot pricing. same flow — spins up, runs, terminates.
idle billing on backend servers is a real one. would love to hear more about your setup. 🙏
@sandeep_01 cool
The ephemeral spin-up-run-terminate model is the right abstraction for batch ML jobs. We've burned significant budget on GPU instances idling after failed training runs, especially when a job crashes at epoch 40 and the instance just sits there. How do you handle mid-job failures and artifact persistence? Does the runner automatically sync outputs to S3 before terminating?
crunr
@retain_dev yes , instance terminates the moment it crashes. always.
artifacts: run with --s3 and everything in outputs/ syncs to your S3 bucket before the instance is gone. crash at epoch 40 — your last checkpoint is already in S3.
one honest answer: mid-run snapshotting is on you to wire in your training script for now. save checkpoints to outputs/ periodically. crunr handles the rest.
that said , automatic mid-job checkpointing is shipping next week. crash anywhere, resume from exactly where you left off. no wiring needed.
the idle instance after a crash problem : already solved. the resume problem — one week away. 🙏