We finished #5 Product of the Day. Here's what three days of 60+ comments actually taught us.
We launched Cloud World Model on Product Hunt last week. The pitch: simulate AWS, GCP, Azure, OCI, and DigitalOcean infrastructure without provisioning real resources. You describe an architecture compute, databases, load balancers, serverless functions and the simulator models latency curves, CPU saturation, autoscaling behavior, failure propagation, and cost. No cloud bill.
We expected interest from learners. The comments told a different story.
Cloud World Model
Hi everyone! I'm Kevin Brown, one of the makers of Cloud World Model.
Cloud World Model lets you model AWS, GCP, Azure, OCI, and DigitalOcean architectures and instantly see how they behave CPU, error rates, throughput, autoscaling, failure recovery, and cost without provisioning a single real resource.
A few things we're proud of:
A capacity-aware engine that models real per-provider performance profiles
Chaos engineering: inject zone outages, DB crashes, and network partitions, then get a resilience score
A multi-cloud explorer that compares provider combos on cost, latency, and vendor lock-in
A full RL training API so AI agents can learn cloud optimization in a safe, cost-free environment
Beginner mode with plain-English AI explanations and an interactive tutorial
Whether you're learning cloud skills or training agents to optimize infrastructure, I'd like to hear any of the following in the comments?
How do you typically test cloud architecture changes before putting them in production or any environment?
Do you think a mechanism to be able simulate a cloud architecture change would be useful?
Any experiences with cloud cost comparisons?
Cloud World Model
BTW, you can try us headless using any of the popular tools like @Claude Code , @Codex or @Grok Build. The API is fully documented.
Here is even a starter prompt you can try:
@mathsociety Once you've simulated an architecture and you're happy with it, can you export that to Terraform or Pulumi? Right now it sounds like the sim and the actual deploy are two separate worlds. That's where I'd get stuck.
Cloud World Model
@whetlan Thank you. You are the second person asking about Terraform/Pulumi export question. The way I initially viewed it is the simulator is just a pure simulator. The Agent knows more about the customers actual environment then we do. The Agent gives us the architecture, we simulate it and we give the agent a response, the agent can keep simulating with us until it thinks it has the right answer. The Agent would then create the Terraform / Pulumi code for the architecture. However, if enough people think we should provide that feature, then it's something worth looking into.
@mathsociety Answering Q1+Q2 from the small-team side: I deploy bots to Fly and honestly my "test" is deploy-and-pray, no real staging. It bit me last week, a .env got baked into the Docker image and silently overrode my prod secrets, no error thrown, just wrong behavior. A simulator that flagged "this config will shadow your prod env" before deploy would've saved me hours. And the crash-injection prompt is the right instinct, the failures that hurt are the silent ones, not the loud ones.
Cloud World Model
@david_marko ouch. yes, it happens. At present, we simulate pure architecture without the application code. But your instinct about silent failures is spot on and directly maps to what we do: once your service is running, Cloud World Model lets you inject those quiet failure modes. Would love to know if that's useful for your Fly.io setup. We've tried to support many cloud providers not just the big ones.
@mathsociety Great job! Chaos engineering with resilience scoring before touching real infra is a great fit for teams that want to test failure modes without blowing a staging budget. How closely is the per-provider performance model calibrated against real-world AWS/GCP latency and throughput under load — actual benchmark data, or more of a directional approximation aimed at learning?
Cloud World Model
@xichiwoo Thank you. Published vendor specs and documented performance references, not directional approximations. We validate accuracy continuously: AWS ~97%, GCP ~98%, with a hard CI gate that blocks any code change dropping a provider below 95%.
Cloud World Model
@xichiwoo The caveat: it's benchmark-sourced, not live telemetry. So it reflects documented behavior, not moment-to-moment real-world variance. For learning and architecture validation it's solid; for predicting exact p99 latency on a specific day it's directional.
That CI gate on the accuracy benchmark is the reassuring part - vendor specs alone drift fast, so gating on measured accuracy is the right call. The thing I'd still watch is that the failure modes an RL agent loves to exploit are the non-nominal ones: throttling, noisy-neighbor contention, cold starts, regional capacity limits - exactly what published specs don't capture. Does the benchmark exercise those degraded regimes, or is it mostly steady-state cost/latency accuracy?
Cloud World Model
@hi_i_am_mimo Good news, it goes beyond normal conditions. The benchmark tests what happens under heavy traffic too, when things start slowing down, errors spike, and services get overwhelmed. It also simulates servers restarting cold and things like entire zones going down or databases falling over.
Where it doesn't go yet, it doesn't explicitly model one tenant's workload disrupting another's and some of the noisy neighbor comments you mentioned. That's a fair gap to flag.
@mathsociety Makes sense - if it already covers traffic spikes, cold restarts and zone failures, that is most of what an agent would try to game. The noisy-neighbor gap is the honest one to flag; it is genuinely hard without live multi-tenant telemetry. One thing on the cold-restart case: does it model the warm-up curve (cache, connection-pool and JIT ramp after restart), or just the restart latency hit? That ramp is usually where an RL policy finds a shortcut.
Cloud World Model
@hi_i_am_mimo More than the latency hit, but not fully: cache warm-up is modeled (hit rate ramps with traffic, so it's cold right after restart and falls through to the DB until RPS climbs), and serverless DBs come back at floor capacity with a tiny connection pool that ramps over several steps, so an agent hammering it immediately hits pool saturation. The honest gap is JIT/CPU: there's no "code hasn't warmed up yet" ramp, so a policy hunting for that specific shortcut won't find it, the cache and connection-pool ramps are the warm-up levers it can actually exploit today.
We'll look at closing this gap. I need to "Model a JIT/CPU warm-up ramp after compute restart". Your feedback is helpful and great. Thank you!!
Strong launch. The RL training API is the interesting edge. If an agent learns an infra optimization in simulation, I’d want the handoff receipt before deploy: resources changed, env/secret assumptions, failure case tested, and rollback path.
Do you expect agents to export a plan into Terraform/Pulumi, or stay inside the simulator?
Cloud World Model
@blah_mad My hope is actually more people use the API than the UI. Today, the RL agent stays inside the simulator. It learns optimization policies scaling thresholds, resource sizing, failure response and you get the episode history, reward trajectory, and the final recommended configuration as structured output. What you don't get yet is an auto-generated Terraform/Pulumi diff you can apply directly. At present the Agent is responsible for taking what its learned from our simulation engine and creating the IaC code. It's also possibly a natural next step for us to do it. Thanks for the comment!!
That makes sense. The episode history + final config is probably the receipt I’d start with. If the agent writes IaC today, I’d keep the review around the diff: source sim run, changed resources, blast radius, rollback.
Do you plan to make that structured output stable enough for other tools to consume?
Cloud World Model
@blah_mad Yes, the API is OpenAPI-specified with a generated TypeScript SDK, so the output is a documented contract, not ad-hoc JSON. Episode history and final config are stable today. Your diff shape (source run, changed resources, blast radius, rollback) maps well onto what's already there. Blast radius isn't a named field yet but the failure propagation data exists. Happy to share the spec if you want to build on top of it.
Yes, worth sharing. The part I’d look for is the run object other tools can trust: input architecture, sim id, failure data, recommendation, and what changed since the last run.
Is that exposed as one resource today, or stitched from a few endpoints?
The RL training API is the part that grabs me - an agent is only as good as the sim it learns in. The capacity-aware engine modeling "real per-provider performance profiles" is where that lives or dies: are those profiles grounded in published benchmarks and vendor specs, or in measured telemetry, and how often do you refresh them? If the sim cost/latency drifts from the actual providers, an agent will happily optimize for the model instead of the cloud, so how do you validate fidelity against a real deployment?
Cloud World Model
@hi_i_am_mimo performance profiles are grounded in published vendor specs and documented benchmarks, not measured telemetry. Fidelity is validated via an accuracy benchmark that runs against all five providers (AWS ~97%, GCP ~98%, Azure ~98%, OCI ~96%, DigitalOcean ~98%) with a hard CI gate. If we had more resources, we could also add to the data doing our own ML testing and get the accuracy even higher than what we are reporting. Also, pricing we check weekly.
This is the one I keep coming back to, cost is the question that never really leaves the room, and almost always the hardest thing to pin down before you commit.
My real question is fidelity. The headline compute numbers are easy, every calculator gets those right. The bills that actually blow up are the hidden line items: egress and cross-AZ traffic, managed-service markups, spot vs committed pricing. Does the engine reach down to those, or just the sticker compute price? And the case that would really earn its keep: migration, where the egress to leave a provider ambushes everyone and never shows up in a "provider A vs B monthly" comparison until the invoice lands. Can the explorer model the transition cost, not just the steady-state side-by-side?
Genuinely love the concept, the chaos-engineering resilience score is a great touch too. Congrats on the launch! :)
Cloud World Model
@keirodev appreciate the depth of this question.The engine goes beyond sticker compute: it models managed service rates, Kubernetes control plane fees, and spot vs on-demand pricing (that one's shipping very soon). So the "hidden" managed-service and spot deltas are in scope.
Egress and cross-AZ traffic costs aren't modeled yet. Those are roadmap items. Migration/transition egress is an even sharper version of the same problem, and honestly one of the most under appreciated real costs of a provider switch. That alone is a good reason for us to build it.
Thanks for the thoughtful questions, and for the kind words on the chaos resilience score!
@mathsociety Clean answer, and having managed-service and spot deltas already in scope is the part most calculators never reach, so you're ahead where it counts. The honesty on egress being roadmap rather than "sort of handled" is its own trust signal :)
One thumb on the scale for prioritization: egress is the rare hidden cost that changes the decision, not just the final number. Compute and managed rates tend to move every option up or down together, but egress is asymmetric, it punishes multi-region and migration paths specifically, which are exactly the architectures someone opens a simulator to stress-test. The day it can say "this design looks cheaper until you price the cross-AZ chatter, then it isn't" is the day it tells people something they couldn't already guess.
Rooting for it! ^^
Cloud World Model
@keirodev Already have work queued to model egress and cross-AZ traffic costs. Your framing on prioritization is going in as context for how we sequence it. Thanks for pushing on this it sharpens the roadmap more than you'd think. Thank you!!
When simulating highly stateful infrastructure setups (like managed DB clusters with strict VPC networking rules or IAM permission chains), how deeply does your local mocking layer mirror the cloud providers' internal API state validation? Does it execute structural validation against a custom internal schema engine, or parse translated Terraform/CloudFormation configurations directly?
Cloud World Model
@juno_dost We don't parse Terraform/CloudFormation or mirror provider API state validation. The simulator is a behavioral engine that models performance, cost, and failure outcomes using provider-specific capacity profiles and coefficients, not config compliance. VPC rules and IAM chains aren't enforced structurally; the focus is on what happens at runtime (latency, error rates, autoscaling, cost) rather than pre-flight policy validation.
The unmodeled dimensions are where an RL agent quietly cheats. You said egress and cross-AZ are not in the cost model yet, so an optimizer trained on that sim will not just ignore them, it will learn to exploit them: chatty cross-AZ topologies look free, so the policy you deploy ends up biased along exactly the axes the sim cannot see. 97% is reassuring for one config, but the agent searches for the 3%. Do you flag or penalize decisions that lean on unmodeled resources, or bound how far the agent can wander from validated regions?
Cloud World Model
@dipankar_sarkar You're right, unmodeled dimensions don't stay neutral, they become free reward. An agent trained on this sim will drift toward chatty cross-AZ topologies precisely because egress looks free.
Honestly, we surface the gaps in the accuracy breakdown, but we don't yet penalize the agent for leaning on them. Based on your comment we’ll work on, a per-step warning when a policy wanders into unmodeled territory, plus a soft penalty to keep the agent from exploiting blind spots. The sim should be upfront about its own uncertainty, not just accurate where it's confident.
Thank you!!
The RL training API is the sharp end of this — and the part I'd push on. An agent is only as honest as its reward. Cost, latency, and resilience are in tension: minimize cost hard enough and the agent learns to ship something cheap and brittle that looks great right up until the zone outage you didn't simulate.
So, two things I'd want before trusting an agent's infra recommendation enough to act on it:
Is the reward multi-objective and user-weightable (I decide cost vs resilience vs latency), and does a run surface the tradeoff the agent chose — "cut 30% cost but dropped your resilience score from 8 to 5" — instead of just handing back one "optimal" config? The tradeoff being visible and mine to set is the whole ballgame.
The chaos-injection + resilience score framing is a great call too. Congrats on shipping, Kevin
Cloud World Model
@syed_noor4 Thanks, Syed. The reward design critique makes sense. Cost, latency, and resilience are all tracked as separate metrics in the simulation, but the reward weighting isn't exposed as a user-tunable parameter today. Actionable, feedback is why we love being on product hunt.
The data to compute the two things you mentioned are there; it's a presentation and API design problem, not a simulation problem. Thank you very much for the feedback!