We finished #5 Product of the Day. Here's what three days of 60+ comments actually taught us.
We launched Cloud World Model on Product Hunt last week. The pitch: simulate AWS, GCP, Azure, OCI, and DigitalOcean infrastructure without provisioning real resources. You describe an architecture compute, databases, load balancers, serverless functions and the simulator models latency curves, CPU saturation, autoscaling behavior, failure propagation, and cost. No cloud bill.
We expected interest from learners. The comments told a different story.
The per-step warning will help a human reading the run, but the agent only ever optimizes the scalar reward, so a warning buried in a report won't bend the policy. What worked for us was keeping a separate eval env with the 'free' dimensions like egress and cross-AZ switched back on, and scoring the trained policy only there. If its reward collapses on that env, you've caught a policy that overfit to the sim's blind spot before it ever ships. Same idea as a train/test split, just applied to the reward.
Cloud World Model
@dipankar_sarkar Really appreciate this. We plan to apply a soft penalty to the scalar reward (not just a surface warning), so the agent does feel the cost of leaning on blind spots. But you're right that a fixed penalty can be absorbed,if the gain is large enough. The train/test reward split is the proper solution.
Would love to get your input on what you'd want to configure in that eval env. which dimensions to expose, whether you'd want the costs to match real provider rates or be parameterizable, and whether pass/fail should be a hard threshold or a relative drop from training reward?
This hits a real pain point—our staging AWS bill quietly hit $400/month last quarter because someone left a NAT Gateway running. Which AWS services are fully simulated vs mocked? Specifically curious about Lambda cold starts, DynamoDB Streams, and S3 event notifications. If those three work accurately, this becomes a no-brainer for our CI pipeline.
Cloud World Model
@jimmy_benhsu Great question, and that NAT Gateway story is exactly why we built this.
Here's the honest breakdown for your three:
Lambda cold starts - fully simulated. The engine injects cold-start latency on ~15% of requests per step, adds it directly to your P50/P95/P99 metrics, and can cascade into connection pool pressure if your downstream can't absorb the spikes. You can set coldStartLatency per resource to match what you observe in production.
DynamoDB Streams and S3 event notifications - modeled, not deeply simulated. DynamoDB itself is simulated (request-based billing, base latency, saturation behavior), but Streams-specific things like shard throughput, iterator age lag, or event delivery delays aren't broken out as independent simulation axes yet. Same for S3 notifications - S3 contributes storage cost and latency to the request path, but we don't currently model notification fan-out failure or delivery timing.
For your CI use case: where we're most useful today is catching resource saturation + cost surprises (the NAT Gateway scenario, overprovisioned RDS, Lambda fleet cold-start cascades) before they hit staging. The event-driven plumbing between services (Streams → Lambda triggers, S3 → SQS → worker) is on the roadmap, happy to share more detail on where that lands if it's a blocker for you.
I also need to plan to create a what's modeled thus far page with a mechanism to keep it up to date.
Cloud World Model
@jimmy_benhsu FYI - The what we model thus far page is up. Let me know, if you think it needs more details. Thanks!! https://cloudworldmodel.ai/provider-coverage
@mathsociety Thanks for the honest breakdown — that distinction between "fully simulated" and "modeled" is really useful for planning.
For our CI pipeline, the sweet spot for Cloud World Model would probably be resource saturation + cost surprise detection (exactly what you highlighted), while we keep a lightweight real-AWS smoke test path for anything that depends on event delivery timing or shard behavior.
Quick question on the hybrid boundary: in your experience, do teams typically run CWM for the bulk of integration tests and then gate merges with a small real-AWS canary — or do they run both in parallel and diff the results? Curious if you've seen a pattern that minimizes the "false confidence from simulation" risk.
Also checked the provider coverage page — clean reference. Would love to see an "accuracy matrix" column there showing which metrics are measured vs extrapolated, so teams can self-select where to trust simulation vs where to fall back to real infra.
Cloud World Model
@jimmy_benhsu On the hybrid boundary - the more common pattern is CWM for bulk + real canary at the gate, not parallel diffing. The diff approach sounds appealing but adds a lot of noise, simulation and real infra rarely produce byte-identical metrics even when both are "correct," so you end up chasing variance instead of actual regressions. The canary works better as a smoke fence: "does the thing actually boot and serve traffic" rather than "do the numbers match."
The false-confidence risk is real, and the mitigation we've seen work is being explicit about what CWM covers vs. what it doesn’t. Your accuracy matrix idea makes sense. The provider coverage page today shows what's simulated, but not how confidently. An "accuracy type" column, measured vs. extrapolated, would let teams self-select where to trust CWM and where to keep real infra in the loop. I’ll work towards getting it added.
Thanks!!
The chaos engineering part caught my eye, injecting a DB crash and getting a resilience score back seems really useful for catching weak spots before prod. Curious how close the cost estimates land to a real AWS bill in practice. Congrats on shipping!
Cloud World Model
@i_sanjay_gautam Thank you, Sanjay. Yes, chaos engineering goes along way towards determining what happens when something crashes. We believe we are 95 to 98 percent accurate to the cost estimates of a real AWS bill. We have an accuracy benchmark which describes it here. https://www.cloudworldmodel.ai/accuracy
the cost simulation is the part i need most. i blew $400 on an RDS instance i spun up for "testing" and forgot about for 11 days. nobody warned me.
how granular does the cost projection go? if i model a 3-tier app does it tell me i'm about to pay for an over-provisioned NAT gateway, or just give me a total bill estimate?
the value of cost tools breaks for me at the line item level. that's where i actually make decisions.
Cloud World Model
@thenameisarian Hi, this is the scenario we are built for so the $400 RDS instance story hits home. The engine prices every resource in your architecture independently. So a 3-tier app isn't one blended number, it's the sum of its parts, and you can see which part is bleeding money. We are also a simulation engine, intended to simulate most times even before you deploy the architecture. If you change the architecture, simulate again. Thanks for the question.
@mathsociety this is the answer. the "simulate again after every architecture change" workflow is what i don't have anywhere else. terraform plan tells me what's changing, not what it'll cost.
added to my list to try this week. one more quick one: does the simulation cover spot instance pricing or just on-demand? the cost-prediction wins i actually need are usually in the gap between "what i provisioned" and "what i'm paying for at 2am during a scale event."
cheers for the thoughtful answer.
Cloud World Model
@thenameisarian pricing is on-demand today, spot/preemptible isn't modeled yet. So I won't pretend it does.
But the second half of what you said is the part we actually nail: the "what am I paying at 2am during a scale event" gap. Cost isn't a static number off your provisioned config, it's recomputed every simulation step as autoscaling adds and removes instances. So when traffic spikes and the fleet scales from 2 to 9 nodes, you watch the cost/hour climb in real time, and Aurora Serverless v2 cost tracks live ACU rather than a flat rate. That's exactly the gap between "what I provisioned" and "what the scale event actually costs me" modeled step by step, just at on-demand rates.
Spot pricing is a fair ask though, and it's the natural next layer. Comments like these is the reason we did Product Hunt to get a sense of what customers want. We’ll add it to roadmap.
Cloud World Model
@daniel_adsuar_prieto Thanks Daniel great question. Appreciate, the kind words regarding the launch.
For multi-cloud networking constraints (latency between regions, cross-cloud egress costs, routing behavior), the simulation is quite accurate, we model provider-specific network topologies, zone-aware placement, and inter-cloud traffic costs. That's core to what we do.
IAM policies are a different story. We don't simulate that. We focus on the core infrastructure. IAM policies is a gap worth thinking about to test least-privilege architectures before deploying. It's something worth considering if there is demand for it. Thanks!!
Congrats on the launch! 🚀
Simulating cloud architecture before provisioning real resources is a very useful idea, especially for cost-heavy experiments and failure testing.
I'm curious: how close are the cost and performance predictions to real-world cloud bills after deployment? Do you provide any confidence score or comparison against actual usage data over time?
Cloud World Model
@prashant_patil14 Thanks Prashant, appreciate the kind words. Cost predictions are grounded in published provider pricing (updated when drift is detected against live pricing pages) and validated against benchmarks our accuracy scores sit between 96–98% across AWS, GCP, Azure, OCI, and DigitalOcean.
We don't currently ingest actual usage data post-deployment for comparison, so there's no feedback loop that tightens predictions over time from real bills. If we had more resources, we could also run our own ML cloud tests and add to our own data to get the accuracy scores even higher. We don't do that today.
Here's our current accuracy numbers. I think they would need to be externally disproven. https://cloudworldmodel.ai/accuracy
Useful angle for teams that want to teach cloud tradeoffs without handing out real cloud accounts. I’d be interested in how close the cost/perf model stays to provider changes over time, since drift is usually where these simulators get hard to trust.
Cloud World Model
@jimmy_lee12 Drift is an important concern. We use varying CI tricks.
Continuous validation: Every code change runs a pricing check in CI that fails the build if our cost constants drift from the reference rates we've sourced from each provider's pricing pages (AWS, GCP, Azure, OCI, DigitalOcean).
Weekly accuracy floor: A scheduled job benchmarks all five providers against real-world reference data every week and fires an alert (via PostHog - Shoutout to @PostHog ) if any provider's overall simulation accuracy drops below 95%. All five are currently well above that - AWS ~97%, GCP ~98%, Azure ~98%, OCI ~97%, DigitalOcean ~98%.
It's a combination of CI tricks and checks that keep things accurate.