What broke first when you tried to scale a vibe-coded project?

by•4d ago

Vibe coding is fantastic for getting something working fast - but there's always a point where it starts to struggle.

For me it was when I tried to add proper auth to a project that didn't have it baked in from the start. The AI kept suggesting patches instead of a real solution, and eventually I was chasing one fix into the next.

Curious if others have hit a similar wall. What was the first thing in your vibe-coded project that the AI couldn't cleanly fix, and how did you actually handle it?

96 views

Replies

Best

The first thing that broke was the project’s ideation itself.

In a lot of cases, it is not that the human designer change their mind, but the original requirements had drifted out of the model’s head. Features started turning into approximations of themselves: edge cases disappeared, “unused” lower-level pieces got deleted, and small refactors accidentally broke behavior that had never been anchored in tests, docs, or acceptance criteria.

Scaling a vibe-coded project exposed a memory problem before it first expose a coding problem: sometimes neither human nor model could reliably tell which behavior was essential and which was accidental.

Report

2d ago

@rosetta_zidian_guo the memory problem framing is the best explanation i've read on this thread. it matches something i noticed too - the model doesn't know what it forgot, so it just confidently fills the gap with something plausible instead of flagging the uncertainty

Report

2d ago

For me, the first thing that usually breaks is project structure.

Vibe coding is great for getting features working fast, but after a few iterations you start seeing duplicated logic, inconsistent naming, random utility functions, and unclear data flow.

The AI can keep patching symptoms, but it struggles when the foundation needs to be redesigned.

What helped me was stopping feature work for a bit, documenting the intended architecture, then asking the AI to refactor around that plan instead of letting it keep adding fixes.

Report

3d ago

Mine broke at database structure. The AI kept adding new tables instead of simplifying the model. Starting over with a clean schema took less time than fixing the growing mess.

Report

3d ago

For me it wasn't the code, it was the data. I'm building an AI research agent that pulls company info from a bunch of different sources, and the model was happy to grab whatever it saw first even when two sources clearly disagreed. Vibe coding got me to 'it returns something' fast, but the real work started when I needed it to return the right thing. Ended up writing deterministic reconciliation and verification on top of the LLM instead of trusting it to stay consistent. That part you can't really vibe.

Report

3d ago

Ran into something similar building a voice AI agent — the

trickiest part wasn't the initial build, it was catching

when the AI said it did something successfully but

actually hadn't.

Specifically: the agent was confidently saying "you're all

booked!" after a phone call, but the tool wasn't actually

connected to the assistant — so it was just generating the

right-sounding confirmation without anything real happening

behind it. Nothing in the conversation transcript looked

wrong. Only caught it by checking the actual calendar

afterward and finding nothing there.

What helped: stop trusting the conversation/transcript as

proof of anything. Always verify against the real system of

record. The AI will happily narrate success even when the

underlying connection is silently broken — it has no way of

knowing the tool call never fired unless you build in checks

for that specifically.

Auth feels like the same category — the AI patches what's

visible (the error message) without understanding the

missing structural piece underneath.

Report

3d ago

@redist that's a sharper version of what i ran into honestly. i was checking the code diff to see if the fix worked, you're checking the actual calendar - same mistake, checking the thing that narrates instead of the thing that's supposed to have happened. feels like a general rule for agent work now: never trust the summary, always check the system of record directly

Report

2d ago

@omri_ben_shoham1 Exactly — "never trust the summary, always check

the system of record" is probably the most

underrated rule in agent development right now.

The transcript is the agent's story about what

happened. The calendar/database/CRM is what

actually happened. Those two things can diverge

silently and confidently at the same time.

Building that verification layer in from the start

changes how you architect the whole thing.

Report

1d ago

that's a good way to put it, the transcript is a story not a receipt. i think a lot of the early vibe coding hype skipped this because demos never hit the failure path, it only shows up once real users start doing things the happy path wasn't built for

Report

19h ago

For me it wasn't the code quality.

It was the cost.

At one point I accidentally burned over $170 in API usage building what should've been a pretty simple reporting script. 😅

That was the moment I realized scaling isn't just about whether the AI can solve the problem—it's also about whether the workflow stays efficient as the project grows.

Report

2d ago

@erika_chen$170 for a reporting script is a brutal way to learn that lesson. did you ever figure out what was actually burning the tokens - was it re-sending full context on every iteration, or more like an agent looping on its own output

Report

2d ago

@omri_ben_shoham1 I never figured it out for sure.

My suspicion is that prompt caching wasn't working somewhere in the stack.

I was using Claude Code with an API provider (not Anthropic's Max plan), so everything was billed per token.

It definitely felt like the same context was being sent over and over again, but I don't have proof.

Report

24h ago

@erika_chen makes sense, third party providers rarely pass through the same caching discounts. worth checking if switching to a provider with explicit prompt caching support would've cut that bill down, might save someone else here from the same $170 lesson

Report

20h ago