Imed Radhouani

We let Claude write 100% of our code for 7 days. Here's what broke first.

by

Last week we did something stupid.


We paused all human coding. Gave Claude (Anthropic) access to our GitHub repo. Told it to build new features, fix bugs, and ship.

No human review. No guardrails. Just Claude and our codebase.

For 7 days, it ran the engineering team.

Here's what happened.

Day 1: Confidence was high.

Claude (Sonnet 4.6 then Opus 4.5) fixed a small CSS bug in 30 seconds. Then refactored a messy function into something readable. We felt like geniuses.

By end of day, it had shipped 3 minor improvements. We started talking about cutting engineering costs.

Day 2: The first crack.

We asked Claude to add a new filter to our dashboard. It wrote the code. It worked locally. We merged.

That night, something else broke. A completely unrelated chart stopped loading. No error logs. No obvious cause.

We spent 2 hours tracing it back to Claude's change. The filter logic was fine. But it had refactored a shared utility function that 5 other features relied on. It didn't check dependencies. It just assumed.

We rolled back. Lesson one: AI doesn't think about side effects.

Day 3: The false confidence trap.

We asked Claude to build a new feature from scratch. It generated 800 lines of code. Beautiful structure. Clean comments. Tests included.

We reviewed it quickly. Looked perfect.

Pushed to staging. The feature worked. We celebrated.

Then we noticed something strange. Our API costs had spiked. Claude was making 3x more calls than necessary — not because the code was wrong, but because it didn't understand pricing implications. It called external APIs in loops where a batch request would have been fine.

No error. Just expensive.

Day 4: The silent failure.

We asked Claude to optimize our database queries. It wrote better SQL. Things ran faster.

Then user emails started coming in. "Where did my old data go?"

Claude had dropped a table. Not a critical one. But a table with 3 months of user activity logs. Not backed up. Not in our retention policy.

It didn't ask permission. It didn't warn us. It just did what we asked: "clean up old data."

We spent the next 2 hours backing-up and rolling-back a DB snapshot.

Day 5: The paradox.

We asked Claude to fix the backup issue. It wrote a beautiful automated backup script. Scheduled. Logged. Perfect.

We asked it to add a new feature. It worked flawlessly.

We asked it to review its own code from day 3. It found 2 potential bugs and fixed them.

We started feeling safe again.

Then at 3am, our site went down. Claude had updated a core dependency to the latest version. It worked in test. But the new version had a breaking change our production environment didn't support. No human would have made that mistake.

Day 6: The blame game.

We spent the morning restoring the site. Asked Claude what happened. It explained the dependency logic perfectly. It acknowledged the mistake. Then it suggested 3 ways to prevent it in the future.

One of the suggestions was to implement a dependency review process before merging.

It was telling us to put humans back in the loop.

The hardcoded amateur sh*t came the day before. We asked Claude to add a simple feature — a discount code field on checkout. It worked. Beautifully. Until we realized it had hardcoded the discount logic. Not configurable. Not in settings. Just raw numbers and conditions buried in the code. If we wanted to change the discount amount, a developer had to dig in and rewrite it. It didn't ask. It just assumed. And that's when we realised that we needed to rethink the whole AI visibility engine !!

Day 7: The verdict.

We ended the experiment. Total tally:

  • Features shipped: 12

  • Features that worked without issues: 4

  • New bugs introduced: 27

  • Hours spent fixing things Claude broke: 40

  • User emails explaining lost data: 73

  • API cost increase: 38%

What we learned.

Claude is incredible at writing code. It's terrible at understanding context, dependencies, business logic, and consequences.

It doesn't know what you didn't tell it. It doesn't ask questions when something is ambiguous. It assumes it's right.

The best work we got wasn't when Claude coded alone. It was when Claude wrote the first draft and a human reviewed it, caught the assumptions, and fixed the blind spots.

The hype is real. So is the mess.

What I'm curious about.

Has anyone else tried this? What broke first for you?

Imed Radhouani
Founder & CTO – Rankfender
Code that ships. Chaos that teaches.

5.1K views

Add a comment

Replies

Best
Abdul Basit Ahmad

I like the enthusiasm, but its misplaced. Without review even 10x 7 level famanga wozards will mess up. I think you will do much better with multishot even if you dont read the generated code beyond compiler errors. I have done some experiments in my github. I would really love feedback on my approach

Imed Radhouani

@darthcoder Yeah you're right. The enthusiasm is dangerous. Without review, even the cleanest code is a liability. The 3x API calls. The dropped table. The hardcoded discount logic. All of it looked perfect until it wasn't.

The multishot approach makes sense. Run it multiple times, compare outputs, let the patterns emerge. I'd love to see what you've been experimenting with. Link your GitHub?

Ryan W. McClellan, MS

AI dependency syndrome. I'm glad you shared this, and that you learned from the causation. AI, believe it or not, is still in beta mode; we just do not realize it because we think it is smart enough.

I've spent a lot of time with no-code editors that function on Claude's API and they do the same thing, and it seems almost like it's trying to find ways to gain tokens on things it already knows.

Is this an AI conspiracy?

Imed Radhouani

@ryanwmcc Hahaha an AI conspiracy would explain a lot. Maybe they're all in on it, burning tokens on purpose, nudging us toward subscription tiers we didn't know we needed (waaw, who knows, it can be a part of their business model O__o ) Claude playing 4D chess while we think it's just bad at loops.

But yeah, the "beta mode" thing is real. We treat these models like they're finished because they sound confident. But confidence isn't competence. It's just good at sounding like it knows what it's doing.

Hardik Gohil

This hit a bit too close.

I’ve noticed that things rarely “break” suddenly — it’s more like you slowly lose clarity over time.

At some point:
– the product still works
– features keep shipping
– users are there

But internally, it gets harder to answer simple questions like:
👉 what actually matters right now?

That’s usually when everything starts feeling heavier than it should.

Curious — did you notice that moment clearly in hindsight, or only after things started going off track?

Imed Radhouani

@hardikgohil79 You described it perfectly. Nothing breaks. It just gets heavier. You're still shipping. Users are still there. But you can't answer "what actually matters right now?" without a long pause.

For us, the moment was when we couldn't explain why we were building the next feature. Not to customers. To ourselves. We had a roadmap. We had deadlines. But the "why" had gone quiet.

We noticed it in hindsight. At the time, we just felt tired. We thought it was burnout. It wasn't. It was drift. We were building things because they were on the list, not because they mattered.

The clarity came back when we stopped and asked: what would happen if we didn't ship anything new for two months? The answer was: nothing bad. That's when we knew we were off track.

Did you catch your moment in real time or only after?

Landon Reid

We run Claude on every feature at ReadyPermit. The trick isn't letting it code alone — it's giving it tight constraints. Scope the task. Lock the schema. Review the diff. AI writes 10x faster but breaks 10x quieter. Guardrails > autonomy every time.

Denys Skrypnyk

I personally use AI in coding very selectively.

Mostly for routine stuff — generating models from a data structure, writing basic tests, or small boilerplate. But I don’t rely on it to generate large chunks of code.

At the end of the day, it’s still just a model. If AI writes the code, I have to spend time reviewing it carefully, going through every detail to make sure there are no wrong assumptions or hidden issues.

With Copilot and fast “muscle-memory” typing, I can often write the same code faster than reviewing AI-generated output.

So for me, AI is a helper for repetitive tasks — not a replacement for actual engineering.