Developer Farm - AI coding pipeline that can't cheat the tests

When AI agents see your test suite and acceptance criteria, they don't solve the problem — they solve the tests. Developer Farm is an open-source AI coding pipeline that makes metric gaming physically impossible through strict 4-layer isolation: 🔒 Planning never sees execution results 🔒 Execution never sees the test suite or rubric 🔒 Verification doesn't know who wrote the code 🔒 Retry feedback never leaks the scoring criteria Built on LangGraph + local Ollama + Qwen models

Hey Product Hunt! 👋 I'm Ilya, a software engineer who got tired of AI coding tools that technically pass tests but miss the point. Six months ago I was debugging a "working" feature built by an AI agent. The tests were green. The code was clean. But it solved a slightly different problem than what I asked for. That's when Goodhart's Law clicked: when you give an agent the test suite, it optimizes for the test suite — not the problem. So I built Developer Farm as a side project. The core idea is simple: split the pipeline into isolated layers where each layer is physically prevented from seeing information it shouldn't. The executor never sees the tests. The verifier doesn't know who wrote the code. Feedback never leaks the rubric. I expected this to be a weekend hack. Turns out it's a genuinely useful way to build software: • $0.03 per feature (runs on my old GTX 1050 Ti) • 26 seconds end-to-end • Real git branches per attempt • Full audit trail of every decision Open-sourced it under MIT because I think the industry needs more honest AI tools, not just smarter ones. Would love your feedback: - What isolation patterns have you found useful in AI workflows? - What features would make this production-ready for your team? - Anyone else fighting Goodhart's Law in their AI systems? Happy to answer anything in the comments!

Developer Farm - AI coding pipeline that can't cheat the tests

Replies