Forget benchmarks: Code Canary reports realtime data about the quality of Claude Code, Codex, and Gemini CLI by asking regular users how their AI coding agent is behaving.
Hi Product Hunt! I've been working on an app and am excited to tell you about it.
Here's the back story - if you’ve got any kind of engineering background, you’re probably spending most of your waking hours enmeshed in a possibly unhealthy relationship with a coding agent like Claude Code. I know I am. This also means you know models like Claude have good days and bad days. Sometimes these issues stem from service interruptions as APIs get overloaded or there’s some kind of backend issue preventing them from working at all.
Other times it just seems like the model is being dumb and it’s hard to discern whether it’s the task you’ve given it or something deeper at play1.
The most maddening part is that it’s very hard to tell whether it’s you or your hastily vibe coded codebase causing the difficulties, or whether there’s something actually happening. Without data, it’s impossible to know.
Enter Code Canary, my humble attempt at creating a distributed data collection platform for analyzing the quality of coding agents in real time.
Code Canary is a lightweight, open feedback system that lets developers rate their AI coding sessions and publishes the results as a public, continuously-updated comparison dashboard.
Replies
Breadwinner