Garry Tan

The New Waydev - Measure the full AI SDLC. From token to production.

AI agents write code. Most teams cannot tell you what percentage actually ships. Waydev tracks agent-generated code from IDE to production with AI Checkpoints: which agent, tokens consumed, cost per PR, acceptance rate, deployment status. Per team, per repo, per vendor. Compare Copilot, Cursor, and Claude Code on what reaches your customers. Measure cost per shipped PR and AI ROI. Ask the Waydev Agent anything.

Add a comment

Replies

Best
Alex Circei

Hey Product Hunt 👋

I am Alex, founder of @Waydev Nine years of building engineering intelligence. I have never seen a shift like this one.

AI agents are writing your code. Nobody audits the output.

4% of public GitHub commits are already authored by Claude Code. Companies are spending up to $195 per developer per month on AI coding tools. Almost none of them can prove the spend is working.

That is the gap we rebuilt Waydev to close. The new platform measures the full AI SDLC:

  • AI Adoption — which tools your teams use, what you spend per vendor, per team, per repo

  • AI Impact — follow AI code from IDE to production. See where it ships and where it dies

  • AI ROI — cost per PR, cost per shipped line, tokens consumed vs code shipped

  • AI Checkpoints — commit-level attribution. Which agent, how many tokens, what percentage was AI

  • Waydev Agent — ask anything. Closes the loop by feeding insights back to your AI through MCP

AI adoption was the easy part. Proving what AI actually changed in production is the hard part. That is what we built.

In the comments all day. Ask me anything.

— Alex

Savian Boroanca

@alex_circei, congrats to you and the team! This cuts to something most teams aren’t ready to admit yet: we’ve dramatically improved code generation, but not accountability. Measuring AI adoption is easy, but measuring whether that code actually survives in production is the hard and far more important problem. The focus on commit-level attribution and metrics like cost per PR or shipped code is directionally right, even if imperfect. Without that layer, AI spend is just a growing line item with no clear tie to outcomes.

What’s especially interesting is closing the loop, feeding these insights back into the agents themselves. That’s where this shifts from analytics to a self-improving system. The challenge will be balancing useful visibility with developer trust. This has to feel like system optimization, not surveillance. If you get that right, this starts to look like the observability layer for AI-generated code. That’s a category worth defining early. Godspeed :-)

Alex Circei

@savian_boroanca Thanks Savian, really appreciate this.

That’s exactly the bet we’re making. AI adoption is easy to report, but the real question is whether that code survives review, ships to production, and actually improves outcomes.

We also believe the next step is closing the loop, turning those signals into feedback for both teams and agents, without making it feel like surveillance. It has to help engineering organizations optimize the system, not police developers.

Still early, but we think this is a missing layer in the market, and a category worth building.

DAYAL PUNJABI

@alex_circei For teams blending human + AI code say, junior devs iterating on agent output, how does Waydev surface the "human lift" vs pure AI PRs? Does it flag where devs spend most time reviewing/rejecting AI suggestions; those subtle quality gates that spending alone misses?

Alex Circei

@dayal_punjabi That’s exactly the gap we’re trying to solve.

Waydev separates AI-assisted, agent-driven, and human work, then shows what happened after the code was written: review time, rework, cycle time, deploy frequency, and incident correlation. That makes the human lift visible, not just the volume of AI-generated code.

So if a junior dev or agent opens a PR, you can see whether humans had to heavily review, rewrite, slow down, or stabilize it before it actually shipped.

For suggestion rejection at the IDE level, it depends on the source integration, but our main point is this: AI usage alone is not the truth, production outcomes are.

Nathan Latka

@alex_circei congrats on the launch! much needed.

Caleb Bennett

Most team track usage , but not what actually makes it to production. This kind of visibility could really help cut wasted spend . Curious if it also highlights why some AI generated PRs don't get shipped?

Alex Circei

@caleb_bennett1 Exactly. Most AI dashboards stop at usage. The real question is what gets merged, shipped, and creates value. And yes, this kind of visibility should also show where AI-generated PRs get stuck, in review, rework, or abandonment, which is where a lot of wasted spend hides.

Liam Bailey

@caleb_bennett1 Really good point

Curious Kitty
A lot of engineering analytics tools get dismissed as “commit/LoC dashboards.” What product decisions did you make to avoid Goodhart’s-law behavior (PR splitting, metric gaming), and how do you recommend companies operationalize Waydev without turning it into an individual performance scorecard?
Alex Circei

@curiouskitty Great question. We made a few deliberate product choices to avoid turning Waydev into a commit/LoC scoreboard.

First, we do not optimize around raw activity metrics. Commits, lines of code, PR count, and similar signals can be useful as context, but they are easy to game and dangerous when treated as outcomes. We focus much more on system-level flow, quality, and delivery signals like cycle time, review time, deployment frequency, change failure rate, rework, incidents, and what actually ships to production.

Second, we push measurement up from the individual to the team, repo, and org level. The goal is to understand how the system performs, where work gets stuck, and whether tooling, process, or AI adoption is improving outcomes. Not to rank engineers.

Third, we connect metrics instead of showing them in isolation. A spike in PR volume alone tells you very little. But PR volume plus longer review time, higher rework, and more incidents tells a very different story. That is how you reduce metric gaming, by making tradeoffs visible.

Fourth, we recommend companies use Waydev for coaching and operating rhythms, not performance management. The best rollouts are for engineering leaders, not as a scorecard for individual compensation discussions. Use it to ask: where are the bottlenecks, which teams need support, what changed after adopting AI tools, what is improving, what is getting worse?

My simple rule is this: if a metric can be easily gamed, it should never be the goal. It can be a signal, but never the target.

So the operational model we recommend is:

  • measure teams and systems, not individuals

  • look at outcome bundles, not single vanity metrics

  • use trends and before/after analysis, not snapshots

  • combine quantitative signals with qualitative context like DevEx feedback

  • never use one metric as a proxy for engineer quality

That is how you get value from engineering intelligence without creating Goodhart-law behavior.

Chip BORODESCU
💡 Bright idea

Finally something that looks at actually measuring productivity beyond just lines of code. With AI agents, generating code is becoming the easy part, but the more important question is what actually makes it through review, ships to production, and creates durable value. Otherwise we risk confusing velocity of spitting code with actual progress.

This feels like the right lens for understanding AI’s real contribution to engineering teams. The one question I'm still trying to figure out and I'd love your perspective: how do you connect these engineering metrics (output) with the business KPIs (actual business outcome)?

Alex Circei

@cborodescu Chip, exactly. That is the trap, AI can increase code volume far faster than it increases delivered value.

The way we think about it is by treating engineering metrics as leading indicators, then tying them to business outcomes at the team, initiative, and product level. For example:

  • cycle time, review time, deployment frequency, and rework rate show how efficiently value moves through the system

  • incidents, rollback rate, and change failure rate show the quality cost of that speed

  • then you connect those signals to business KPIs like feature adoption, customer retention, revenue impact, SLA performance, and cost to deliver

So the real question is not “did AI generate more code?” but “did AI help this team ship the right work faster, with less risk, and with better business results?”

That is the layer we think is still missing in most of the market.

Abhishek Mishra

This feels super relevant right now. A lot of teams are thinking about this problem. Will give it a shot.

Alex Circei

@abhi_shek1994 Thanks!

Dragos Bulugean

nice way of looking at your team's output, now together with visibility for generated code. will try it out soon.

Alex Circei

@dragos_bulugean Thanks! We're waiting for your feedback!

Piotr Pasierbek

this is exactly what we've been missing. we use Cursor and Claude Code daily but have zero visibility into which suggestions actually make it to prod. the cost per shipped PR metric is brilliant - finally a way to measure actual AI ROI instead of just "feels faster." curious how the agent tracking works across different IDEs?

Alex Circei

@piotr_pasierbek Appreciate this, Piotr. That is exactly the gap we kept hearing from teams.

Most AI tools show activity. We wanted to show what actually ships.

Waydev measures at the source-of-truth layer, code, PRs, and production, so teams can see which AI-assisted changes make it into shipped PRs and what ROI they actually create. That is also how we approach agent tracking across IDEs: not just by looking at reported usage inside one tool, but by connecting the contribution back to the work that reached production.

Happy to show you how it works with Cursor, Claude Code, and other agent workflows.

CHRISTIAN ONOCHIE

This hits a real blind spot. Everyone is adopting AI coding tools, but almost no one can tie usage to actual shipped value.

Alex Circei

@christian_onochie Exactly. Adoption was the easy part. The hard part is proving what changed in production, for speed, quality, and ROI.

Piotr Sędzik

love that you're tracking acceptance rates by vendor. we've been debating Copilot vs Cursor internally and it's all gut feeling right now. being able to see "Cursor had 73% acceptance but Copilot code shipped 2x faster" would end those arguments quickly. does it handle when devs modify AI suggestions before committing?

Alex Circei

@piotreksedzik Exactly. Most teams are still arguing AI vendors based on screenshots, gut feeling, or who shouts the loudest internally.

The goal here is to compare vendors on what actually matters: acceptance, shipped output, rework, and downstream delivery impact.

And yes, that is an important part of it, not just whether a suggestion was accepted, but how much of it survived after edits and made it through commit, PR, and into production.

Otherwise the metric is incomplete.

Tyrone Robb

Looks really cool.

How do you compare against https://macroscope.com/ ? I like 1) their github integration and the code suggestions, 2) the the sprint analysis.

Alex Circei

@ty_robb Really good product.

From what I’ve seen, Macroscope is strongest as a GitHub-native AI layer: very fast GitHub setup, PR and commit summaries, code review, fix suggestions, and lightweight sprint/status reporting. Waydev is broader. We connect engineering data across GitHub, GitLab, Bitbucket, Azure DevOps and Jira, then go beyond suggestions into DORA, sprint risk/capacity, DX, AI adoption, AI impact, AI ROI, and resource planning.

So I’d frame it simply:

  • if you want an AI reviewer living inside GitHub, Macroscope looks strong

  • if you want to understand whether engineering, including AI tools, is actually improving delivery, planning, quality and ROI across the org, that’s where Waydev is much deeper

On the sprint side specifically, Waydev is very explicit there: velocity/sprint reporting, scope creep, capacity issues, forecasted sprint risk, plus Jira-based sprint visibility.

123
Next
Last