
Haystack
Review the pull requests that actually need human attention
375 followers
Review the pull requests that actually need human attention
375 followers
Haystack helps engineering teams manage the growing volume of AI-generated pull requests.
It sits on top of GitHub, analyzes each PR’s diff, codebase context, agent trace, intent, and verification evidence, then routes it: safe to move forward, needs fixes, or needs human review.
Teams use Haystack to keep code moving without rubber-stamping, focusing human attention where judgment actually matters.
This is the 5th launch from Haystack. View more
Haystack
Launched this week
Haystack helps engineering teams manage the growing volume of AI-generated pull requests.
It sits on top of GitHub, analyzes each PR’s diff, codebase context, agent trace, intent, and verification evidence, then routes it: safe to move forward, needs fixes, or needs human review.
Teams use Haystack to keep code moving without rubber-stamping, focusing human attention where judgment actually matters.






Free Options
Launch Team / Built With







Haystack
Hey PH! We’re building Haystack to help teams deal with the explosion in the number of pull requests that need to be reviewed due to the rise of coding agents.
Haystack replaces the GitHub PR review system with a queue that triages each PR before a human has to read any diffs. It looks at the diffs, the codebase, and the coding-agent conversation that produced the PR. Haystack then routes it into one of three buckets:
1. Safe to merge. This means the PR has enough evidence behind it that the team can merge it without another human’s review.
Some examples:
A small UI copy change that includes a screenshot showing the final state
A backend change where the author clearly tested the important paths and ran the changes in a real environment
2. Needs fixes. This means that the PR has bugs or violates a rule in your codebase and therefore the PR needs to be fixed by the author.
Some examples:
The agent was asked to make loading a large table faster by adding pagination, but the PR still loads every result at once and “implements” pagination in the UI
The PR silently catches an error instead of logging, surfacing, or handling it. This violates the team’s “no silent error swallowing” rule
3. Needs human review. This means that the PR could not be sufficiently verified by the author or is touching a sensitive part of the codebase (determined by user-input guidelines) and thus requires human review.
Some examples:
The PR changes a significant amount of logic in billing
The PR changes an important user flow like onboarding, but the author only ran unit tests and never opened the app to check the flow end-to-end. That violates the team’s rule that high-impact user-facing changes need manual verification.
Instead of starting with line-by-line diffs, Haystack immediately tells the reviewer the goal behind the PR, what design decisions the author made (informed by their coding-agent conversation), and how much the author did to verify that the pull request works (e.g. run scripts, checked the frontend, etc.).
In this way, review shifts from “what changed?” to “is this the right behavior and is there evidence that it works?”.
Here’s a quick demo: https://www.tella.tv/video/strea...
We previously launched Haystack as a tool for understanding large PRs (https://news.ycombinator.com/ite...). As many of you can probably relate to, the release of Opus 4.5 completely shattered our conception of how fast an engineer could craft a PR.
And as coding agents got even better from 4.5, we realized that pull requests did not scale along with our coding velocity. With each member of our team being able to pump out more than 20 pull requests a day, code review quickly became cognitively exhausting and less helpful.
After talking with other folks, we learned many feel similarly, and currently face the binary option of either not doing review at all or trying to keep up with a fire hose of pull requests.
Haystack is our attempt at a third path. We still believe in code review, but as coding agents produce more code, human reviewer attention becomes more valuable and more expensive.
Haystack helps teams spend that attention on the PRs where a human can meaningfully change the outcome of that PR. And for such PRs, Haystack shows the reviewer what the PR intended to do, whether the author showed that it works, and what design decisions need a second pair of eyes.
We’re still quite early and are figuring out whether Haystack truly makes code review better. We would love any and all feedback!
@akshay_subramaniam Congrats on the launch Akshay. Very cool but to my thinking, this requires either very good judgment (not an ai strong point) or very explicit rulesets. How do you deal with that.
Haystack
@zolani_matebese We allow the user to have a pretty in-depth ruleset that explicitly covers sensitive directories (e.g. /auth) or can be quite expansive and trigger based on the modification of any sensitive logic at all.
We also give the AI access to the coding conversation the author had with their agent, so they can err on the side of being conservative e.g. if the author only ran test but did not actually run the code, the PR can always need a review.
@akshay_subramaniam Would appreciate your outlook on when the following would be supported by Haystack - they're blockers for my organization:
- Self-hosted GHES support
- SSO Auth
Hi. This is a real problem now - AI makes it easy to create PRs, but review time is still human.
I like the idea of routing PRs instead of treating every diff the same. What signals matter most for deciding “safe to move forward” vs “needs human review”?
Haystack
@ihorperkovskyi That's configurable by users! But, to give some examples:
1. The PR changes a significant amount of logic in billing. This might need a senior engineer who's familiar with the billing stack to get things right.
2. The PR changes an important user flow like onboarding, but the author only ran unit tests and never opened the app to check the flow end-to-end. That violates the team’s rule that high-impact user-facing changes need manual verification.
3. The author changes an agentic workflow, but does not do any A/B testing (e.g. replaying a situation or running an eval) to prove that their changes improve the product
@akshay_subramaniam Makes sense. I like that teams can configure this themselves. The billing/onboarding examples are good, those are exactly the kinds of PRs where “needs human review” should be explicit, not guessed from diff size.
Does Haystack handle monorepo setups where different services need different review rules — e.g. stricter verification requirements for a payments service vs. internal tooling? Or is the ruleset currently global across all repos in a workspace?
Haystack
@sunnyallan The ruleset is per repo currently. If you have a monorepo with very different levels of verification needed, you can always specify "for payments, I want the author to prove that they did X" and Haystack will not apply this bar to internal tooling.
Alternatively, you can always ask for a human review for payments, and the reviewer can easily see "oh author did X, so I'm more confident this can merge, although I might want to take a closer look at exactly what they did".
mailX by mailwarm
Any reason as to your inspiration for naming it Haystack?
Haystack
@othman_katim The original product was meant to help users understand the relationship of different parts of their codebase e.g. how different functions worked together to allow for user authentication. This was because pre LLMs finding code relating to a semantic concept was like finding a needle in a haystack.
I guess you could retrofit the current product with the name by saying teams' PR queues look like haystacks and the needles are the PRs they need to review!
The signal-to-noise problem with PRs is real. Does it work across multiple repos or is it scoped to one at a time?
Haystack
@rich_nashawaty This works across multiple repos!
@akshay_subramaniam That's great to hear — exactly what makes it so valuable for bigger orgs. Congrats on the launch!
The configurable ruleset approach is great, but I am curious about the failure mode. If Haystack wrongly approves a PR that should have been flagged, and it breaks something in production, what does recovery look like? Is there a way to trace back what signal it missed, or does the team have to debug that themselves?