Leading quant VC: Turning art into science through AI
This is the 3rd launch from Vela Partners. View more

PHBench
Launching today
PHBench: the first public benchmark predicting Series A funding from Product Hunt launch signals.
We analyzed 67,292 featured launches over 7 years, linked to 528 verified Series A rounds via Crunchbase. Champion model: 4.7x lift over random. Team size × community engagement is the strongest signal; B2B (API, Payments, Fintech) converts at 3x baseline; Rank #1 raises at 2.2x unranked.
Dataset, code, and baselines open. Submit at phbench.com and subscribe for weekly high-probability launches.




Free
Launch Team / Built With





Vela Partners
@rajiv_ayyangar, thank you so much for hunting us!
Hey PH Community 👋
We're Yagiz, a Senior Technical Product Manager at Amazon and an independent researcher and Yigit, co-founder and GP of Vela Partners. Today, we're launching PHBench in collaboration with the University of Oxford (Ben Griffin and Rick Chen) and Vela Partners, the leading quant VC.
And yes, the irony of launching a Product Hunt benchmark on Product Hunt is completely intentional 🙂
Here's the origin story. We kept asking a question nobody had answered: Can you predict which Product Hunt launches will raise Series A funding, based solely on what you see on launch day (votes, rank, team size, category, timing)?
So we built PHBench. We collected 67,292 featured PH launches going back to 2019, matched them to Crunchbase funding records, and identified 528 verified Series A raises within 18 months. Seven years of data. Every featured launch.
Three findings I think this community will find interesting:
→ The signals work. Our model is 4.7x better than random. Statistically significant.
→ The strongest predictor isn't votes alone. It's team size × community engagement together. A large coordinated team achieving high traction is more predictive than either signal alone.
→ B2B categories convert at 3x the baseline rate. API, Payments, Fintech. If you launch a developer tool on a Tuesday with a big team and high engagement, that's a strong signal.
We also tested three frontier Gemini models on the same task. The most capable model performed the worst. Better reasoning doesn't help with pure numbers.
The dataset is available on HuggingFace. The leaderboard is live. The code is public. Can you beat our baseline?
The paper is on arXiv and has been submitted to the NeurIPS 2026 Evaluations & Datasets Track.
Would love your feedback — especially from anyone who's launched on PH and gone on to raise Series A. You're in our dataset :)
Product Hunt
Vela Partners
@curiouskitty On operationalizing the score: We'd recommend treating it as a relative ranking signal rather than a calibrated probability. The model ranks well (13x lift over random in the top 50), but the absolute probabilities shift across market regimes.
For anyone deploying this in practice, we'd suggest re-ranking the current cohort weekly rather than relying on absolute thresholds. Periodic retraining (quarterly, as new Crunchbase labels resolve) would help, and calibrating by sector makes sense given that Fintech/API categories convert at 3x the baseline while consumer categories are well below.
On F₀.₅ as primary metric: In VC deal-flow screening, false positives are more expensive than false negatives. A false positive means an analyst spends time on a company that won't raise (scarce capacity wasted). A false negative means missing a deal, but that's recoverable through other sourcing channels. F₀.₅ weights precision twice as heavily as recall, which matches that asymmetry. AP is reported as a threshold-free complement, but F₀.₅ at an optimized threshold is what we'd actually use in a weekly screening workflow.
Interesting, most people assume raw upvotes are the proxy for quality. So the finding about team size × community engagement being a stronger signal than votes alone is genuinely counterintuitive but very curious. Have you looked at whether solo founders who hit high engagement are penalized by this model? Do they show up as a distinct cluster? Would love to see how the signal degrades for truly first-time founders vs. repeat ones. Incredible dataset, congrats on getting years of data cleaned!
Vela Partners
@artstavenka1 Thanks! And yes, the votes finding surprises everyone! Raw upvote count is actually one of our four "noise" features. It has high model importance but near-zero conditional lift. The reason: viral launches with 500+ votes are often consumer products riding a wave of hype that doesn't translate to institutional funding. The strongest signal is votes combined with daily rank. A #1 launch with high engagement raises Series A at 3.5x the baseline but votes without a strong rank is noise. Maker team size is #2 in importance, and maker follower count is #6 but carries a higher lift (2.4x vs 1.2x for team size alone). Suggesting that who's on the team matters more than how many.
On solo founders: we haven't done the cluster analysis you're describing, but the data is suggestive. Solo founders (maker_count = 1) underperform teams of 2-3, with a modest 1.2x lift for teams vs. baseline. But the bigger signal is follower count: a solo founder with a large following performs fine; a solo founder with no following is where the model gets skeptical.
We don't currently distinguish first-time vs repeat founders. That's a great feature idea, but maker IDs are redacted in the PH API for privacy, so it's not something participants can compute today. It would require a partnership with Product Hunt to access that signal. @rajiv_ayyangar what do you think :)?
If you're curious about digging in, would love to see you submit a model:) You can get the full dataset from here: https://huggingface.co/datasets/ihlamury/phbench
Are you using only launch day signals, or do you include post launch traction like follows and comments over the first week?
Vela Partners
@karimbenkeroum The core signals are captured on launch day (votes, comments, daily rank, maker profiles, topic tags).
One caveat: maker follower counts were scraped in 2026, not at launch time, so for older launches, they reflect post-funding growth. It's a limitation we document in the paper.
Adding richer post-launch features like 7-day comment growth or follow-on engagement would be a great extension. We think there's a lot of untapped signal there.
Full details on the feature set are in Section 5 of the paper: arxiv.org/abs/2605.02974
Vela Partners
Been quietly working on this with Yagiz, Yigit and Rick for a while.
While I mostly focus on using founder profiles to predict raises, PHBench tries the same prediction but from the product side. A similar question but from the other side.
Have a go at the leaderboard if you fancy; the data's on HuggingFace.
Vela Partners
@bengriffin3 thank you for your valuable contributions. Excited to incorporate ProductHunt into founder prediction pipeline.
Vela Partners
Really excited to bring PHBench to you guys! By extending the short-term productivity signals on Product Hunt to predict long-term funding materialization, we help to identify outlier products that are truly valuable in the VC environment. We think it will be greatly beneficial to the Product Hunt community.
Come to beat our baseline and get to the top of the leaderboard!
Vela Partners
@rick_chen5 excited to see more predictors in ProductHunt community to join us! :)
Vela Partners
So excited to see this live! This has been a labor of love, collecting data, running +100 experiments, and testing LLMs against good old gradient boosting.
The leaderboard is open. If you can beat us, you're the new champion. Who's in?
Vela Partners
@ihlamury looking forward to seeing some competition soon!!!