PHBench - Predict the next Series A from a ProductHunt launch
by•
PHBench: the first public benchmark predicting Series A funding from Product Hunt launch signals.
We analyzed 67,292 featured launches over 7 years, linked to 528 verified Series A rounds via Crunchbase. Champion model: 4.7x lift over random. Team size × community engagement is the strongest signal; B2B (API, Payments, Fintech) converts at 3x baseline; Rank #1 raises at 2.2x unranked.
Dataset, code, and baselines open. Submit at phbench.com and subscribe for weekly high-probability launches.


Replies
Vela Partners
@cyrus_burns we'll know by the end of the launch but it's positive, because it's dev tool and 4 hunters! These are positive signals :)
Raycast
No analysis on hunter impact? 🥴🥴🥴
Vela Partners
@chrismessina Valid request! Not yet. Hunter signal is actually one we'd love to explore. The current model uses maker profiles but not who hunted the product. There's probably
real signal there. A hunt from a top hunter likely correlates with stronger launches. Adding it to the wishlist for v2! 🫡
Vela Partners
@chrismessina @ihlamury We should create a specific ML feature as chris_messina :) I'm sure it has positive impact!
@yigit, congrats on the launch. I will be wondering the result of https://www.producthunt.com/products/vela-terminal launch after the launch ends!
My best product hunt launches were driven by public curiosity and correlated with it. I was using those metric for A/B/C testing and it was way more making sense when you test yourself as founder or an idea of early prediction when same amount of effort is spent.
I will definitely try to benchmark using my past launches and give feedback!
Vela Partners
@ozgur_ozkan4 thank you! My interpretation is that ProductHunt is a signal that the founder is hustling, and if she has the distribution / network. So it's a positive signal of a larger feature set.
Decktopus AI
congrats on the launch! This seems very interesting and exciting.
Vela Partners
@alara_akcasiz Thank you so much!
Vela Partners
@alara_akcasiz thank you!
Now that this model is public, founders will start optimizing for the signals it tracks - bigger teams on paper, coordinated engagement, category shopping. Does publishing the feature set risk corrupting the signal over time?
Also curious where EdTech lands in the category rankings. Congrats on the launch!
Vela Partners
@jared_salois The short answer: some signals are gameable and some aren't.
Vote count on its own turned out to be one of our "noise" features. High votes without a strong daily rank doesn't predict much. We think this is because votes reflect marketing effort as much as product quality, and the model learned to look past that. Daily rank is a stronger signal because it's relative, you're competing against every other launch that day, so it's harder to influence. Maker team size and follower count are structural signals that reflect genuine team quality.
The more surface-level metrics are naturally the ones most susceptible to optimization over time. The durable signals tend to be the ones rooted in team and product fundamentals. That said, this is a real limitation we discuss in the paper, and it's one reason we keep test labels private and plan periodic model refreshes.
On EdTech: it falls under our Education/Productivity bucket. Conversion rates are close to baseline, not as strong as Fintech/API (3x) or Developer Tools (1.8x), but solidly above consumer categories.
Since the dataset is open, you could train an EdTech-specific model and see which signals matter most for that vertical. Would love to see it on the leaderboard: https://huggingface.co/datasets/ihlamury/phbench
Curious what signals it uses upvote velocity, comment quality, founder background? Most PH launches that blew up felt impossible to predict beforehand. Does it backtest against past launches?
Vela Partners
@imad_elkhafi The model uses 61 features built entirely from public Product Hunt signals. The main categories are i) engagement, ii) ranking, iii) maker profile, iv) timing, and v) category
We don't use comment text or founder background in this version. We only use structured numerical signals, fully anonymized.
On backtesting: the dataset covers 67,292 featured Product Hunt launches from 2019-2025, with verified Series A outcomes from Crunchbase. We use a strict temporal train/validation/test split so the model never sees future data.
The champion ensemble achieves a 4.7x lift over random on the held-out test set. Full methodology is in the paper (arXiv:2605.02974).
You're right that individual outcomes feel unpredictable. The base rate is only 0.78%. The model doesn't claim certainty, but it reliably concentrates true positives in its top predictions.
@ihlamury 67k launches backtested with a temporal split - that's serious methodology. 4.7x lift over random is a strong result given the 0.78% base rate. Reading the paper now.
Product Hunt
Vela Partners
@curiouskitty On operationalizing the score: We'd recommend treating it as a relative ranking signal rather than a calibrated probability. The model ranks well (13x lift over random in the top 50), but the absolute probabilities shift across market regimes.
For anyone deploying this in practice, we'd suggest re-ranking the current cohort weekly rather than relying on absolute thresholds. Periodic retraining (quarterly, as new Crunchbase labels resolve) would help, and calibrating by sector makes sense given that Fintech/API categories convert at 3x the baseline while consumer categories are well below.
On F₀.₅ as primary metric: In VC deal-flow screening, false positives are more expensive than false negatives. A false positive means an analyst spends time on a company that won't raise (scarce capacity wasted). A false negative means missing a deal, but that's recoverable through other sourcing channels. F₀.₅ weights precision twice as heavily as recall, which matches that asymmetry. AP is reported as a threshold-free complement, but F₀.₅ at an optimized threshold is what we'd actually use in a weekly screening workflow.
Vela Partners
@curiouskitty @ihlamury
I'll chime in for F 0.5.
It's a conscious decision to optimize increasing true positives. If a VC fund has more true positives, then there could be more 100x'es.
It's also easier to increase recall by investing all companies, but VC funds has limited capital and have a problem of receiving deals sequentially.
So being right more often about true positives is closer to the real world.
launching soon, and this is exactly the tool i didnt know existed. the benchmarking before going live feels like the thing most founders skip and regret
How does the relationship mapping work in practice — does it pull from public data only, or can you connect your own deal flow and CRM? Curious where the edges are.
Vela Partners
@hirogure PHBench works entirely from Product Hunt launch signals (votes, rank, maker
profiles, topics, timing) combined with Crunchbase funding outcomes. There's no CRM integration or private deal flow.
The model uses 61 engineered features from these public signals to predict which launches will raise a Series A within 18 months. On our held-out test set, the champion model achieves 4.7x lift over random.
Connecting private deal flow or CRM data is an interesting direction for future work. It could significantly
improve predictions by incorporating signals not visible publicly. For now, though, the benchmark is intentionally public-data-only, which is accessible through Product Hunt APIs and Crunchbase, so results are reproducible and comparable across teams.
@ihlamury The 4.7x lift on held-out data is a strong signal on its own. Makes sense to keep it public-data-only for reproducibility — private CRM integration would make benchmarking much harder to standardize anyway.
Curious how you're weighting PH traction vs. founder signals. We launched a product on PH — day-of momentum was real, but the business outcome turned on team history, market timing, and relationships more than launch rank. PH is a distribution channel, not a proxy for business durability. Would love to see the false positive rate: how many 500+ vote launches failed to raise, and why. That's the interesting dataset.
Vela Partners
@thekrewYou're making an important point. PH traction alone isn't a proxy for business durability. The
model captures this too: vote count is one of 61 features, but rank, maker following, category, timing, and their interactions all contribute. A 500-vote consumer app on a Saturday scores very differently from a 500-vote B2B tool that hit #1 on a Tuesday with a well-followed maker team.
On false positives: in our dataset, the vast majority of high-vote launches did not raise a Series A. Of the 67,292 featured launches from 2019-2025, only 528 (0.78%) went on to raise one within 18 months. Most 500+ vote posts are in the "didn't raise" bucket. The model's job isn't to eliminate false positives, at this base rate that's impossible, but to meaningfully concentrate the true positives. The champion model surfaces them at 4.7x the base rate on held-out data.
The factors you mention (team history, market timing, relationships) are the kind of private signals PHBench intentionally excludes. The benchmark uses only public, structured PH signals so results are reproducible. We think of it as a floor: how far can you get without privileged information? The answer is surprisingly far, but there's clearly a ceiling that only private data can break through.
Full dataset is available on HuggingFace: https://huggingface.co/datasets/ihlamury/phbench