Why I Built a YC Database at 2AM

I was prepping my YC application. I wanted to know: has anyone already done this idea?

YC's search gave me nothing useful. So I spent a weekend scraping their Algolia API, built a local database, ran embeddings against my pitch... and found 3 companies that had tried it.

Two failed. One pivoted and got acquired.

That search saved me from wasting a year. So I shipped it for everyone:

What would you do with over 6k YC pitches in a searchable vector database?

94 views

Add a comment

Replies

Best

Could relying heavily on similarity signals unintentionally discourage genuinely differentiated ideas?

What made you decide to build this instead of using existing YC search tools?

 search is very limited on yc page

when did you realize YC search wasn't enough?

 when i started validating my ideas!

YC has probably accumulated one of the largest datasets of startup ambition ever assembled. Searching it properly feels surprisingly underexplored.

 what i think as well. exploreyc is open sourced, so feel free to contirbute!

This is a very practical founder problem.

Before spending months building, it’s useful to know who already tried a similar idea, what direction they took, and whether they failed, pivoted, or found a niche.

I’d use a searchable YC database to compare positioning, find adjacent markets, and avoid building something that sounds new only because I haven’t searched deeply enough yet.

the top use case has to be reverse founder therapy. paste in your pitch, get back a list of the 4 people who already tried it plus how they failed. saves you 2 years and 200k. also free trauma.

Tried it with our space (link in bio) and noticed something that might help improve the search.

I intentionally wrote a description very similar to 's YC profile, but it didn't return Beacons (YC W19), even though it's probably one of the most relevant matches. Instead, I got a few less relevant results.

Might be worth looking into the embedding or retrieval pipeline, because I'd expect Beacons to show up near the top for that query.

Really cool idea though, I can definitely see this being useful for founders. (Attaching a screenshot from the YC Startup Directory showing Beacons.)

This would have saved me hours of random searchung. Having everything searchable in one place makes researchfar less painful.

ExploreYC is open-sourced, so you can take advantage of the data for your own tools/services: