SimRepo adds a “Similar Repositories” section to GitHub, helping you discover related projects without leaving the page.
Technically, it works by embedding repositories into a vector space trained on over 300M GitHub stars. Repositories with similar stargazer patterns end up close to each other, so recommendations are made by finding the nearest neighbors in that space.
SimRepo uses Qdrant, a vector database that handles nearest-neighbor lookups efficiently. This lets SimRepo return relevant results fast, without overloading your machine.
The extension is open source (GPL v3) and available for all major browsers.
Would love feedback on: - The quality of recommendations - Performance across different browsers - Ideas for improving the homepage and recommendations
Thanks for checking it out! I'm happy to answer any questions!
@mubelotix given that it takes days to train this, how do you handle freshness or avoid staleness? GitHub is such a dynamic place that many repos with 150+ stars might also be years old and less maintained…
That’s a great point, and yes freshness is definitely one of the trickier challenges. I’m planning to partly refresh the dataset on a monthly cycle. That way, completely new repos get picked up quickly, and every existing entry gets updated at least once a year. Until recently I didn’t have the infrastructure to support that kind of regular retraining, which is why updates haven’t been as frequent as I’d like. Now that the pieces are in place, I’m working toward making those monthly refreshes the standard
@chrismessina You need around 70GB of RAM and a decent CPU. That's not much for any company, but I'm only a student. There are probably plenty of other use cases for this but I didn't dig into them yet. Feel free to suggest ideas!
@chrismessina Great question! I wasn't the first one to try this, but everyone else has been using repo metadata for recommendations. That includes tags, keywords, languages, dates... SimRepo focuses on user interactions instead (stars, forks, watches). I havn't see anyone do this. Maybe they assumed it wasn't possible to crawl enough of Github
Report
Nice work! Using stargazer patterns instead of just tags/topics is a smart approach.
Quick one: why the 150-star minimum? Is that where the signal gets reliable enough, or more of a performance thing?
@vouchy Great question. Repositories with fewer than ~150 stars tend to produce much noisier patterns. Below that threshold, the stargazer graph is often too sparse to reliably extract meaningful similarity signals, so the recommendations degrade pretty quickly.
On the other side, there are already over 300,000 repos with 150+ stars, and loading + processing all of them still takes several days, so the cutoff also helps keep the system computationally manageable for now.
That said, I’m actively working on reducing this limitation
Report
Are they comparing based on the repository’s READ.me and technology set?
Report
congrats on the launch ! my question- 1.how you handling freshness
2.is it saas tool? like how you try to earn from this?
Nice idea. I lose hours hopping between repos. The stargazer angle is neat. How does it do with niche or new repos with few stars? Trying it on Firefox tonight on my beat-up laptop. Open source helps.
@alexcloudstar Thanks! For niche or very new repos, the signal is weaker but you can still stumble onto interesting finds. The extension itself should run smoothly on your laptop since the heavy lifting happens on a Qdrant server. Open-source is the way!
Report
Looks slick! Just added the extension and starred the repo.
@tleyden Thank you for your input, I will definitely look into getting that working. Until then, you can hide this notice in the settings if you’d like.
Report
@mubelotix No problem. Yes it might cool if the first time I stumbled across a repo with less than 150 stars, it queued it up for indexing. That could at least limit the indexed repos with < 150 stars to those that your users are looking at.
@abod_rehman Not very well, to be honest. Very new or low-star repos usually get their stars from a single source, which creates a skewed signal. The model ends up grouping them by who shared them rather than by what they actually are. Results around ~150 stars are noticeably lower-quality.
SimRepo
Hey everyone 👋
I’m the creator of SimRepo.
SimRepo adds a “Similar Repositories” section to GitHub, helping you discover related projects without leaving the page.
Technically, it works by embedding repositories into a vector space trained on over 300M GitHub stars. Repositories with similar stargazer patterns end up close to each other, so recommendations are made by finding the nearest neighbors in that space.
SimRepo uses Qdrant, a vector database that handles nearest-neighbor lookups efficiently. This lets SimRepo return relevant results fast, without overloading your machine.
The extension is open source (GPL v3) and available for all major browsers.
Would love feedback on:
- The quality of recommendations
- Performance across different browsers
- Ideas for improving the homepage and recommendations
Thanks for checking it out! I'm happy to answer any questions!
Raycast
SimRepo
@chrismessina
That’s a great point, and yes freshness is definitely one of the trickier challenges. I’m planning to partly refresh the dataset on a monthly cycle. That way, completely new repos get picked up quickly, and every existing entry gets updated at least once a year. Until recently I didn’t have the infrastructure to support that kind of regular retraining, which is why updates haven’t been as frequent as I’d like. Now that the pieces are in place, I’m working toward making those monthly refreshes the standard
Raycast
@mubelotix nice — what kind of infra is necessary to do this kind of analysis? Are you using that analysis for other purposes?
SimRepo
@chrismessina You need around 70GB of RAM and a decent CPU. That's not much for any company, but I'm only a student. There are probably plenty of other use cases for this but I didn't dig into them yet. Feel free to suggest ideas!
Raycast
@mubelotix also, how does this relate to GitRec (built on Gorse) and GitHub Recommender?
SimRepo
@chrismessina Great question! I wasn't the first one to try this, but everyone else has been using repo metadata for recommendations. That includes tags, keywords, languages, dates... SimRepo focuses on user interactions instead (stars, forks, watches). I havn't see anyone do this. Maybe they assumed it wasn't possible to crawl enough of Github
SimRepo
@vouchy Great question. Repositories with fewer than ~150 stars tend to produce much noisier patterns. Below that threshold, the stargazer graph is often too sparse to reliably extract meaningful similarity signals, so the recommendations degrade pretty quickly.
On the other side, there are already over 300,000 repos with 150+ stars, and loading + processing all of them still takes several days, so the cutoff also helps keep the system computationally manageable for now.
That said, I’m actively working on reducing this limitation
Are they comparing based on the repository’s READ.me and technology set?
Makers Page
Nice idea. I lose hours hopping between repos. The stargazer angle is neat. How does it do with niche or new repos with few stars? Trying it on Firefox tonight on my beat-up laptop. Open source helps.
SimRepo
@alexcloudstar Thanks! For niche or very new repos, the signal is weaker but you can still stumble onto interesting finds. The extension itself should run smoothly on your laptop since the heavy lifting happens on a Qdrant server. Open-source is the way!
Looks slick! Just added the extension and starred the repo.
SimRepo
@tleyden Thank you!
@mubelotix Super easy to get working!
But it would be cool if it could index "star challenged" repos, but I get it that the storage costs will go up.
SimRepo
@tleyden Thank you for your input, I will definitely look into getting that working. Until then, you can hide this notice in the settings if you’d like.
@mubelotix No problem. Yes it might cool if the first time I stumbled across a repo with less than 150 stars, it queued it up for indexing. That could at least limit the indexed repos with < 150 stars to those that your users are looking at.
Triforce Todos
How does it handle very new repos with few stars? Do recommendations still work well, or is it more tuned to popular projects?
SimRepo
@abod_rehman Not very well, to be honest. Very new or low-star repos usually get their stars from a single source, which creates a skewed signal. The model ends up grouping them by who shared them rather than by what they actually are. Results around ~150 stars are noticeably lower-quality.