Michael Huang

DataSalon.ai - Discover datasets worth training on.

byโ€ข
A dataset discovery platform that aggregates and AI-enriches datasets from 40+ open data sources for AI/ML practitioners.

Add a comment

Replies

Best
Michael Huang
Maker
๐Ÿ“Œ
Hey Product Hunt! ๐Ÿ‘‹ I'm the maker of DataSalon โ€” a dataset discovery platform for ML practitioners. The problem: Finding the right training dataset takes hours. You open 6 browser tabs, read inconsistent descriptions, can't tell which is trustworthy, and still miss the ones you didn't know existed. And when you truly can't find it โ€” there's nowhere to ask. DataSalon solves this five ways: ๐ŸŒ One place for 40+ platforms โ€” Kaggle, Hugging Face, Zenodo, data.gov, Papers with Code and 35+ more, aggregated and deduplicated. No more tab-juggling. โญ Quality you can trust โ€” Every dataset scored on 4 dimensions (Description ยท Source ยท Reputation ยท Access). Spam filtered, duplicates merged. You only see what's earned its spot. ๐Ÿง  AI that fills the gaps โ€” Raw metadata is messy and uneven. Our AI pipeline normalizes every dataset into a clear title, structured summary, and unified taxonomy โ€” so 300K+ datasets finally speak the same language. ๐Ÿงญ From search to discovery โ€” Keyword search is table stakes. What you can't search for are the combinations โ€” "Synthetic weather data for autonomous driving perception", "Multi-lingual legal contracts with annotations". We surface those as curated Topics, so you find what you didn't know you needed. ๐Ÿค A community for data supply & demand โ€” Can't find it? Post a request. Have it? Post an offer. The community bridges the gap our aggregation can't cover. What's next: Quality radar charts, dataset subscriptions, and an agent interface so your AI assistants can query DataSalon directly. Shipping in beta โ€” feedback and "this platform is missing" shouts are all very welcome ๐Ÿ™ ๐Ÿ”— https://datasalon.ai