All activity
Victor Strandmoestarted a discussion
Democratizing dataset influence on model performance
AI teams are data constrained, not model constrained and waste millions retraining models on data with little or negative impact. They spend most of their budget collecting, processing, and labeling data without knowing what actually improves performance. This leads to repeated failed retraining cycles, wasted GPU runs, and slow iteration because teams lack insights in which datasets improve...
Victor Strandmoeleft a comment
Hi. We made a tool for this using guided influence benchmarking in Dowser. Its meant as a one stop shop for measuring impact of training data on LM. Note: limited to LM not LLM at the moment. Feel free to give it a shot.
Dowser doesn’t just clean or label data. It directly trains and benchmarks models to prove which datasets help or hurt performance. Using influence guided training, it produces confident influence scores in minutes on commodity hardware across Huggingface datasets. Teams get precise guidance on what data actually moves the model before spending GPU budget.
After the benchmarks are completed, you may use the app to upload your model to Hugginface

DowserFind the right data, optimize training, ship models fast
Victor Strandmoeleft a comment
We built Dowser because AI teams keep wasting time and GPU retraining on data that doesn’t help or actively hurts models. Dowser uses influence guided training to show which datasets actually improve performance before you retrain. It benchmarks models directly and gives results in minutes on cheap hardware. Happy to answer questions about influence methods, dataset selection, or where this...

DowserFind the right data, optimize training, ship models fast
