discussion
Would you recommend this product?
Ivan
Maker
Founder of Icons8
Bias in machine learning is a serious topic. As a producer of AI imagery, we pay special attention to these issues. Currently, most datasets used in industry and academia are extremely biased, and sadly, we are now seeing the consequences of those poorly constructed inputs. As they say, garbage-in, garbage-out. We have recently been working with universities and companies to help solve these issues with synthetic data. There is still more work to be done to improve our own generation capabilities, but we believe this is a firm step in the right direction. Balanced or gap-filling datasets We can generate both full datasets that are evenly distributed among race and gender, or we can provide you with supplementary data that can be used to even-out your existing data. Synthetic images We have specifically trained a new machine learning model to ensure that the photos we produce are not heavily biased towards any race or gender. Packed with detailed metadata and options for customizable backgrounds, these safe, synthetic datasets afford the utmost flexibility. No likeness rights, no royalties, no BS. Real images Does your training work require more than faces? Then don’t reinvent the wheel when you can use ours! We have a professional photography team that has captured a huge library of high-quality, licensed training images in our photo studio. Save the time and money of sourcing models, preparing sets, and hiring photographers. These datasets display a wide variety of poses, facial expressions, models and are available with masked backgrounds. Features - 175k+ high-resolution real studio portrait photos - up to 500k safe-to-use synthetic face photos - Specific datasets covering balanced races, ages, and emotions. - Full characteristic, position, and image metadata included - Available with backgrounds included or as precisely masked transparent PNGs. We are happy to also introduce a dataset that is free for academic use! ✨ If you are interested in much larger datasets, please contact us and we will be happy to work with you directly.
Share
Creating datasets to improve ML models wasn't exactly our intent when starting out last year, but it has been something customers have consistently asked about. I think it bears extra highlighting that by no means do we think we are we 'solving bias' with these datasets. Bias exists in every ML model out there and it is the responsibility of engineers and data scientists to actively avoid it. We are working on capturing significantly more training data to enable the generation of races and physical characteristics that we currently cannot produce. We shoot our own training data and sadly our photo studio has been closed for much of this year due to the pandemic. That said, we are confident that these datasets can still be very beneficial, especially when used in conjunction with existing data sources. During the beta period we have worked with many researchers and are now eager to extend this product to a larger audience. Aside: we launched the Anonymizer last week and have loved seeing all of the entertaining results that have been shared around.
Looks interesting