Radically efficient machine teaching

Prodigy is a new annotation tool for creating training and evaluation data for machine learning models. It comes with an extensible, self-hosted back-end, active learning-powered models that update as you annotate, and a modern web application that helps you stay focused.

Would you recommend this product?
8 Reviews5.0/5
Thanks for hunting Prodigy – happy to share it with the ProductHunt community today! ✨ My co-founder Matt and I are mostly known for our open-source software – we're the makers of spaCy, a popular Python library for Natural Language Processing. Prodigy is our first commercial product. We started beta testing it earlier this year, and hundreds of testers and five versions later, v1.0 is finally ready. Prodigy addresses one of the biggest problem that AI developers have been facing: sooner or later, you'll always need labelled data – whether it's for training a new model, improving an existing model's prediction or just for evaluation. And in many cases, you won't know whether an idea works until you try it (and usually, that involves writing lengthy annotation manuals and scheduling too many meetings). Prodigy lets developers and data scientists iterate on both the code *and* data by putting the model in the loop to suggest the most relevant examples, and reducing the annotation to a simple, binary decision: acccept or reject. To see Prodigy in action, you can try the live demo: Or read more about the philosophy here: Btw: Prodigy is a downloadable tool that you extend with code – it runs on your own hardware, and no data ever needs to leave your servers.
@_inesmontani congrats on shipping Prodigy! Looks really neat :) Is it mostly focused around NLP type data or can I also use it to annotate e.g. images?
@metakermit Thanks! The built-in models are focused on NLP, since this is what we know best and where we have the best answers. However, Prodigy comes with interfaces for image classification, object detection and image segmentation. There's also a built-in recipe that lets you test the image annotation using one of the YOLOv2 models for object detection. See here for examples: You can easily plug in your own models using custom recipes (simple Python functions). All you need is a method that updates the model, and one that assigns predictions to incoming examples. As we started testing the image capabilities, we realised that the results were actually very promising – so a robust, built-in image model is definitely high on our list of future features 😊
@_inesmontani awesome, thanks! Now I really want to try it out and throw my meticulously organised image folders and text files out of the window :)
@_inesmontani Congrats on shipping! It is a truly excellent tool to get started with NLP! We participated in the beta, and it is amazing what you can accomplish in a day. And, special mention for the excellent support!
@redevries Thanks, it's really nice to hear that Prodigy's already making a difference. And thanks for taking part in the beta! Shipping Prodigy so soon definitely wouldn't have been possible without our beta testers and the discussions on the support forum 🙏
Awesome tool! Congrats on shipping it. How hard is it to train it on a new language? I am training a classifier on finding similar documents, and I am facing a hard time labeling the data. This tool might be what I was looking for!
@firasalmanna Training on new languages should be no problem. It's easy to train word vectors, and you can create a term list to get you started, like in the insult classifier example. Starting with a new language for entity recognition is a bit harder. But to find similar documents, there should be no problem!
@firasalmanna To add to @honnibal's comment, here are the relevant docs for training vectors and creating terminology lists with Prodigy: If the language you're working with is already supported by spaCy out-of-the-box, you can use this as the basis for your model. Otherwise, you can always add your own, or start with the 'xx' Language class, which is language-neutral and only includes the most basic tokenization rules. Once you have vectors and a terminology list, you can use the textcat.teach recipe to start training your classifier. We've also recorded a video tutorial that shows the whole end-to-end workflow: