Nanonets OCR - Intelligent text extraction using OCR and deep learning

Transform unstructured, human-readable text into structured and validated data using OCR + Deep Learning to extract relevant information. Digitize everything from documents, PDFs to number plates and utility meters. Extract relevant info and key fields.

Add a comment

Replies

Best
Looks promising!! Are these ready to use APIs or do you always use custom models
Thanks! Currently you build your own custom models with a handful of your data. We've seen that one size fits all models don't work out too well.
Hello fellow hunters, Thank you for stopping by to have a look at Nanonets' OCR product. I'm one of the co-founders of Nanonets and I would like to give a quick overview of our OCR product. We set out to solve the problem of being able to simplify OCR integration into your product. Especially to automate manual data entry and validation processes in your pipelines. Through this integration, users can easily build production ready OCR models. To give you a little bit of background, Nanonets is a machine learning API for developers to integrate cutting edge ML into their products. Let me give you a quick walkthrough of this feature. 1. Assume you have a large number of invoices that are generated everyday. You have an entire team dedicated to digitizing and extracting key fields from these images. 2. With Nanonets, you can upload these images and teach your model what to look for. For eg: In invoices, you can build a model to extract the product names and prices. 3. Once your annotations are done and your model is built, integrating it is as easy as copying 2 lines of code :) I would urge you to take a look at the product webpage. We have built the product with a lot of passion and would love to have your feedback on it. Happy to answer any questions. Prathamesh
Great job! The product looks awesome. On landing page you have examples of document in English and Czech, is Nanonets working with latin text only?
Hey, glad you liked it! Nanonets works with most languages and not only the latin script. For eg: We support Mandarin and Japanese characters as well.
cool! 👍
This looks great! I'm assuming that I'll need to set up the model with a set of my pre-existing formatted documents? Is there a minimum number that's needed?
Thanks! 50 documents and you're good to go!
Hey, great product. How long does it take to train a model after uploading the images?
It generally takes 30 mins - 3 hours. Currently, we're really backed up due to the PH traffic :)
Isn't this template specific again? Or have you generalised it?
Hey, it isn't template specific. So if you have say 50 sets of different document types containing similar data, we're able to pull it out for you. Hope this helps
Does this also work for hand written documents?
Hey Shikhar, that's a great question. Given enough examples, we're definitely able to make it work on handwritten text.
Just checked your app! Awesome! good luck!
The most important questions that I have not seen addressed and are a must: are the accuracy (in %) of the OCR outputted before and after training and when there’s a sudden change in the placement of the fields that has not been part of the training set. There’s already plenty of software that addresses the same problem (some use AI others use a different approach) but what make all them unusable in real world scenarios, where data coherence is critical, is the % of failures which force to outsource to contractors in third word countries the manual/user review of all OCR output (as expensive as having the contractor enter the whole dataset).
Awesome! Does it work for any specific file format or any image?
It works for most of image types. For a few document digitization customers, we have processed PDF's as well. Are you looking for some specific file format support?
12
Next
Last