Nanonets OCR - Intelligent text extraction using OCR and deep learning

YC Female Founders Holiday Gift Guide 2017

•7yr ago

Transform unstructured, human-readable text into structured and validated data using OCR + Deep Learning to extract relevant information. Digitize everything from documents, PDFs to number plates and utility meters. Extract relevant info and key fields.

Replies

Best

Looks promising!! Are these ready to use APIs or do you always use custom models

Report

7yr ago

Nanonets

Maker

@anup_surana Thanks! Currently you build your own custom models with a handful of your data. We've seen that one size fits all models don't work out too well.

Report

7yr ago

Nanonets

Maker

Hello fellow hunters, Thank you for stopping by to have a look at Nanonets' OCR product. I'm one of the co-founders of Nanonets and I would like to give a quick overview of our OCR product. We set out to solve the problem of being able to simplify OCR integration into your product. Especially to automate manual data entry and validation processes in your pipelines. Through this integration, users can easily build production ready OCR models. To give you a little bit of background, Nanonets is a machine learning API for developers to integrate cutting edge ML into their products. Let me give you a quick walkthrough of this feature. 1. Assume you have a large number of invoices that are generated everyday. You have an entire team dedicated to digitizing and extracting key fields from these images. 2. With Nanonets, you can upload these images and teach your model what to look for. For eg: In invoices, you can build a model to extract the product names and prices. 3. Once your annotations are done and your model is built, integrating it is as easy as copying 2 lines of code :) I would urge you to take a look at the product webpage. We have built the product with a lot of passion and would love to have your feedback on it. Happy to answer any questions. Prathamesh

Report

7yr ago

UDAPTOR

Great job! The product looks awesome. On landing page you have examples of document in English and Czech, is Nanonets working with latin text only?

Report

7yr ago

Nanonets

Maker

@yevgeniy_pozdeyev Hey, glad you liked it! Nanonets works with most languages and not only the latin script. For eg: We support Mandarin and Japanese characters as well.

Report

7yr ago

UDAPTOR

@rushabh_nagda cool! 👍

Report

7yr ago

HOMERUN

This looks great! I'm assuming that I'll need to set up the model with a set of my pre-existing formatted documents? Is there a minimum number that's needed?

Report

7yr ago

Nanonets

Maker

@screenshake Thanks! 50 documents and you're good to go!

Report

7yr ago

Hey, great product. How long does it take to train a model after uploading the images?

Report

7yr ago

Nanonets

Maker

@earlctate It generally takes 30 mins - 3 hours. Currently, we're really backed up due to the PH traffic :)

Report

7yr ago

Isn't this template specific again? Or have you generalised it?

Report

7yr ago

Nanonets

Maker

@yash_agarwal8 Hey, it isn't template specific. So if you have say 50 sets of different document types containing similar data, we're able to pull it out for you. Hope this helps

Report

7yr ago

Does this also work for hand written documents?

Report

7yr ago

Nanonets

Maker

@shikhar_khanna2 Hey Shikhar, that's a great question. Given enough examples, we're definitely able to make it work on handwritten text.

Report

7yr ago

Atlassian

Just checked your app! Awesome! good luck!

Report

7yr ago

Nanonets

Maker

@unrealartemg Thanks!

Report

7yr ago

The most important questions that I have not seen addressed and are a must: are the accuracy (in %) of the OCR outputted before and after training and when there’s a sudden change in the placement of the fields that has not been part of the training set. There’s already plenty of software that addresses the same problem (some use AI others use a different approach) but what make all them unusable in real world scenarios, where data coherence is critical, is the % of failures which force to outsource to contractors in third word countries the manual/user review of all OCR output (as expensive as having the contractor enter the whole dataset).

Report

7yr ago

Awesome! Does it work for any specific file format or any image?

Report

7yr ago

Nanonets

Maker

@pramod_kk It works for most of image types. For a few document digitization customers, we have processed PDF's as well. Are you looking for some specific file format support?

Report

7yr ago

1 2