TrainYourTesseract

A free font training tool for your OCR use case

get it

Reviews

Discussion

You need to become a Contributor to join the discussion - Find out how.
26915
Rodrigo Escobar@rodaescobar · Senior iOS dev at innovation.rocks
It sounds like magic. What happens behind the scenes when you give it a font? Basically, how do you train the font?
8821
Carina WetzlhütterMaker@trackingcarina · customer experience at selma.io
@rodaescobar Haha yes, lots of magic happening there! Thanks for that question :D Basically, the font file is sent to our servers, where the usual training steps happen automatically. The uploaded font is used to "draw" images. Then a "box" is being wrapped around each character which defines the position of the symbol and the ASCII representation. This image + box combination is used to train the Tesseract model. The outcome is the trained font file, which lets Tesseract detect and classify the text with the chosen font on images! Hope this explains the magic :)
170279
Jakob Reiter@reiter_jakob
Nice Product, really handy! Just if I click 'Umlauts', are symbols like ñ etc included?
478246
Matthias GasserMaker@matthias_gasser · Head of Product at Anyline
@reiter_jakob thx! Umlauts are either lowercase äöü or uppercase ÄÖÜ or both. Depending on what you select :)
723408
Bernhard Schaffer@bernischaffer
awesome ! :D can you explain a little bit how this will help me with my tesseract training? will it decrease my time spending on training or will it eliminate the need to train it on my own?
478246
Matthias GasserMaker@matthias_gasser · Head of Product at Anyline
@bernischaffer thx! 😄 it minimizes the time needed to train new symbols to just a few seconds! You provide the font file, and in a few moments you get a tesseract traineddata file via mail. Ready to use in your project. Hassle free! 🦄
18936
Bernhard Hauser@_bernhard · CEO, oratio
Thanks for the hunt, @__tosh! I was wondering if you can train fonts for Tesseract on your own or do I need a tool for that?
8821
Carina WetzlhütterMaker@trackingcarina · customer experience at selma.io
@_bernhard @__tosh Hi Bernhard and thx for the great question! Kinda brings me to why we've built the whole tool! Training fonts on your own with Tesseract is quite a hassle. You'd need to download the whole Tesseract Training Tool Chain with all dependencies and compile it which takes a few hours - but a few hours for only one trained font file doesn't really pay off when looking at OCR implementations. Here is a link to a description of how Training Fonts for Tesseract would look like manually: https://github.com/tesseract-ocr... :) Some tools already do exist, none of them really works the way ours does: 1. Upload Font File, 2. Get back Trained Font File. Our devs actually always use the tool for our internal projects as well! Hope that helped!
503010
Harald Reingruber@harald3dcv · Visual Computing Engineer, Three10
Awesome tool! Does it also work with Unicode fonts? I guess Arial Unicode MS would be an overkill?
478246
Matthias GasserMaker@matthias_gasser · Head of Product at Anyline
@harald3dcv Hi Harald and thx for this great question. In theory yes, but there are some obstacles on the way. Currently we use a fixed set of characters to train i.e. lower-case / upper-case standard alphabet and a few common special characters. The process itself should also work with any (drawable) unicode symbol. We'll think about extending the training process in order to let the user define which symbols should be trained after all. Hassle free :) Hope this helps? :) Let me know if you need something specially trained for a project!
503010
Harald Reingruber@harald3dcv · Visual Computing Engineer, Three10
@matthias_gasser Cool, do you know already what the maximum number of characters will be?