@rodaescobar Haha yes, lots of magic happening there! Thanks for that question :D
Basically, the font file is sent to our servers, where the usual training steps happen automatically.
The uploaded font is used to "draw" images. Then a "box" is being wrapped around each character which defines the position of the symbol and the ASCII representation. This image + box combination is used to train the Tesseract model.
The outcome is the trained font file, which lets Tesseract detect and classify the text with the chosen font on images! Hope this explains the magic :)
@_bernhard@__tosh Hi Bernhard and thx for the great question! Kinda brings me to why we've built the whole tool!
Training fonts on your own with Tesseract is quite a hassle. You'd need to download the whole Tesseract Training Tool Chain with all dependencies and compile it which takes a few hours - but a few hours for only one trained font file doesn't really pay off when looking at OCR implementations.
Here is a link to a description of how Training Fonts for Tesseract would look like manually: https://github.com/tesseract-ocr... :)
Some tools already do exist, none of them really works the way ours does: 1. Upload Font File, 2. Get back Trained Font File. Our devs actually always use the tool for our internal projects as well! Hope that helped!
Report
awesome ! :D can you explain a little bit how this will help me with my tesseract training? will it decrease my time spending on training or will it eliminate the need to train it on my own?
@bernischaffer thx! 😄 it minimizes the time needed to train new symbols to just a few seconds! You provide the font file, and in a few moments you get a tesseract traineddata file via mail. Ready to use in your project. Hassle free! 🦄
Report
Awesome tool! Does it also work with Unicode fonts? I guess Arial Unicode MS would be an overkill?
@harald3dcv Hi Harald and thx for this great question.
In theory yes, but there are some obstacles on the way. Currently we use a fixed set of characters to train i.e. lower-case / upper-case standard alphabet and a few common special characters. The process itself should also work with any (drawable) unicode symbol. We'll think about extending the training process in order to let the user define which symbols should be trained after all. Hassle free :)
Hope this helps? :) Let me know if you need something specially trained for a project!
Report
@matthias_gasser Cool, do you know already what the maximum number of characters will be?
Report
Hi, i don't understand how to get the tool, do we have to contact you ?
Mimo 2.0
Anyline
orat.io
Anyline
Anyline
Anyline