TrainYourTesseract

A free font training tool for your OCR use case

2 followers

A free font training tool for your OCR use case

2 followers

Visit website

Launch tags:Android•iOS•Robots

Launch Team

FramerLaunch websites with enterprise needs at startup speeds.

Promoted

Mimo 2.0

It sounds like magic. What happens behind the scenes when you give it a font? Basically, how do you train the font?

Report

9yr ago

Anyline

Maker

@rodaescobar Haha yes, lots of magic happening there! Thanks for that question :D Basically, the font file is sent to our servers, where the usual training steps happen automatically. The uploaded font is used to "draw" images. Then a "box" is being wrapped around each character which defines the position of the symbol and the ASCII representation. This image + box combination is used to train the Tesseract model. The outcome is the trained font file, which lets Tesseract detect and classify the text with the chosen font on images! Hope this explains the magic :)

Report

9yr ago

awesome ! :D can you explain a little bit how this will help me with my tesseract training? will it decrease my time spending on training or will it eliminate the need to train it on my own?

Report

9yr ago

Anyline

Maker

@bernischaffer thx! 😄 it minimizes the time needed to train new symbols to just a few seconds! You provide the font file, and in a few moments you get a tesseract traineddata file via mail. Ready to use in your project. Hassle free! 🦄

Report

9yr ago

orat.io

Thanks for the hunt, @__tosh! I was wondering if you can train fonts for Tesseract on your own or do I need a tool for that?

Report

9yr ago

Anyline

Maker

@_bernhard @__tosh Hi Bernhard and thx for the great question! Kinda brings me to why we've built the whole tool! Training fonts on your own with Tesseract is quite a hassle. You'd need to download the whole Tesseract Training Tool Chain with all dependencies and compile it which takes a few hours - but a few hours for only one trained font file doesn't really pay off when looking at OCR implementations. Here is a link to a description of how Training Fonts for Tesseract would look like manually: https://github.com/tesseract-ocr... :) Some tools already do exist, none of them really works the way ours does: 1. Upload Font File, 2. Get back Trained Font File. Our devs actually always use the tool for our internal projects as well! Hope that helped!

Report

9yr ago

Awesome tool! Does it also work with Unicode fonts? I guess Arial Unicode MS would be an overkill?

Report

9yr ago

Anyline

Maker

@harald3dcv Hi Harald and thx for this great question. In theory yes, but there are some obstacles on the way. Currently we use a fixed set of characters to train i.e. lower-case / upper-case standard alphabet and a few common special characters. The process itself should also work with any (drawable) unicode symbol. We'll think about extending the training process in order to let the user define which symbols should be trained after all. Hassle free :) Hope this helps? :) Let me know if you need something specially trained for a project!

Report

9yr ago

@matthias_gasser Cool, do you know already what the maximum number of characters will be?

Report

9yr ago