Let’s Take Things Step by Step, Shall We?

A scanner generates an image of the paper document and the text is intelligently “extracted” from that image. But what really happens? Can we be more specific about the recognition process?

Scanned image

Color image before the binarization       Black-and-white image after the binarization

Text and graphic zones on a scanned image

The OCR software extracts text information from the black-and-white pixels of the selected zones: it recognizes the shapes and assigns characters. This is done in several steps.

OCRed text

Scanned image with touching lines

Uppercase letter A in various typefaces

The stage is set, let’s now discuss the successive steps of the OCR process in detail!

Back to top

Submit feedback

Pin it          Tweet                    

Previous sectionNext page

Let’s take things step by step, shall we?Take us where the rainbow ends!B is for binarizeWhat gets read and what doesn’tLines, lineskew and drop lettersSegmenting words and charactersStylized fontsWhy is OCR software called omnifont?What’s the role of linguistics in the OCR process?

Home pageIntroScannersImagesHistoryOCRLanguagesAccuracyOutputBCRPen scannersSitemapSearchFeedback – Contact