What is optical character recognition (OCR) and how does BHL use it?

Optical Character Recognition, typically referred to simply as “OCR,” is the process of converting images of text into machine readable text characters. This process is performed by special software such as ABBYY FineReader (https://www.abbyy.com/en-us/finereader/).

BHL uses OCR to process all the page images in our collection so that the text contained within the images can be indexed and made searchable in support of full text search functionality and the taxonomic name finding algorithm.

Tags: search, text mining, data mining, text recognition