Optical Character Recognition (OCR)...

Optical Character Recognition - OCR
Optical Character Recognition (OCR) in progress

Optical Character Recognition (OCR)

Max is able to convert all typewritten text using optical character recognition (“OCR”). We convert paper-based records, microfilm or existing digital images into a searchable .pdf format. Our specialised methods give exceptional accuracy, turning your off-line material into a searchable on-line resource.

We are able to handle jobs of all sizes and can work with all kinds of original materials, including bound volumes and broadsheet newspapers. We can output to a variety of formats, including PDF/A, text, MS-Word, XML and HTML.

Our sophisticated OCR system uses pattern recognition algorithms, which identify individual characters. A dictionary-based analysis then enables the system to deduce the content on a word-by-word basis, even where individual characters have not been picked up correctly. The OCR process recognises and retains content layout such as columns, tables and illustrations. This means that the document can be displayed in its original layout on the PDF whilst still being a fully searchable archive.

Some of the clients for whom we have undertaken large-scale OCR projects include:

  • London School of Economics
  • British Universities Film & Video Council
  • Anti-Slavery International Library
  • Greenwich University

Testimonials

Max Communications and the RSA Archive have a successful, long standing working relationship. We have undertaken several digitisation projects together to scan our numerous large sized artworks, prints and diagrams. More recently we successfully migrated to Max’s archive management and digital preservation services, DRYAD and SOTERIA. I cannot fault either the consistent quality of the digitisation or the quality of professional service provided and would recommend, Max Communications, without reservation.

--Eve Watson | Head of Archive | The Royal Society of Arts
Optical Character Recognition - OCR
Optical Character Recognition (OCR) in progress