Search the site:

Optical Character Recognition (OCR)

Optical Character Recognition (OCR)
Image courtesy of the LSE

Max is able to convert all typewritten text using optical character recognition (“OCR”). We convert paper-based records, microfilm or existing digital images into a searchable pdf format. Our specialised methods give exceptional accuracy, turning your off-line material into a searchable on-line resource.

We are able to handle jobs of all sizes and can work with all kinds of original materials, including bound volumes and broadsheet newspapers. We can output to a variety of formats, including PDF/A, text, MS-Word, XML and HTML.

Our sophisticated OCR system uses pattern recognition algorithms, which identify individual characters. A dictionary-based analysis then enables the system to deduce the content on a word-by-word basis, even where individual characters have not been picked up correctly. The OCR process recognises and retains content layout such as columns, tables and illustrations. This means that the document can be displayed in its original layout on the PDF whilst still being a fully searchable archive.

Some of the clients for whom we have undertaken large-scale OCR projects include:

  • London School of Economics
  • British Universities Film & Video Council
  • Anti-Slavery International Library
  • Greenwich University

For further details of the services we offer, please contact us on 020 8309 5445 or via our contact page.

Optical Character Recognition (OCR)

Max is able to convert all typewritten text using optical character recognition (“OCR”). We convert paper-based records, microfilm or existing digital images into a searchable pdf format. Our specialised methods give exceptional accuracy, turning your off-line material into a searchable on-line resource.

We are able to handle jobs of all sizes and can work with all kinds of original materials, including bound volumes and broadsheet newspapers. We can output to a variety of formats, including PDF/A, text, MS-Word, XML and HTML.

Our sophisticated OCR system uses pattern recognition algorithms, which identify individual characters. A dictionary-based analysis then enables the system to deduce the content on a word-by-word basis, even where individual characters have not been picked up correctly. The OCR process recognises and retains content layout such as columns, tables and illustrations. This means that the document can be displayed in its original layout on the PDF whilst still being a fully searchable archive.

Some of the clients for whom we have undertaken large-scale OCR projects include:

  • London School of Economics
  • British Universities Film & Video Council
  • Anti-Slavery International Library
  • Greenwich University

For further details of the services we offer, please contact us on 020 8309 5445 or via our contact page.

Optical Character Recognition
Image courtesy of the LSE