Cornell University-affiliated researchers can scan text from physical collections at Cornell University Library using Optical Character Recognition (OCR) software. OCR software examines a scanned image or document for text and creates a text-readable digital copy, often as a PDF, Microsoft Word document or Microsoft Excel sheet.
OCR software can be a time-saving alternative to manually transcribing typed text into a word processor. But OCR software is not perfect. Depending on the image quality of the original document and the corresponding scans you make, OCR software may not produce completely accurate renditions of the original document. It also does not work well for handwritten documents.
Learn about the OCR software available at Cornell University Library and how to produce high-quality scans for best OCR results.
If you have more specific questions for the PDFs (each case can be a little different depending on image quality, language, etc.), contact digitalcolab@cornell.edu.
Keep in mind the following tips on successful OCR scanning:
OCR software may have trouble reading the document if:
Two high-capacity OCR software to know are Tesseract and ABBYY FineReader. Learn more about their advantages and limitations below:
| OCR engine | Advantages | Limitations |
|---|---|---|
| Tesseract |
|
|
| ABBYY FineReader |
|
|