Week 4: Text Objects, Part 1
We will learn about the digitization of text-based media, or “text objects,” like books, manuscripts, comics, notebooks, etc. We will learn about character encodings, optical character recognition (OCR), and typical workflows for text digitization.
Weekly Learning Objectives
- distinguish between a “textual image” and a “text file” and explain the functional differences between the two
- define character encoding, ASCII, and Unicode
- use tesseract to perform OCR on image files
- diagram a typical text digitization workflow
Before class: Readings, Resources, and Tasks