Week 8: text analysis
Summary
This week we will learn about imaging and OCR tools for converting images of comics, fanzines, and similar documents to searchable text, and we will learn about and use a few text analysis tools to analyze textual data about comics and comics readers.
Weekly Learning Objectives
- Use image-capture and OCR tools to convert images of comic book letters of comment pages into searchable text.
- Use Voyant to analyze a corpus of letters of comment.
- Use AntConc to analyze a corpus of letters of comment
- Use Mallet to generate topic models from a corpus of letters of comment.
Before class: Readings, resources, and tasks
Readings
- Webinar on text analysis with Voyant Tools by their developer Professor Geoffrey Rockwell
- Anthony, L. (2022). AncConc 4 Tutorials. Watch all the tutorials; in total they run about 90 minutes.
- Froehlich, H. (2015, 2022). Corpus Analysis with Antconc. Programming Historian.
- Underwood, T. (2012). Topic modeling made just simple enough. Retrieved from http://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/
- Goldstone, G., & Underwood, T. (2012). What can topic models of PMLA teach us about the history of literary scholarship? Retrieved from https://tedunderwood.com/2012/12/14/what-can-topic-models-of-pmla-teach-us-about-the-history-of-literary-scholarship/
- Graham, S., Weingart, S., & Milligan, I. (2012). Getting Started with Topic Modeling and MALLET. Retrieved from http://programminghistorian.org/lessons/topic-modeling-and-mallet
Discussion
In class
- Lecture and demo: In the first half of the class I will demo text analysis tools that you read about this week: Voyant, AntConc, and Mallet.
- Command-Line Basics:
- Lab work: In the second half of class use any or all of these tools to explore your own text corpus, the texts in our data folder in Canvas, or other texts you may find online.