Week 9: Text analysis lab
Summary
Today we will spend the class period using and experimenting with text analysis tools we learned about before the break.
Explore a text corpus using the text analysis tools we have learned about in class (Voyant, AntConc, Mallet), or other tools you may have found on your own. You may use your own text corpus that you may be assembling for your final project, the Amazing Spider-Man fanmail (“The Spider’s Web”) corpus, or another set of texts available on the Web. You may work alone or in small groups of two or three.
Another possible activity is to view your Mallet topic model results in Excel or another spreadsheet program and experiment with creating charts or visualizations of the data. If you are having difficulty installing or running Mallet on your computer, you may use these pre-generated topic models.
Weekly Learning Objectives
- explore one or more text corpora using a one or more text anlaysis tools
- acquire hands on experience with one or more text analysis tools
Before class: Readings, resources, and tasks
- review tools, readings, and tutorials from last week.
- prepare, for use in class, a text corpus relevant to your final project or another corpus of interest.
In class
With your own texts related to your final project or with the text corpora below, spend the class period exploring the texts with any or all of the tools listed below and sharing your findings with me and your fellow students.
Text Analysis Tools
- Voyant Tools
- AntConc
- Mallet
- See also this tutorial on topic modelling with Mallet by Shawn Graham, Scott Weingart, and Ian Milligan.
- Text analysis on the command line with common Unix/Linux tools from Unix for Poets.
- Named Entity Recognition with python
- Tutorial: The Best Way to do Named Entity Recognition (NER)
- Focus on the first two options (spaCy and NLTK) using open source tools; the final option uses the author’s API tool, which encourages a subscription.
- Named Entity Recognition (NER) with the Stanford Named Entity Recognizer
Comics-related text corpora
- “The Spider’s Web” fan mail from Amazing Spider-Man, 1964-1995: spiders_web_1963-1995_txt.zip
- Mallet topic modeling output of “The Spider’s Web” fan mail from Amazing Spider-Man, 1964-1995: asm_topic_model_mallet_output.zip