# Week 8: Text Analysis ### Voyant, AntConc, MALLET John A. Walsh --- ## Framing the Session Today is about the **questions** we can ask—and sometimes answer—with these tools. Notes: > Say this slowly. Then: “And we’ll see that different tools make different kinds of answers possible.” --- ## Three Approaches - **Voyant** → exploratory, macro patterns - **AntConc** → precise, contextual analysis - **MALLET** → latent structure (topics) Notes: > Emphasize: these are not just tools—they are different ways of seeing text. --- ## A Working Question > What can we learn about **Spider-Man readers** from their letters? Notes: > Keep this as a throughline. Return to it later. --- ## Key Principle **Frequency ≠ meaning** Notes: > Give a quick example: Just because “Spider-Man” is frequent doesn’t tell us anything interesting. --- ## Another Principle **Context = interpretation** Notes: > This sets up AntConc. Words only matter in use. --- ## Voyant ### What does it show us? - Frequent words - Patterns across documents - Quick comparisons --- ## Voyant: What to Look For - Repeated vocabulary - Shifts across subsets - Evaluative language (“great,” “love,” etc.) --- ## Voyant: Limits - Flattens context - Overemphasizes frequency - Can feel like a “black box” Notes: > Don’t dwell—just plant the idea. --- ## AntConc ### What does it show us? - Words in context (concordance) - Collocations - Repeated phrases (clusters) --- ## AntConc: What to Look For - How words are used - Recurring phrases - Patterns of reader language --- ## AntConc: Advantage **Precision** - You control the query - You see the evidence - You can justify claims --- ## Transition > Now we move from patterns → arguments Notes: > This bridges into discussion and later MALLET. --- ## Synthesis Question > If you had to make an argument about Spider-Man readers, what would it be? --- ## Enter Topic Modeling Notes: > Pause. Shift tone slightly—new conceptual move. --- ## What is a Topic? A **topic** is a group of words that frequently appear together. --- ## Important Topics are **not** themes. They are **statistical patterns** we interpret as themes. Notes: > This is the most important slide in this section. --- ## How It Works (Very Simplified) - Input: documents (letters) - Process: model finds word co-occurrence patterns - Output: - Topics (word lists) - Documents as mixtures of topics --- ## The Key Move > The computer gives us word clusters. > We decide what they mean. --- ## mallet commands ``` mallet import-dir --input years --output years.mallet --keep-sequence --remove-stopwords ``` ``` mallet train-topics \ --input years.mallet \ --num-topics 30 \ --optimize-interval 20 \ --optimize-burn-in 50 \ --num-top-words 20 \ --output-state years-topic-state.gz \ --output-topic-keys years_topic_words.tsv \ --output-doc-topics years_topic_distribution.tsv ``` --- ## What Could Go Wrong? - Topics that don’t make sense - Mixed or incoherent clusters - Too many common words --- ## Your Task You will: 1. Interpret topics 2. Label them 3. Track them over time 4. Evaluate the model --- ## Phase 1: Reading Topics For each topic: - What is this about? - Characters? - Events? - Reader opinions? --- ## Phase 2: Labeling Topics Give each topic a **2–4 word label** Examples: - “Gwen Stacy death” - “Villains and crime” - “Fan evaluation language” --- ## Phase 3: Topics Over Time - When is your topic strongest? - Does it rise or fall? --- ## Phase 4: Critical Turn - Which topics make sense? - Which don’t? - Why? --- ## Key Question > Are these really “themes”? --- ## Final Reflection > What is the difference between: > - a statistical pattern > - and an interpretive claim? --- ## Takeaways - Topic modeling produces **patterns, not meanings** - Interpretation is still required - Some topics are useful; others are artifacts --- ## Exit Question > What did you learn about how tools shape interpretation?