reveal.js

# Week 8: Text Analysis  
### Voyant, AntConc, MALLET  
John A. Walsh

---

## Framing the Session

Today is about the **questions** we can ask—and sometimes answer—with these tools.

Notes:
> Say this slowly. Then:
“And we’ll see that different tools make different kinds of answers possible.”

---

## Three Approaches

- **Voyant** → exploratory, macro patterns  
- **AntConc** → precise, contextual analysis  
- **MALLET** → latent structure (topics)

Notes:
> Emphasize: these are not just tools—they are different ways of seeing text.

---

## A Working Question

> What can we learn about **Spider-Man readers** from their letters?

Notes:
> Keep this as a throughline. Return to it later.

---

## Key Principle

**Frequency ≠ meaning**

Notes:
> Give a quick example:
Just because “Spider-Man” is frequent doesn’t tell us anything interesting.

---

## Another Principle

**Context = interpretation**

Notes:
> This sets up AntConc. Words only matter in use.

---

## Voyant

### What does it show us?

- Frequent words  
- Patterns across documents  
- Quick comparisons

---

## Voyant: What to Look For

- Repeated vocabulary  
- Shifts across subsets  
- Evaluative language (“great,” “love,” etc.)

---

## Voyant: Limits

- Flattens context  
- Overemphasizes frequency  
- Can feel like a “black box”

Notes:
> Don’t dwell—just plant the idea.

---

## AntConc

### What does it show us?

- Words in context (concordance)  
- Collocations  
- Repeated phrases (clusters)

---

## AntConc: What to Look For

- How words are used  
- Recurring phrases  
- Patterns of reader language

---

## AntConc: Advantage

**Precision**

- You control the query  
- You see the evidence  
- You can justify claims

---

## Transition

> Now we move from patterns → arguments

Notes:
> This bridges into discussion and later MALLET.

---

## Synthesis Question

> If you had to make an argument about Spider-Man readers, what would it be?

---

## Enter Topic Modeling

Notes:
> Pause. Shift tone slightly—new conceptual move.

---

## What is a Topic?

A **topic** is a group of words that frequently appear together.

---

## Important

Topics are **not** themes.

They are **statistical patterns** we interpret as themes.

Notes:
> This is the most important slide in this section.

---

## How It Works (Very Simplified)

- Input: documents (letters)  
- Process: model finds word co-occurrence patterns  
- Output:
  - Topics (word lists)  
  - Documents as mixtures of topics

---

## The Key Move

> The computer gives us word clusters.  
> We decide what they mean.

---

## mallet commands

```
mallet import-dir --input years --output years.mallet --keep-sequence --remove-stopwords
```

```
mallet train-topics \
  --input years.mallet \
  --num-topics 30 \
  --optimize-interval 20 \
  --optimize-burn-in 50 \
  --num-top-words 20 \
  --output-state years-topic-state.gz \
  --output-topic-keys years_topic_words.tsv \
  --output-doc-topics years_topic_distribution.tsv
```

---

## What Could Go Wrong?

- Topics that don’t make sense  
- Mixed or incoherent clusters  
- Too many common words

---

## Your Task

You will:

1. Interpret topics  
2. Label them  
3. Track them over time  
4. Evaluate the model

---

## Phase 1: Reading Topics

For each topic:

- What is this about?
  - Characters?
  - Events?
  - Reader opinions?

---

## Phase 2: Labeling Topics

Give each topic a **2–4 word label**

Examples:
- “Gwen Stacy death”  
- “Villains and crime”  
- “Fan evaluation language”

---

## Phase 3: Topics Over Time

- When is your topic strongest?  
- Does it rise or fall?

---

## Phase 4: Critical Turn

- Which topics make sense?  
- Which don’t?  
- Why?

---

## Key Question

> Are these really “themes”?

---

## Final Reflection

> What is the difference between:
> - a statistical pattern  
> - and an interpretive claim?

---

## Takeaways

- Topic modeling produces **patterns, not meanings**  
- Interpretation is still required  
- Some topics are useful; others are artifacts

---

## Exit Question

> What did you learn about how tools shape interpretation?