Digital and Empirical Methods for Studying Readership and Fandom.
View the Project on GitHub jawalsh/z604-z672-comic-books-and-their-readers-SP24
Regular expressions are a fancy way of matching patterns in text. This gives you a more powerful tool to do search and replace, which can be especially handy for cleaning data.
Like your normal search in MS Word or “find in page” in a browser, you type the string you are looking for. But instead of just the letters, numbers, punctuation, and symbols on your keyboard, you have other “symbols” you can type too in order to specify what you’re looking for. (Skip down to Examples to see some)
^
start of a string$
end of a string\b
word boundary.
any character[]
creates character class
[abc]
a single character of a, b, or c (change to make any characters you want)[a-z]
a character between a and z (inclusive)[a-zA-Z]
a character between a to z and A to Z (inclusive)[0-9]
numbers between 0 and 9 (inclusive)[^abc]
a character that is not a, b, or c[^a-z]
a character not in the range a to z (inclusive)()
creates pattern to match
a | b match a or b |
\s
any whitespace character
\d
any digit\D
any non-digit\w
any word character\W
any non-word character
Escaping: when you need to type a character that the computer thinks is part of the programming language (e.g. \
in this case) and make the computer understand that it is content (a literal). Put a \
in front of a character you need to escape
Use \\
to escape \
.
Use \/
to escape /
Use \.
to escape .
“a” here is meant to be any character
a?
0 or 1 of aa*
0 or more of a
.*
means “Match any character”a+
1 or more of aa{3}
exactly 3 of a in a rowa{3,}
3 or more of a in a rowa{3,6}
between exactly 3 and 6 of a in a row
Regex101 (Alex’s preference)
Regexr
\s$
\s{1}$
\s+$
(1 or more whitespaces before end of line)\s*$
(0 or more whitespaces before end of line)^\s+
/
at end of a line
\/ $
\/\s$
^T.*day$
(assuming list of days on separate lines)T[a-z]*day\b
(assuming are not on separate lines)[abc]
(abc)
(abc|ABC)
[ab]{2}
will match aa, bb, ab, ba and not ac, cc, bcabc{1,3}
will match abc abcc abccc
abc[1,3]
will match the bolded text here “abc abcc abccc abcccc”abc[1,3]\b
(have added word boundary) will match the bolded text here “abc abcc abccc abcccc”a(bc){1,2}
will match abc or abcbca(bc|cb)
will match abc or acbWikipedia: Regular Expressions
Regular Expressions Quick Start
Jonny Fox, Regex tutorial
Loyola Marymount Explanation of Regexes