Exploratory Data Analysis in R
Exploratory Data Analysis in R
categorical data
E X P L O R AT O R Y D ATA A N A LY S I S I N R
Andrew Bray
Assistant Professor, Reed College
Comics dataset
comics
# A tibble: 23,272 x 11
name id align
<fctr> <fctr> <fctr>
1 Spider-Man (Peter Parker) Secret Identity Good
2 Captain America (Steven Rogers) Public Identity Good
3 Wolverine (James \\"Logan\\" Howlett) Public Identity Neutral
4 Iron Man (Anthony \\"Tony\\" Stark) Public Identity Good
5 Thor (Thor Odinson) No Dual Identity Good
6 Benjamin Grimm (Earth-616) Public Identity Good
7 Reed Richards (Earth-616) Public Identity Good
8 Hulk (Robert Bruce Banner) Public Identity Good
9 Scott Summers (Earth-616) Public Identity Neutral
10 Jonathan Storm (Earth-616) Public Identity Good
# ... with 23,262 more rows, and 8 more variables: eye <fctr>,
# hair <fctr>, gender <fctr>, gsm <fctr>, alive <fctr>,
# appearances <int>, first_appear <fctr>, publisher <fctr>
levels(comics$id)
"No Dual" "Public" "Secret" "Unknown" # Note: NAs ignored by levels() function
table(comics$id, comics$align)
Andrew Bray
Assistant Professor, Reed College
From counts to proportions
options(scipen = 999, digits = 3) # Simplify display format
tab_cnt <- table(comics$id, comics$align)
tab_cnt
prop.table(tab_cnt)
sum(prop.table(tab_cnt))
prop.table(tab_cnt, 2)
Andrew Bray
Assistant Professor, Reed College
Marginal distribution
table(comics$id)