0% found this document useful (0 votes)
16 views33 pages

Exploratory Data Analysis in R

Uploaded by

xekare1271
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views33 pages

Exploratory Data Analysis in R

Uploaded by

xekare1271
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Exploring

categorical data
E X P L O R AT O R Y D ATA A N A LY S I S I N R

Andrew Bray
Assistant Professor, Reed College
Comics dataset
comics

# A tibble: 23,272 x 11
name id align
<fctr> <fctr> <fctr>
1 Spider-Man (Peter Parker) Secret Identity Good
2 Captain America (Steven Rogers) Public Identity Good
3 Wolverine (James \\"Logan\\" Howlett) Public Identity Neutral
4 Iron Man (Anthony \\"Tony\\" Stark) Public Identity Good
5 Thor (Thor Odinson) No Dual Identity Good
6 Benjamin Grimm (Earth-616) Public Identity Good
7 Reed Richards (Earth-616) Public Identity Good
8 Hulk (Robert Bruce Banner) Public Identity Good
9 Scott Summers (Earth-616) Public Identity Neutral
10 Jonathan Storm (Earth-616) Public Identity Good
# ... with 23,262 more rows, and 8 more variables: eye <fctr>,
# hair <fctr>, gender <fctr>, gsm <fctr>, alive <fctr>,
# appearances <int>, first_appear <fctr>, publisher <fctr>

EXPLORATORY DATA ANALYSIS IN R


Working with factors
levels(comics$align)

"Bad" "Good" "Neutral"


"Reformed Criminals"

levels(comics$id)

"No Dual" "Public" "Secret" "Unknown" # Note: NAs ignored by levels() function

table(comics$id, comics$align)

Bad Good Neutral Reformed Criminals


No Dual 474 647 390 0
Public 2172 2930 965 1
Secret 4493 2475 959 1
Unknown 7 0 2 0

EXPLORATORY DATA ANALYSIS IN R


EXPLORATORY DATA ANALYSIS IN R
EXPLORATORY DATA ANALYSIS IN R
Bar chart
library(ggplot2) # Load package
ggplot(comics, aes(x = id, fill = align)) +
geom_bar()

EXPLORATORY DATA ANALYSIS IN R


Let's practice!
E X P L O R AT O R Y D ATA A N A LY S I S I N R
Counts vs.
proportions
E X P L O R AT O R Y D ATA A N A LY S I S I N R

Andrew Bray
Assistant Professor, Reed College
From counts to proportions
options(scipen = 999, digits = 3) # Simplify display format
tab_cnt <- table(comics$id, comics$align)
tab_cnt

Bad Good Neutral


No Dual 474 647 390
Public 2172 2930 965
Secret 4493 2475 959
Unknown 7 0 2

prop.table(tab_cnt)

Bad Good Neutral


No Dual 0.030553 0.041704 0.025139
Public 0.140003 0.188862 0.062202
Secret 0.289609 0.159533 0.061815
Unknown 0.000451 0.000000 0.000129

sum(prop.table(tab_cnt))

EXPLORATORY DATA ANALYSIS IN R


Conditional proportions
prop.table(tab_cnt, 1)

Bad Good Neutral


No Dual 0.314 0.428 0.258
Public 0.358 0.483 0.159
Secret 0.567 0.312 0.121
Unknown 0.778 0.000 0.222

prop.table(tab_cnt, 2)

Bad Good Neutral


No Dual 0.066331 0.106907 0.168394
Public 0.303946 0.484137 0.416667
Secret 0.628743 0.408956 0.414076
Unknown 0.000980 0.000000 0.000864

EXPLORATORY DATA ANALYSIS IN R


EXPLORATORY DATA ANALYSIS IN R
EXPLORATORY DATA ANALYSIS IN R
EXPLORATORY DATA ANALYSIS IN R
Conditional bar chart
ggplot(comics, aes(x = id, fill = align)) +
geom_bar(position = "fill") +
ylab("proportion")

EXPLORATORY DATA ANALYSIS IN R


Conditional bar chart
ggplot(comics, aes(x = id, fill = align)) +
geom_bar(position = "fill") +
ylab("proportion")

EXPLORATORY DATA ANALYSIS IN R


Conditional bar chart
ggplot(comics, aes(x = id, fill = align)) +
geom_bar(position = "fill") +
ylab("proportion")

EXPLORATORY DATA ANALYSIS IN R


Conditional bar chart
ggplot(comics, aes(x = align, fill = id)) +
geom_bar(position = "fill") +
ylab("proportion")

EXPLORATORY DATA ANALYSIS IN R


Conditional bar chart
ggplot(comics, aes(x = align, fill = id)) +
geom_bar(position = "fill") +
ylab("proportion")

EXPLORATORY DATA ANALYSIS IN R


Let's practice!
E X P L O R AT O R Y D ATA A N A LY S I S I N R
Distribution of one
variable
E X P L O R AT O R Y D ATA A N A LY S I S I N R

Andrew Bray
Assistant Professor, Reed College
Marginal distribution
table(comics$id)

No Dual Public Secret Unknown


1511 6067 7927 9

tab_cnt <- table(comics$id, comics$align)


tab_cnt

Bad Good Neutral


No Dual 474 647 390
Public 2172 2930 965
Secret 4493 2475 959
Unknown 7 0 2

EXPLORATORY DATA ANALYSIS IN R


Simple barchart
ggplot(comics, aes(x = id)) +
geom_bar()

EXPLORATORY DATA ANALYSIS IN R


Faceting
tab_cnt <- table(comics$id, comics$align)
tab_cnt

Bad Good Neutral


No Dual 474 647 390
Public 2172 2930 965
Secret 4493 2475 959
Unknown 7 0 2

EXPLORATORY DATA ANALYSIS IN R


Faceted barcharts
ggplot(comics, aes(x = id)) +
geom_bar() +
facet_wrap(~align)

EXPLORATORY DATA ANALYSIS IN R


Faceting vs. stacking

EXPLORATORY DATA ANALYSIS IN R


Faceting vs. stacking

EXPLORATORY DATA ANALYSIS IN R


Faceting vs. stacking

EXPLORATORY DATA ANALYSIS IN R


Faceting vs. stacking

EXPLORATORY DATA ANALYSIS IN R


Faceting vs. stacking

EXPLORATORY DATA ANALYSIS IN R


Pie chart vs. bar chart

EXPLORATORY DATA ANALYSIS IN R


Pie chart vs. bar chart

EXPLORATORY DATA ANALYSIS IN R


Pie chart vs. bar chart

EXPLORATORY DATA ANALYSIS IN R


Let's practice!
E X P L O R AT O R Y D ATA A N A LY S I S I N R

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy