Tidyverse - Tidyr and Dplyr
Tidyverse - Tidyr and Dplyr
dplyr
manipulate data
HELLO!
I am Elijah Appiah from
Ghana.
I am an Economist by
profession.
R Raw Data Leisure
You can reach me:
eappiah.uew@gmail.com
secret behind the smile!
3
Lesson Goal
Impactful guide to two important
packages of the tidyverse for data
manipulation – tidyr and dplyr.
4
tidyverse
tidyverse collection – R most
powerful packages for:
data manipulation
data analysis
data visualization.
5
tidyverse Collection
tidyr stringr
reshape and tidy data work with strings
dplyr forcats
manipulate data factors and categorical variables
tibble lubridate
tidy DataFrames dates and time
readr purrr
functional programming and
read data loops
6
tidyverse Collection
ggplot2
data visualization
7
tidyr
Messy Data
9
tidyr
Tidy data is in two forms:
Wide data
Long data
10
tidyr
Wide Data
Has column for each variable and a row for
each observation.
11
tidyr
Long Data
Has one column indicating type of
variable contained in that row, and
another column containing values of
each variable in the row.
12
tidyr
13
tidyr
Reshape Data
Wide to Long Data
pivot_longer()
tidyr
Wide to Long Data Format
pivot_longer(
data,
cols,
names_to = “name”,
values_to = “value”
)
15
tidyr
Long to Wide Data Format
pivot_wider(
data,
names_from = name,
values_from = value,
)
16
tidyr
DataFrame
airquality {base}
New York Air Quality Measurements
Description
Daily air quality measurements in New York, May to
September 1973.
17
tidyr
There are two more important
functions in tidyr.
unite() – unite columns
separate() – separate columns
19
tidyr
unite(
data, col, …, sep = “_”, remove = TRUE, na.rm = FALSE
)
data - dataset
col - new column name
… - columns to unite
sep - character to separate the united column values
remove = TRUE – remove input columns from output DataFrame
na.rm = FALSE – if TRUE, missing values will be removed before uniting each value
20
tidyr
separate(
data, col, into, sep = “[^[:alnum:]]+”, remove = TRUE
)
data - dataset
col – column name or position of column name
into – character vector of new column names
sep – separator between columns
remove = TRUE – remove input columns from output data frame
21
DATA MANIPULATION
23
dplyr
dplyr is primarily the
grammar of data manipulation.
dplyr
Common dplyr verbs:
select(): select columns
filter(): filter data based on conditions
mutate(): add new vars or transform vars
rename(): rename columns
arrange(): arrange data in asc./desc. order
summarise(): usually used with group_by()
to summarise data
25
dplyr
The pipe operator: %>% - link functions to data
E.g.
summary(mtcars)
mtcars %>% summary()
dplyr
select(data, …)
dplyr
filter(data, …)
dplyr
mutate(data, …)
dplyr
rename(data, …)
dplyr
group_by(data, …) %>%
summarise(…)
data %>%
group_by(…) %>%
summarise(…)
31
tidyr
DataFrame
wage1 {wooldridge}
Description
Data from the 1976 Current Population Survey,
collected by Henry Farber who was once in MIT.
32
THANKS!
Any questions?
Reach me anytime!
Email
eappiah.uew@gmail.com
LinkedIn
https://www.linkedin.com/in/appiah-elijah-383231123/