0% found this document useful (0 votes)
106 views33 pages

Tidyverse - Tidyr and Dplyr

Tidyr is used for reshaping and tidying data, specifically converting between wide and long data formats using pivot_longer() and pivot_wider(). Dplyr is used for manipulating data through common verbs like select(), filter(), mutate(), rename(), arrange(), and summarise(), often used with the pipe operator (%>%) to link functions together. Both packages are part of the tidyverse collection of packages for data science in R.

Uploaded by

APPIAH ELIJAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views33 pages

Tidyverse - Tidyr and Dplyr

Tidyr is used for reshaping and tidying data, specifically converting between wide and long data formats using pivot_longer() and pivot_wider(). Dplyr is used for manipulating data through common verbs like select(), filter(), mutate(), rename(), arrange(), and summarise(), often used with the pipe operator (%>%) to link functions together. Both packages are part of the tidyverse collection of packages for data science in R.

Uploaded by

APPIAH ELIJAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

tidyr

reshape and tidy data

dplyr
manipulate data

TIDYR & DPLYR


2

HELLO!
I am Elijah Appiah from
Ghana.
I am an Economist by
profession.
R  Raw Data  Leisure
You can reach me:
eappiah.uew@gmail.com
secret behind the smile!
3

Lesson Goal
Impactful guide to two important
packages of the tidyverse for data
manipulation – tidyr and dplyr.
4

tidyverse
tidyverse collection – R most
powerful packages for:
data manipulation
data analysis
data visualization.
5

tidyverse Collection
tidyr stringr
reshape and tidy data work with strings

dplyr forcats
manipulate data factors and categorical variables

tibble lubridate
tidy DataFrames dates and time

readr purrr
functional programming and
read data loops
6

tidyverse Collection
ggplot2
data visualization
7

RESHAPE & TIDY DATA


8

tidyr
Messy Data
9

tidyr
Tidy data is in two forms:
Wide data
Long data
10

tidyr
Wide Data
Has column for each variable and a row for
each observation.
11

tidyr
Long Data
Has one column indicating type of
variable contained in that row, and
another column containing values of
each variable in the row.
12

tidyr
13

tidyr
Reshape Data
Wide to Long Data
pivot_longer()

Long to Wide Data


pivot_wider()
14

tidyr
Wide to Long Data Format
pivot_longer(
data,
cols,
names_to = “name”,
values_to = “value”
)
15

tidyr
Long to Wide Data Format
pivot_wider(
data,
names_from = name,
values_from = value,
)
16

tidyr
DataFrame
airquality {base}
New York Air Quality Measurements

Description
Daily air quality measurements in New York, May to
September 1973.
17

Now, let’s practice


18

tidyr
There are two more important
functions in tidyr.
unite() – unite columns
separate() – separate columns
19

tidyr
unite(
data, col, …, sep = “_”, remove = TRUE, na.rm = FALSE
)
data - dataset
col - new column name
… - columns to unite
sep - character to separate the united column values
remove = TRUE – remove input columns from output DataFrame
na.rm = FALSE – if TRUE, missing values will be removed before uniting each value
20

tidyr
separate(
data, col, into, sep = “[^[:alnum:]]+”, remove = TRUE
)
data - dataset
col – column name or position of column name
into – character vector of new column names
sep – separator between columns
remove = TRUE – remove input columns from output data frame
21

Now, let’s practice


22

DATA MANIPULATION
23

dplyr
dplyr is primarily the
grammar of data manipulation.

Functional verbs for data


manipulation.
24

dplyr
Common dplyr verbs:
select(): select columns
filter(): filter data based on conditions
mutate(): add new vars or transform vars
rename(): rename columns
arrange(): arrange data in asc./desc. order
summarise(): usually used with group_by()
to summarise data
25

dplyr
The pipe operator: %>% - link functions to data
E.g.
summary(mtcars)
mtcars %>% summary()

mean(mtcars$mpg, na.rm = TRUE)


mtcars$mpg %>% mean(na.rm = TRUE)
26

dplyr
select(data, …)

data %>% select(…)


27

dplyr
filter(data, …)

data %>% filter(…)


28

dplyr
mutate(data, …)

data %>% mutate(…)


29

dplyr
rename(data, …)

data %>% rename(…)


30

dplyr
group_by(data, …) %>%
summarise(…)

data %>%
group_by(…) %>%
summarise(…)
31

tidyr
DataFrame
wage1 {wooldridge}

Description
Data from the 1976 Current Population Survey,
collected by Henry Farber who was once in MIT.
32

Now, let’s practice


33

THANKS!
Any questions?
Reach me anytime!
Email
eappiah.uew@gmail.com
LinkedIn
https://www.linkedin.com/in/appiah-elijah-383231123/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy