0% found this document useful (0 votes)

5 views4 pages

Week2 Cheat Sheet Data Wrangling With Tidyverse

The document is a cheat sheet for data wrangling using the Tidyverse in R, detailing various commands and their syntax along with descriptions and examples. It covers package installation, data manipulation functions, handling missing values, data normalization, and visualization techniques. Additionally, it includes a changelog and authorship information.

Uploaded by

moonb4115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

Week2 Cheat Sheet Data Wrangling With Tidyverse

Uploaded by

moonb4115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CheatSheet - Data Wrangling with Tidyverse

Commands Syntax Description Example

install.packages
is used to install
install package install.packages("packagename") the packages install.packages("tidyverse")
from the R
library.
library() Load
load package library(packagename) the package from library(tidyverse)
R library.
download.file()
to download the
file locally
using the
download.file()
function.

url naming the

download.file(url, destfile, method, quiet = FALSE, mode = URL of a download.file(url, destfile =
download.file "w",cacheOK = TRUE,headers = NULL, …) resource to be "lax_to_jfk.tar.gz")
downloaded.

destfile a
character string
with the name
where the
downloaded file
is saved.
untar() is used
to extract files
from a tar archive
untar untar() is done with untar("lax_to_jfk.tar.gz")
untar function
from the utils
package.
read_csv() reads
read_csv read_csv(file) the csv file using read_csv("lax_to_jfk/lax_to_jfk.csv")
readr package.
Missing Values
and
Formatting
is.na(x) returns
a vector of TRUE
or FALSE
is.na is.na(x) depending if the is.na(c(1, na)) # FALSE TRUE
according
element in x is
NA or not.
anyNA() returns
TRUE if x
anyNA anyNA(x, recursive = FALSE) contains any NAs anyNA(c(1, na)) # TRUE
and FALSE
otherwise.
sum() is used to
sum sum(object) sum(is.na(carrierdelay))
calculate sum.
summarize summarize(X, by, FUN, summarize() summarize(count =
…,stat.name=deparse(substitute(X)),type=c('variables','matrix'), function reduces sum(is.na(carrierdelay)))
subset=TRUE,keepcolnames=FALSE) a data frame to a
summary of just
one vector or
value.

X a vector or
matrix capable of
being operated
on by the
function
specified as the
FUN argument

by one or more
stratification
variables. If a
single variable,
by may be a
vector, otherwise
it should be a list.

FUN a function
of a single vector
argument, used to
create the
statistical
summaries for
summarize. FUN
may compute any
number of
statistics.
map() functions
transform their
input by applying
a function to each
map map(.x, .f, ...) map(sub_airline, ~sum(is.na(.)))
element and
returning a vector
the same length
as the input.
dim returns the
dimension of the
dim dim(object) dim(sub_airline)
matrix, array, or
data frame.
drop_na() drop
drop_na drop_na(object) rows containing drop_na(carrierdelay)
missing values.
replace_na
replace missing
values.

data A data frame

or vector.

replace If data is replace_na(list(carrierdelay = 0,

a data frame, a weatherdelay = 0, nasdelay = 0,
replace_na replace_na(data, replace, ...)
securitydelay = 0, lateaircraftdelay
named list giving
= 0))
the value to
replace NA with
for each column.
If data is a
vector, a single
value used for
replacement.
mean() calculate
the arithmetic
mean of the
mean mean(x, na.rm) elements of the mean(drop_na_rows$carrierdelay)
numeric vector
passed to it as
argument.
mutate function
in R (mutate,
mutate_all and
mutate, date_airline %>% select(year, month,
mutate_at) is
mutate_all, mutate(data, ...) day) %>% mutate_all(type.convert) %>%
used to create mutate_if(is.character, as.numeric)
mutate_if
new variable or
column to the
dataframe in R.
Data
Normalization
Simple scaling xnew=xold/xmax Simple scaling sub_airline$arrdelay /
divides each max(sub_airline$arrdelay)
value by the
maximum value
in a feature. The
new range is
between 0 and 1.
Min-max
subtracts the
minimum value
from the original
and divides by (sub_airline$arrdelay -
the maximum min(sub_airline$arrdelay))
Min-max xnew= (xold-xmax) / (xmax-xmin)
minus the /(max(sub_airline$arrdelay) -
minimum. The min(sub_airline$arrdelay))
minimum
becomes 0 and
the maximum
becomes 1.
Standardization
(Z-score)
subtracts the
mean ( 𝜇 ) of the (sub_airline$arrdelay -
Z-score xnew= (xold - 𝜇) / 𝜎 mean(sub_airline$arrdelay)) /
feature and sd(sub_airline$arrdelay)
divides by the
standard
deviation ( 𝜎 ).
Binning Data
ggplot is a
plotting package
that makes it ggplot(data = sub_airline, mapping =
aes(x = arrdelay)) +
ggplot ggplot(df, aes(x, y, other aesthetics)) simple to create geom_histogram(bins = 100, color =
complex plots "white", fill = "red")
from data in a
data frame.
ntile() function
is used to divide
the data into N sub_airline %>% mutate(quantile_rank
ntile ntile(data)
bins there by = ntile(sub_airline$arrdelay,4))
providing ntile
rank.
geom_histogram()
function display geom_histogram(bins = 4, color =
geom_histogram geom_histogram(*arguments) the counts with "white", fill = "red")
bars.
Indicator
variable
spread a key-
value pair across
multiple columns
* data is your
dataframe of
interest.
* key is the
column whose
sub_airline %>%
spread spread(data, key, value) values will spread(reporting_airline, arrdelay)
become variable
names.
* value is the
column where
values will fill in
under the new
variables created
from key.
slice()looks at
slice slice(num1 : num5 ) the specified slice(1:5)
rows.
factor()
function is used
to encode a sub_airline %>%
vector as a factor, mutate(reporting_airline =
factor factor(x) If argument factor(reporting_airline,labels =
ordered is TRUE, c("aa", "as", "dl", "ua", "b6", "pa
the factor levels (1)", "hp", "tw", "vx")))
are assumed to be
ordered.

Author(s)
D.M. Naidu
Changelog
Date Version Changed by Change Description
2023-05-11 1.1 Eric Hao & Vladislav Boyko Updated Page Frames
2020-08-11 1.0 D.M. Naidu Initial Version

Tidyverse: Core Packages in Tidyverse
No ratings yet
Tidyverse: Core Packages in Tidyverse
8 pages
R Module 6 - Data Summarization
No ratings yet
R Module 6 - Data Summarization
25 pages
Manipulating Data in R
No ratings yet
Manipulating Data in R
32 pages
Lecture3 More of Chapter 2
No ratings yet
Lecture3 More of Chapter 2
50 pages
Lesson 1
No ratings yet
Lesson 1
24 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
Data - Analysis Using Matlab
No ratings yet
Data - Analysis Using Matlab
156 pages
Week3 Cheat Sheet Exploratory Data Analysis
No ratings yet
Week3 Cheat Sheet Exploratory Data Analysis
3 pages
Matlab Mathworks Data Analysis
No ratings yet
Matlab Mathworks Data Analysis
167 pages
Drug Poisoning
No ratings yet
Drug Poisoning
70 pages
Ex 2
No ratings yet
Ex 2
5 pages
Practical Preprocessing and Data Cleaning
No ratings yet
Practical Preprocessing and Data Cleaning
51 pages
Lab4 Instructions
No ratings yet
Lab4 Instructions
52 pages
KrutikaKolhe 862467252 HW4
No ratings yet
KrutikaKolhe 862467252 HW4
16 pages
Unit 2
No ratings yet
Unit 2
76 pages
MBA Sem 1 Unit 3 Fundamentals of R
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R
41 pages
ISO 9001-2015 Process Audit Checklist
100% (2)
ISO 9001-2015 Process Audit Checklist
17 pages
Dar 4
No ratings yet
Dar 4
28 pages
Mda Practical2 Eda
No ratings yet
Mda Practical2 Eda
50 pages
Lab 2
No ratings yet
Lab 2
5 pages
Chapter 03 Wrangling
No ratings yet
Chapter 03 Wrangling
40 pages
Week4 CheatSheet ModelDevelopment
No ratings yet
Week4 CheatSheet ModelDevelopment
4 pages
Learning R Programming For Data Science Enthusiasts
No ratings yet
Learning R Programming For Data Science Enthusiasts
8 pages
Data
No ratings yet
Data
126 pages
Week 6
No ratings yet
Week 6
36 pages
Tidyverse Handout
No ratings yet
Tidyverse Handout
30 pages
Practicaal Session Lecture3-Set Up For R Programming Language For Data Analytics
No ratings yet
Practicaal Session Lecture3-Set Up For R Programming Language For Data Analytics
11 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
Data Analysis
100% (1)
Data Analysis
126 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
Introduction To Dplyr
No ratings yet
Introduction To Dplyr
9 pages
Advanced R Programming Tidyverse Packages Notes
No ratings yet
Advanced R Programming Tidyverse Packages Notes
12 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
MATLAB Data Analysis - MathWorks - MATLAB and Simulink For
No ratings yet
MATLAB Data Analysis - MathWorks - MATLAB and Simulink For
104 pages
A5E01428618-03 SITRANS CV en en-US
No ratings yet
A5E01428618-03 SITRANS CV en en-US
114 pages
Coursera Notes
No ratings yet
Coursera Notes
4 pages
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
No ratings yet
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
22 pages
14 Work With Big Data
No ratings yet
14 Work With Big Data
74 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Data Analysis
No ratings yet
Data Analysis
106 pages
HikVision Only 8MP Cameras
No ratings yet
HikVision Only 8MP Cameras
5 pages
Loading Datasets From Excel/CSV: A) Local R Database Dataset
No ratings yet
Loading Datasets From Excel/CSV: A) Local R Database Dataset
4 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Unit 1
No ratings yet
Unit 1
21 pages
R Intro STAT5000
No ratings yet
R Intro STAT5000
17 pages
18BCE10291 - Outliers Assignment
No ratings yet
18BCE10291 - Outliers Assignment
10 pages
Data Analysis
No ratings yet
Data Analysis
116 pages
Introduction To Dplyr
No ratings yet
Introduction To Dplyr
14 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Data Analysis
No ratings yet
Data Analysis
110 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Data Build Tool (DBT)
No ratings yet
Data Build Tool (DBT)
65 pages
Intro To Data Coursera
No ratings yet
Intro To Data Coursera
9 pages
Readme
No ratings yet
Readme
67 pages
Graphic Designer Resume Sample
100% (1)
Graphic Designer Resume Sample
4 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
Harony P6 Past Papers Final 2023 Edited
No ratings yet
Harony P6 Past Papers Final 2023 Edited
314 pages
Introduction To Data-2
No ratings yet
Introduction To Data-2
13 pages
STATA - Subject Table of Contents
No ratings yet
STATA - Subject Table of Contents
15 pages
Chicken Farm Project Cameras Technical Specs
No ratings yet
Chicken Farm Project Cameras Technical Specs
17 pages
Apple Product Information Sheet 20 Watt Hours
No ratings yet
Apple Product Information Sheet 20 Watt Hours
8 pages
Topic IV Hand Sketched Schematic Diagram
No ratings yet
Topic IV Hand Sketched Schematic Diagram
23 pages
DS-MCH208 Datasheet 20241206
No ratings yet
DS-MCH208 Datasheet 20241206
5 pages
Microprocessor and Assembly Language Lecture Note For Ndii Computer Engineering
No ratings yet
Microprocessor and Assembly Language Lecture Note For Ndii Computer Engineering
25 pages
Week1 Cheat Sheet Dplyr Functions
No ratings yet
Week1 Cheat Sheet Dplyr Functions
2 pages
Asus CM6330 - CM6730 - CM6830 - M11aa
No ratings yet
Asus CM6330 - CM6730 - CM6830 - M11aa
90 pages
Server-Side Programming: Java Servlets: Web Technologies A Computer Science Perspective
No ratings yet
Server-Side Programming: Java Servlets: Web Technologies A Computer Science Perspective
115 pages
UNIT 5-Distributed Data Bases Part-1
No ratings yet
UNIT 5-Distributed Data Bases Part-1
23 pages
SC 220: Groups and Linear Algebra B.Tech Sem-III: Subgroup
No ratings yet
SC 220: Groups and Linear Algebra B.Tech Sem-III: Subgroup
73 pages
CCMS Smoke Test Plan
No ratings yet
CCMS Smoke Test Plan
6 pages
Activity Guide and Evaluation Rubric - Task 3 - Electromagnetic Waves in Guided Media PDF
No ratings yet
Activity Guide and Evaluation Rubric - Task 3 - Electromagnetic Waves in Guided Media PDF
7 pages
Mobile Banking
No ratings yet
Mobile Banking
8 pages
Physics - Phy-H-Dse-T-03 (Communication Electronics)
No ratings yet
Physics - Phy-H-Dse-T-03 (Communication Electronics)
2 pages
PR Digital Readouts Linear Encoders ID208864 en
No ratings yet
PR Digital Readouts Linear Encoders ID208864 en
19 pages
SAQA - 115431 - Learner Guide
No ratings yet
SAQA - 115431 - Learner Guide
21 pages
4th Unit Web
No ratings yet
4th Unit Web
43 pages
Networking and Internetworking Devices
No ratings yet
Networking and Internetworking Devices
21 pages
Design and Implementation of School Records Management System
No ratings yet
Design and Implementation of School Records Management System
13 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
16 pages
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
No ratings yet
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
25 pages
Manual Instructions Surcharge 80EEB
No ratings yet
Manual Instructions Surcharge 80EEB
11 pages
Data Scanning Devices: GC University, Lahore Department of Computer Science Report Template B.A/B.Sc (Hons) 2 Semester
No ratings yet
Data Scanning Devices: GC University, Lahore Department of Computer Science Report Template B.A/B.Sc (Hons) 2 Semester
7 pages
Duty-Roster - Tue - 20-12-2022-Final Fall 2022-Dit-Foc-Iub
No ratings yet
Duty-Roster - Tue - 20-12-2022-Final Fall 2022-Dit-Foc-Iub
2 pages
01 Simple Architectures - Solutions
No ratings yet
01 Simple Architectures - Solutions
6 pages
Or Or: Important Information!
No ratings yet
Or Or: Important Information!
5 pages
Covid To Cancer: How AI Is Being Used To Beat Deadly Diseases
No ratings yet
Covid To Cancer: How AI Is Being Used To Beat Deadly Diseases
1 page
JDK Tutorials - Herong's Tutorial Examples
From Everand
JDK Tutorials - Herong's Tutorial Examples
Herong Yang
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Windows Command Prompt
From Everand
Windows Command Prompt
Murat Yildirimoglu
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week2 Cheat Sheet Data Wrangling With Tidyverse

Uploaded by

Week2 Cheat Sheet Data Wrangling With Tidyverse

Uploaded by

CheatSheet - Data Wrangling with Tidyverse

Commands Syntax Description Example

url naming the

data A data frame

replace If data is replace_na(list(carrierdelay = 0,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.