Week2 Cheat Sheet Data Wrangling With Tidyverse
Week2 Cheat Sheet Data Wrangling With Tidyverse
destfile a
character string
with the name
where the
downloaded file
is saved.
untar() is used
to extract files
from a tar archive
untar untar() is done with untar("lax_to_jfk.tar.gz")
untar function
from the utils
package.
read_csv() reads
read_csv read_csv(file) the csv file using read_csv("lax_to_jfk/lax_to_jfk.csv")
readr package.
Missing Values
and
Formatting
is.na(x) returns
a vector of TRUE
or FALSE
is.na is.na(x) depending if the is.na(c(1, na)) # FALSE TRUE
according
element in x is
NA or not.
anyNA() returns
TRUE if x
anyNA anyNA(x, recursive = FALSE) contains any NAs anyNA(c(1, na)) # TRUE
and FALSE
otherwise.
sum() is used to
sum sum(object) sum(is.na(carrierdelay))
calculate sum.
summarize summarize(X, by, FUN, summarize() summarize(count =
…,stat.name=deparse(substitute(X)),type=c('variables','matrix'), function reduces sum(is.na(carrierdelay)))
subset=TRUE,keepcolnames=FALSE) a data frame to a
summary of just
one vector or
value.
X a vector or
matrix capable of
being operated
on by the
function
specified as the
FUN argument
by one or more
stratification
variables. If a
single variable,
by may be a
vector, otherwise
it should be a list.
FUN a function
of a single vector
argument, used to
create the
statistical
summaries for
summarize. FUN
may compute any
number of
statistics.
map() functions
transform their
input by applying
a function to each
map map(.x, .f, ...) map(sub_airline, ~sum(is.na(.)))
element and
returning a vector
the same length
as the input.
dim returns the
dimension of the
dim dim(object) dim(sub_airline)
matrix, array, or
data frame.
drop_na() drop
drop_na drop_na(object) rows containing drop_na(carrierdelay)
missing values.
replace_na
replace missing
values.
Author(s)
D.M. Naidu
Changelog
Date Version Changed by Change Description
2023-05-11 1.1 Eric Hao & Vladislav Boyko Updated Page Frames
2020-08-11 1.0 D.M. Naidu Initial Version