Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
• select(df,var1,var2….)
• arrange(df,var1,desc(var2)…)
• dat1<-mutate(dat,marks=c(50,60,70,80,90))
• > print(dat1)
• Sno name dept marks
• 1 2 bb CSE 50
• 2 1 aa ECE 60
• 3 4 cc IT 70
• 4 3 dd ECE 80
• 5 5 aa CSE 90
• dat<-data.frame(
• Sno=c(2,1,4,3,5),
• name=c("bb","aa","cc","dd","aa"),
• dept=c("CSE","ECE","IT","ECE","CSE")
• )
• dat1<-mutate(dat,marks=c(50,60,70,80,90))
• summarize(dat1,avg=mean(marks))
• sample_n(dat1,2)
• sample_frac(dat1,0.4)
data.table package
• A data table is nothing but a group of related facts
arranged in labeled rows and columns and is used to
record information.
• data.table can be used to perform faster
manipulation in a data set. Using data.table reduces
computing time when compared to data.frame.
• A data table has 3 parts namely DT[i,j,by]. Here, we
are instructing R to subset the rows using ‘i’, to
calculate ‘j’ which is grouped by ‘by’. Most of the
times, ‘by’ relates to categorical variable.
data.table package
• dat<-data.frame(
• Sno=c(2,1,4,3,5),
• name=c("bb","aa","cc","dd","aa"),
• dept=c("CSE","ECE","IT","ECE","CSE")
• )
• dat1<-data.table(dat)
• class(dat1)
• dat1[2:4]
• dat1[dept=='CSE']
• dat1[dept %in% c('CSE','IT')]
reshape2 Package
• reshape2 is an R package, was written by
Hadley Wickham which makes it easy to
transform data between wide and long
formats.
• Use the reshape2 package to reshape your
data. Using the reshape2 package, we can
combine features that have unique values. It
has 2 functions namely melt and cast.
• melt: Converts data from wide format to long
format. It is a form of restructuring where
multiple categorical columns are ‘melted’ into
unique rows. Let us understand it using the
code below.
• cast: converts data from long format to wide
format. It starts with melted data and reshapes
into long format. It’s the reverse of melt
function. It has two functions
namely, dcast and acast.
• - dcast returns a data frame as output.
- acast returns a vector/matrix/array as the
output.
• dat
• sno name
• 1 aa
• 2 bb
• 3 cc
readr Package
• The readr package is also developed by Hadley
Wickham to deal with reading in large flat files quickly.
• ‘readr’ is used to read various forms of data in R. It is
very fast. The characters are not converted to factors.
It helps in reading the following data:
• Delimited files with read_delim(), read_csv(),
read_tsv(), and read_csv2().
• Fixed width files with read_fwf(), and read_table().
• Web log files with read_log()
dum <- read_csv("D:/dum1.csv")
dum
class(dum)
{dum$name(for only name column)}
tidyr Package
• tidyr is a package which was developed by
Hadley Wickham which makes it easy to tidy
your data.
• To make the data look neat and tidy, use the
tidyr package. The package has 4 major
functions. You can use these functions if you
are stuck in the data exploration phase, along
with dplyr.
• gather() – ‘gathers’ multiple columns and converts them into
key:value pairs. This function transforms wide form of data to
long form. It can be used as an alternative to ‘melt’ in reshape
package.