0% found this document useful (0 votes)
188 views2 pages

Data Cleansing-2

Data cleaning is an important part of the data analysis process, as a technically correct dataset can still be incorrect for analysis. Various functions from packages like dplyr can be used to preview, filter, arrange, and transform data. Common data types in R include factors, dates, and POSIXct class for date-time values.

Uploaded by

dipesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views2 pages

Data Cleansing-2

Data cleaning is an important part of the data analysis process, as a technically correct dataset can still be incorrect for analysis. Various functions from packages like dplyr can be used to preview, filter, arrange, and transform data. Common data types in R include factors, dates, and POSIXct class for date-time values.

Uploaded by

dipesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Ignoring missing values from your dataset is an easier and correct approach than

updating the dataset with mean / median values


May be correct...

Data munging is
A Process to clean messy data

Can a technically correct dataset still be incorrect for data analysis?


Yes

Binning is a method to manage data


noisy data

Data cleaning is the most time consuming process in data analysis


True

tail() function shows ___ by default


6 rows

print() is the recommended function to view the dataset


No,Not....

____ can be used to view data distribution of a single variable AND ____ can be
used to view relation between 2 variables
hist(),plot()

Consider cars built-in R dataset and find out what is the median of dist variable
36.00

Using head function, identify the 8th row of mtcars built-in dataset
10 26

Identify the function which is part of dplyr package that helps in previewing the
data.
glimpse()

In a tidy data set ___ forms a row and ____ forms a column
Observation,Variable

A dataset with columns (country, disease, #ofdeaths) has values Row1 - (CONGO, TB,
28) Row2 - (SPAIN, TB, 2) Row3 - (EGYPT, TB, 0). Is this is a tidy or messy
dataset.?
Tidy Data

filter() is for selecting columns and select() is for selecting rows


False

___ allows to make new variables


mutate()

Which function(s) of dplyr would you use to first subset the columns and then sort
them on a particular column?
filter(),arrange()

What is the class of sys.date() and sys.time()


POSIXct

Can a variable of factor type be converted to a date type


No

If value of time is system time which is 2016-12-21 18:33:31 UTC. What is the
output for time+60
18:34

What are the possible outlier treatment


all the options

Identify the correct ones


separate() makes

____ is similar to separate() function


extract()

Which one is NOT a special value in R


None of the options

____ can be used to identify the existence of a matching pattern in a string


str.detect()

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy