0% found this document useful (0 votes)

14 views14 pages

Practical 1 EDA

The document outlines practical exercises for Exploratory Data Analysis (EDA) and data visualization in R. It includes instructions for reading CSV files, extracting data, checking data types, summarizing datasets, subsetting data based on conditions, sorting, handling missing values, and creating various plots. Additionally, it provides a series of tasks to be performed in R, including creating a new CSV file and performing specific data manipulations.

Uploaded by

yashhmehtaa1807

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views14 pages

Practical 1 EDA

Uploaded by

yashhmehtaa1807

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Practical No.

: 01
Aim: EDA & Data Visualization
1.To Read data from CSV file to R
Diabetes <- read.csv(file.choose(),header =
TRUE,sep = ",")
 Diabetes: is the variable we are creating to store
the csv file in form of data frame
 Read.csv :is used as we are reading csv file
 File.choose () :wil open the browser to select
desired csv file
 Header=TRUE: will treat first row as a header
 Sep: as we have csv that comma separated file ,
we use “,”
2.To extract first few lines of data set
head(data_set_name)
head(eda_data) #dataset name is eda_data
 This will produce first few , by default 6 lines
of the dataset

3.To check data type of every variable in dataset.

It completely displays the internal structure of R
object.
str(data_set_name)
The term ‘category’ and ‘enumerated type’
are also used for factors
4.To check summary of entire data frame object
summary(data_set_name)

5.To check first 10 rows of the dataset

data_set_name [row_no:column_no,]
1st row and 10th column [1:10,]

6.To check only 2 columns of the dataset

data_set_name [,row_no:column_no]
1st row and 2nd column [,1:2]
7.To display first 10 rows and only 2 columns of
the dataset
data_set_name [row, column]
 1st to 10th row : 1st and 2nd column [1:10,
1:2]
NOTE:
1.when we want to fetch rows we mention
datasetname rows [no_of_,]
2.When we want to fetch columns we mention
datasetname[,no_of_cols]
3.When we want to fetch both we write
datasetname[no_of_rows, no_of_cols]
8.To display observations having no of students
who have done Graduation
Syntax :
newdata1<-subset(datasetname,
datasetname$column_name=="value")
newdata1
Example :
newdata1<-
subset(EDA_data,EDA_data$Education ==
"Grad")
> newdata1

Here we created new variable called as

newdata1 and we are storing the subset of
EDA_data data in newdata1. It is mandatory
to write again newdata1 in order to view the
output on console as we are creating new
variable to store result of subsetting

9.To display multiple conditions for subsetting

newdata2<-subset(EDA_data,
EDA_data$Age=="51" &
EDA_data$Gender=="M")
>newdata2
Here we extracted details of students whose
age is 51 and gender is Male.

10. To sort the data of a column in ascending

order
 Sorting Data
To sort a data frame in R, use the order( ) function. By
default, sorting is ASCENDING. Prepend the sorting variable
by a minus sign to indicate DESCENDING order. Here are
some examples.

Syntax:
newdata4 <-
datasetname[order(datasetname$column_n
ame), ]
>newdata4
Example :
i.) Newdata4 <-
EDA_data[order(EDA_dataset$Name),]
>Newdata4
ii.) newdata5<-
EDA_data[order(EDA_data$Education,
EDA_data$Salary),]
> newdata5

we are sorting on all rows hence we are not writing

anything after ,
11. To sort the data of a column in descending
order
newdata5<-EDA_data[order(-
EDA_data$Name),]
>newdata5
OR
Newdata5 <-
EDA_data[order(EDA_data$Age, decreasing
= TRUE),]
For Descending order we can use decreasing =
TRUE
12. To check if any column contains missing
observation

colSums(is.na(datasetname)) OR
summary(datasetname)

NA is a logical constant of length 1 which

contains a missing value indicator.

Histogram, boxplot, scatterplot, barplot

13. To plot Histogram of a particular column in

dataset
hist(datasetname$column_name)
14. To plot boxplot of a particular column in
dataset
boxplot(datasetname$column_name)
15. To view properties of particular column of
data
mean(datasetname$column_name)
median(datasetname$column_name)
max(datasetname$column_name)
min(datasetname$column_name)
mode:
y<-table(eda_data$Baths)
names(y)[which(y==max(y))]

my_mode <- function(x) { # Create

mode function
unique_x <- unique(x)
tabulate_x <- tabulate(match(x, unique_x))
unique_x[tabulate_x == max(tabulate_x)]
}
my_mode(eda_data$Baths)
---------------------------------------------------------------
EDA R PRACTICAL

eda_data<-read.csv(file.choose(), header=TRUE,
sep=",")
eda_data
head(eda_data)
summary(eda_data)
str(eda_data)
eda_data[1:8,]
head(eda_data,3)
head(eda_data,8)
tail(eda_data,8)
eda_data[1:8, c(1,5)]
eda_data[,1:5]
newdata1<-subset(eda_data,eda_data$Education ==
"Grad")
newdata1
newdata2<-subset(eda_data, eda_data$Age=="51" &
eda_data$Gender=="M")
newdata2
a<-eda_data[order(eda_data$Name),]
a
a<-eda_data[order(eda_data$Education),]
a
a<-eda_data[order(eda_data$Education, decreasing =
TRUE),]
a
a<-colSums(is.na(eda_data))
a
hist(eda_data$Age)
boxplot(eda_data$Age)
mean(eda_data$Age)
min(eda_data$Age)
max(eda_data$Age)
median(eda_data$Age)
mode(eda_data$Garage)

y<-table(eda_data$Garage)
y

names(y)[which(y==max(y))]

ma<-max(y)
ma
whch<-which(y==ma)
whch
names(y)[whch]

x<-eda_data$Garage
x
y<-unique(x)
y
mat<-match(x, y)
mat
tab<-tabulate(mat)
tab
m<-max(tab)
m
y[tab==m]
x<-eda_data$Age
x
my_mode <- function(x) { # Create
mode function
unique_x <- unique(x)
tabulate_x <- tabulate(match(x, unique_x))
unique_x[tabulate_x == max(tabulate_x)]
}

my_mode(x)

x<-c(0,0,0,1,1,1,1,2,2,2,2,4)
x
y<-table(x)
y
y[max(y)]

hist(eda_data$Rooms)
hist(eda_data$Salary)
#b<-skewness(eda_data$Rooms)
#hist(b)
#Two-way table
#barplot
counts = table(eda_data$Education,eda_data$Gender)
counts
barplot(counts, main = "Data distribution by
Education Vs Gender",col = c("blue","red"))

plot(eda_data$Education,eda_data$Gender, col =
c("blue","red"))

#scatterplot
plot(eda_data$Age, eda_data$Salary)

library(PerformanceAnalytics)
a<-skewness(eda_data$Rooms)
a
hist(eda_data$Rooms)

#imputing missing values

library(e1071)
b<-skewness(eda_data$Garage)
b
hist(eda_data$Garage)

#library(ggplot2)
#ggplot(eda_data$Rooms,
x=returnsstat_density(geom = "line"))

eda_data$Garage[is.na(eda_data$Garage)]<-
mean(eda_data$Garage, na.rm=TRUE)
View(eda_data)

skewness(eda_data$Rooms)
a
hist(a)

hist(eda_data$Rooms)
b<-eda_data$Rooms[is.na(eda_data$Rooms)]<-
median(eda_data$Rooms, na.rm=TRUE)
b
hist(b)
View(eda_data)

#mode
getmode <- function(v){
v=v[nchar(as.character(v))>0]
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}

#Identifying duplicate data

data<-eda_data[1:5, 3:4]
data
duplicated(data)
#removing duplicate data
a<-data[!duplicated(data),]
a
#removing an outlier
boxplot(eda_data$AppraisedValue)

plot(eda_data$AppraisedValue)
x<-eda_data$AppraisedValue
x
out <- boxplot.stats(x)$out #identifying the outlier
out ## `boxplot.stats` has picked them out 1200 value
x<-x[!(x %in% out)]
x ## this removes 1200 from x
boxplot(x)
#imputing
q<-quantile(eda_data$AppraisedValue, .95) #95th
percentile
q #850
summary(eda_data$AppraisedValue)
#ifelse(2==2, "equal", "not equal") #example of ifelse
app_val<- ifelse(eda_data$AppraisedValue >=
1000,850,eda_data$AppraisedValue)
app_val
boxplot(app_val)

#conersion : character to numeric values

str<-eda_data$Gender
str
str(eda_data$Gender)
str(eda_data$Education)
num<-as.numeric(str)
num
str(num)
typeof(num)
class(num)

num<-as.factor(num)
num
class(num)

num<-as.character(num)
num
class(num)
typeof(num)

#numeric to logical values

v<-c(0, 0, 1, 1)
v
logi<-as.logical(v)
logi

#logical to numeric
int<-as.integer(logi)
int
typeof(int)

fact<-as.factor(int)
fact
str(eda_data$Name)
Perform following operations in R

1. Create Student.csv file with fields(rollno, name, gender, class, Tmarks) (note:- Total marks out of
1000) Read the file in R
2. Extract first few lines from from dataset
3. Check the data type of dataset's fields
4. Get the summary of data set
5. check the dimensions of dataset and list column names
6. List the row sets where total marks are more than 750
7. List only the first 2 columns where total marks are more than 750 and class is SYCS
8. Sort the data in ascending order of total marks
9. List the records where total marks are not entered.
10. Plot the scatter plot which shows relation between average marks and class.
11. Draw the box plot for totalmarks
12. Get the summary of data set
13. check the dimensions of dataset and list column names
14. List the row sets where total marks are more than 750
15. List only the first 2 columns where total marks are more than 750 and class is SYCS
16. Sort the data in ascending order of total marks
17. List the records where total marks are not entered.
18. Plot the scatter plot which shows relation between average marks and class.

Dde Prospectus 2012 13 220612
No ratings yet
Dde Prospectus 2012 13 220612
73 pages
R Studio Lab Summary Sheet
No ratings yet
R Studio Lab Summary Sheet
3 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
Cleaning Data
No ratings yet
Cleaning Data
17 pages
Data Analysis With R
No ratings yet
Data Analysis With R
72 pages
Unit 2
No ratings yet
Unit 2
29 pages
R Tutorial #1: Applied Econometrics (Econ3005)
No ratings yet
R Tutorial #1: Applied Econometrics (Econ3005)
21 pages
Daur Unit 2
No ratings yet
Daur Unit 2
28 pages
DSR LAB MANUAL - 10 Programs
No ratings yet
DSR LAB MANUAL - 10 Programs
34 pages
Data Analyses R Manual NYTS
No ratings yet
Data Analyses R Manual NYTS
24 pages
Week3 Slides
No ratings yet
Week3 Slides
36 pages
Solutions For QB3
No ratings yet
Solutions For QB3
14 pages
Materi 4
No ratings yet
Materi 4
30 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
L3 Notes-1
No ratings yet
L3 Notes-1
8 pages
All Values in The First Column
No ratings yet
All Values in The First Column
7 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Data Cleansing
No ratings yet
Data Cleansing
18 pages
PySpark RDD Basics PDF
No ratings yet
PySpark RDD Basics PDF
1 page
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
Important R Codes and Notes
No ratings yet
Important R Codes and Notes
13 pages
(R) Internal-2 Q & A
No ratings yet
(R) Internal-2 Q & A
65 pages
Unit Ii Eda Using R
No ratings yet
Unit Ii Eda Using R
11 pages
Module 2.9
No ratings yet
Module 2.9
11 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
BIO259 Note
No ratings yet
BIO259 Note
55 pages
12th Computer Science EM Chapter 12 Study Materials English Medium PDF Download
No ratings yet
12th Computer Science EM Chapter 12 Study Materials English Medium PDF Download
17 pages
Lecture 5 (Managing and Understanding Data)
No ratings yet
Lecture 5 (Managing and Understanding Data)
9 pages
Dsda Manual
No ratings yet
Dsda Manual
64 pages
FE418 RLectureNotes1
No ratings yet
FE418 RLectureNotes1
15 pages
R Cheatsheet ABCD
No ratings yet
R Cheatsheet ABCD
3 pages
R Cheatsheet ABC
No ratings yet
R Cheatsheet ABC
3 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Build A Restful App With Spring MVC and Angularjs
No ratings yet
Build A Restful App With Spring MVC and Angularjs
161 pages
Experiment 5
No ratings yet
Experiment 5
13 pages
R Commands
No ratings yet
R Commands
18 pages
Unit 2
No ratings yet
Unit 2
76 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
Module 2 ExploratoryDataAnalysis
No ratings yet
Module 2 ExploratoryDataAnalysis
22 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
R Functions
No ratings yet
R Functions
8 pages
R Code
No ratings yet
R Code
9 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
R Programming-1
No ratings yet
R Programming-1
6 pages
Web Question Bank
No ratings yet
Web Question Bank
6 pages
Designing and Building Parallel Programs
No ratings yet
Designing and Building Parallel Programs
371 pages
Chapter 3-JAVA GUI Programming-Reveiw Final
No ratings yet
Chapter 3-JAVA GUI Programming-Reveiw Final
88 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Emgu CV Tutorial Skander
No ratings yet
Emgu CV Tutorial Skander
36 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
R Cheatsheet ABC
No ratings yet
R Cheatsheet ABC
3 pages
Ds
No ratings yet
Ds
2 pages
UNIT - 1 (Data Structure Using C/C++) .: Definition of Algorithm
No ratings yet
UNIT - 1 (Data Structure Using C/C++) .: Definition of Algorithm
2 pages
Bachelor of Science in Computer Science Prospectus
No ratings yet
Bachelor of Science in Computer Science Prospectus
2 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
R Examples
No ratings yet
R Examples
56 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Python 3 Cheat Sheet v3
100% (5)
Python 3 Cheat Sheet v3
13 pages
DSCI 100 Cheat Sheet
No ratings yet
DSCI 100 Cheat Sheet
3 pages
Heap Data Structure: Zahoor Jan
No ratings yet
Heap Data Structure: Zahoor Jan
38 pages
R File Code
No ratings yet
R File Code
16 pages
Quiz DP 13
No ratings yet
Quiz DP 13
4 pages
DS Tutorial-2 Dinesh Dodeja 52119
No ratings yet
DS Tutorial-2 Dinesh Dodeja 52119
5 pages
OAF Personalisation Migration
No ratings yet
OAF Personalisation Migration
8 pages
Java Programming MCQ - 90min
No ratings yet
Java Programming MCQ - 90min
14 pages
Write A Program To Fin Tte Sum of Iumbers II Ai Array Usiig Poiiters
No ratings yet
Write A Program To Fin Tte Sum of Iumbers II Ai Array Usiig Poiiters
6 pages
Chapter 4
No ratings yet
Chapter 4
45 pages
3 - Hashing Functions
No ratings yet
3 - Hashing Functions
20 pages
BC 414 - Programming Database Changes Complee
No ratings yet
BC 414 - Programming Database Changes Complee
80 pages
Ahjkosdqjhkladsfl Compressed
No ratings yet
Ahjkosdqjhkladsfl Compressed
10 pages
TechnicalReference FiM
No ratings yet
TechnicalReference FiM
60 pages
PDC R2
No ratings yet
PDC R2
7 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Bucketprocessor
No ratings yet
Bucketprocessor
6 pages
OS (Mmy Allocation Methods)
No ratings yet
OS (Mmy Allocation Methods)
12 pages
Ejercicio Cobol 5
No ratings yet
Ejercicio Cobol 5
2 pages
Machine Learning Practical File
No ratings yet
Machine Learning Practical File
31 pages
FSD Module 9
No ratings yet
FSD Module 9
8 pages
2 Stack PDA
No ratings yet
2 Stack PDA
2 pages
Array Sorting
No ratings yet
Array Sorting
5 pages
Data Structure - Algo Expert
No ratings yet
Data Structure - Algo Expert
3 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Practical 1 EDA

Uploaded by

Practical 1 EDA

Uploaded by

Practical No.

3.To check data type of every variable in dataset.

5.To check first 10 rows of the dataset

6.To check only 2 columns of the dataset

Here we created new variable called as

9.To display multiple conditions for subsetting

10. To sort the data of a column in ascending

we are sorting on all rows hence we are not writing

NA is a logical constant of length 1 which

Histogram, boxplot, scatterplot, barplot

13. To plot Histogram of a particular column in

my_mode <- function(x) { # Create

#imputing missing values

#Identifying duplicate data

#conersion : character to numeric values

#numeric to logical values

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.