0% found this document useful (0 votes)

14 views26 pages

Da (22C01156)

Uploaded by

shivaputrakempannvar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views26 pages

Da (22C01156)

Uploaded by

shivaputrakempannvar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Data Analytics

Case Studies in R
Presidency College, Bangalore-24
Report
Department of Computer
Applications

Faculty In charge: Dr. Sheetal Nitin

Title: A Data Analytics Approach to Understanding

Student Performance in studies.

CLASS & SECTION: _V BCA C_

SL.
NO Register Number Student Name

1 22C01156 SHEETAL KEMPANNAVAR

2 22C01146 SAHANA CR
3 22C01182 VAGEESHA YADAV
Reaccredited by
NAAC with A+

Presidency
Group
Case Study Title: A Data Analytics Approach to Understanding
Student Performance in Studies
Course: BCA
Subject: DATA ANAYTICS
Class & Section: V BCA ‘C’

Certificate

This is to certify that the team has

satisfactorily completed the course of
seminar/project/case studies prescribed by
the Presidency College (Autonomous) for
the semester ……5 BCA ‘C’………………
degree course in the year 20 24 -20 24
SL.
NO Register Number Student Name MARKS
to be filled
by Faculty
1 22C01156 SHEETAL KEMPANNAVAR
2 22C01146 SAHANA CR
3 22C01182 VAGEESHA YADAV
Page 1
Signature of the Staff Member

Table of Contents

Sl.no Contents Pages

1 Abstract

2 Introduction

3 Data Collection

4 Data Exploration

5 Data Reformatting and

Cleaning

6 Data Preprocessing

7 Data Analysis

8 Conclusion

Page 2
Abstract

Title:A Data Analytics Approach to Understanding Student

Performance in Studies.

This report presents a data analytics case study developed

using R programming as part of the curriculum. The study
utilizes a simple dataset originally formatted in Excel, which is
converted to CSV for analysis in R. It encompasses
fundamental techniques including descriptive statistics, data
exploration, and data preprocessing, culminating in the
construction of a basic predictive model. Various
visualizations and charts are employed to enhance
understanding of the data and address the specified problem
statement. This work aims to provide practical insights into
data analytics methodologies and their application using R.

Page 3
Introduction
Problem statement:
Consider the dataset indicating the number of hours of study put in by the students
(NoOfHours) and their score (Score).

This report presents a data analytics case study developed using R programming as part of the
curriculum, focusing on the relationship between the number of study hours (Hours) and
scores (Score) of students. The dataset, originally formatted in Excel, is converted to CSV for
analysis in R. Key analytical techniques employed include descriptive statistics, data
exploration, and data preprocessing, which facilitate a deeper understanding of the data's
structure and patterns. A simple predictive model is built to assess the impact of study hours
on academic performance, supported by various visualizations and charts. This work aims to
illustrate the practical application of data analytics methodologies in R and provide insights
into factors influencing student success.

Techniques used:
A basic data science project consists of the following six steps:

1. State the problem you are trying to solve.

It has to be an unambiguous question that can be answered with data and a statistical
or machine learning model. At least, specify: What is being observed? What has to be
predicted?
2. Collect the data, then clean and prepare it.
This is commonly the most time-consuming task, but it has to be done in order to fit a
prediction model with the data.
3. Explore the data.
Get to know its properties and quirks. Check numerical summaries of your metric
variables, tables of the categorical data, and plot univariate and multivariate
representations of your variables. By this, you also get an overview of the quality of
the data and can find outliers.
4. Check if any variables may need to be transformed.
Most commonly, this is a logarithmic transformation of skewed measurements such as
concentrations or times. Also, some variables might have to be split up into two or
more variables.
5. Choose a model and train it on the data.

Page 4
If you have more than one candidate model, apply each and evaluate their goodness-
of-fit using independent data that was not used for training the model.
6. Use the best model to make your final predictions.

Model details:

Page 5
Data Collection
DATASET
HOURS SCORE
2.5 21
5.1 47
3.2 27
8.5 75
3.5 30
1.5 20
9.2 88
5.5
8.3 80
2 25
7.7 85
5.9 62
4.5 41
3.3 42
1.1 17
8.9 95
56
1.9 24
6.1 67
69
2.7 30
4.8 54
3.8
6.9 76
7.8 86
6 45
8 89
57
10 90
14 98
1.5 24
12
4.6 79
7 90
3.6 66

Page 6
6.9 78
1 23
2.2 79
6.8 88
3 50

Data Exploration
Reading CSV Files A CSV file uses .csv extension and stores data in a table structure format
in any plain text. The following function reads data from a CSV file: read.csv(‘filename’)
where, filename is the name of the CSV file that needs to be imported.

setwd("C:/Users/shivp/OneDrive/Desktop/SHEEETAL STUDY-PC")
score<-read.csv("studentscore.csv")
View(score)

Exploring a dataset means displaying the data of the dataset in a different form. Datasets are
the main part of analytical data processing. It uses different forms or parts of the dataset.
With the help of R commands, analysts can easily explore a dataset in different ways.

summary(score)

HOURS SCORE
Min.: 1.00 Min. :17.00
1st Qu.: 3.00 1st Qu.:38.25
Median: 5.00 Median :59.00
Mean: 5.45 Mean :58.75
3rd Qu.: 7.25 3rd Qu.:79.25
Max. :14.00 Max. :98.00

Page 7
str(score)

'data.frame': 40 obs. of 2 variables:

$ HOURS: num 2 5 3 8 4 2 9 6 8 2 ...
$ SCORE: num 21 47 27 75 30 20 88 59 80 25 ...

> head(score)
HOURS SCORE
1 2 21
2 5 47
3 3 27
4 8 75
5 4 30
6 2 20

> tail(score)
HOURS SCORE
35 4 66
36 7 78
37 1 23
38 2 79
39 7 88
40 3 50

> dim(score)
[1] 40 2

Data Reformatting and Cleaning

Missing Values treatMent in r During analytical data processing, users come across problems
caused by missing and infinite values. To get an accurate output, users should remove or
clean the missing values. In R, NA (Not Available) represents missing values and Inf
(Infinite) represents infinite values. R provides different functions that identify the missing
values during processing

Page 8
> is.na(score)
HOURS SCORE
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] FALSE FALSE
[5,] FALSE FALSE
[6,] FALSE FALSE
[7,] FALSE FALSE
[8,] FALSE TRUE
[9,] FALSE FALSE
[10,] FALSE FALSE
[11,] FALSE FALSE
[12,] FALSE FALSE
[13,] FALSE FALSE
[14,] FALSE FALSE
[15,] FALSE FALSE
[16,] FALSE FALSE
[17,] TRUE FALSE
[18,] FALSE FALSE
[19,] FALSE FALSE
[20,] TRUE FALSE
[21,] FALSE FALSE
[22,] FALSE FALSE
[23,] FALSE TRUE
[24,] FALSE FALSE
[25,] FALSE FALSE
[26,] FALSE FALSE
[27,] FALSE FALSE
[28,] TRUE FALSE
[29,] FALSE FALSE
[30,] FALSE FALSE
[31,] FALSE FALSE
[32,] FALSE TRUE
[33,] FALSE FALSE
[34,] FALSE FALSE
[35,] FALSE FALSE
[36,] FALSE FALSE
[37,] FALSE FALSE
[38,] FALSE FALSE

Page 9
[39,] FALSE FALSE
[40,] FALSE FALSE

Data Preprocessing
Method1: EDITING using
edit(sscore)

Page 10
Method2 : Removing using na.omit(score)
score1<-na.omit(score)
> print(score1)
HOURS SCORE
1 2 21
2 5 47
3 3 27
4 8 75
5 4 30
6 2 20
7 9 88
8 6 59
9 8 80
10 2 25
11 8 85
12 6 62
13 4 41
14 3 42
15 1 17
16 9 95
18 2 24

Page 11
19 6 67
21 3 30
22 5 54
23 4 59
24 7 76
25 8 86
26 6 45
27 8 89
29 10 90
30 14 98
31 2 24
32 12 59
33 5 79
34 7 90
35 4 66
36 7 78
37 1 23
38 2 79
39 7 88
40 3 50

Method 3: Auto adjusting the

missing values
Preprocessing forHours

> print(score$HOURS<-ifelse(is.na(score$HOURS),ave(score$HOURS, FUN = function(x)

mean(x, na.rm = TRUE)),score$HOURS))

[1] 2.500000 5.100000 3.200000 8.500000 3.500000 1.500000 9.200000

5.500000 8.300000 2.000000 7.700000
[12] 5.900000 4.500000 3.300000 1.100000 8.900000 5.440541 1.900000
6.100000 5.440541 2.700000 4.800000
[23] 3.800000 6.900000 7.800000 6.000000 8.000000 5.440541 10.000000
14.000000 1.500000 12.000000 4.600000
[34] 7.000000 3.600000 6.900000 1.000000 2.200000 6.800000 3.000000

Page 12
We can round off the values
> print(score1$HOURS <- as.numeric(format(round(score1$HOURS, 0))))

[1] 2 5 3 8 4 2 9 8 2 8 6 4 3 1 9 2 6 3 5 7 8 6 8 10 14 2 5 7 4 7 1 2 7
3

Preprocessing for scores

> print(score$SCORE <- ifelse(is.na(score$SCORE),ave(score$SCORE, FUN = function(x)
mean(x, na.rm = TRUE)),score$SCORE))

[1] 21.00000 47.00000 27.00000 75.00000 30.00000 20.00000 88.00000 58.72973

80.00000 25.00000 85.00000 62.00000 41.00000
[14] 42.00000 17.00000 95.00000 56.00000 24.00000 67.00000 69.00000 30.00000
54.00000 58.72973 76.00000 86.00000 45.00000
[27] 89.00000 57.00000 90.00000 98.00000 24.00000 58.72973 79.00000 90.00000
66.00000 78.00000 23.00000 79.00000 88.00000
[40] 50.00000

print(score$SCORE <- as.numeric(format(round(score$SCORE,0))))

[1] 21 47 27 75 30 20 88 59 80 25 85 62 41 42 17 95 56 24 67 69 30 54 59 76 86 45 89
57 90 98 24 59 79 90 66 78 23 79 88
[40]50

Page 13
Preprocessed data

Page 14
Data Analysis
plot(score$HOURS,score$SCORE)

abline (h=mean (score$SCORE))

When we use the mean to predict the score, at some instances we can observe a significant
difference between the actual (observed) value and the predicted value. So we are trying with
Correlation.

Page 15
Correlation Coefficient

The degree and direction of a linear association can be determined using correlation. The Pearson
correlation coefficient of the association between the number of hours studied and GPA score is
shown as follows

> cor(score$HOURS,score$SCORE)

[1] 0.7920693

The correlation value here suggests that there is a strong association between the number of hours
studied and the freshmen score.

A correlation value close to 0 indicates that the variables are not linearly associated. However, these
variables may still be related. Thus, it is advised to plot the data.

Since correlation analysis may be inappropriate in determining the causation, we use regression
techniques to quantify the nature of the relationship between the variables.

When a regression model is of a linear form, such a regression is called a linear regression. Similarly,
when a regression model is of non-linear form, then such a regression is called a non-linear
regression.

A Linear equation can be defined as the equation having a maximum of only one degree. A Nonlinear
equation can be defined as the equation having the maximum degree 2 or more than 2. A linear
equation forms a straight line on the graph. A nonlinear equation forms a curve on the graph.

1. Simple linear form: There is one predictor and one dependent variable: f(X) = b0 + b1x1 + e 2.
Multiple linear form: There are multiple predictor variables and one dependent variable: f(X) = b0 +
b1x1 + b2x2 + …. + bnxn + e

Since the scatter plot between the number of hours of study put in by students and the freshmen
scores suggested a linear association, let us build a linear regression model to quantify the nature
of this relationship.

Building linear model for prediction

Create the Linear Model Using lm()

Let us compute the coefficients: (a) Intercept (b) HS$NoOfHours

>model<-lm(score$SCORE~score$HOURS)

>print(model)

Call:

lm(formula = score$SCORE ~ score$HOURS)

Page 16
Coefficients:

(Intercept) score$HOURS

22.980 6.575

> summary(model)

Call:

lm(formula = score$SCORE ~ score$HOURS)

Residuals:

Min 1Q Median 3Q Max

-42.88 -11.21 -0.34 10.45 41.55

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 22.980 5.099 4.506 6.12e-05 ***

score$HOURS 6.575 0.822 7.999 1.14e-09 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15.5 on 38 degrees of freedom

Multiple R-squared: 0.6274, Adjusted R-squared: 0.6176

F-statistic: 63.98 on 1 and 38 DF, p-value: 1.144e-09

The first item shown in the output is the formula (lm(formula = score$Score ~
score$Hour) that R uses to fit the data. lm() is a linear model function in R that is
used to create a simple regression model. score$Hour is the predictor variable
and score$Score is the target/response variable. The next item in the model
output describes residuals. What are “residuals”?
The difference between the actual observed response values and the response
values that the model predicted is called “residuals”.The residuals section of the
model output
breaks it down into five summary points,viz., (Minimum, 1Q (first quartile),

Page 17
Median and 3Q (third quartile) and Maximum). When assessing how well the
model fits the data, one should look for a symmetrical distribution across these
points on the mean
value zero (0).

Coefficient: Estimate
The coefficient, Estimate contains two rows. The first one is the intercept, which
is the mean
of the response Y when all predictors, all X = 0. Note, the mean is only useful if
every X
in the model actually has some values of zero. The second row in the Coefficients
is the
slope, or in our example, the effect Hours has on Score. The slope
term in our model proves that for every hour increase in the Hours, the required
Score goes up by 0.75 points.

Page 18
> res<-predict(model)
> print(res)
1 2 3 4 5 6 7 8 9 10 11 12
39.41696 56.51106 44.01922 78.86489 45.99161 32.84230 83.46715 59.14093
77.54996 36.12963 73.60517 61.77079
13 14 15 16 17 18 19 20 21 22 23
24
52.56627 44.67668 30.21244 81.49476 58.75000 35.47216 63.08572 58.75000
40.73189 54.53867 47.96401 68.34544
25 26 27 28 29 30 31 32 33 34 35
36
74.26263 62.42825 75.57757 58.75000 88.72688 115.02550 32.84230 101.87619
53.22374 69.00291 46.64908 68.34544
37 38 39 40

Page 19
29.55497 37.44456 67.68798 42.70429

Rounding of results:
> res<-round(res,0)
> print(res)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30
39 57 44 79 46 33 83 59 78 36 74 62 53 45 30 81 59 35 63 59 41 55 48
68 74 62 76 59 89 115
31 32 33 34 35 36 37 38 39 40
33 102 53 69 47 68 30 37 68 43

> ResHS <- resid(model)

> print(ResHS)
1 2 3 4 5 6 7 8 9 10
-18.4169573 -9.5110631 -17.0192166 -3.8648937 -15.9916134 -12.8423013
4.5328471 -0.1409255 2.4500376 -11.1296293
11 12 13 14 15 16 17 18 19 20
11.3948312 0.2292121 -11.5662694 -2.6766822 -13.2124389 13.5052439 -
2.7500000 -11.4721637 3.9142809 10.2500000
21 22 23 24 25 26 27 28 29 30
-10.7318885 -0.5386663 11.0359898 7.6545560 11.7373656 -17.4282535
13.4224344 -1.7500000 1.2731223 -17.0255019
31 32 33 34 35 36 37 38 39 40
-8.8423013 -42.8761898 25.7762650 20.9970904 19.3509210 9.6545560 -
6.5549733 41.5554395 20.3120216 7.2957146

> data <-

data.frame(Hours=score$HOURS,Actualscore=score$SCORE,Predictedscore=res,Residua
ls<-ResHS)
> print(data)

Hours Actualscore Predictedscore Residuals....ResHS

Page 20
18 1.900000 24 35 -11.4721637
19 6.100000 67 63 3.9142809
20 5.440541 69 59 10.2500000
21 2.700000 30 41 -10.7318885
22 4.800000 54 55 -0.5386663
23 3.800000 59 48 11.0359898
24 6.900000 76 68 7.6545560
25 7.800000 86 74 11.7373656
26 6.000000 45 62 -17.4282535
27 8.000000 89 76 13.4224344
28 5.440541 57 59 -1.7500000
29 10.000000 90 89 1.2731223
30 14.000000 98 115 -17.0255019
31 1.500000 24 33 -8.8423013
32 12.000000 59 102 -42.8761898
33 4.600000 79 53 25.7762650
34 7.000000 90 69 20.9970904
35 3.600000 66 47 19.3509210
36 6.900000 78 68 9.6545560
37 1.000000 23 30 -6.5549733
38 2.200000 79 37 41.5554395
39 6.800000 88 68 20.3120216
40 3.000000 50 43 7.2957146

write.csv(data,"C:/Users/welcome/Desktop/PREDICTED.csv")
print ('CSV file written Successfully :)')

> new<-na. omit(score)

> print(new)
HOURS SCORE
1 2.500000 21
2 5.100000 47
3 3.200000 27
4 8.500000 75
5 3.500000 30
6 1.500000 20
7 9.200000 88
8 5.500000 59
9 8.300000 80
10 2.000000 25
11 7.700000 85
12 5.900000 62
13 4.500000 41
14 3.300000 42
15 1.100000 17
16 8.900000 95
17 5.440541 56
18 1.900000 24
19 6.100000 67
20 5.440541 69
21 2.700000 30
22 4.800000 54

Page 21
23 3.800000 59
24 6.900000 76
25 7.800000 86
26 6.000000 45
27 8.000000 89
28 5.440541 57
29 10.000000 90
30 14.000000 98
31 1.500000 24
32 12.000000 59
33 4.600000 79
34 7.000000 90
35 3.600000 66
36 6.900000 78
37 1.000000 23
38 2.200000 79
39 6.800000 88
40 3.000000 50

> model <- lm(new$SCORE ~ new$HOURS)

> print(model)

Call:
lm(formula = new$SCORE ~ new$HOURS)

Coefficients:
(Intercept) new$HOURS
22.980 6.575

> summary(model)

Call:
lm(formula = new$SCORE ~ new$HOURS)

Residuals:
Min 1Q Median 3Q Max
-42.88 -11.21 -0.34 10.45 41.55

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.980 5.099 4.506 6.12e-05 ***
new$HOURS 6.575 0.822 7.999 1.14e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15.5 on 38 degrees of freedom

Multiple R-squared: 0.6274, Adjusted R-squared: 0.6176
F-statistic: 63.98 on 1 and 38 DF, p-value: 1.144e-09

Page 22
if the number of data points is small, a large F-statistic is required to ascertain
that there may be arelationship between the predictor and response variables.
To compute the F statistic,
the formula is F = (explained variation/(k – 1))/(unexplained variation/(n – k))
where, k is the no. of variables in the dataset and n is the no. of observations.

plot(new$HOURS, new$SCORE, co1='blue', main = 'Linear Regression',

abline(lm(new$SCORE ~ new$HOURS)), cex = 1.3, pch = 16, xlab = 'No of hours of
study',ylab = 'Student Score')

write.csv(data,"C:/Users/shivp/OneDrive/Desktop/SHEEETAL STUDY-PC")

print ('CSV file written Successfully :)')

data <-data.frame(Hours=new$HOURS, Actualscore=new$SCORE,

Predictedscore=round(res,0), Residuals<-ResHS)

Page 23
> print(data)

Hours Actualscore Predictedscore Residuals....ResHS

1 2.500000 21 39 -18.4169573
2 5.100000 47 57 -9.5110631
3 3.200000 27 44 -17.0192166
4 8.500000 75 79 -3.8648937
5 3.500000 30 46 -15.9916134
6 1.500000 20 33 -12.8423013
7 9.200000 88 83 4.5328471
8 5.500000 59 59 -0.1409255
9 8.300000 80 78 2.4500376
10 2.000000 25 36 -11.1296293
11 7.700000 85 74 11.3948312
12 5.900000 62 62 0.2292121
13 4.500000 41 53 -11.5662694
14 3.300000 42 45 -2.6766822
15 1.100000 17 30 -13.2124389
16 8.900000 95 81 13.5052439
17 5.440541 56 59 -2.7500000
18 1.900000 24 35 -11.4721637
19 6.100000 67 63 3.9142809
20 5.440541 69 59 10.2500000
21 2.700000 30 41 -10.7318885
22 4.800000 54 55 -0.5386663
23 3.800000 59 48 11.0359898
24 6.900000 76 68 7.6545560
25 7.800000 86 74 11.7373656
26 6.000000 45 62 -17.4282535
27 8.000000 89 76 13.4224344
28 5.440541 57 59 -1.7500000
29 10.000000 90 89 1.2731223
30 14.000000 98 115 -17.0255019
31 1.500000 24 33 -8.8423013
32 12.000000 59 102 -42.8761898
33 4.600000 79 53 25.7762650
34 7.000000 90 69 20.9970904
35 3.600000 66 47 19.3509210
36 6.900000 78 68 9.6545560
37 1.000000 23 30 -6.5549733
38 2.200000 79 37 41.5554395
39 6.800000 88 68 20.3120216
40 3.000000 50 43 7.2957146

Page 24
Conclusion
The model is validated based on the validation of the assumptions of linear regression: This case
study demonstrates the critical relationship between study habits, quantified by the number of
hours spent studying, and academic performance, as reflected in freshmen scores. Through the
application of various data analytics techniques using R, we were able to explore and analyze the
dataset effectively.

The findings indicate a positive correlation between the hours of study and freshmen scores,
suggesting that increased study time is associated with better academic outcomes.
Descriptive statistics and visualizations highlighted the distribution of study hours and
scores, while data pre-processing ensured the integrity and suitability of the dataset for
analysis.

The predictive model developed during this study provides a foundational understanding of
how study habits can influence academic performance. This insight not only underscores the
importance of effective study practices among students but also offers valuable guidance for
educators and policymakers aiming to enhance student success.

Overall, this analysis serves as a practical example of how data analytics can inform
educational strategies and encourage students to adopt more effective study routines for
improved academic achievement. Future research could expand on this analysis by
incorporating additional variables, such as study methods and student engagement, to
provide a more comprehensive understanding of factors influencing academic success.

Page 25

R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
Data Preparation .1
No ratings yet
Data Preparation .1
37 pages
Lec 6 Data Preprocessing Using R
No ratings yet
Lec 6 Data Preprocessing Using R
84 pages
Solutions 1
No ratings yet
Solutions 1
27 pages
R Programing Bhagu
No ratings yet
R Programing Bhagu
40 pages
Materi 4
No ratings yet
Materi 4
30 pages
Data Cleaning R
No ratings yet
Data Cleaning R
16 pages
Mod 3
No ratings yet
Mod 3
50 pages
Unit 2
No ratings yet
Unit 2
76 pages
DAUR UNIT 1 Part 2
No ratings yet
DAUR UNIT 1 Part 2
39 pages
2 - Data Management and Wrangling
No ratings yet
2 - Data Management and Wrangling
33 pages
Week 1-3
No ratings yet
Week 1-3
17 pages
Dsda Manual
No ratings yet
Dsda Manual
64 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
Datascience Practicals
No ratings yet
Datascience Practicals
23 pages
23 Hack in Sight 2014
100% (2)
23 Hack in Sight 2014
652 pages
Data Cleansing
No ratings yet
Data Cleansing
18 pages
Handling Missing Values: Is - Na Colsums
No ratings yet
Handling Missing Values: Is - Na Colsums
10 pages
Student Performance Analysis and Prediction
No ratings yet
Student Performance Analysis and Prediction
19 pages
Practical 1 EDA
No ratings yet
Practical 1 EDA
14 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Chapter 2. Pre-Processing Data
No ratings yet
Chapter 2. Pre-Processing Data
37 pages
Project Employee Absenteeism
No ratings yet
Project Employee Absenteeism
33 pages
Brushless ALTERNATOR Customer Training
100% (1)
Brushless ALTERNATOR Customer Training
61 pages
R Practicals
No ratings yet
R Practicals
32 pages
Data Analytics Using R Lab - Master Manual
No ratings yet
Data Analytics Using R Lab - Master Manual
29 pages
Data Preparation: Treatment of Missing Values
No ratings yet
Data Preparation: Treatment of Missing Values
26 pages
Module 2.9
No ratings yet
Module 2.9
11 pages
Sample Paper For Practice New
No ratings yet
Sample Paper For Practice New
6 pages
R Assignment 10
No ratings yet
R Assignment 10
12 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Experiment 5
No ratings yet
Experiment 5
13 pages
Radiograph Interpretation CASTINGS
No ratings yet
Radiograph Interpretation CASTINGS
5 pages
R Commands
No ratings yet
R Commands
18 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
DA Lab Manual
No ratings yet
DA Lab Manual
42 pages
Data Preparation: Handling Missing Values and Outliers
No ratings yet
Data Preparation: Handling Missing Values and Outliers
28 pages
Big Data - Lab 3
No ratings yet
Big Data - Lab 3
25 pages
Singh Project1 Report
No ratings yet
Singh Project1 Report
12 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
R Basics
88% (8)
R Basics
8 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
07 HR
No ratings yet
07 HR
15 pages
Logistic Regression Assignment
No ratings yet
Logistic Regression Assignment
20 pages
Chapter3 DS
No ratings yet
Chapter3 DS
17 pages
2024 Ceed Mathematics - Paper I
No ratings yet
2024 Ceed Mathematics - Paper I
14 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
R Functions
No ratings yet
R Functions
8 pages
Adsl Exp 3 2024
No ratings yet
Adsl Exp 3 2024
11 pages
Hose Reel Calculation
100% (4)
Hose Reel Calculation
2 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
Assignment 2 PDF
No ratings yet
Assignment 2 PDF
25 pages
Binder 1
No ratings yet
Binder 1
4 pages
B4A Tutorials PDF
100% (4)
B4A Tutorials PDF
119 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
PreProcessing With R
No ratings yet
PreProcessing With R
6 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
IFOS Presentation-PAK Mobilink0704
No ratings yet
IFOS Presentation-PAK Mobilink0704
13 pages
Radial Conduction Heat Transfer
No ratings yet
Radial Conduction Heat Transfer
9 pages
Aristotle On Matter
No ratings yet
Aristotle On Matter
24 pages
Chapter 2 Part 1
No ratings yet
Chapter 2 Part 1
23 pages
Chapter 2
No ratings yet
Chapter 2
23 pages
Fisher Thermo Scientific Catalogue V Dear
100% (1)
Fisher Thermo Scientific Catalogue V Dear
72 pages
Chapter-4 Basic of Statistics
No ratings yet
Chapter-4 Basic of Statistics
4 pages
OmniStudio Build Simple Integration Procedures
No ratings yet
OmniStudio Build Simple Integration Procedures
11 pages
Dual Operational Amplifier: General Description Package Outline
No ratings yet
Dual Operational Amplifier: General Description Package Outline
4 pages
Moment Gradient Factor For Steel I-Beams
No ratings yet
Moment Gradient Factor For Steel I-Beams
20 pages
Dr. Devang Sharma
No ratings yet
Dr. Devang Sharma
6 pages
Grammar Jeopardy: Modal Auxiliaries, Relative Adverbs, & Relative Pronouns
No ratings yet
Grammar Jeopardy: Modal Auxiliaries, Relative Adverbs, & Relative Pronouns
18 pages
Lec1 PDF
No ratings yet
Lec1 PDF
28 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
49 pages
Discussion Forum Unit 5
No ratings yet
Discussion Forum Unit 5
2 pages
CBSE Class 11 Biology Sample Paper Set 4
No ratings yet
CBSE Class 11 Biology Sample Paper Set 4
3 pages
Earned Value Analysis-15-12-2016 - AH PDF
No ratings yet
Earned Value Analysis-15-12-2016 - AH PDF
17 pages
Lesson 1.1 System and Surroundings
No ratings yet
Lesson 1.1 System and Surroundings
12 pages
DVE Viscometer
No ratings yet
DVE Viscometer
1 page
Effect of Grist
No ratings yet
Effect of Grist
9 pages
Homomorphism
No ratings yet
Homomorphism
10 pages
Automatic High Beam Controller For Vehicles
No ratings yet
Automatic High Beam Controller For Vehicles
6 pages
Tables and Formulas Used in Dry-Run
No ratings yet
Tables and Formulas Used in Dry-Run
3 pages
CHEM 113-Quiz #7 Answer Key
No ratings yet
CHEM 113-Quiz #7 Answer Key
4 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
From Everand
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
Dr. GEETHA N DATA SCIENTIST, BENGALURU
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Da (22C01156)

Uploaded by

Da (22C01156)

Uploaded by

Data Analytics

Faculty In charge: Dr. Sheetal Nitin

Title: A Data Analytics Approach to Understanding

CLASS & SECTION: _V BCA C_

1 22C01156 SHEETAL KEMPANNAVAR

This is to certify that the team has

Sl.no Contents Pages

5 Data Reformatting and

Title:A Data Analytics Approach to Understanding Student

This report presents a data analytics case study developed

1. State the problem you are trying to solve.

'data.frame': 40 obs. of 2 variables:

Data Reformatting and Cleaning

Method 3: Auto adjusting the

> print(score$HOURS<-ifelse(is.na(score$HOURS),ave(score$HOURS, FUN = function(x)

[1] 2.500000 5.100000 3.200000 8.500000 3.500000 1.500000 9.200000

Preprocessing for scores

[1] 21.00000 47.00000 27.00000 75.00000 30.00000 20.00000 88.00000 58.72973

print(score$SCORE <- as.numeric(format(round(score$SCORE,0))))

abline (h=mean (score$SCORE))

Building linear model for prediction

Create the Linear Model Using lm()

Let us compute the coefficients: (a) Intercept (b) HS$NoOfHours

lm(formula = score$SCORE ~ score$HOURS)

lm(formula = score$SCORE ~ score$HOURS)

Min 1Q Median 3Q Max

-42.88 -11.21 -0.34 10.45 41.55

Estimate Std. Error t value Pr(>|t|)

(Intercept) 22.980 5.099 4.506 6.12e-05 ***

score$HOURS 6.575 0.822 7.999 1.14e-09 ***

Residual standard error: 15.5 on 38 degrees of freedom

Multiple R-squared: 0.6274, Adjusted R-squared: 0.6176

F-statistic: 63.98 on 1 and 38 DF, p-value: 1.144e-09

> ResHS <- resid(model)

> data <-

Hours Actualscore Predictedscore Residuals....ResHS

> new<-na. omit(score)

> model <- lm(new$SCORE ~ new$HOURS)

Residual standard error: 15.5 on 38 degrees of freedom

plot(new$HOURS, new$SCORE, co1='blue', main = 'Linear Regression',

print ('CSV file written Successfully :)')

data <-data.frame(Hours=new$HOURS, Actualscore=new$SCORE,

Hours Actualscore Predictedscore Residuals....ResHS

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.