0% found this document useful (0 votes)
65 views8 pages

ML Assignment No. 1: 1.1 Title

This document outlines an assignment on linear regression. It provides: 1) Details of the assignment such as the title, problem definition, prerequisites, software/hardware requirements, and learning objectives. 2) An explanation of linear regression theory concepts including definitions of linear regression, prerequisites for regression, the least squares regression line, coefficient of determination, and standard error. 3) An example problem to find the regression equation from data and use it to predict a student's grade based on their aptitude test score.

Uploaded by

Kirti Phegade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views8 pages

ML Assignment No. 1: 1.1 Title

This document outlines an assignment on linear regression. It provides: 1) Details of the assignment such as the title, problem definition, prerequisites, software/hardware requirements, and learning objectives. 2) An explanation of linear regression theory concepts including definitions of linear regression, prerequisites for regression, the least squares regression line, coefficient of determination, and standard error. 3) An example problem to find the regression equation from data and use it to predict a student's grade based on their aptitude test score.

Uploaded by

Kirti Phegade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

Lab Practices-III Fourth Year Computer Engineering

Engineering

ML Assignment No. 1
R C V T Total Dated Sign
(2) (4) (2) (2) (10)

1.1 Title

Assignment based on Linear Regression.

1.2 Problem Definition:

The following table shows the results of a recently conducted study on the correlation of the number of
hours spent driving with the risk of developing acute backache. Find the equation of the best fit line for this
data.

1.3 Prerequisite:

Basic of Python, Data Mining Algorithm

1.4 Software Requirements:

Anaconda with Python 3.7

1.5 Hardware Requirement:

PIV, 2GB RAM, 500 GB HDD, Lenovo A13-4089Model.

1.6 Learning Objectives:

Learn Linear Regression for given different Dataset

1.7 Outcomes:

After completion of this assignment students are able to understand the How to find the correlation between
to Two variable, How to Calculate Accuracy of the Linear Model and how to plot graph using matplotlib.

1.8 Theory Concepts:

1.8.1 Linear Regression

Regression analysis is used in stats to find trends in data. For example, you might guess that there’s a
connection between how much you eat and how much you weight; regression analysis can help you
quantify that.

What is Linear Regression?

In a cause and effect relationship, the independent variable is the cause, and the dependent variable is
the effect. Least squares linear regression is a method for predicting the value of a dependent variable Y,
based on the value of an independent variable X.
Lab Practices-III Fourth Year Computer Engineering
Engineering

Prerequisites for Regression

Simple linear regression is appropriate when the following conditions are satisfied.

 The dependent variable Y has a linear relationship to the independent variable X. To check this,
make sure that the XY scatterplot is linear and that the residual plot shows a random pattern. For
each value of X, the probability distribution of Y has the same standard deviation σ.
 When this condition is satisfied, the variability of the residuals will be relatively constant across all
values of X, which is easily checked in a residual plot.
 For any given value of X,

 The Y values are independent, as indicated by a random pattern on the residual plot.
 The Y values are roughly normally distributed (i.e., symmetric and unimodal). A
little skewness is ok if the sample size is large. A histogram or a dotplot will show the shape
of the distribution.

The Least Squares Regression Line

Linear regression finds the straight line, called the least squares regression line or LSRL, that best
represents observations in a bivariate data set. Suppose Y is a dependent variable, and X is an independent
variable. The population regression line is:

Y = Β0 + Β1X

where Β0 is a constant, Β1 is the regression coefficient, X is the value of the independent variable, and Y is
the value of the dependent variable.

Given a random sample of observations, the population regression line is estimated by:

ŷ = b0 + b1x

where b0 is a constant, b1 is the regression coefficient, x is the value of the independent variable, and ŷ is
the predictedvalue of the dependent variable.

How to Define a Regression Line

Normally, you will use a computational tool - a software package (e.g., Excel) or a graphing calculator - to
find b0 and b1. You enter the X and Y values into your program or calculator, and the tool solves for each
parameter.

In the unlikely event that you find yourself on a desert island without a computer or a graphing calculator,
you can solve for b0 and b1 "by hand". Here are the equations.

b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x)2]


Lab Practices-III Fourth Year Computer Engineering
b1 = r * (sy / sx) Engineering

b0 = y - b1 * x

where b0 is the constant in the regression equation, b1 is the regression coefficient, r is the correlation
between x and y, xi is the X value of observation i, yi is the Y value of observation i, x is the mean of X, y is
the mean of Y, sx is the standard deviation of X, and sy is the standard deviation of Y.

Properties of the Regression Line

When the regression parameters (b0 and b1) are defined as described above, the regression line has the
following properties.

 The line minimizes the sum of squared differences between observed values (the y values) and
predicted values (the ŷ values computed from the regression equation).
 The regression line passes through the mean of the X values (x) and through the mean of
the Y values (y).
 The regression constant (b0) is equal to the y intercept of the regression line.
 The regression coefficient (b1) is the average change in the dependent variable (Y) for a 1-unit
change in the independent variable (X). It is the slope of the regression line.

The least squares regression line is the only straight line that has all of these properties.

The Coefficient of Determination

The coefficient of determination (denoted by R2) is a key output of regression analysis. It is interpreted as


the proportion of the variance in the dependent variable that is predictable from the independent variable.

 The coefficient of determination ranges from 0 to 1.


 An R2 of 0 means that the dependent variable cannot be predicted from the independent variable.
 An R2 of 1 means the dependent variable can be predicted without error from the independent
variable.
 An R2 between 0 and 1 indicates the extent to which the dependent variable is predictable. An R 2 of
0.10 means that 10 percent of the variance in Y is predictable from X; an R2 of 0.20 means that 20
percent is predictable; and so on.

The formula for computing the coefficient of determination for a linear regression model with one
independent variable is given below.

Coefficient of determination. The coefficient of determination (R2) for a linear regression model with one
independent variable is:

R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ]


/ (σx * σy ) }2
Lab Practices-III Fourth Year Computer Engineering
Engineering
where N is the number of observations used to fit the model, Σ is the summation symbol, xi is the x value
for observation i, x is the mean x value, yi is the y value for observation i, y is the mean y value, σx is the
standard deviation of x, and σy is the standard deviation of y.

If you know the linear correlation (r) between two variables, then the coefficient of determination (R2) is
easily computed using the following formula: R2 = r2.

Standard Error

The standard error about the regression line (often denoted by SE) is a measure of the average amount
that the regression equation over- or under-predicts. The higher the coefficient of determination, the lower
the standard error; and the more accurate predictions are likely to be.

Example

Last year, five randomly selected students took a math aptitude test before they began their statistics course.
The Statistics Department has three questions.

 What linear regression equation best predicts statistics performance, based on math aptitude scores?
 If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics?
 How well does the regression equation fit the data?

How to Find the Regression Equation

In the table below, the xi column shows scores on the aptitude test. Similarly, the y i column shows statistics
grades. The last two columns show deviations scores - the difference between the student's score and the
average score on each test. The last two rows show sums and mean scores that we will use to conduct the
regression analysis.

Student xi yi (xi-x) (yi-y)


1 95 85 17 8
2 85 95 7 18
3 80 70 2 -7
4 70 65 -8 -12
5 60 70 -18 -7
Sum 390 385
Mean 78 77

And for each student, we also need to compute the squares of the deviation scores (the last two columns in
the table below).

Student xi yi (xi-x)2 (yi-y)2


1 95 85 289 64
2 85 95 49 324
3 80 70 4 49
Lab Practices-III Fourth Year Computer Engineering
4 70 65 64Engineering 144
5 60 70 324 49
Sum 390 385 730 630
Mean 78 77

And finally, for each student, we need to compute the product of the deviation scores.

Student xi yi (xi-x)(yi-y)
1 95 85 136
2 85 95 126
3 80 70 -14
4 70 65 96
5 60 70 126
Sum 390 385 470
Mean 78 77

The regression equation is a linear equation of the form: ŷ = b 0 + b1x . To conduct a regression analysis, we
need to solve for b0 and b1. Computations are shown below. Notice that all of our inputs for the regression
analysis come from the above three tables.

First, we solve for the regression coefficient (b1):

b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x)2]

b1 = 470/730

b1 = 0.644

Once we know the value of the regression coefficient (b1), we can solve for the regression slope (b0):

b0 = y - b1 * x

b0 = 77 - (0.644)(78)

b0 = 26.768

Therefore, the regression equation is: ŷ = 26.768 + 0.644x .

How to Use the Regression Equation

Once you have the regression equation, using it is a snap. Choose a value for the independent variable (x),
perform the computation, and you have an estimated value (ŷ) for the dependent variable.
Lab Practices-III Fourth Year Computer Engineering
In our example, the independent variable is the student's score on the Engineering
aptitude test. The dependent variable
is the student's statistics grade. If a student made an 80 on the aptitude test, the estimated statistics grade (ŷ)
would be:

ŷ = b0 + b1x

ŷ = 26.768 + 0.644x = 26.768 + 0.644 * 80

ŷ = 26.768 + 51.52 = 78.288

Warning: When you use a regression equation, do not use values for the independent variable that are
outside the range of values used to create the equation. That is called extrapolation, and it can produce
unreasonable estimates.

In this example, the aptitude test scores used to create the regression equation ranged from 60 to 95.
Therefore, only use values inside that range to estimate statistics grades. Using values outside that range
(less than 60 or greater than 95) is problematic.

How to Find the Coefficient of Determination

Whenever you use a regression equation, you should ask how well the equation fits the data. One way to
assess fit is to check the coefficient of determination, which can be computed from the following formula.

R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ] / (σx * σy ) }2

where N is the number of observations used to fit the model, Σ is the summation symbol, x i is the x value
for observation i, x is the mean x value, yi is the y value for observation i, y is the mean y value, σx is the
standard deviation of x, and σy is the standard deviation of y.

Computations for the sample problem of this lesson are shown below. We begin by computing the standard
deviation of x (σx):

σx = sqrt [ Σ ( xi - x )2 / N ]

σx = sqrt( 730/5 ) = sqrt(146) = 12.083

Next, we find the standard deviation of y, (σy):

σy = sqrt [ Σ ( yi - y )2 / N ]

σy = sqrt( 630/5 ) = sqrt(126) = 11.225

And finally, we compute the coefficient of determination (R2):

R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ] / (σx * σy ) }2

R2 = [ ( 1/5 ) * 470 / ( 12.083 * 11.225 ) ]2


Lab Practices-III Fourth Year Computer Engineering
2 2 2 Engineering
R  = ( 94 / 135.632 )  = ( 0.693 )  = 0.48

A coefficient of determination equal to 0.48 indicates that about 48% of the variation in statistics grades
(thedependent variable) can be explained by the relationship to math aptitude scores (the independent
variable). This would be considered a good fit to the data, in the sense that it would substantially improve
an educator's ability to predict student performance in statistics class.

Residuals

The difference between the observed value of the dependent variable (y) and the predicted value (ŷ) is
called the residual (e). Each data point has one residual.

Residual = Observed value - Predicted value 


e = y - ŷ

Both the sum and the mean of the residuals are equal to zero. That is, Σ e = 0 and e = 0.

What is Best Fit Line- A line of best fit  (or "trend" line) is a straight line that best represents the data on a
scatter plot.  This line may pass through some of the points, none of the points, or all of the points.

1.9 Given Dataset in Our Definition-

1.10 Algorithm

1. Import the Required Packages


2. Read Given Dataset
3. Import the Linear Regression and Create object of it
4. Find the Accuracy of Model using Score Function
5. Predict the value using Regressor Object
6. Take input from user.
7. Calculate the value of y
8. Draw Scatter Plot
Lab Practices-III Fourth Year Computer Engineering
Engineering
1.11 Important Function Used for Linear Regression

1. coef_ it is used to calculate slope in ML Model


2. Intercept_ it is used to calculate intercept in ML Model
3. fit - it shows the relationship between two varraible
4. score – it display accuracy score of model

1.12 Scatter Plot generated after Implementation of Code

1.9 Conclusion

Thus we learn that to how to find the trend of data using X as Independent Variable and Y is and
Dependent Variable by using Linear Regression.

1.10 Assignment Questions

1. What is Linear Regression?


2. What is Sigma?
3. What is Mu?
4. What is Probability of Distribution in term of Binominal and Frequency Distribution?
5. What is Standard Deviation?
6. What is Coefficient of Determination?

References:-

1.https://stattrek.com/regression/linear-regression.aspx?Tutorial=AP
2.Mittu Skillologies Youtube Channel

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy