0% found this document useful (0 votes)
26 views5 pages

Cl-Vii Ass2 4301063

The document discusses generating a 2D dataset, splitting it into training and test sets, performing linear regression with least squares method, analyzing training and test MSE, bias-variance tradeoff, cross validation, and subset selection. Code is provided to implement linear regression on a head size and brain weight dataset, calculate MSE, and determine R2 score.

Uploaded by

ATHARVA SHINDE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views5 pages

Cl-Vii Ass2 4301063

The document discusses generating a 2D dataset, splitting it into training and test sets, performing linear regression with least squares method, analyzing training and test MSE, bias-variance tradeoff, cross validation, and subset selection. Code is provided to implement linear regression on a head size and brain weight dataset, calculate MSE, and determine R2 score.

Uploaded by

ATHARVA SHINDE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Name: Akkshada Anil Jagtap Roll No.

: 4301002

Assignment No. 2

Aim: Generate a proper 2-D data set of N points. Split the data set
into Training Data set and Test Data set.
i) Perform linear regression analysis with Least Squares Method.
ii) Plot the graphs for Training MSE and Test MSE and comment on Curve Fitting and
Generalization Error.
iii) Verify the Effect of Data Set Size and Bias-Variance Tradeoff.
iv) Apply Cross Validation and plot the graphs for errors.
v) Apply Subset Selection Method and plot the graphs for errors.
vi) Describe your findings in each case

Theory :

Mean Squared Error: General steps to calculate MSE the from a set of
X and Y values:
1. Find the regression line.
2. Insert your X values into the linear regression equation to find the new Y values (Y').
3. Subtract the new Y value from the original to get the error.
4. Square the errors.
5. Add up the errors.
6. Find the mean.

Simple Linear Regression :


When we have a single input attribute (x) and we want to use linear regression, this is
called simple linear regression.
If we had multiple input attributes (e.g. x1, x2, x3, etc.) This would be called multiple linear
regression. The procedure for linear regression is different and simpler than that for
multiple linear regression, so it is a good place to start.
In this section we are going to create a simple linear regression model from our training
data, then make predictions for our training data to get an idea of how well the model
learned the relationship in the data. With simple linear regression we want to model our
data as follows: y = B0 + B1 * x This is a line where y is the output variable we want to
predict, x is the input variable we know and B0 and B1 are coefficients that we need to
estimate that move the line around.
Technically, B0 is called the intercept because it determines where the line intercepts the y-
axis. In machine learning we can call this the bias, because it is added to offset all
predictions that we make. The B1 term is called the slope because it defines the slope of the
line or how x translates into a y value before we add our bias. The goal is to find the best
estimates for the coefficients to minimize the errors in predicting y from x. Simple
regression is great, because rather than having to search for values by trial and error or
calculate them analytically using more advanced linear algebra, we can estimate them
directly from our data.
The goal is to find the best estimates for the coefficients to minimize the errors in
predicting y from x. Simple regression is great, because rather than having to search for
values by trial and error or calculate them analytically using more advanced linear algebra,
we can estimate them directly from our data. We can start off by estimating the value for
B1 as:
B1 = ∑ ( 𝑋 𝑖 − 𝑋) ∗ ( 𝑌 𝑖 − 𝑌) / ∑ ( 𝑋 𝑖 − 𝑋^2)
Where mean() is the average value for the variable in our dataset. The xi and yi refer to the
fact that we need to repeat these calculations across all values in our dataset and i refers to
the i’th value of x or y. We can calculate B0 using B1 and some statistics from our dataset,
as follows:
B0 =( 𝐵 1 ∗ 𝑋)

Implementation:
# Making imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data=pd.read_csv("headbrain.csv")

data.head()

Gender Age Range Head Size(cm^3) Brain Weight(grams)


0 1 1 4512 1530
1 1 1 3738 1297
2 1 1 4261 1335
3 1 1 3777 1282
4 1 1 4177 1590
Let us divide data set in training and testing data sets
X_train =data[:200]
X_test =data[200:]

X = X_train['Head Size(cm^3)'].values
Y = X_train['Brain Weight(grams)'].values
mean_x = np.mean(X)
mean_y = np.mean(Y)

print("Printing mean x")


print(mean_x)
print("Printing mean y")
print(mean_y)
m = len(X)
print("Number of samples in training set")
print(m)
numer = 0
denom = 0
for i in range(m):
numer += (X[i] - mean_x) * (Y[i] - mean_y)
denom += (X[i] - mean_x) ** 2
b1 = numer / denom
b0 = mean_y - (b1 * mean_x)
print("Coefficient and bias is as follows")
print(b1, b0)

Printing mean x
3679.225
Printing mean y
1299.01
Number of samples in training set
200
Coefficient and bias is as follows
0.24984027563731726 379.79141186829145

Plotting Values and Regression Line


max_x = np.max(X)
min_x = np.min(X)

Calculating line values x and y


x = np.linspace(min_x, max_x)
y = b0 + b1 * x

Ploting Line & Scatter Points


#Plotting Line
plt.plot(x, y, color='YELLOW', label='Regression Line')
# Ploting Scatter Points
plt.scatter(X, Y, c='GREEN', label='Scatter Plot headsize vs brain
wt')
plt.xlabel('Head Size in cm3')
plt.ylabel('Brain Weight in grams')
plt.legend()
plt.show()

Calculating Root Mean Squares Error


sse = 0
for i in range(m):
y_pred = b0 + b1 * X[i] #y_pred i.e brain wt= bo=b1* head size
sse += (Y[i] - y_pred) ** 2 #Y[i] means brain weight
print("Meansqaureerrorofbrainwt oftraindata",sse)
mse = sse/m
rmse = np.sqrt(mse)
print("Root Mean sqaure error is",rmse)

Meansqaureerrorofbrainwt oftraindata 1069588.9925686093


Root Mean sqaure error is 73.12964489755879
Score of determination starts here.The coefficient of determination
(denoted by R2) is a key output of regression analysis.The coefficient
of determination is the square of the correlation (r) between
predicted y scores and actual y scores; thus, it ranges from 0 to 1.
ss_t = 0
ss_r = 0
for i in range(m):
y_pred = b0 + b1 * X[i]
ss_t += (Y[i] - mean_y) ** 2
ss_r += (Y[i] - y_pred) ** 2
scorer2 = 1 - (ss_r/ss_t)
print("R^2 score for training data is",scorer2)

R^2 score for training data is 0.5949551398852462

Conclusion: Hence,we successfully studied to generate a proper 2-D


data set of N points. Split the data set into Training Data set and Test
Data set.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy