Cl-Vii Ass2 4301063
Cl-Vii Ass2 4301063
: 4301002
Assignment No. 2
Aim: Generate a proper 2-D data set of N points. Split the data set
into Training Data set and Test Data set.
i) Perform linear regression analysis with Least Squares Method.
ii) Plot the graphs for Training MSE and Test MSE and comment on Curve Fitting and
Generalization Error.
iii) Verify the Effect of Data Set Size and Bias-Variance Tradeoff.
iv) Apply Cross Validation and plot the graphs for errors.
v) Apply Subset Selection Method and plot the graphs for errors.
vi) Describe your findings in each case
Theory :
Mean Squared Error: General steps to calculate MSE the from a set of
X and Y values:
1. Find the regression line.
2. Insert your X values into the linear regression equation to find the new Y values (Y').
3. Subtract the new Y value from the original to get the error.
4. Square the errors.
5. Add up the errors.
6. Find the mean.
Implementation:
# Making imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data=pd.read_csv("headbrain.csv")
data.head()
X = X_train['Head Size(cm^3)'].values
Y = X_train['Brain Weight(grams)'].values
mean_x = np.mean(X)
mean_y = np.mean(Y)
Printing mean x
3679.225
Printing mean y
1299.01
Number of samples in training set
200
Coefficient and bias is as follows
0.24984027563731726 379.79141186829145