Lab Experiment No. 3 Part A Part B Name: Dhruv Jain SAP ID: 60004190030 Div/Batch: A/A2 Aim
Lab Experiment No. 3 Part A Part B Name: Dhruv Jain SAP ID: 60004190030 Div/Batch: A/A2 Aim
3 PART A PART B
Name: Dhruv Jain
SAP ID: 60004190030
Div/Batch: A/A2
AIM:
1. Single Variate
2. Multi variate
Description:
Part A:
Part B:
Theory:
Univariate Data:
When we conduct a study that looks at only one variable, we say that we are working with univariate
data. Suppose, for example, that we conducted a survey to estimate the average weight of high school
students. Since we are only working with one variable (weight), we would be working with univariate
data.
Multivariate Data:
Multivariate means involving multiple dependent variables resulting in one outcome. This explains that
the majority of the problems in the real world are Multivariate. For example, we cannot predict the
weather of any year based on the season. There are multiple factors like pollution, humidity, precipitation,
etc.
Linear Regression:
Linear regression analysis is used to predict the value of a variable based on the value of another variable.
The variable you want to predict is called the dependent variable. The variable you are using to predict
the other variable's value is called the independent variable.
This form of analysis estimates the coefficients of the linear equation, involving one or more independent
variables that best predict the value of the dependent variable. Linear regression fits a straight line or
surface that minimizes the discrepancies between predicted and actual output values. There are simple
linear regression calculators that use a “least squares” method to discover the best-fit line for a set of
paired data. You then estimate the value of X (dependent variable) from Y (independent variable).
Part A:
Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
weather = pd.read_csv('weatherHistory.csv')
temp = weather['Temperature (C)']
humidity = weather['Humidity']
plt.xlabel('Tempperature')
plt.ylabel('Humidity')
plt.scatter(temp, humidity, s=5)
plt.show()
X_train, X_test, y_train, y_test = train_test_split(temp, humidity)
LR = LinearRegression()
LR.fit(X_train.values.reshape(-1, 1), y_train.values.reshape(-1, 1))
y_pred = LR.predict(X_test.values.reshape(-1,1))
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder
weather = pd.read_csv('weatherHistory.csv')
LE = LabelEncoder()
LE.fit(weather['Daily Summary'])
weather['Daily Summary'] = LE.transform(weather['Daily Summary'])
weather.drop(['Formatted Date', 'Summary', 'Precip Type'], inplace=True, axis=1)
LR = LinearRegression()
Y = weather.pop('Humidity')
X = weather
X_train, X_test, y_train, y_test = train_test_split(X, Y)
LR.fit(X_train, y_train)
y_pred = LR.predict(X_test)
print("Accuracy of multivariate Linear Regression: ",LR.score(X_test, y_test)*100)
Output:
Conclusion:
Using linear regression to fit univariate and multivariate data is an effective way to predict
regression based values. Instances where there is a positive correlation between variables and the
function is continuous, we can predict data using regression. Regression analyses input variables
and predicts values be it one to one mapping (univariate) or many to one mapping (multivariate).
Since multivariate regression involves more than one factor that determines the output, it is
difficult to visualize such predictions.