Linear Regression
Linear Regression
×
Become a GenAI Professional: 10+ Projects, 26+ Tools , 75+ Mentorship Sessions
Explore Program
Home
Gallabox
OPEN
Sign Up For Free
11 minutes
Introduction
Linear Regression is a commonly used supervised Machine Learning algorithm for data science learners that predicts
continuous values. Linear Regression assumes that there is a linear relationship present between dependent and
independent variables. In simple words, it finds the best fitting line/plane that describes two or more variables.On the
other hand, Logistic Regression is another supervised Machine Learning algorithm that helps fundamentally in binary
classification (separating discreet values).
Although the usage of Linear Regression vs Logistic Regression algorithm is completely different, mathematically we
can observe that with an additional step we can convert Linear Regression into Logistic Regression.
Prerequisite:
This tutorial requires you to have Python 3 and Pip3 installed on your local computer. To install Python and python
libraries, In case you are unfamiliar with basics of Python, do have a look at our free python tutorial of Introduction to
Python, For other tutorials of data science and machine learning course you can visit
https://courses.analyticsvidhya.com/.
Learning Objectives
This beginners tutorial will give you a brief about linear regression and logistic regression along with similarities and
difference.
You will learn step by step how to calculate linear regression and logistic regression
Both of the machine learning models are very important for data scientist as well as for those preparing for data
science and artificial intelligence. at last you will learn about similarities and diffrences between linear regression
and logistic regression .
This article was published as a part of the Data Science Blogathon.
Table of Contents
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
1. Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field
of machine learning.
2. In some situations regression analysis can be used to infer causal relationships between the independent and
dependent variables.
Types of Regression:
1.Simple Linear Regression: Simple Linear Regression is the model that estimates relationship between one
independent variable and one dependent variable or target variable using a straight line.
2.Multiple Linear Regression: Multiple linear regression is a model that is used to analyze the relationship between two
or more independent variables and single dependent variable or target variable.
Step 1
Let’s assume that we have a dataset where x is the independent variable and Y is a function of x (Y=f(x)). Thus, by using
Linear Regression we can form the following equation (equation for the best-fitted line):
“ Y = mx + c
Step 2
Now, to derive the best-fitted line, first, we assign random values to m and c and calculate the corresponding value of
the given training data points Y for a given x. This Y value is the output value.
Step 3
Now, as we have our calculated output value (let’s represent it as ŷ), we can verify whether our prediction is accurate or
not. In the case of Linear Regression, we calculate this error (residual) by using the MSE method (mean squared error)
and we name it as loss function:
Step 4
To achieve the best-fitted line, we have to minimize the value of the loss function. To minimize the loss function, we use a
technique called gradient descent. Let’s discuss how gradient descent works (although I will not dig into detail as this is
not the focus of this article).
Gradient Descent
A Cost Function is a mathematical formula used to calculate the error, difference between predicted value and the
actual value. If we look at the formula for the loss function, it’s the ‘mean square error’ means the error is represented in
second-order terms. If we plot the loss function for the weight (in our equation weights are m and c), it will be a
parabolic curve. Now as our moto is to minimize the loss function, we have to reach the bottom of the curve.
To achieve this we should take the first-order derivative of the loss function for the weights (m and c). Then we will
subtract the result of the derivative from the initial weight multiplying with a learning rate (α). We will keep repeating
this step until we reach the minimum value (we call it global minima). We fix a threshold of a very small value (example:
0.0001) as global minima. If we don’t set the threshold value then it may take forever to reach the exact zero value.
Step 5
Once the loss function is minimized, we get the final equation for the best-fitted line and we can predict the value of Y
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
for any given X.
agree to our Privacy Policy and Terms of Use. Accept
This is where Linear
Regression Regression
Analysis ends
| Beginners and we are justGuide
Comprehensive one step away from
(Updated reaching to Logistic Regression.
2024)
Step 1
To calculate the binary separation, first, we determine the best-fitted line by following the Linear Regression steps.
Step 2
The regression line we get from Linear Regression is highly susceptible to outliers. Thus it will not do a good job in
classifying two classes.
Thus, the predicted value gets converted into probability by feeding it to the sigmoid function.
The logistic regression hypothesis generalizes from the linear regression hypothesis that it uses the logistic function is
also known as sigmoid function(activation function).
As we can see in Fig 3, we can feed any real number to the sigmoid function and it will return a value between 0 and 1.
Now as we have the basic idea that how Linear Regression and Logistic Regression are related, let us revisit the process
with an example.
So we can figure out that this is a regression problem where we will build a Linear Regression model. We will train the
model with provided Height and Weight values. Once the model is trained we can predict Weight for a given unknown
Height value.
Now suppose we have an additional field Obesity and we have to classify whether a person is obese or not depending on
their provided height and weight. This is clearly a classification problem where we have to segregate the dataset into
two classes (Obese and Not-Obese).
So, for the new problem, we can again follow the Linear Regression steps and build a regression line. This time, the line
will be based on two parameters Height and Weight and the regression line will fit between two discreet sets of values.
As this regression line is highly susceptible to outliers, it will not do a good job in classifying two classes.
To get a better classification, we will feed the output values from the regression line to the sigmoid function. The
sigmoid function returns the probability for each output value from the regression line. Now based on a predefined
threshold value, we can easily classify the output into two classes Obese or Not-Obese.
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
Confusion Matrix
A confusion matrix is table that define the performance of classification algorithm.It visualizes and summarizes
performance of classification algorithm.The most frequently used performance metrics for classification according to
these values are accuracy (ACC), precision (P), sensitivity (Sn), specificity (Sp), and F-score values.
Overfitting
It occurs when our model tries to cover all the data points or more than the required data points present in the given
dataset.
Finally, we can summarize the similarities and differences between these two models.
The linear and logistic probability models are given by the following equations:
Where p = probability.
From eq 1 and 2, probability (p) is considered a linear function of the regressors for the linear model. Whereas, for the
logistic model, the log odds p/(1-p) are considered a regressors’ linear function.
Linear Regression and Logistic Regression, both the models are parametric regression i.e. both the models use linear
equations for predictions
However, functionality-wise these two are completely different. Following are the differences.
Conclusion
Linear Regression and Logistic Regression both are supervised Machine Learning algorithms.Linear Regression and
Logistic Regression, both the models are parametric regression i.e. both the models use linear equations for
predictions.Logistic regression is considered generalized linear model because the outcome depends on the sum of the
inputs and parameters
Key Takeaways
Q2. What’s the difference between logistic regression and linear regression?
A. Linear Regression is used to solve Regression problems where as Logistic Regression is used to solve Classification
problems.
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you