0% found this document useful (0 votes)

5 views31 pages

Unit 2 Regression

The document provides an overview of regression analysis, including methods such as least squares, linear regression, and multiple regression, along with their applications in prediction and causal inference. It explains key concepts like slope, intercept, and standard error of regression, and outlines various types of regression such as polynomial and logistic regression. Additionally, it discusses the importance of measuring the goodness-of-fit through metrics like the standard error and correlation coefficient.

Uploaded by

Ramdas Darade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views31 pages

Unit 2 Regression

Uploaded by

Ramdas Darade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Regression :

By
Prof. R. B. Darade
• UNIT-II
• Regression:
• Curve fitting by the method of least squares, fitting the lines y= a + bx and x = a
• + by, Multiple regression, standard error of regression– Pharmaceutical Examples
• Probability:
• Definition of probability, Binomial distribution, Normal distribution, Poisson’s
• distribution,properties–
• problems,Sample,Population,largesample,smallsample,Null
• hypothesis,alternativehypothesis,sampling,essenceofsampling,typesofsampling,
• Error-I type, Error-II type, Standard error of mean (SEM) - Pharmaceutical
• examples
• Parametric test:
• t-test(Sample, Pooled or Unpaired and Paired),ANOVA,(Oneway and Two
• way),Least Significance difference
WHAT IS REGRESSION ANALYSIS?

Regression analysis is the statistical method used to

determine the structure of a relationship between
two variables (single linear regression) or three or
more variables (multiple regression).
Slope is used to describe the steepness of a line.
The definition of slope is the rise of a line over the
run of a line, or the change in the vertical direction
(y) over the change in the horizontal direction (x).
The point
where the line
or curve
crosses the
axis of the
graph is called
intercept.
In statistical modeling, regression analysis is a set of statistical
processes for estimating the relationships between a dependent
variable /outcome or response variable, or a label in machine
learning parlance and one or more error-free independent
variables/ input often called regressors, predictors, covariates,
explanatory variables or features

The most common form of regression analysis is linear regression,

in which one finds the line (or a more complex linear combination)
that most closely fits the data according to a specific mathematical
criterion.
Regression analysis is primarily used for two conceptually distinct
purposes.
 First, regression analysis is widely used for prediction and
forecasting, where its use has substantial overlap with the field of
machine learning.
 Second, in some situations regression analysis can be used to infer
causal relationships between the independent and dependent
variables.
Regression model
Prediction (interpolation and extrapolation)
Regression models predict a value of the Y variable given known values of the X
variables.
Prediction within the range of values in the dataset used for model-fitting is known
informally as interpolation. Prediction outside this range of the data is known as
extrapolation.
Performing extrapolation relies strongly on the regression assumptions. The
further the extrapolation goes outside the data, the more room there is for the
model to fail due to differences between the assumptions and the sample data or
the true values.
In the middle, the fitted straight line represents the best balance between the points
above and below this line. The dotted straight lines represent the two extreme
lines, considering only the variation in the slope. The inner curves represent the
estimated range of values considering the variation in both slope and intercept. The
outer curves represent a prediction for a new measurement.
In regression analysis, least squares is a parameter estimation method based on minimizing the
sum of the squares of the residuals (a residual being the difference between an observed value
and the fitted value provided by a model) made in the results of each individual equation. (More
simply, least squares is a mathematical procedure for finding the best-fitting curve to a given set
of points by minimizing the sum of the squares of the offsets ("the residuals") of the points
from the curve.)
The most important application is in data fitting. When the problem has substantial
uncertainties in the independent variable (the x variable), then simple regression and least-
squares methods have problems; in such cases, the methodology required for fitting errors-in-
variables models may be considered instead of that for least squares.
Least squares problems fall into two categories: linear or ordinary least squares and nonlinear
least squares, depending on whether or not the model functions are linear in all unknowns.
The result of fitting a set
Conic fitting a set of points using least-squares
of data points with a
approximation
quadratic function
Least Square method is a fundamental mathematical technique
widely used in data analysis, statistics, and regression
modeling to identify the best-fitting curve or line for a given set of
data points. This method ensures that the overall error is reduced,
providing a highly accurate model for predicting future data trends.

In statistics, when the data can be represented on a cartesian plane by

using the independent and dependent variable as the x and y
coordinates, it is called scatter data. This data might not be useful in
making interpretations or predicting the values of the dependent variable
for the independent variable. So, we try to get an equation of a line
that fits best to the given data points with the help of the Least
Square Method.
What is the Least Square Method?
Least Square Method is used to derive a generalized linear equation
between two variables. when the value of the
dependent and independent variable is represented as the x and y
coordinates in a 2D cartesian coordinate system. Initially, known values
are marked on a plot. The plot obtained at this point is called a
scatter plot.

Then, we try to represent all the marked points as a straight line or

a linear equation. The equation of such a line is obtained with the
help of the Least Square method. This is done to get the value of the
dependent variable for an independent variable for which the value
This method aims at minimizing the sum of squares of deviations as much as possible. The line
obtained from such a method is called a regression line or line of best fit.

Ref: https://www.geeksforgeeks.org/least-square-method/
https://www.cuemath.com/data/least-squares/
The sum of squares measures how widely a set
of datapoints is spread out from the mean. It is
also known as variation.
It is calculated by adding together the squared
differences of each data point. To determine
the sum of squares, square the distance
between each data point and the line of best
fit, then add them together. The line of best fit
will minimize this value.
The sum of squares measures the
deviation of data points away
from the mean value. A higher sum
of squares indicates higher
variability while a lower result
indicates low variability from the
mean.
To calculate the sum of squares,
subtract the mean from the data
Formula for Least Square Method
Least Square Method formula is used to find the best-fitting line through a set of
data points. For a simple linear regression, which is a line of the form y=mx+c,
where y is the dependent variable, x is the independent variable, a is the
slope of the line, and b is the y-intercept, the formulas to calculate the slope
(m) and intercept (c) of the line are derived from the following
equations:
1.Slope (m) Formula: m = n(∑xy)−(∑x)(∑y) / n(∑x2)−(∑x)2

2.Intercept (c) Formula: c = (∑y)−a(∑x) / n

Where:
•n is the number of data points,

•∑xy is the sum of the product of each pair of x and y values,

•∑x is the sum of all x values,

•∑y is the sum of all y values,

•∑x2 is the sum of the squares of x values.

A correlation’s strength can be quantified by calculating
the correlation coefficient, sometimes represented by r.
The correlation coefficient falls between negative one
and positive one.

r = -1 indicates a perfect negative correlation.

r = 1 indicates a perfect positive correlation.

r = 0 indicates no correlation.
Types of Regression Analysis

Simple Linear
Studies relationship between two variables (predictor and outcome)
Regression

Multiple Linear
Captures impact of all variables
Regression

Polynomial Regression Finds and represents complex patterns and non-linear relationships

Logistic Regression Estimates probability based on predictor variables

Used in cases with high correlation between variables; can also be used as a
Ridge Regression
regularization method for accuracy

Lasso Regression Used to minimize effect of correlated variables on predictions

Simple Linear Regression
Useful for exploring the relationship
between two continuous variables in
straightforward cause-and-effect
investigations, simple linear regression
is the most basic form of regression
analysis. It involves studying the
relationship between two variables: an
independent variable (the predictor)
and a dependent variable (the
outcome).
Multiple Linear Regression
(MLR)
MLR regression extends the
concept of simple linear
regression by capturing the
combined impact of all
factors, allowing for a more
comprehensive analysis of
how several factors
collectively influence the
outcome.
Polynomial Regression
For non-linear relationships,
polynomial regression accommodates
curves and enables accurate
representation of complex patterns.
This method involves fitting a
polynomial equation to the data,
allowing for more flexible modeling of
complex relationships.

For example, a second order

polynomial regression—also known as
a quadratic regression—can be used
to capture a U-shaped or inverted U-
shaped pattern in the data.
Logistic Regression
Logistic regression
estimates the probability of
an event occurring based
on one or more predictor
variables. In contrast to
linear regression, logistic
regression is designed to
predict categorical
outcomes, which are
typically binary in nature—
for example, yes/no or 0/1.
What is the Standard Error of the Regression (SER)?

The Standard Error of the Regression expresses the degree of

uncertainty in the accuracy of the dependent variable’s
projected values. It conveniently tells you how far off the
regression model is on average by utilising the response
variable’s units. It is also called the SE of the estimate.
Graphically, the relationship is stronger when the actual x,y
data points lie closer to the regression line (errors are smaller).
Why is SER important?

The SER is an absolute measure of how far the data points

The difference between the actual value of the dependent variable y (in the
sample date) and the predicted value of the dependent variable y^ obtained
from the multiple regression model is called the error or residual.
Error=Actual Value−Predicted Value
For the simple linear regression model, the standard error of the estimate
measures the average vertical distance (the error) between the points on the
scatter diagram and the regression line.

Best Ref: https://ecampusontario.pressbooks.pub/introstats/chapter/13-3-standard-error-of-the-estimate/

Standard Error of Regression
Standard error is a statistical technique that is used to find the average
distance between the observed values and the regression line. It defines how
much the actual data is spread around the line. In other words, it can be said
that it provides a measure of how much the actual dependent value deviates
from the predicted value. Since it is an error, therefore lower the value better
will be our prediction.

The standard error of the regression (S) and R-squared are two key goodness-
of-fit measures for regression analysis. While R-squared is the most well-
known amongst the goodness-of-fit statistics, I think it is a bit over-
hyped. The standard error of the regression is also known as residual
References:
https://www.datamation.com/big-data/what-is-regression-
analysis/

DA_UNIT_3_R22
No ratings yet
DA_UNIT_3_R22
15 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Applied Numerical Methods - (NAFTI - Ir)
No ratings yet
Applied Numerical Methods - (NAFTI - Ir)
593 pages
Da Unit III
0% (1)
Da Unit III
43 pages
Unit 2-1
No ratings yet
Unit 2-1
30 pages
Regression
No ratings yet
Regression
60 pages
UNIT-3NEW
No ratings yet
UNIT-3NEW
34 pages
SAIDS- Linear Least Squares.pptx
No ratings yet
SAIDS- Linear Least Squares.pptx
27 pages
Artificial Intelligence and Machine Learning - CS3491 - Notes - Unit 3 - Supervised Learning
No ratings yet
Artificial Intelligence and Machine Learning - CS3491 - Notes - Unit 3 - Supervised Learning
37 pages
da-unit-iii
No ratings yet
da-unit-iii
43 pages
CH 5
No ratings yet
CH 5
36 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Machine Learning QB
No ratings yet
Machine Learning QB
32 pages
Chapter 12 - Wave Forces On Slender Cylinder
No ratings yet
Chapter 12 - Wave Forces On Slender Cylinder
78 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
(Mathe) Simple Linear Regression and Correlation
No ratings yet
(Mathe) Simple Linear Regression and Correlation
61 pages
Module 5.2
No ratings yet
Module 5.2
51 pages
Unit-III
No ratings yet
Unit-III
13 pages
FDSA Unit V LECTURE NOTS
No ratings yet
FDSA Unit V LECTURE NOTS
28 pages
unit-5-ad3491-fundamentals-of-data-science-unit-5-notes (1)
No ratings yet
unit-5-ad3491-fundamentals-of-data-science-unit-5-notes (1)
24 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
AI18
No ratings yet
AI18
11 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
Mohit Final REASEARCH PAPER
No ratings yet
Mohit Final REASEARCH PAPER
20 pages
UNIT-3-1
No ratings yet
UNIT-3-1
41 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Regularities in Stock Markets
No ratings yet
Regularities in Stock Markets
4 pages
Unit III
No ratings yet
Unit III
18 pages
unit-5 -notes
No ratings yet
unit-5 -notes
41 pages
Unit-2
No ratings yet
Unit-2
26 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Ra Web
No ratings yet
Ra Web
70 pages
Rohini 73149042113
No ratings yet
Rohini 73149042113
11 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model for Medical Data
No ratings yet
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model for Medical Data
7 pages
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
10 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
SAJAA(V29N5)+p136-142+3055+FCA+REFRESHER
No ratings yet
SAJAA(V29N5)+p136-142+3055+FCA+REFRESHER
7 pages
A) The Least-Squares Method
No ratings yet
A) The Least-Squares Method
19 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
5 - Part II - Regression Analysis w-notes(1)
No ratings yet
5 - Part II - Regression Analysis w-notes(1)
10 pages
ASHRAE Symposium AC-02!9!4 Cooling Tower Model-Hydeman
No ratings yet
ASHRAE Symposium AC-02!9!4 Cooling Tower Model-Hydeman
10 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
Curve Fitting
No ratings yet
Curve Fitting
16 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Regression Analysis 1 2020
No ratings yet
Regression Analysis 1 2020
40 pages
Linear Regression Analysis and Least Square Methods
No ratings yet
Linear Regression Analysis and Least Square Methods
65 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
(eBook PDF) Essentials of Business Analytics 1st Jeffrey D. Camm All Chapters Instant Download
100% (1)
(eBook PDF) Essentials of Business Analytics 1st Jeffrey D. Camm All Chapters Instant Download
55 pages
Adjustment Theory
No ratings yet
Adjustment Theory
46 pages
Scientist Getting Started Guide
No ratings yet
Scientist Getting Started Guide
68 pages
2 Fundamentals of Measurement 2023
No ratings yet
2 Fundamentals of Measurement 2023
64 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
ds811 - dpd-LogiCORE IP Digitlal Pre-Distortion v4
No ratings yet
ds811 - dpd-LogiCORE IP Digitlal Pre-Distortion v4
45 pages
Reading 4
No ratings yet
Reading 4
15 pages
Effects On Elastic Constants of Technical Membranes Applying The Evaluation Methods of MSAJM-02-1995
No ratings yet
Effects On Elastic Constants of Technical Membranes Applying The Evaluation Methods of MSAJM-02-1995
12 pages
Time Series Summary
No ratings yet
Time Series Summary
14 pages
Constable Etal 1987 Geophysics
No ratings yet
Constable Etal 1987 Geophysics
12 pages
Stevenson 14e Chap003
No ratings yet
Stevenson 14e Chap003
41 pages
Lecture-13 - ESO208 - Aug 31 - 2022
No ratings yet
Lecture-13 - ESO208 - Aug 31 - 2022
28 pages
Detection of Adulteration of Kudzu Powder by Terahertz Time Domain Spectros
No ratings yet
Detection of Adulteration of Kudzu Powder by Terahertz Time Domain Spectros
8 pages
Time Series Analysis (2)
No ratings yet
Time Series Analysis (2)
9 pages
Parametric Identification
No ratings yet
Parametric Identification
6 pages
mcmasters99_flash_annealing
No ratings yet
mcmasters99_flash_annealing
7 pages
Plouff
No ratings yet
Plouff
15 pages
Sop 23
No ratings yet
Sop 23
8 pages
Computational Mathematics and Statistics
No ratings yet
Computational Mathematics and Statistics
12 pages
TD 1
No ratings yet
TD 1
6 pages
Merida FullProf A PDF
No ratings yet
Merida FullProf A PDF
24 pages
Caro 2013
No ratings yet
Caro 2013
9 pages
STAT844 444outline
No ratings yet
STAT844 444outline
1 page
Feb 17, 2019 MMW SCC Handout Least Squares Line
No ratings yet
Feb 17, 2019 MMW SCC Handout Least Squares Line
4 pages
(Semi-Variable) Total Variable) Cost As Production Unit Fixed Costs) As Production Increase
No ratings yet
(Semi-Variable) Total Variable) Cost As Production Unit Fixed Costs) As Production Increase
4 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 2 Regression

Uploaded by

Unit 2 Regression

Uploaded by

Regression :

Regression analysis is the statistical method used to

The most common form of regression analysis is linear regression,

In statistics, when the data can be represented on a cartesian plane by

Then, we try to represent all the marked points as a straight line or

2.Intercept (c) Formula: c = (∑y)−a(∑x) / n

•∑xy is the sum of the product of each pair of x and y values,

•∑x is the sum of all x values,

•∑y is the sum of all y values,

•∑x2 is the sum of the squares of x values.

r = -1 indicates a perfect negative correlation.

r = 1 indicates a perfect positive correlation.

Logistic Regression Estimates probability based on predictor variables

Lasso Regression Used to minimize effect of correlated variables on predictions

For example, a second order

The Standard Error of the Regression expresses the degree of

The SER is an absolute measure of how far the data points

Best Ref: https://ecampusontario.pressbooks.pub/introstats/chapter/13-3-standard-error-of-the-estimate/

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Unit 2 Regression

Uploaded by

Unit 2 Regression

Uploaded by

Regression :

Regression analysis is the statistical method used to

The most common form of regression analysis is linear regression,

In statistics, when the data can be represented on a cartesian plane by

Then, we try to represent all the marked points as a straight line or

2.Intercept (c) Formula: c = (∑y)−a(∑x) / n​

•∑xy is the sum of the product of each pair of x and y values,

•∑x is the sum of all x values,

•∑y is the sum of all y values,

•∑x2 is the sum of the squares of x values.

r = -1 indicates a perfect negative correlation.

r = 1 indicates a perfect positive correlation.

Logistic Regression Estimates probability based on predictor variables

Lasso Regression Used to minimize effect of correlated variables on predictions

For example, a second order

The Standard Error of the Regression expresses the degree of

The SER is an absolute measure of how far the data points

Best Ref: https://ecampusontario.pressbooks.pub/introstats/chapter/13-3-standard-error-of-the-estimate/

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

2.Intercept (c) Formula: c = (∑y)−a(∑x) / n