0% found this document useful (0 votes)
5 views31 pages

Unit 2 Regression

The document provides an overview of regression analysis, including methods such as least squares, linear regression, and multiple regression, along with their applications in prediction and causal inference. It explains key concepts like slope, intercept, and standard error of regression, and outlines various types of regression such as polynomial and logistic regression. Additionally, it discusses the importance of measuring the goodness-of-fit through metrics like the standard error and correlation coefficient.

Uploaded by

Ramdas Darade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views31 pages

Unit 2 Regression

The document provides an overview of regression analysis, including methods such as least squares, linear regression, and multiple regression, along with their applications in prediction and causal inference. It explains key concepts like slope, intercept, and standard error of regression, and outlines various types of regression such as polynomial and logistic regression. Additionally, it discusses the importance of measuring the goodness-of-fit through metrics like the standard error and correlation coefficient.

Uploaded by

Ramdas Darade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Regression :

By
Prof. R. B. Darade
• UNIT-II
• Regression:
• Curve fitting by the method of least squares, fitting the lines y= a + bx and x = a
• + by, Multiple regression, standard error of regression– Pharmaceutical Examples
• Probability:
• Definition of probability, Binomial distribution, Normal distribution, Poisson’s
• distribution,properties–
• problems,Sample,Population,largesample,smallsample,Null
• hypothesis,alternativehypothesis,sampling,essenceofsampling,typesofsampling,
• Error-I type, Error-II type, Standard error of mean (SEM) - Pharmaceutical
• examples
• Parametric test:
• t-test(Sample, Pooled or Unpaired and Paired),ANOVA,(Oneway and Two
• way),Least Significance difference
WHAT IS REGRESSION ANALYSIS?

Regression analysis is the statistical method used to


determine the structure of a relationship between
two variables (single linear regression) or three or
more variables (multiple regression).
Slope is used to describe the steepness of a line.
The definition of slope is the rise of a line over the
run of a line, or the change in the vertical direction
(y) over the change in the horizontal direction (x).
The point
where the line
or curve
crosses the
axis of the
graph is called
intercept.
In statistical modeling, regression analysis is a set of statistical
processes for estimating the relationships between a dependent
variable /outcome or response variable, or a label in machine
learning parlance and one or more error-free independent
variables/ input often called regressors, predictors, covariates,
explanatory variables or features

The most common form of regression analysis is linear regression,


in which one finds the line (or a more complex linear combination)
that most closely fits the data according to a specific mathematical
criterion.
Regression analysis is primarily used for two conceptually distinct
purposes.
 First, regression analysis is widely used for prediction and
forecasting, where its use has substantial overlap with the field of
machine learning.
 Second, in some situations regression analysis can be used to infer
causal relationships between the independent and dependent
variables.
Regression model
Prediction (interpolation and extrapolation)
Regression models predict a value of the Y variable given known values of the X
variables.
Prediction within the range of values in the dataset used for model-fitting is known
informally as interpolation. Prediction outside this range of the data is known as
extrapolation.
Performing extrapolation relies strongly on the regression assumptions. The
further the extrapolation goes outside the data, the more room there is for the
model to fail due to differences between the assumptions and the sample data or
the true values.
In the middle, the fitted straight line represents the best balance between the points
above and below this line. The dotted straight lines represent the two extreme
lines, considering only the variation in the slope. The inner curves represent the
estimated range of values considering the variation in both slope and intercept. The
outer curves represent a prediction for a new measurement.
In regression analysis, least squares is a parameter estimation method based on minimizing the
sum of the squares of the residuals (a residual being the difference between an observed value
and the fitted value provided by a model) made in the results of each individual equation. (More
simply, least squares is a mathematical procedure for finding the best-fitting curve to a given set
of points by minimizing the sum of the squares of the offsets ("the residuals") of the points
from the curve.)
The most important application is in data fitting. When the problem has substantial
uncertainties in the independent variable (the x variable), then simple regression and least-
squares methods have problems; in such cases, the methodology required for fitting errors-in-
variables models may be considered instead of that for least squares.
Least squares problems fall into two categories: linear or ordinary least squares and nonlinear
least squares, depending on whether or not the model functions are linear in all unknowns.
The result of fitting a set
Conic fitting a set of points using least-squares
of data points with a
approximation
quadratic function
Least Square method is a fundamental mathematical technique
widely used in data analysis, statistics, and regression
modeling to identify the best-fitting curve or line for a given set of
data points. This method ensures that the overall error is reduced,
providing a highly accurate model for predicting future data trends.

In statistics, when the data can be represented on a cartesian plane by


using the independent and dependent variable as the x and y
coordinates, it is called scatter data. This data might not be useful in
making interpretations or predicting the values of the dependent variable
for the independent variable. So, we try to get an equation of a line
that fits best to the given data points with the help of the Least
Square Method.
What is the Least Square Method?
Least Square Method is used to derive a generalized linear equation
between two variables. when the value of the
dependent and independent variable is represented as the x and y
coordinates in a 2D cartesian coordinate system. Initially, known values
are marked on a plot. The plot obtained at this point is called a
scatter plot.

Then, we try to represent all the marked points as a straight line or


a linear equation. The equation of such a line is obtained with the
help of the Least Square method. This is done to get the value of the
dependent variable for an independent variable for which the value
This method aims at minimizing the sum of squares of deviations as much as possible. The line
obtained from such a method is called a regression line or line of best fit.

Ref: https://www.geeksforgeeks.org/least-square-method/
https://www.cuemath.com/data/least-squares/
The sum of squares measures how widely a set
of datapoints is spread out from the mean. It is
also known as variation.
It is calculated by adding together the squared
differences of each data point. To determine
the sum of squares, square the distance
between each data point and the line of best
fit, then add them together. The line of best fit
will minimize this value.
The sum of squares measures the
deviation of data points away
from the mean value. A higher sum
of squares indicates higher
variability while a lower result
indicates low variability from the
mean.
To calculate the sum of squares,
subtract the mean from the data
Formula for Least Square Method
Least Square Method formula is used to find the best-fitting line through a set of
data points. For a simple linear regression, which is a line of the form y=mx+c,
where y is the dependent variable, x is the independent variable, a is the
slope of the line, and b is the y-intercept, the formulas to calculate the slope
(m) and intercept (c) of the line are derived from the following
equations:
1.Slope (m) Formula: m = n(∑xy)−(∑x)(∑y) / n(∑x2)−(∑x)2​

2.Intercept (c) Formula: c = (∑y)−a(∑x) / n​

Where:
•n is the number of data points,

•∑xy is the sum of the product of each pair of x and y values,

•∑x is the sum of all x values,

•∑y is the sum of all y values,

•∑x2 is the sum of the squares of x values.


A correlation’s strength can be quantified by calculating
the correlation coefficient, sometimes represented by r.
The correlation coefficient falls between negative one
and positive one.

r = -1 indicates a perfect negative correlation.

r = 1 indicates a perfect positive correlation.

r = 0 indicates no correlation.
Types of Regression Analysis

Simple Linear
Studies relationship between two variables (predictor and outcome)
Regression

Multiple Linear
Captures impact of all variables
Regression

Polynomial Regression Finds and represents complex patterns and non-linear relationships

Logistic Regression Estimates probability based on predictor variables

Used in cases with high correlation between variables; can also be used as a
Ridge Regression
regularization method for accuracy

Lasso Regression Used to minimize effect of correlated variables on predictions


Simple Linear Regression
Useful for exploring the relationship
between two continuous variables in
straightforward cause-and-effect
investigations, simple linear regression
is the most basic form of regression
analysis. It involves studying the
relationship between two variables: an
independent variable (the predictor)
and a dependent variable (the
outcome).
Multiple Linear Regression
(MLR)
MLR regression extends the
concept of simple linear
regression by capturing the
combined impact of all
factors, allowing for a more
comprehensive analysis of
how several factors
collectively influence the
outcome.
Polynomial Regression
For non-linear relationships,
polynomial regression accommodates
curves and enables accurate
representation of complex patterns.
This method involves fitting a
polynomial equation to the data,
allowing for more flexible modeling of
complex relationships.

For example, a second order


polynomial regression—also known as
a quadratic regression—can be used
to capture a U-shaped or inverted U-
shaped pattern in the data.
Logistic Regression
Logistic regression
estimates the probability of
an event occurring based
on one or more predictor
variables. In contrast to
linear regression, logistic
regression is designed to
predict categorical
outcomes, which are
typically binary in nature—
for example, yes/no or 0/1.
What is the Standard Error of the Regression (SER)?

The Standard Error of the Regression expresses the degree of


uncertainty in the accuracy of the dependent variable’s
projected values. It conveniently tells you how far off the
regression model is on average by utilising the response
variable’s units. It is also called the SE of the estimate.
Graphically, the relationship is stronger when the actual x,y
data points lie closer to the regression line (errors are smaller).
Why is SER important?

The SER is an absolute measure of how far the data points


The difference between the actual value of the dependent variable y (in the
sample date) and the predicted value of the dependent variable y^ obtained
from the multiple regression model is called the error or residual.
Error=Actual Value−Predicted Value
For the simple linear regression model, the standard error of the estimate
measures the average vertical distance (the error) between the points on the
scatter diagram and the regression line.

Best Ref: https://ecampusontario.pressbooks.pub/introstats/chapter/13-3-standard-error-of-the-estimate/


Standard Error of Regression
Standard error is a statistical technique that is used to find the average
distance between the observed values and the regression line. It defines how
much the actual data is spread around the line. In other words, it can be said
that it provides a measure of how much the actual dependent value deviates
from the predicted value. Since it is an error, therefore lower the value better
will be our prediction.

The standard error of the regression (S) and R-squared are two key goodness-
of-fit measures for regression analysis. While R-squared is the most well-
known amongst the goodness-of-fit statistics, I think it is a bit over-
hyped. The standard error of the regression is also known as residual
References:
https://www.datamation.com/big-data/what-is-regression-
analysis/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy