0% found this document useful (0 votes)
26 views28 pages

EE708 Module 3A

The document provides an overview of regression analysis and modeling, focusing on linear, non-linear, and logistic regression techniques. It details the types of regression, including simple and multiple linear regression, and discusses the least squares cost function used to minimize errors in predictions. Additionally, it includes a case study on oxygen purity data to illustrate the application of linear regression models.

Uploaded by

infinitecoding42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views28 pages

EE708 Module 3A

The document provides an overview of regression analysis and modeling, focusing on linear, non-linear, and logistic regression techniques. It details the types of regression, including simple and multiple linear regression, and discusses the least squares cost function used to minimize errors in predictions. Additionally, it includes a case study on oxygen purity data to illustrate the application of linear regression models.

Uploaded by

infinitecoding42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

10-Feb-25

EE708: Fundamentals of Data Science and Machine


Intelligence
Rajesh M Hegde
Dept. of EE IIT Kanpur

Module 3A
Regression Analysis and Modeling: Linear, Non-Linear, and Logistic

Regression
« It is a statistical approach used to analyze the relationship between a dependent variable
and one or more independent variables.
* The objective is to determine the most suitable function that characterizes the connection
between these variables.
* Data for regression problems comes as a set of N input/output observation pairs
{Gen v}z 2 Cea,y1), (2, ¥2), o, (X, V)
* x, : independent variable known as input/ regressor/ predictors/ covariates.

+ Each dimension of the input is referred to as a feature/explanatory variable.

* ¥, * dependent variable known as output/ response/prediction/estimation.


* In classification, the output values y, are called labels, and all points sharing the same
label value are referred to as a class of data.
10-Feb-25

Regression
* Regression is performed to
« produce a trend line or a curve that can be used to help visually summarize.
* drive home a particular point about the data under study.
« learn a model to precise predictions can be made regarding output values in the future.
* Types of regression

Linear regression: Logistic regression: Polynomial Time series Support vector


Predicts continuous Models the regression: Capture regression: Predicts regression: Approx.
output by modeling probability of binary ~ nonlinear relationship future values in a continuous function
a straight-line outcomes by fitting polynomial ~ time-dependent data using a hyperplane
relationship curve set. that best fits the data.

Linear Regression
« Linear regression is a type of supervised learning algorithm.
« It computes the linear relationship between the dependent variable and one or more
independent features by fitting a linear equation to observed data.

Linear Regre
(based on indepe

Simple Linear Regression


Univariate Linear Regression
(only one independent variable
or one dimension) (only one dependent variable)

Multiple Linear Regression Multivariate Regression


(more than one independent (more than one dependent
variable or multi dimension) variable)

Classification of linear regression


Module 3
10-Feb-25

Simple Linear Regression


« Fitting of a representative line to a set of input/output data points.
« For an input x,, € RY, it is a column vector of length N, given by

x1
Xp = x. 2 y
: % ey =wo+Tw
XN
* The relationship between input and output is
yn = Wo + WiXn

* The linear equation is called the model f,,, , (x)


fwows (Xn) = wo + wy X
Simple Linear regression one
‘Where wy, w; are the parameters known as independent variable x,,
wy : regression coefficient; wy: bias/ intercept

Multiple Linear Regression


* This involves more than one independent variable and one dependent variable.

* With a bias and P associated slope weights to tune properly


Yn ® Wo t XyqWy + XgqWp + -t XppWp, M= N

* The ‘~’sign as it cannot be sure that all data lies completely on a single line.
. 5. . ” v
* Using notation X to denote an input x with a 1 placed on top of it
y=wo+ziwy +aawy
Wo 1
w1 X1n
w=(Wa|, Xy =|Xon
Wp. Xpn.
* Desired linear relationships is more compactly given by Multiple Linear regression for P = 2
Yo = fu(xn) =ifw, n=1,..,N
10-Feb-25

Linear Regression: Least Squares Cost Function


« Find a weight vector w that
« tightly holds each of N approximate equalities
« Or the error between &hw and y, to be small
Or deviation of the observations from the true regression line is small

_~ Observed value (y,)


b5

Estimated
regression line

x x
The best line and its errors The best line and its squared errors

Linear Regression: Least Squares Cost Function


* Point-wise cost that measures the error/ residual (€,,) of a model for the point {x,, y,,} is given by
9uW) = € = (w = 3,)?
* For all N such values to be small, cost is averaged over dataset, forming a least squares cost
function g(w)
N N
1 1 - 2
gw) = fiz (W) = NZ(""W — )
n=1 n=1

« It is a function of the weights w and data as well.


* The best fitting hyperplane is the one whose parameters minimize this error.
N
1
min g(w) = min—z(kflw —yn)?
w w N 4
=
Objective: Find the parameters ‘w’ that minimize the cost function.
10-Feb-25

Linear Regression: Least Squares Estimators


Recollect : y, = Wy + X1 Wy + XzWp + - + XpuWp
, m= 1 o 1
« Consider P = 1; then g(w =5 ZR=1 (W = y)® = S BN(Wo + widy — yu)?
* The least square estimators of wy and wy, say W, and W; must satisfy

2gw) . _"g“v) =0
dw, NAO,WJ = w.
¢ N
2
2 o+ wrty
= 30) = 0 = =z (o + Wity — ) -=0
" N N
fiWOan+lex£ =Zynxn
n=1 n=1 n=1
Substitute wo | yN_
5 — Ny Say
= | =22
N2 _(NAJI?)Z S
n=1%n

Case Study: Oxygen Purity Dataset


Observation Hydrocarbon Level Purty 100 =
Number (%) %)

2
1 099
102
90.01
905
L
5 .
3 11s 9143 .
4 129 9374 ™ SRR
% "=
s
6
146
136
96.73
94.45 £
7 087 §7.59 o R
8 123 o177 .
: 155 942 88
10 1.40 93.65 3 .

i i 254 %es 0ss 105 115 12 135 145


131 098
113
96
92.52 Hydrocarbon level (x)

l: : 'I': L) Scatter diagram of oxygen purity versus


16 120 9039 hydrocarbon level
17 126 9325
192 =
L8 4
9498 Objective: Fit a simple linear
! regression
:
20 095 #7133 model to the oxygen purity data in Table
10-Feb-25

Case Study: Oxygen Purity Dataset


* Solution: From Table
20 20
N =20, Z xp, =23.92, x=1.196, Z yp =1843.21, ¥y =92.1605
n=1 n=1
20 20 20
Z x2 =29.2892, z y2 = 170,044.5321, Z XnVn = 2214.6566
n=1 n=1 n=1

Sex = 068088, Sy, = 10.17744


The least square estimates:
S,
Wy ==X =1494748, W, =7 — W, % = 74.2833
Sex
The fitted simple linear regression model:
[ = o + Wyx = 74.283 + 14.947x |

Case Study: Oxygen Purity Dataset

ASO
RN ) PR
Sl AN
% £ o i
@ 90

0.87 1.07 127 1.47 167


Hydrocarbon level (%)

Scatter plot of oxygen purity (y) versus hydrocarbon level (x) and regression
model § = 74.283 + 14.947x.
10-Feb-25

Linear Regression: Empirical Models


100 -
* The scatter diagram indicates
* No simple curve will pass exactly through all the i
9%
points.
« The points are scattered randomly around the line.
* Hence y,, may be considered a random variable and
related to x,, as
E(ynlxn) = wo +Wixy 085 095 105 115 125 135 145 155
« For a fixed x,, the actual y,, is determined the linear Higosib el (o
model plus a random error term Scatter diagram of oxygen purity
Yn=Wo+wiX, +€ versus hydrocarbon level and the
Regression Line

Linear Regression: Empirical Models

* Let the mean and variance of € are y = 0 and


o2, then mean of y given x y
(Oxygen
E(ylx) = E(wo + xw; +€) purity) 1
wo +w, (1.25)
Trie regression line
=wp +xwy + E(e) Hyjx = Wo +Xw;
Wo + w; (1.00) =75+15x
=wy+ xwy

* Variance of y given x
var(ylx) = var(wy + 2w, +€) x=1.00 x=1.25 x(Hydrocarbon level)

=var(e) = a? The distribution of y for given value of x for the


oxygen purity-hydrocarbon data
10-Feb-25

Linear Regression: Role of w, and w,


* Model: fyow, (Xn) = wo + wixy, ; Parameters: wo, wy

* Cost function: g(w) = %Zfi:l(wn + wiXn = Yn)?


* What do wy and w, do?
85
%6 1 12 14 16
Hydrocarbon Level (x)
100
wy varies, wo = 75 wy =15, wo varies
Purity (y)

0.8 1 12 14 16 0.8 1 1.2 14 16


Hydrocarbon Level (x) Hydrocarbon Level (x)

Linear Regression: Gradient Cost


* Model: fiyg w, (Xn) = wo + w1 Xy 5 o gw) =2 EN 1 (Fivg s () — )’
120 e
. & s
o

Cost (MSE)
Purity (y)

0.8 1 12 14 16
Hydrocarbon Level (x) Weight (w)

Model output by varying wy and wy = 75 Cost (MSE) by varying wy and wy = 75


10-Feb-25

Linear Regression: Cost Function Visualization


* Model: fiyqw, (Xn) = wo + wixy ;
cgw) = % ¥=1(fw.),w1 (xn) — y")z
' 81; —

1 |
o8 ”
o7 »nff
Cost (MSE)
253

E
06 @8
0s 7
7
13 = 18
Bias (w;) 74 - 14 1M 12 13 14 15 8 17 18
20 Weight(w,)
Weight (w,)

Surface Plot of Cost Function : MSE Contour plot of cost (MSE) by


by varying both w; and wy varying both w; and wy

Linear Regression: Gradient Descent along the Cost Function


* Objective: Find the value of wy and w; that minimizes the cost/error function g(w).
* Outline:
« Start with some wy, wy
« Keep changing wy, w; to reduce g(w)
« Settle at or near a minimum

* Model: fiy,w, (xn) = wo +w1x, . 2


* Cost Function : g(w) = %Zfifl(fwmwl(xn) — yn)

Weight/Parameter Updating
a
Wo =Wo — @z - glw) Simultaneously update the weights
F 0 Repeat until convergence
wy=wy —a a—wlg(w) a: Learning rate
10-Feb-25

Linear Regression: Gradient Descent along the Cost Function

a
] ) w=w—uwg(w)

0 %g(W) < 0 (negative number)


w = w — a(negetive number): Increase w

(] %g(w) > 0 (positive number)

ow— 10 == 30 w = w — a(positive number): Decrease w


Weight (w)

Linear Regression: Bias and Regression Coefficient Update

* Linear regression model: fi, 1, (%n) = wo + wixy

« Cost function:
3
g(W) = 12 Zy (forgaes (Ea) = )" 2 = 21 E0L1 (Wo + W1ty — 32
Simultaneously update the bias and regression coefficients as
N Bias update

Wo = Wo —a g =g 0) = o~y
4 2
D (frgu, () — 1) P

Regression
coefficient
update

10
10-Feb-25

Linear Regression: Simultaneous Updates of Bias and Regression

Simultaneous weight update


El a
* wo = wo —az=g(w) wi =wy—agmg(w)
Correct: Simultaneously update Incorrect: Sequential Update

7} G}
temp_wy = wy — aa—wl gw) temp_w; = wy — aa—wl gw)

G} w; = temp_w;
temp_wy = wy — a O—WOg(w) 5
) 2 =wy—a-—
wy = temp_w; empWo = WolH owg gw)

wy = temp_w, wo = temp_w,

Linear Regression: Fixing the Learning Rate

a
w= W—a%g(w)
Cost (MSE)

Cost (MSE)

o 10 20 30 0 10 20 30
Weight (w) Weight (w)

a: is too small a: is too large


Gradient descent may slow Gradient descent may:
* More iterations to converge * Qvershoot, never reaches the minimum
« Fails to converge (diverge)
10-Feb-25

Linear Regression: Fixed Learning Rate ?


* Can fixed learning rate reach minimum?

i}
w= w—afig(w) 0.8

. m
* Near minimum Qo8
* Derivative becomes smaller %
* Update steps become smaller 8 04 Smaller
« Can reach minimum with a fixed learning P

e
N
rate
) 10 20 30
Weight (w)

Linear Regression: Cost Function at Various Learning Rates

0 200 400 600 800 1000 1200


Iteration

12
10-Feb-25

Linear Regression: Illustrating Gradient Descent on a Contour Plot

data1
‘Optimization Path

-50 0 50 100 150 -50 0 50 100 150


Weight (w) Weight (w)

Gradient descent path towards the minimum on the contour plot of the cost function,
for various Learning rates

Logistic Regression
« It is supervised machine learning algorithm used for two-class classification.
* ¥, must be a categorical or discrete value, ex. y,, € {0,1} or y,, € {True, False}.
* ‘Steps’ are the simplest shape such a dataset takes.

Class-IT
Class-II | e

Project the Project the points


points on x-axis on X1 X;-plane

~|
Separation point
x

Classes are separated by a point when P = 1 Classes are separated by line when P = 2

13
10-Feb-25

Logistic Regression: Trying to Fit Linear Function for Irregular Data

« Fitting the line y = wp + wyx , results in an extremely subpar results.


* The line is inflexible to account for the nonlinearity present in such data.
* A function that matches the general shape of data, best fits.
* In other words, such data needs to be fit with a step function.

Logistic Regression: Trying to Fit a Discontinuous Step Function

1.0 .. T LLLLLLLr e

y 0.5 il _
Discontinuous ¥ =wo+wix
0.0 — i SRR step function

=2 -1 0 1 2 3 4 5 6

« Fit a discontinuous step function with a linear boundary.


* Alinear model defining this boundary is step(wy + wyx)
where the step function is defined as
1, x205
step(x) = {o x<o0s

14
10-Feb-25

Logistic Regression: Trying to Fit a Discontinuous Step Function


Linear Boundary
* The linear boundary between the two steps is defined by all points x, where
wo +wix = 0.5
* With P dimensional input, the linear model defining the boundary as
xTw = wo +wyxy + - wpxp = 0.5
* Linear boundary is defined by all points, where "w = 0.5

1.0 .o L LLLLLLLELr
e
wy +
Y 05 ’ i i
Discontinuous = =wp Wo T+ wxWa:
0.0 - step function

-2 -1 [ 1 2 3 4 5 6

Logistic Regression: Computing Weights of Discontinuous Step Function

Tuning the parameters


* wy and w; are tuned after composing the linear model with the step function.
* Least Squares cost function is setup that when minimized recovers ideal weights.
« If y, = 1, itlies in the +ve region where x"w > 0.5 > step(x"w) = 1 matches label value.
« If y, = 0, it lies in the -ve region where x"w < 0.5 > step(x"w) = 0 matches label value.

« In short, the evaluation matches its label value, i.e., step(}'chw) =Yn
* Define Least Squares cost function as
N
gw) =5 > (step(in™w)
1 .
- 2) 2

n=1

» Weights are then computed by minimizing g(w).


* But differentiation of a discontinuous function like a step is a problem

15
10-Feb-25

Logistic Regression: Minimizing the Cost Function (LSE)

+ g(w) takes on only integer values.

* Impossible to minimize with local optimization,


as at every point the function is completely flat. Im
%)
£
* Local optimization cannot easily take even a @
8
S8
single step downhill.

* This problem is inherited from the use of the


step function, itselfa discontinuous step.

Least Square cost function for the toy dataset

Logistic Regression: Why Logistic Sigmoid Function


* Least Squares cannot be minimized due to the discontinuous step function.
« It needs to be replaced with a continuous function that matches it very closely everywhere.
* Such a function is the sigmoid function o (-)

W=
« Also called a logistic or logistic sigmoid function.
* When such a function is fit to a classification dataset,

the performed regression is Logistic regression.

x
Sigmoid function

16
10-Feb-25

Logistic Regression: Cost function using Logistic Sigmoid Function

* With the sigmoid function,


o(xiw) =y, n=1,..,N
* Least Squares cost for logistic regression
N
1 i
gw) =5 > (o(xIw) - y.) 2
n=1

* Function is generally non-convex and contains large flat regions, making it difficult to
minimize properly, but not impossible.

* Standard gradient descent, normalized gradient descent and Newton's


method are used.
Jlzllx = Z [

el = /3 <2
el = (3 lil?)*

Example: Normalized Gradient Descent for LSE Logistic Regression

* Using normalized gradient descent to minimize the logistic Least Squares cost.

0.00 cooe
_on
—0.25 20 30 0 ln"
—10_qg 20 wy
. g

Toy dataset (left) and it corresponding Least Square cost function using sigmoid function (right).

17
10-Feb-25

Normalized Gradient Descent


Gradient Descent:
w=w—aVg(x)
« For non-convex g(w)V, g(x) vanishes near saddle points and slows the overall
convergence rate and makes detection of local minima difficult.

Normalized Gradient Descent:


w=w-—
Vg
“Ivgtol
Vg(x)
preserves the direction of gradient but ignores magnitude.
1Ivg Gl
Vg(x)
does not vanish near saddle points, so NGD does not slow down in the
Wg@I A ! *
neighborhood of saddle points and escape “quickly”.

Normalized Gradient Descent

am 2 am
Non-convex 00 e

r "
o .
o oo
Convex o

o m— ileee

18
10-Feb-25

Example: Normalized Gradient Descent for LS Logistic Regression


* Run normalized gradient descent for 900 iterations, initialized at wy = —w; = 20 ,with
step length parameter fixed at a = 1.
* The normalized gradient descent steps are colored green to red as the run progresses.
* Note : This illustration is for a error function with several Iocal[mini)ma
data g (wo. wy

r wy

Least Squares logistic regression fit to the data (left panel), and gradient descent path on the
contour plot of the cost function (right panel).

Logistic Regression: Logit Response Function


« Linear regression often works very well when the response variable is quantitative.
« Consider a situation where response variable takes on only two possible values, 0 and 1.
* Suppose that the model has the form
Yn = Wo + Wixn + €n
* ¥, is a Bernoulli random variable with probability distribution as:
* Since E(€,,) = 0, the expected value of the response variable is 1 Pln=1)=my
E(n) = 1(m) + 01 —mp) =11, 0 POr=0)=1-m,
This implies E(y,) = wo + wyx, = 1,
« This restriction can cause serious problems with the choice of the linear response function
0<EQn)=m<1

19
10-Feb-25

Logistic Regression: Logit Response Function


* The expected response given by the response function E(y,) = wy + wix, is just the
probability that the response variable takes on the value 1.
* The error terms €, can only take on two values, namely,


_
=
1-Wo+wixy), ya=1 1 =wo+wix,
+ €,
V=0 0=wo +WiXn + € )
—(wo + wixy,),
A -
« Consequently, > E(:" -€ (-\’ ) ] Lt (‘
« errors in this model cannot possibly be normal Cil z (1 ) e w of
B w
* error variance is not constant. —_— - € (\i‘ [ ’ ./\Ufl
« In logistic regression, it is assumed that E(y) is relaled to x by the logit function.
exp(wy + wix)
E(y) =7 ———"——
1+ exp(wg + wyx)
1
_— Logit response function
1+ exp[—(wo + w1 x)]

Logistic Regression: Logit Response Function and Odds Ratio


EQ®)
———=exp(wy
Odds > ratio
T-EO) p(wo + wyx,
1X)

« If the odds ratio is 2 for a value of x, success is twice as likely as a failure at that value.
« Estimation of parameters wy and w, is done by using MLE
* The quantity exp(wy 4+ w;x) on the right-hand side of Equation

Examples of the logistic


E106 By 06 response function.
0 . S S
’H : (@ E(Y) = 1+exp(—6.0-1.0x)"

%77 4 & 8 10 12 % 2z 4 & 8 1012 M PEOS RO 00

20
10-Feb-25

Logistic Regression Using Maximum Log-Likelihood Estimation

E(y)
T8G) exp(wo + w1x) 0dds ratio
Pr(y=1) =E(@), Pr(y=0)=1-EQ)
Pry) = (B»)(1 ~E0)"”
« Likelihood of logistic regression: i

Lo wilyin) = [[E0w)"(1 - BG)'™"


* Log-likelihood of logistic regression: =

(wo, wyly; x) = log(L(wo, wi|y; %)) = Z[J’n log(E(yn)) + (1 — yn) log(1 — EG))]
n=1
* Find wy, w; that maximizes the log-likelihood.

Gradient Ascent for Maximizing Log-Likelihood


Gradient Descent: To reach minima Gradient Ascent: To reach maxima

Positive Negatiye
~40 oradien radient
Negative Positive —60
gradient gradient

-100
0
0 w—> 10 15 20 —w 30 0 w—> 10 15 20— 30

Wnew a gw) Wnew + gw)


aw aw

21
10-Feb-25

Logistic Regression Using Maximum Log-Likelihood Estimation


* Gradient of log-likelihood:

Ao, wly 1) _ ZN:[ 1 0EGw 1 0BG


awy -
n=1 MG wo +(1‘y")<"1—ra<yn> awo )]
LN E() = 2o
T+ exp(wo £ +i)wi2)_
exp(wg + wyx)

= D EG -3 26y T
= T~ —Bow)(1 - EOw)
Ao wlyix) N
1 0BGy 1 G
oW, ; "o aw T W TR Tow,
N

= =yt
D + X EOR)] 220 — s BOW(1 ~ EOW)
n=1

Logistic Regression Using Maximum Log-Likelihood Estimation

The weights can be obtained using gradient ascent.

Vil = +aal(Wu,w1|y1x)
0 (] —awo

S +aal(Wo-W1|}’ix)
1 1 75‘”1

22
10-Feb-25

Case Study: O-ring Failure Dataset


* The dataset is on launch temperature and O-ring failure for the 24 space shuttle launches
prior to the Challenger disaster of January 1986.
* There are six O-rings used on the rocket motor assembly to seal field joints.
* 1 in the "O-Ring Failure" column indicates that at least one O-ring failure had occurred on
that launch.
O-Ring O-Ring O-Ring
Temperature Falure Temperatwre Failure Temperature Failure
53 1 68 0 75 o
56 1 69 0 75 1
57 1 70 0 76 0
63 0 70 1 76 0
66 0 70 1 78 o0
67 0 70 1 79 0
67 o 2 0 80 0
67 0 3 o 81 0

Case Study: O-ring Failure Dataset


« The fitted logistic regression model is g

Y = T+ exp[=(10.875 — 0.17132%)]
100 o ee & 10 o ee DR
P(O-ing failure)
O-ring failure

0s

00 o eeses se e seee 00 + esess ee aeTEE.


50 60 70 80 50 60 70 80
Temperature Temperature
Scatter plot of O-ring failures versus launch temperature for 24 space shuttle flights. (left)
Probability of O-ring failure versus launch temperature (based on a logistic regression model). (right)

23
10-Feb-25

Case Study: Iris Dataset


* We can use the iris dataset to illustrate logistic
regression.
* A classifier is built to detect the Iris virginica type
based only on the petal width feature.

Virginica Setosa
=== Not Iris virginica proba
Probability

—— Iris virginica proba


++ Decision boundary

00 03 1o vs
Petal width (cm)
Estimated probabilities and decision boundary.

Case Study: Iris Dataset


* Petal width comparison:
* Iris virginica flowers (in triangles) Z,,| --- ot iris virginica proba
ranges from 1.4 cm to 2.5 cm, By || 108 rminkcs preda
. i . - Decision boundary
* Other iris flowers (in squares), ranging <.,
from 0.1 cm to 1.8 cm. o
o 0o o ¥
* Decision boundary: Petal width (cm)
* Above about 2 cm the classifier is highly confident that the flower is an Iris virginica (it outputs
a high probability for that class)
« While below 1 cm it is highly confident that it is not an Iris virginica (high probability for the
“Not Iris virginica” class).
In between these extremes, the classifier is unsure. However, the predictor will return which
ever class is the most likely.
* A decision boundary is at around 1.6 cm where both probabilities are equal to 50%:
« If the petal width is greater than 1.6 cm the classifier will predict that the flower is an Iris
virginica, and otherwise it will predict that it is not.

24
10-Feb-25

Case Study: Illustration of Various Decision Boundaries


* Classifier to detect the Iris virginica type == v . vsvigincs

length.
* Logistic regression classifier, based on two “ ]
features, estimates the probability that a .i*
new flower is an Iris virginica. w o ** potatengin
Linear decision boundary.
* The dashed line represents the points where the model estimates a 50% probability: this is
the model’s decision boundary.
* Each parallel line represents the points where the model outputs a specific probability, from
15% (bottom left) to 90% (top right).
* All the flowers beyond the top-right line have over 90% chance of being Iris virginica,
according to the model.

Multiple Linear Regression

Y =50+ 10x; + 7x, + 5x1X,

E(Y) = 800 + 10x, + 7x,


—8.5x7 — 5x2 + 4x;x;,

25
10-Feb-25

Multiple Linear Regression: Data

Cekiny i y)s = 12,0,m, Ak

£E708 Module 3

Multiple Linear Regression: Least Square Error


k
Vi = Wy + WiXiy + WaXip + 0+ Wik + €, = Wy +ijxij +€;
j=1

Least square function:

26
10-Feb-25

Multiple Linear Regression: Estimating Parameters

T
agw) _
-
ag(w)
o —72;<y,»7w‘,7;w,x‘f>x‘j—0, a .
Ji=d3)

Least square normal equations:


n n n
nwo+wizx“+w22x.z+---+w,( Xik
=1 =1 =
n n n n
WDZXU+W,ZX£+WZZx,v,xu+---+wkZ
= =1 = =1
n n n
2
anxm +w me + sz Xz + ok Wi
i=1 i=1 i=1

Can be solved using any technique appropriate and obtain parameters.

Multiple Linear Regression: Estimating Parameters


Matrix Approach
Regression line:
y=Xw+te
Cost Function:

=ee=(y—xw)(y—Xw)
Normal Equation:
X"Xw=X"y
Least square estimates of parameters:
w=X"X"'XTy

27
10-Feb-25

Case Study: Wire Bond Data

9.95 2 50 14 11.66 2 360


2445 8 110 15 21.65 4 205
3175 1 120 16 17.89 4 400
35.00 10 550 17 69.00 20 600
25.02 8 295 18 10.30 1 585
16.86 4 200 19 3493 10 540
14.38 2 375 20 46.59 15 250
9.60 2 52 21 44.88 15 290
2435 9 100 2 54.12 16 510
27.50 8 300 23 56.63 17 590
17.08 4 412 24 22.13 6 100
37.00 11 400 25 2L15 5 400
4195 12 500

£E708 Module 3

Case Study: Wire Bond Data


Solution
25wy + 206w, + 8294w, = 725.82
206w + 2396w, + 77177w, = 8008.47
8294w, + 77177w; = 3531848w, = 274816.71

Wo = 226379, W, = 274427, W, =0.01253

¥ = 226379 + 2.74427 + 0.01253x,

£E708 Module 3

28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy