EE708 Module 3A
EE708 Module 3A
Module 3A
Regression Analysis and Modeling: Linear, Non-Linear, and Logistic
Regression
« It is a statistical approach used to analyze the relationship between a dependent variable
and one or more independent variables.
* The objective is to determine the most suitable function that characterizes the connection
between these variables.
* Data for regression problems comes as a set of N input/output observation pairs
{Gen v}z 2 Cea,y1), (2, ¥2), o, (X, V)
* x, : independent variable known as input/ regressor/ predictors/ covariates.
Regression
* Regression is performed to
« produce a trend line or a curve that can be used to help visually summarize.
* drive home a particular point about the data under study.
« learn a model to precise predictions can be made regarding output values in the future.
* Types of regression
Linear Regression
« Linear regression is a type of supervised learning algorithm.
« It computes the linear relationship between the dependent variable and one or more
independent features by fitting a linear equation to observed data.
Linear Regre
(based on indepe
x1
Xp = x. 2 y
: % ey =wo+Tw
XN
* The relationship between input and output is
yn = Wo + WiXn
* The ‘~’sign as it cannot be sure that all data lies completely on a single line.
. 5. . ” v
* Using notation X to denote an input x with a 1 placed on top of it
y=wo+ziwy +aawy
Wo 1
w1 X1n
w=(Wa|, Xy =|Xon
Wp. Xpn.
* Desired linear relationships is more compactly given by Multiple Linear regression for P = 2
Yo = fu(xn) =ifw, n=1,..,N
10-Feb-25
Estimated
regression line
x x
The best line and its errors The best line and its squared errors
2gw) . _"g“v) =0
dw, NAO,WJ = w.
¢ N
2
2 o+ wrty
= 30) = 0 = =z (o + Wity — ) -=0
" N N
fiWOan+lex£ =Zynxn
n=1 n=1 n=1
Substitute wo | yN_
5 — Ny Say
= | =22
N2 _(NAJI?)Z S
n=1%n
2
1 099
102
90.01
905
L
5 .
3 11s 9143 .
4 129 9374 ™ SRR
% "=
s
6
146
136
96.73
94.45 £
7 087 §7.59 o R
8 123 o177 .
: 155 942 88
10 1.40 93.65 3 .
ASO
RN ) PR
Sl AN
% £ o i
@ 90
Scatter plot of oxygen purity (y) versus hydrocarbon level (x) and regression
model § = 74.283 + 14.947x.
10-Feb-25
* Variance of y given x
var(ylx) = var(wy + 2w, +€) x=1.00 x=1.25 x(Hydrocarbon level)
Cost (MSE)
Purity (y)
0.8 1 12 14 16
Hydrocarbon Level (x) Weight (w)
1 |
o8 ”
o7 »nff
Cost (MSE)
253
E
06 @8
0s 7
7
13 = 18
Bias (w;) 74 - 14 1M 12 13 14 15 8 17 18
20 Weight(w,)
Weight (w,)
Weight/Parameter Updating
a
Wo =Wo — @z - glw) Simultaneously update the weights
F 0 Repeat until convergence
wy=wy —a a—wlg(w) a: Learning rate
10-Feb-25
a
] ) w=w—uwg(w)
« Cost function:
3
g(W) = 12 Zy (forgaes (Ea) = )" 2 = 21 E0L1 (Wo + W1ty — 32
Simultaneously update the bias and regression coefficients as
N Bias update
Wo = Wo —a g =g 0) = o~y
4 2
D (frgu, () — 1) P
Regression
coefficient
update
10
10-Feb-25
7} G}
temp_wy = wy — aa—wl gw) temp_w; = wy — aa—wl gw)
G} w; = temp_w;
temp_wy = wy — a O—WOg(w) 5
) 2 =wy—a-—
wy = temp_w; empWo = WolH owg gw)
wy = temp_w, wo = temp_w,
a
w= W—a%g(w)
Cost (MSE)
Cost (MSE)
o 10 20 30 0 10 20 30
Weight (w) Weight (w)
i}
w= w—afig(w) 0.8
. m
* Near minimum Qo8
* Derivative becomes smaller %
* Update steps become smaller 8 04 Smaller
« Can reach minimum with a fixed learning P
e
N
rate
) 10 20 30
Weight (w)
12
10-Feb-25
data1
‘Optimization Path
Gradient descent path towards the minimum on the contour plot of the cost function,
for various Learning rates
Logistic Regression
« It is supervised machine learning algorithm used for two-class classification.
* ¥, must be a categorical or discrete value, ex. y,, € {0,1} or y,, € {True, False}.
* ‘Steps’ are the simplest shape such a dataset takes.
Class-IT
Class-II | e
~|
Separation point
x
Classes are separated by a point when P = 1 Classes are separated by line when P = 2
13
10-Feb-25
1.0 .. T LLLLLLLr e
y 0.5 il _
Discontinuous ¥ =wo+wix
0.0 — i SRR step function
=2 -1 0 1 2 3 4 5 6
14
10-Feb-25
1.0 .o L LLLLLLLELr
e
wy +
Y 05 ’ i i
Discontinuous = =wp Wo T+ wxWa:
0.0 - step function
-2 -1 [ 1 2 3 4 5 6
« In short, the evaluation matches its label value, i.e., step(}'chw) =Yn
* Define Least Squares cost function as
N
gw) =5 > (step(in™w)
1 .
- 2) 2
n=1
15
10-Feb-25
W=
« Also called a logistic or logistic sigmoid function.
* When such a function is fit to a classification dataset,
x
Sigmoid function
16
10-Feb-25
* Function is generally non-convex and contains large flat regions, making it difficult to
minimize properly, but not impossible.
el = /3 <2
el = (3 lil?)*
* Using normalized gradient descent to minimize the logistic Least Squares cost.
0.00 cooe
_on
—0.25 20 30 0 ln"
—10_qg 20 wy
. g
Toy dataset (left) and it corresponding Least Square cost function using sigmoid function (right).
17
10-Feb-25
am 2 am
Non-convex 00 e
r "
o .
o oo
Convex o
o m— ileee
18
10-Feb-25
r wy
Least Squares logistic regression fit to the data (left panel), and gradient descent path on the
contour plot of the cost function (right panel).
19
10-Feb-25
€
_
=
1-Wo+wixy), ya=1 1 =wo+wix,
+ €,
V=0 0=wo +WiXn + € )
—(wo + wixy,),
A -
« Consequently, > E(:" -€ (-\’ ) ] Lt (‘
« errors in this model cannot possibly be normal Cil z (1 ) e w of
B w
* error variance is not constant. —_— - € (\i‘ [ ’ ./\Ufl
« In logistic regression, it is assumed that E(y) is relaled to x by the logit function.
exp(wy + wix)
E(y) =7 ———"——
1+ exp(wg + wyx)
1
_— Logit response function
1+ exp[—(wo + w1 x)]
« If the odds ratio is 2 for a value of x, success is twice as likely as a failure at that value.
« Estimation of parameters wy and w, is done by using MLE
* The quantity exp(wy 4+ w;x) on the right-hand side of Equation
20
10-Feb-25
E(y)
T8G) exp(wo + w1x) 0dds ratio
Pr(y=1) =E(@), Pr(y=0)=1-EQ)
Pry) = (B»)(1 ~E0)"”
« Likelihood of logistic regression: i
(wo, wyly; x) = log(L(wo, wi|y; %)) = Z[J’n log(E(yn)) + (1 — yn) log(1 — EG))]
n=1
* Find wy, w; that maximizes the log-likelihood.
Positive Negatiye
~40 oradien radient
Negative Positive —60
gradient gradient
-100
0
0 w—> 10 15 20 —w 30 0 w—> 10 15 20— 30
21
10-Feb-25
= D EG -3 26y T
= T~ —Bow)(1 - EOw)
Ao wlyix) N
1 0BGy 1 G
oW, ; "o aw T W TR Tow,
N
= =yt
D + X EOR)] 220 — s BOW(1 ~ EOW)
n=1
Vil = +aal(Wu,w1|y1x)
0 (] —awo
S +aal(Wo-W1|}’ix)
1 1 75‘”1
22
10-Feb-25
Y = T+ exp[=(10.875 — 0.17132%)]
100 o ee & 10 o ee DR
P(O-ing failure)
O-ring failure
0s
23
10-Feb-25
Virginica Setosa
=== Not Iris virginica proba
Probability
00 03 1o vs
Petal width (cm)
Estimated probabilities and decision boundary.
24
10-Feb-25
length.
* Logistic regression classifier, based on two “ ]
features, estimates the probability that a .i*
new flower is an Iris virginica. w o ** potatengin
Linear decision boundary.
* The dashed line represents the points where the model estimates a 50% probability: this is
the model’s decision boundary.
* Each parallel line represents the points where the model outputs a specific probability, from
15% (bottom left) to 90% (top right).
* All the flowers beyond the top-right line have over 90% chance of being Iris virginica,
according to the model.
25
10-Feb-25
£E708 Module 3
26
10-Feb-25
T
agw) _
-
ag(w)
o —72;<y,»7w‘,7;w,x‘f>x‘j—0, a .
Ji=d3)
=ee=(y—xw)(y—Xw)
Normal Equation:
X"Xw=X"y
Least square estimates of parameters:
w=X"X"'XTy
27
10-Feb-25
£E708 Module 3
£E708 Module 3
28