0% found this document useful (0 votes)
3 views19 pages

Logistic Regression

J

Uploaded by

airobot28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views19 pages

Logistic Regression

J

Uploaded by

airobot28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Classification Sigmoid Cross Entropy Stochastic Gradient Descent

CSE 422: Artificial Intelligence


Logistic Regression

Swakkhar Shatabda

BRAC University

December 2, 2024

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 1 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Contents

1 Classification

2 Sigmoid

3 Cross Entropy

4 Stochastic Gradient Descent

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 2 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Classification

1 In classification problems, we are given data as X and labels as y .


2 Here, we are upto learn a model where, y will be predicted as a
function of X .
3 In classification, the label y is categorical or discrete in value.
4 For example, suppose you are given many features of a fish, like
length, weight, eggs, months and you have to predict whether it is
legal to be caught or not. This problem can be formulated as a
classification problem.

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 3 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Data

Here is how data looks like in a supervised setting:


features label
x1 x2 x3 x4 y
instance no length weight has eggs month legal?
1 10 250 1 12 No
2 20 1250 0 1 Yes
3 15 750 1 2 No
.. .. .. .. .. ..
. . . . . .
m 17 550 0 3 Yes

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 4 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Experiments
We will first try to predict the class of the dataset based on two features,
x1 and x2 .

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 5 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Experiments
We will first try to predict the class of the dataset based on two features,
x1 and x2 .

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 6 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Logistic Regression

At first, we are going to try a linear classifier called logistic regression. We


can apply logistic regression when the data is linearly separable.
The relationship will be predicted as:

y = w0 + w1 x1

This is again an equation of a straight line


We need the best line that separates blue from the orange
learn w0 , w1 , · · ·
Can we use gradient descent here? A little trick required!

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 7 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Gradient Descent for Logistic Regression

The cost function / loss function of gradient descent


m
1X
e= (ŷ (i) − y (i))2
2
i=1

This time too predicted label ŷ is a function of ~x and w


The labels are discrete, for this binary classification two labels 0 (no
or negative) and 1 (yes or positive)
Now, we try to define ŷ with help of the weights or coefficients of the
line.

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 8 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Linear Classification

This linear classifier divides instances based on the local wrt the line,
on the right positive, negative on the left
Any point on the line satisfies the equation. Any point on the right
(3,1) yields positive result and any point on the left (1,1) yields
negative result.
Based on this we can define a linear classifier
Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 9 / 19
Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Linear Classification

This following function will help us in making decision:

f (~x ) = w0 + w1 x1 + w2 x2 + · · · + wn xn

LinearClassifier

1 if f (~x ) > 0 or w0 + w1 x1 + w2 x2 + · · · + wn xn > 0


2 return 1
3 else return 0

This simple classifier just checks whether a point is on the left or right.

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 10 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

A step function!

Alas! This is not a continuous function and thus not differentiable. We


can’t calculate gradients! We need to find an alternate!

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 11 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

A sigmoid function!

1
σ(~x ) = 1+exp(−~
x)

Good things about sigmoid!


1 Its continuous and differentiable.
2 σ 0 (~x ) = σ(~x )(1 − σ(~x ))

Lets go back to the loss function now.

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 12 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

A new loss function - Cross-entropy

Cross-entropy loss, or log loss, measures the performance of a classification


model whose output is a probability value between 0 and 1.
m
X
e= (−y (i)log (ŷ (i)) − (1 − y )log (1 − ŷ ))
i=1

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 13 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Cross Entropy Loss Function

How it works?

m
X
e= (−y (i)log (ŷ (i)) − (1 − y )log (1 − ŷ ))
i=1

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 14 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Cross Entropy Loss Function


How to find the gradient? Lets try!
m
δe δ X
= (−y (i)log (ŷ (i)) − (1 − y )log (1 − ŷ (i)))
δw0 δw 0 i=1
m
X 1 1
= (−y (i) ŷ (i)(1 − ŷ (i)).1 − (1 − y ) (−1))ŷ (i)(1 − ŷ (i)).1)
i=1
ŷ (i) (1 − ŷ (i))
m
(1)
X
= (−y (i) + y (i)ŷ (i) + ŷ (i) − y (i)ŷ (i)).1
i=1
Xm
= (ŷ (i) − y (i)).1
i=1

In a similar way,
m
δe X
= (ŷ (i) − y (i)).xi (2)
δwi i=1

Now the same gradient descent will work!

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 15 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Comments on Gradient Descent

1 Slow when the dataset is too large!


2 Rather learning the whole dataset, possible to learn in chunks!
3 What if we process only 1 single item at each iteration?
4 Lets have another look!

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 16 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Gradient Descent Algorithm

GradientDescent(X , y , alpha, maxIter )


1 for j = 1 to m
2 x0 (j) = 1
3 w0 , w1 , · · · , wn initialized randomly
4 iter = 0
5 while iter + + ≤ maxIter
6 for j = 0 to n
7 slopej = 0
8 for i = 1 to m
9 ŷ = w0 + w1 x1 (i) + w2 x2 (i) + · · · + wn xn (i)
10 e = ŷ − y (i)
11 for j = 0 to n
12 slopej = slopej + e × xj (i)
13 for j = 0 to n
14 wj = wj − α × slopej
15 return w0 , w1 , · · · , wn

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 17 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Lighter Gradient Descent Algorithm

LighterGradientDescent(X , y , alpha, maxIter )


1 for j = 1 to m
2 x0 (j) = 1
3 w0 , w1 , · · · , wn initialized randomly
4 iter = 0
5 while iter + + ≤ maxIter
6 for j = 0 to n
7 slopej = 0
8 i = iter
9 ŷ = w0 + w1 x1 (i) + w2 x2 (i) + · · · + wn xn (i)
10 e = ŷ − y (i)
11 for j = 0 to n
12 slopej = slopej + e × xj (i)
13 for j = 0 to n
14 wj = wj − α × slopej
15 return w0 , w1 , · · · , wn

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 18 / 19


Classification Sigmoid Cross Entropy Stochastic Gradient Descent

Thats it!

Thank you

Swakkhar Shatabda CSE 422: Fall 2021 December 2, 2024 19 / 19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy