0% found this document useful (0 votes)

19 views43 pages

Binary Logistic Regression 2

This document is a lecture on Binary and Multinomial Logistic Regression from Carnegie Mellon University's Machine Learning course. It covers key concepts such as Maximum Likelihood Estimation (MLE), the architecture of Convolutional Neural Networks (CNN) for image classification, and the learning process for logistic regression using methods like Gradient Descent and Stochastic Gradient Descent. Additionally, it discusses the objectives of logistic regression and provides insights into implementing the model for classification tasks.

Uploaded by

K SD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views43 pages

Binary Logistic Regression 2

Uploaded by

K SD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

10-601 Introduction to Machine Learning

Machine Learning Department

School of Computer Science
Carnegie Mellon University

Binary Logistic Regression

+
Multinomial Logistic Regression

Matt Gormley
Lecture 10
Feb. 17, 2020

1
Reminders
• Midterm Exam 1
– Tue, Feb. 18, 7:00pm – 9:00pm
• Homework 4: Logistic Regression
– Out: Wed, Feb. 19
– Due: Fri, Feb. 28 at 11:59pm
• Today’s In-Class Poll
– http://p10.mlcourse.org
• Reading on Probabilistic Learning is reused
later in the course for MLE/MAP
3
MLE
Suppose we have data D = {x(i) }N
i=1
Principle of Maximum Likelihood
N Estimation:
Choose theMLE = `;Ktthat p(t
parameters maximize
|N ) the likelihood
(i)
of the data. MLE
= `;Kt
i=1 (i)
N
p(t | )
MAP
= `;Kt p(t(i)i=1
| )p( )
Maximum Likelihood Estimate (MLE)
i=1
θ2 θMLE

L(θ)
L(θ1, θ2)

θMLE θ1
5
MLE
What does maximizing likelihood accomplish?
• There is only a finite amount of probability
mass (i.e. sum-to-one constraint)
• MLE tries to allocate as much probability
mass as possible to the things we have
observed…

…at the expense of the things we have not

observed

6
MOTIVATION:
LOGISTIC REGRESSION

7
Example: Image Classification
• ImageNet LSVRC-2010 contest:
– Dataset: 1.2 million labeled images, 1000 classes
– Task: Given a new image, label it with the correct class
– Multiclass classification problem
• Examples from http://image-net.org/

10
11
12
13
Example: Image Classification
CNN for Image Classification
(Krizhevsky, Sutskever & Hinton, 2011)
17.5% error on ImageNet LSVRC-2010 contest
Input • Five convolutional layers 1000-way
image (w/max-pooling)
(pixels) • Three fully connected layers softmax

Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
Example: Image Classification
CNN for Image Classification
(Krizhevsky, Sutskever & Hinton, 2011)
17.5% error on ImageNet LSVRC-2010 contest
Input • Five convolutional layers 1000-way
image (w/max-pooling)
(pixels) • Three fully connected layers softmax

The rest is just This “softmax”

some fancy
feature extraction layer is Logistic
(discussed later in Regression!
the course)

Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
LOGISTIC REGRESSION

16
Logistic Regression
Data: Inputs are continuous vectors of length M. Outputs
are discrete.

We are back to
classification.

Despite the name

logistic regression.

17
R ec a
ll…
Linear Models for Classification
Key idea: Try to learn
this hyperplane directly

Looking ahead: Directly modeling the

• We’ll see a number of hyperplane would use a
commonly used Linear
Classifiers decision function:
• These include:
– Perceptron h(t) = sign( T
t)
– Logistic Regression
– Naïve Bayes (under
certain conditions) for:
– Support Vector
Machines y { 1, +1}
R ec a
ll…
Background: Hyperplanes
Hyperplane (Definition 1):
Notation Trick: fold the T
bias b and the weights w
H = {x : w x = b}
into a single vector θ by Hyperplane (Definition 2):
prepending a constant to
x and increasing
dimensionality by one!
w

Half-spaces:
Using gradient ascent for linear
classifiers
Key idea behind today’s lecture:
1. Define a linear classifier (logistic regression)
2. Define an objective function (likelihood)
3. Optimize it with gradient descent to learn
parameters
4. Predict the class with highest probability under
the model

20
Using gradient ascent for linear
classifiers
This decision function isn’t Use a differentiable
differentiable: function instead:
1
h(t) = sign( T
t) p (y = 1|t) =
1 + 2tT( T
t)

sign(x) 1
logistic(u) ≡
1+ e−u 21
Using gradient ascent for linear
classifiers
This decision function isn’t Use a differentiable
differentiable: function instead:
1
h(t) = sign( T
t) p (y = 1|t) =
1 + 2tT( T
t)

sign(x) 1
logistic(u) ≡
1+ e−u 22
Logistic Regression
Data: Inputs are continuous vectors of length M. Outputs
are discrete.

Model: Logistic function applied to dot product of

parameters with input vector. 1
p (y = 1|t) =
1 + 2tT( T
t)
Learning: finds the parameters that minimize some
objective function. = argmin J( )

Prediction: Output is the most probable class.

ŷ = `;Kt p (y|t)
y {0,1}
23
Logistic Regression
Whiteboard
– Bernoulli interpretation
– Logistic Regression Model
– Decision boundary

24
Learning for Logistic Regression
Whiteboard
– Partial derivative for Logistic Regression
– Gradient for Logistic Regression

25
LOGISTIC REGRESSION ON
GAUSSIAN DATA

26
Logistic Regression

27
Logistic Regression

28
Logistic Regression

29
LEARNING LOGISTIC REGRESSION

30
Maximum Conditional
Likelihood Estimation
Learning: finds the parameters that minimize some
objective function.
= argmin J( )
We minimize the negative log conditional likelihood:
N
J( ) = HQ; p (y (i) |t(i) )
i=1
Why?
1. We can’t maximize likelihood (as in Naïve Bayes)
because we don’t have a joint model p(x,y)
2. It worked well for Linear Regression (least squares is
MCLE)
31
Maximum Conditional
Likelihood Estimation
Learning: Four approaches to solving = argmin J( )

Approach 1: Gradient Descent

(take larger – more certain – steps opposite the gradient)

Approach 2: Stochastic Gradient Descent (SGD)

(take many small steps opposite the gradient)

Approach 3: Newton’s Method

(use second derivatives to better follow curvature)

Approach 4: Closed Form???

(set derivatives equal to zero and solve for parameters)

32
Maximum Conditional
Likelihood Estimation
Learning: Four approaches to solving = argmin J( )

Approach 1: Gradient Descent

(take larger – more certain – steps opposite the gradient)

Approach 2: Stochastic Gradient Descent (SGD)

(take many small steps opposite the gradient)

Approach 3: Newton’s Method

(use second derivatives to better follow curvature)

Approach 4: Closed Form???

(set derivatives equal to zero and solve for parameters)

Logistic Regression does not

have a closed form solution
for MLE parameters. 33
SGD for Logistic Regression
Question:
Which of the following is a correct description of SGD for Logistic Regression?

Answer:
At each step (i.e. iteration) of SGD for Logistic Regression we…
A. (1) compute the gradient of the log-likelihood for all examples (2) update all
the parameters using the gradient
B. (1) ask Matt for a description of SGD for Logistic Regression, (2) write it down,
(3) report that answer
C. (1) compute the gradient of the log-likelihood for all examples (2) randomly
pick an example (3) update only the parameters for that example
D. (1) randomly pick a parameter, (2) compute the partial derivative of the log-
likelihood with respect to that parameter, (3) update that parameter for all
examples
E. (1) randomly pick an example, (2) compute the gradient of the log-likelihood
for that example, (3) update all the parameters using that gradient
F. (1) randomly pick a parameter and an example, (2) compute the gradient of
the log-likelihood for that example with respect to that parameter, (3) update
that parameter using that gradient

34
R ec a
ll…
Gradient Descent
Algorithm 1 Gradient Descent
(0)
1: procedure GD(D, )
(0)
2:
3: while not converged do
4: +
— J( )
5: return
d
In order to apply GD to Logistic d 1 J( )
Regression all we need is the d
d 2 J( )
gradient of the objective J( ) = ..
function (i.e. vector of partial .
derivatives). d
d N J( )
35
R ec a
ll…
Stochastic Gradient Descent (SGD)

We can also apply SGD to solve the MCLE

problem for Logistic Regression.
We need a per-example objective:
N
Let J( ) = i=1 J (i) ( )
where J (i) ( ) = HQ; p (y i |ti ).
36
Logistic Regression vs. Perceptron
Question:
True or False: Just like Perceptron, one
step (i.e. iteration) of SGD for Logistic
Regression will result in a change to the
parameters only if the current example is
incorrectly classified.

Answer:

37
Matching Game
Goal: Match the Algorithm to its Update Rule

1. SGD for Logistic Regression 4.

k k + (h (x(i) ) y (i) )
h (x) = p(y|x)
2. Least Mean Squares 5. 1
k k +
h (x) = T
x 1 + exp (h (x(i) ) y (i) )

3. Perceptron 6. (i)
T k k + (h (x(i) ) y (i) )xk
h (x) = sign( x)

A. 1=5, 2=4, 3=6 E. 1=6, 2=6, 3=6

B. 1=5, 2=6, 3=4 F. 1=6, 2=5, 3=5
C. 1=6, 2=4, 3=4 G. 1=5, 2=5, 3=5
D. 1=5, 2=6, 3=6 H. 1=4, 2=5, 3=6 38
OPTIMIZATION METHOD #4:
MINI-BATCH SGD

39
Mini-Batch SGD
• Gradient Descent:
Compute true gradient exactly from all N
examples
• Stochastic Gradient Descent (SGD):
Approximate true gradient by the gradient
of one randomly chosen example
• Mini-Batch SGD:
Approximate true gradient by the average
gradient of K randomly chosen examples

40
Mini-Batch SGD

Three variants of first-order optimization:

41
Summary
1. Discriminative classifiers directly model the
conditional, p(y|x)
2. Logistic regression is a simple linear
classifier, that retains a probabilistic
semantics
3. Parameters in LR are learned by iterative
optimization (e.g. SGD)

50
Logistic Regression Objectives
You should be able to…
• Apply the principle of maximum likelihood estimation (MLE) to
learn the parameters of a probabilistic model
• Given a discriminative probabilistic model, derive the conditional
log-likelihood, its gradient, and the corresponding Bayes
Classifier
• Explain the practical reasons why we work with the log of the
likelihood
• Implement logistic regression for binary or multiclass
classification
• Prove that the decision boundary of binary logistic regression is
linear
• For linear regression, show that the parameters which minimize
squared error are equivalent to those that maximize conditional
likelihood

51
MULTINOMIAL LOGISTIC
REGRESSION

54
55
Multinomial Logistic Regression
Chalkboard
– Background: Multinomial distribution
– Definition: Multi-class classification
– Geometric intuitions
– Multinomial logistic regression model
– Generative story
– Reduction to binary logistic regression
– Partial derivatives and gradients
– Applying Gradient Descent and SGD
– Implementation w/ sparse features

56
Debug that Program!
In-Class Exercise: Think-Pair-Share
Debug the following program which is (incorrectly)
attempting to run SGD for multinomial logistic regression

Buggy Program:
while not converged:
for i in shuffle([1,…,N]):
for k in [1,…,K]:
theta[k] = theta[k] - lambda * grad(x[i], y[i],
theta, k)

Assume: grad(x[i], y[i], theta, k) returns the gradient of the negative

log-likelihood of the training example (x[i],y[i]) with respect to vector theta[k].
lambda is the learning rate. N = # of examples. K = # of output classes. M = # of
features. theta is a K by M matrix.

Lecture 3. Classification
No ratings yet
Lecture 3. Classification
60 pages
CSCI-43646364 S25 - Lecture 4
No ratings yet
CSCI-43646364 S25 - Lecture 4
92 pages
Week 8
No ratings yet
Week 8
38 pages
03-Logistic Regression
No ratings yet
03-Logistic Regression
59 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
12 - Bài Toán Phân L P - LR - v2
No ratings yet
12 - Bài Toán Phân L P - LR - v2
130 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
Notes 05
No ratings yet
Notes 05
51 pages
Lec18 Logistic Regression
No ratings yet
Lec18 Logistic Regression
17 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
05 Logistic Regression
No ratings yet
05 Logistic Regression
33 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Fileml
No ratings yet
Fileml
54 pages
DDA3020 Lecture 06 Logistic Regression
No ratings yet
DDA3020 Lecture 06 Logistic Regression
47 pages
CH 4
No ratings yet
CH 4
41 pages
Lecture 8 Logistic Regression
No ratings yet
Lecture 8 Logistic Regression
34 pages
Final ML
No ratings yet
Final ML
54 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Lecture Note #9 - PEC-CS701E
No ratings yet
Lecture Note #9 - PEC-CS701E
41 pages
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
2 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Tmi05 2 Logistic Regression
No ratings yet
Tmi05 2 Logistic Regression
29 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
No ratings yet
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
15 pages
05 LogisticRegression PDF
No ratings yet
05 LogisticRegression PDF
23 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Lec 05
No ratings yet
Lec 05
53 pages
Lec12 Logreg
No ratings yet
Lec12 Logreg
41 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Logistic Regression
No ratings yet
Logistic Regression
19 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
SCADA-Based Smart Grid Integrating PV Systems and PLC Automation Using TIA Portal
No ratings yet
SCADA-Based Smart Grid Integrating PV Systems and PLC Automation Using TIA Portal
7 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Logistic Regression: Classification
No ratings yet
Logistic Regression: Classification
28 pages
ESP32 - Firebase
No ratings yet
ESP32 - Firebase
52 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Statistics True or False
100% (1)
Statistics True or False
9 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Desposato, Levinson, Weller (2021) The Impact of Group Collaboration and Performance On Interpersonal Trust and Cooperation
No ratings yet
Desposato, Levinson, Weller (2021) The Impact of Group Collaboration and Performance On Interpersonal Trust and Cooperation
20 pages
RPI Pico
No ratings yet
RPI Pico
44 pages
1 s2.0 S0263224121007892 Main
No ratings yet
1 s2.0 S0263224121007892 Main
12 pages
Forecasting
100% (1)
Forecasting
21 pages
Econometrics Lecture Chapter 2 Note pdf-1
No ratings yet
Econometrics Lecture Chapter 2 Note pdf-1
34 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Nodemcu (Esp8266 or Esp32)
No ratings yet
Nodemcu (Esp8266 or Esp32)
31 pages
Forecasting Techniques in Fast Moving Consumer Goods Supply Chain: A Model Proposal
No ratings yet
Forecasting Techniques in Fast Moving Consumer Goods Supply Chain: A Model Proposal
11 pages
Introduction To LVSG
No ratings yet
Introduction To LVSG
8 pages
Acb (3WL.3WT)
No ratings yet
Acb (3WL.3WT)
14 pages
Klinker EMA MEA (Pre) PDF
No ratings yet
Klinker EMA MEA (Pre) PDF
11 pages
Stochastic Gradient Descent 1
No ratings yet
Stochastic Gradient Descent 1
42 pages
Tsaf Lab Manual
No ratings yet
Tsaf Lab Manual
133 pages
Drives Electrical - New 2024
No ratings yet
Drives Electrical - New 2024
12 pages
Examples: Mixture Modeling With Cross-Sectional Data
No ratings yet
Examples: Mixture Modeling With Cross-Sectional Data
56 pages
109 Sourabh Vivek Chougule
No ratings yet
109 Sourabh Vivek Chougule
75 pages
Research Prososal Group 12 Finall
No ratings yet
Research Prososal Group 12 Finall
28 pages
FRS Writing Style
No ratings yet
FRS Writing Style
16 pages
Mub Dcmli
No ratings yet
Mub Dcmli
13 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Chapter 6 Sugeno Fuzzy and Mamdani Models
No ratings yet
Chapter 6 Sugeno Fuzzy and Mamdani Models
5 pages
DSE 3 Unit 4
No ratings yet
DSE 3 Unit 4
8 pages
Chapter 4 Fuzzy Rules and Inferences
No ratings yet
Chapter 4 Fuzzy Rules and Inferences
7 pages
DS - NLP
No ratings yet
DS - NLP
39 pages
DRDO
No ratings yet
DRDO
26 pages
Chapter 1 Fuzzy Set
No ratings yet
Chapter 1 Fuzzy Set
13 pages
NIRF 2024 Engineering Submitted
No ratings yet
NIRF 2024 Engineering Submitted
16 pages
Lecture 4
No ratings yet
Lecture 4
15 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
19 pages
Modelling of Soil Shear Strength Using Neural Network Approach
No ratings yet
Modelling of Soil Shear Strength Using Neural Network Approach
21 pages
Effect of Competence, Independence, and Professional Skepticism Against Ability To Detect Fraud Action in Audit Assignment (Survey On Public Accounting Firm Registered in IICPA Territory of Jakarta)
No ratings yet
Effect of Competence, Independence, and Professional Skepticism Against Ability To Detect Fraud Action in Audit Assignment (Survey On Public Accounting Firm Registered in IICPA Territory of Jakarta)
16 pages
3 Research Methodology
No ratings yet
3 Research Methodology
17 pages
Part A Assignment - No - 5 PDF
No ratings yet
Part A Assignment - No - 5 PDF
8 pages
Silvennoinen 2014 BFJ PDF
No ratings yet
Silvennoinen 2014 BFJ PDF
13 pages
Design, Modeling, and Validation of A Soft Magnetic 3-D Force Sensor
No ratings yet
Design, Modeling, and Validation of A Soft Magnetic 3-D Force Sensor
12 pages
Ijmet 08 06 011
No ratings yet
Ijmet 08 06 011
10 pages
Chapter 14
No ratings yet
Chapter 14
22 pages
Diponegoro Journal of Social and Political Tahun 2018, Hal 1-7
No ratings yet
Diponegoro Journal of Social and Political Tahun 2018, Hal 1-7
7 pages
Jurnal Artikel Wan Nurul Fatin Annisa - No
No ratings yet
Jurnal Artikel Wan Nurul Fatin Annisa - No
10 pages
ECON2280 2023-24 Common Course Outline
No ratings yet
ECON2280 2023-24 Common Course Outline
5 pages
Shelf Life PDF
No ratings yet
Shelf Life PDF
8 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
3 pages
Sciencedirect: Ömer Faruk İşcan Göknur Ersarı Atılhan Naktiyok
No ratings yet
Sciencedirect: Ömer Faruk İşcan Göknur Ersarı Atılhan Naktiyok
9 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Binary Logistic Regression 2

Uploaded by

Binary Logistic Regression 2

Uploaded by

10-601 Introduction to Machine Learning

Machine Learning Department

Binary Logistic Regression

…at the expense of the things we have not

The rest is just This “softmax”

Despite the name

Looking ahead: Directly modeling the

Model: Logistic function applied to dot product of

Prediction: Output is the most probable class.

Approach 1: Gradient Descent

Approach 2: Stochastic Gradient Descent (SGD)

Approach 3: Newton’s Method

Approach 4: Closed Form???

Approach 1: Gradient Descent

Approach 2: Stochastic Gradient Descent (SGD)

Approach 3: Newton’s Method

Approach 4: Closed Form???

Logistic Regression does not

We can also apply SGD to solve the MCLE

1. SGD for Logistic Regression 4.

A. 1=5, 2=4, 3=6 E. 1=6, 2=6, 3=6

Three variants of first-order optimization:

Assume: grad(x[i], y[i], theta, k) returns the gradient of the negative

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.