0% found this document useful (0 votes)

6 views20 pages

Logistic Regression (Probability Concepts) and Perceptron

The document discusses classification algorithms, particularly focusing on binary classification where outcomes are discrete values, such as disease diagnosis or spam detection. It explains the limitations of applying linear regression to classification problems and introduces the logistic function as a better alternative for predicting probabilities between 0 and 1. The document also outlines the process of fitting parameters using maximum likelihood estimation and gradient ascent for logistic regression, contrasting it with the perceptron algorithm for binary outputs.

Uploaded by

mohamed el dahan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views20 pages

Logistic Regression (Probability Concepts) and Perceptron

Uploaded by

mohamed el dahan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

ClAssification

Algorithms
Recall that:
Regression problems are where the variables that you’re
trying to predict are continuous values.
In classification problem we need to talk about if the values
you’re trying to predict will be discrete values.
You can take on only a small number of discrete values and
in this case I’ll talk about binary classification where output
takes on only two values.
Classification
One classification problems, say, a medical diagnosis and try to
decide based on some features that the patient has a disease or
does not have a disease.
Or if in the housing example, maybe you’re trying to decide
will this house be sold in the next six months or not and the
answer is either yes or no.
Another example, if you want to build a spam filter. Is this e-
mail spam or not? It’s yes or no.
So, it’s a yes or no answer.
So there’s X, there’s Y and Y є [0,1].

Y
+1 _
xxxxx

xxxxx
q x
One thing you could do is take linear regression, as
we’ve described it so far, and apply it to this problem,
So given this data set you can fit a straight line to it.
Maybe you get that straight line → easy classification
problem.
So you apply linear regressions to this data set and you get a
reasonable fit and you can then maybe take your linear regression
hypothesis to this straight line and threshold it at 0.5.

Y
+1 _
xxxxx
0.5

xxxxx
q x
If you do that you’ll certainly get the right answer.
You predict that if X is to the right of, the mid-point q then Y =1
and then if X is to the left of that mid-point then Y =0.
So some people actually apply linear regression to
classification problems and sometimes it’ll work well, but in
general it’s actually a bad idea to apply linear regression to
classification problems like these.
This is because:
Let’s say I change my training set by giving you just one more
training example all the way up there.

Y
+1 _
xxxxx x
0.5

xxxxx
q x
Actually this training set is still entirely obvious what the
relationship between X and Y is
Just take this value (q) and if X is greater than q then Y =1
but if it’s less than q then Y = 0.
By giving you this additional training example it really
shouldn’t change anything.
There’s no surprise that this corresponds to Y =1 .
Y
+1 _
xxxxx x
0.5

xxxxx
q x
But if you now fit linear regression to this data set you end up with
a line that looks maybe like that.
And now the predictions of your hypothesis have changed
completely if your threshold – your hypothesis at Y equal both 0.5.
What is the relationship between X and Y now???

Y
+1 _
xxxxx x
0.5

xxxxx
q x
So what shall we do?
If Y є [0,1].
Let’s just start by changing the form of our
hypothesis so that my hypothesis always lies in the
unit interval between 0 and 1.
hϴ(x) є [0,1].
If I know Y is either 0 or 1 then let’s at least not have
the hypothesis predict values much larger than 1 and
much smaller than 0.
And so instead of choosing a linear function for the
hypothesis we are going to choose this function,
1
hϴ(x) = g(ϴTx) Where : g ( z) = −z
1+ e

1
h ( x) = g ( x) =
T
So, 1+ e − T x sigmoid function
(logistic function)

g(z) As
Z>>
1 _

0.5 _
As
Z<<
z
So g(z) tends towards 0 as Z becomes very small and g(z) will
ascend towards 1 as Z becomes large and it crosses the vertical
axis at 0.5.
As z << → g(z) tends towards 0
As z >> → g(z) ascend towards 1
So this is sigmoid function, also called the logistic function.
The output values by my hypothesis will always be between 0
&1.
Furthermore, just like we did for linear regression, I’m going to
endow the outputs and the hypothesis with a probabilistic
interpretation, So I’m going to assume that the probability that y
= 1 given x parameterized by ϴ =
P(y=1|x; ϴ) = hϴ(x)
So in other words we’ll imagine that the hypothesis is outputting
all these numbers that lie between zero and one.
And we are going to think of the hypothesis as trying to estimate
the probability that y = 1.
And because y has to be either 0 and 1, then the probability of y
equals zero is going to be:

P(y=0|x; ϴ) = 1 - hϴ(x)

Take previous two equations and write them more compactly as:
P(y|x; ϴ) = (hϴ(x) )y(1 - hϴ(x)) 1-y
Given this model by data, how do I fit the parameters ϴ of my
model?
So the likelihood of the parameters is, as before, it’s just the
probability of data.
 m
L( ) = P(Y | X ; ) =  P( y i | x i ; )
i =1

Plugging the previous compact form into this eq. yield:

m
L( ) =  (h ( x )) (1 − h ( x ))
i yi i 1− y i

i =1

So, as before, let’s say we want to find a maximum likelihood

estimate of the parameters ϴ.
It turns out that very often – just when you work with the
derivations, it is often much easier to maximize the log of the
likelihood rather than maximize the likelihood.
l(ϴ) = log L(ϴ)

m
l ( ) = log L( ) =  y i log( h ( x i )) + (1 − y i ) log(1 − h ( x i ))
i =1

And so to fit the parameters ϴ of our model we’ll find the

value of ϴ that maximizes this log likelihood.
So having maximized this function, we can actually apply the
same gradient descent algorithm that we learned.
Which was the first algorithm we used to minimize the quadratic
error function J(ϴ).
So we can actually use exactly the same algorithm to maximize
the log likelihood.
That algorithm was just repeatedly take the value of ϴ and you
replace it with the previous value of ϴ plus a learning rate α
times the gradient of the error function.

 :=  +  l ( )
One small change is that because previously we were trying to
minimize the quadratic error term.
Today we’re trying to maximize rather than minimize. So rather
than having a minus sign we have a plus sign.
So this is just great in ascents, but for the maximization rather than
the minimization.
So we actually call this gradient ascent and it’s really the same
algorithm.
So what you need to do is compute the partial derivatives of your
objective function with respect to each of your parameters ϴi.
The way you get this is to take derivatives, and work through the
algebra it turns out it’ll simplify down to this formula:
m
 j :=  j +   ( y − h ( x )) x
i i i
j
i =1
We actually had exactly the same learning rule as for least squares
regression,
So, is this the same learning algorithm as the previous least squares
regression which we declared before as being a bad idea for
classification problems?
Actually, this is not the same, as in logistic regression definition
of this hϴ(x) is no longer (ϴTxi) this is not a linear function
anymore.
hϴ(x) = g(ϴTx)

Actually, this is a logistic function of (ϴTxi)

This is actually a totally different learning algorithm.
But this is one of the most elegant generalized learning models.
perceptron algorithm
What if you want to force g(z) to output a value to either 0 or 1?
Rather than the logistic algorithm which ouput values in between
[0,1]
So the perceptron algorithm defines g(z) to be:

1 if z  0
g ( z) =  g(z)
0 otherwise step function

1 _
and hϴ(x) = g(ϴTx)
Learning rule is:
m
 j :=  j +   ( y i − h ( x i )) x ij
i =1
z
Recall that:
In logistic regression model, to find the value of ϴ that
maximizes this log likelihood, gradient ascent or gradient
descent is a perfectly fine algorithm to use.
P(y=1|x; ϴ) = hϴ(x)

1
h ( x) = g ( x) =
T
− T x
1+ e
m
l ( ) = log L( ) =  y i log( h ( x i )) + (1 − y i ) log(1 − h ( x i ))
i =1

 j :=  j +  ( y i − h ( x i )) x ij

CSCI-43646364 S25 - Lecture 4
No ratings yet
CSCI-43646364 S25 - Lecture 4
92 pages
Unit-2 MLT
No ratings yet
Unit-2 MLT
84 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Classification
No ratings yet
Classification
31 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
27 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
CSE 440 AI Volume1 (p1)
No ratings yet
CSE 440 AI Volume1 (p1)
4 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
Mauryan Empire
No ratings yet
Mauryan Empire
11 pages
Slide 2
No ratings yet
Slide 2
30 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
LogisticRegression ExercisesSolutions
No ratings yet
LogisticRegression ExercisesSolutions
5 pages
Chapter Regression
No ratings yet
Chapter Regression
10 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Week 3 Lecture Notes
No ratings yet
Week 3 Lecture Notes
7 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
Linux Essentials Full Course
100% (5)
Linux Essentials Full Course
210 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
No ratings yet
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
10 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Machine Learning Homework
No ratings yet
Machine Learning Homework
8 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
5 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
CH 1
No ratings yet
CH 1
24 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
Cost Function
No ratings yet
Cost Function
17 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Oracle Cash Management
100% (2)
Oracle Cash Management
14 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Notes Chapter Logistic Regression
No ratings yet
Notes Chapter Logistic Regression
6 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Test Card Credit
100% (6)
Test Card Credit
32 pages
AutoCAD Drawing Commands
No ratings yet
AutoCAD Drawing Commands
9 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Software Development: Cansat Program
No ratings yet
Software Development: Cansat Program
22 pages
Military Synthetic Training and Simulation Markets Europe
100% (1)
Military Synthetic Training and Simulation Markets Europe
97 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Java For Selenium
No ratings yet
Java For Selenium
45 pages
U1
No ratings yet
U1
2 pages
Dell Optiplex 3020-Small Form Factor Owner'S Manual: Regulatory Model: D08S Regulatory Type: D08S001
No ratings yet
Dell Optiplex 3020-Small Form Factor Owner'S Manual: Regulatory Model: D08S Regulatory Type: D08S001
63 pages
Tutorial On Kalman Filter
No ratings yet
Tutorial On Kalman Filter
47 pages
CHEMISTRY - 3.1 Accuracy Precision Practice Sig Figs and Sci Notation
100% (1)
CHEMISTRY - 3.1 Accuracy Precision Practice Sig Figs and Sci Notation
20 pages
ICTBroadcast, A Unified Autodialer Software, Enterprise Edition User Guide
No ratings yet
ICTBroadcast, A Unified Autodialer Software, Enterprise Edition User Guide
37 pages
END Semester Lab Exam EVEN 2025
No ratings yet
END Semester Lab Exam EVEN 2025
1 page
Datasheet - A-HV-3U Battery BOS-A V1.1
No ratings yet
Datasheet - A-HV-3U Battery BOS-A V1.1
6 pages
APS 502 LP Models
No ratings yet
APS 502 LP Models
37 pages
Anand Techno Creations Company: Industrial Training Report
No ratings yet
Anand Techno Creations Company: Industrial Training Report
46 pages
08 GT I9070 Tshoo 7
No ratings yet
08 GT I9070 Tshoo 7
49 pages
Response From Payment Gateway: Home
No ratings yet
Response From Payment Gateway: Home
3 pages
Tle-9 Css q4 w3-4 m2 Lds Needs-And-wants RTP
No ratings yet
Tle-9 Css q4 w3-4 m2 Lds Needs-And-wants RTP
14 pages
WR 1 Q P Memo
No ratings yet
WR 1 Q P Memo
7 pages
Screen Capture: User's Guide
No ratings yet
Screen Capture: User's Guide
15 pages
End-Of-Term Test Higher A
No ratings yet
End-Of-Term Test Higher A
4 pages
Contact Summary
No ratings yet
Contact Summary
19 pages
Name:-Nitish Xavier Tirkey F.Y.Bca Date: - 4 October, 2010
No ratings yet
Name:-Nitish Xavier Tirkey F.Y.Bca Date: - 4 October, 2010
10 pages
Riki Endri S (Kipas Angin Dinding Portable)
No ratings yet
Riki Endri S (Kipas Angin Dinding Portable)
10 pages
Smart India Hackathon 2024
No ratings yet
Smart India Hackathon 2024
6 pages
.. Link Analysis Report: Site Information
No ratings yet
.. Link Analysis Report: Site Information
3 pages
Links For Learning German
No ratings yet
Links For Learning German
2 pages
Modelo de Negocio Secubike
No ratings yet
Modelo de Negocio Secubike
1 page
When Should You Use The Spearman's Rank-Order Correlation?
No ratings yet
When Should You Use The Spearman's Rank-Order Correlation?
6 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Logistic Regression (Probability Concepts) and Perceptron

Uploaded by

Logistic Regression (Probability Concepts) and Perceptron

Uploaded by

ClAssification

Plugging the previous compact form into this eq. yield:

So, as before, let’s say we want to find a maximum likelihood

And so to fit the parameters ϴ of our model we’ll find the

Actually, this is a logistic function of (ϴTxi)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.