0% found this document useful (0 votes)
87 views43 pages

Learning XOR - Gradient Based Learning - Hidden Units

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views43 pages

Learning XOR - Gradient Based Learning - Hidden Units

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

1

Perceptrons 𝑏
𝑤

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑥1 1
𝑇 Output:
𝑤2
𝑥2

𝑤𝐷
𝑥𝐷

[]
𝑥1
• A perceptron is a function that maps
𝑥2
D-dimensional vectors to real numbers. 𝐱 = 𝑥3
• For notational convenience, we add an

extra input, called the bias input. 𝑥𝐷
The bias input is always equal to 1.
• is called the bias weight. It is optimized during training.
• are also weights that are optimized during training. 1
1
Perceptrons 𝑏
𝑤

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑥1 1
𝑇 Output:
𝑤2
𝑥2

𝑤𝐷
𝑥𝐷
• A perceptron computes its output in two steps:
Step 1:

Step 2:

• is called an activation function.


• For example, could be the sigmoid function
2
1
Perceptrons 𝑏
𝑤

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑥1 1
𝑇 Output:
𝑤2
𝑥2

𝑤𝐷
𝑥𝐷
• A perceptron computes its output in two steps:
Step 1:

Step 2:
• In a single formula:

3
Notation for 1 𝑤
0
𝑤
Bias Weight
𝑧=h ( 𝒘 𝒙 )
𝑥1 1
𝑇 Output:
𝑤2
𝑥2

𝑤𝐷
𝑥𝐷

• There is an alternative representation that we will not use, where


is denoted as , and weight vector .
• Then, instead of writing we can simply write
• In our slides, we will denote the bias weight as and treat it
separately from the other weights. That will make life easier later.
4
Perceptrons 1 𝑏
and 𝑤

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑥1 1
𝑇 Output:
Neurons 𝑥 𝑤 2
2

𝑤𝐷
𝑥𝐷

• Perceptrons are inspired by neurons.


– Neurons are the cells forming the nervous system, and the brain.
– Neurons somehow sum up their inputs, and if the sum exceeds a
threshold, they "fire".
• Since brains are "intelligent", computer scientists have been
hoping that perceptron-based systems can be used to model
intelligence. 5
Activation
Functions
• A perceptron produces
output .
• One choice for the
activation function :
the step function.

• The step function is useful for providing some intuitive examples.


• It is not useful for actual real-world systems.
– Reason: it is not differentiable, it does not allow optimization via gradient
descent.

6
Activation
Functions
• A perceptron produces
output .
• Another choice for the
activation function :
the sigmoidal function.

• The sigmoidal is often used in real-world systems.


• It is a differentiable function, it allows use of gradient descent.

7
Example: The AND Perceptron
• Suppose we use the step function for activation.
• Suppose boolean value false is represented as number 0.
• Suppose boolean value true is represented as number 1.
• Then, the perceptron below computes the boolean AND function:

false AND false = false


1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 ) 𝑇 Output:


𝑤 2 =1
𝑥2 8
Example: The AND Perceptron
• Verification: If and :

–.
• Corresponds to case false AND false = false.
false AND false = false
1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 9
Example: The AND Perceptron
• Verification: If and :

–.
• Corresponds to case false AND true = false.
false AND false = false
1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 10
Example: The AND Perceptron
• Verification: If and :

–.
• Corresponds to case true AND false = false.
false AND false = false
1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 11
Example: The AND Perceptron
• Verification: If and :

–.
• Corresponds to case true AND true = true.
false AND false = false
1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 12
Example: The OR Perceptron
• Suppose we use the step function for activation.
• Suppose boolean value false is represented as number 0.
• Suppose boolean value true is represented as number 1.
• Then, the perceptron below computes the boolean OR function:

false OR false = false


1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 ) 𝑇 Output:


𝑤 2 =1
𝑥2 13
Example: The OR Perceptron
• Verification: If and :

–.
• Corresponds to case false OR false = false.
false OR false = false
1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 14
Example: The OR Perceptron
• Verification: If and :

–.
• Corresponds to case false OR true = true.
false OR false = false
1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 15
Example: The OR Perceptron
• Verification: If and :

–.
• Corresponds to case true OR false = true.
false OR false = false
1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 16
Example: The OR Perceptron
• Verification: If and :

–.
• Corresponds to case true OR true = true.
false OR false = false
1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 17
Example: The NOT Perceptron
• Suppose we use the step function for activation.
• Suppose boolean value false is represented as number 0.
• Suppose boolean value true is represented as number 1.
• Then, the perceptron below computes the boolean NOT function:

NOT(false) = true
1
𝑏= NOT(true) = false
0.5

𝑧=h ( 𝑏+𝒘 𝒙 ) 𝑇
𝑤1 =−1 Output:
𝑥1
18
Example: The NOT Perceptron
• Verification: If :

–.
• Corresponds to case NOT(false) = true.

NOT(false) = true
1
𝑏= NOT(true) = false
0.5

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇
𝑤1 =−1 Output:
𝑥1
19
Example: The NOT Perceptron
• Verification: If :

–.
• Corresponds to case NOT(true) = false.

NOT(false) = true
1
𝑏= NOT(true) = false
0.5

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇
𝑤1 =−1 Output:
𝑥1
20
The XOR
Function
false XOR false = false
false XOR true = true
true XOR false = true
true XOR true = false

• As before, we represent
false with 0 and true with 1.
• The figure shows the four input points of the XOR function.
– red corresponds to output value true.
– green corresponds to output value false.
• The two classes (true and false) are not linearly separable.
• Therefore, no perceptron can compute the XOR function.
21
Our First Neural Network: XOR
• A neural network is built using perceptrons as building blocks.
• The inputs to some perceptrons are outputs of other perceptrons.
• Here is an example neural network computing the XOR function.

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 22
Our First Neural Network: XOR
• Terminology: inputs and perceptrons are all called “units”.
• Units are grouped in layers: layer 1 (input), layer 2, layer 3 (output).
• The input layer just represents the inputs to the network.
– There are two inputs: and .

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 23
Our First Neural Network: XOR
• Such networks are called layered networks, more details later.
• Each unit is indexed by two numbers (layer index, unit index).
• Each bias weight is indexed by the same two numbers as its unit.
• Each weight is indexed by three numbers (layer, unit, weight).
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 24
Our First Neural Network: XOR
• Note: every weight is associated with two units: it connects the
output of a unit with an input of another unit.
– Which of the two units do we use to index the weight?

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 25
Our First Neural Network: XOR
• To index a weight , we use the layer number and unit number of
the unit for which is an incoming weight.
• Weights incoming to unit are indexed as , where ranges from 1 to
the number of incoming weights for unit .
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 26
Our First Neural Network: XOR
• Weights incoming to unit are indexed as , where ranges from 1 to
the number of incoming weights for unit .
• Since the input layer (which is layer 1) has no incoming weights,
there are no weights indexed as .
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 27
Our First Neural Network: XOR
• The XOR network shows how individual perceptrons can be
combined to perform more complicated functions.

Logical OR

Input unit, 𝑏 2, 1 =− 0.5 𝑏 Logical


1=
outputs 3,
= 1 Unit −0.5
(A AND (NOT B)
𝑤 2 , 1 ,1 2,1 𝑤
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 Logical AND 3,1
𝑤 − 1. 5
= − 1
,1 =
2, 2
Unit 1
1,2 𝑤2 𝑤 3 ,1 ,2
, 2 ,2 = Unit
1
Input unit, 2,2
outputs 28
Computing the Output: An Example
• Suppose that (corresponding to false XOR true).
• For Unit 2,1, which performs a logical OR:
– The output is .
– Assuming that is the step function, , so Unit 2,1 outputs 1.
Input unit, − 0.5 𝑏
outputs Unit
1
3,
1 =−
2,1 𝑤 0.5
0∗ 1 (OR) 3,
1, =1
Unit 1 Unit
1,1 ∗ 1𝑏 3,1 Output:
1 2, =
2
𝑤 −1 (A AND
.5 = − 1 (NOT B))
,1 =
2, 2
Unit 1
1,2 𝑤2 Unit 𝑤 3 ,1 ,2
, 2 ,2 =1 2,2
Input unit,
(AND)
outputs 29
Computing the Output: An Example
• Suppose that (corresponding to false XOR true).
• For Unit 2,2, which performs a logical AND:
– The output is .
– Since is the step function, , so Unit 2,2 outputs 0.
Input unit, − 0.5 𝑏
outputs Unit
1
3,
1 =−
2,1 𝑤 0.5
0∗ 1 (OR) 3,
1, =1
Unit 1 Unit
1,1 ∗ 1 3,1 Output:
1 −1
. (A AND
0∗ 5 = − 1 (NOT B))
Unit 1 𝑤 3 ,1 ,2
Unit
1,2 1 ∗1 2,2
0
Input unit,
(AND)
outputs 30
Computing the Output: An Example
• Suppose that 1 (corresponding to false XOR true).
• Unit 3,1 is the output unit, computing the A AND (NOT B) function:
– One input is the output of the OR unit, which is 1.
– The other input is the output of the AND unit, which is 0.
Input unit, − 0.5 𝑏
outputs Unit
1
3,
1 =−
2,1 𝑤 0.5
0∗ 1 (OR) 3,
1, =1
Unit 1 Unit
1,1 ∗ 1 3,1 Output:
1 −1
. (A AND
0∗ 5 = − 1 (NOT B))
Unit 1 𝑤 3 ,1 ,2
Unit
1,2 1 ∗1 2,2
0
Input unit,
(AND)
outputs 31
Computing the Output: An Example
• Suppose that 1 (corresponding to false XOR true).
• For the output unit (computing the A AND (NOT B) function):
– The output is .
– Since is the step function, , so Unit 3,1 outputs 1.
Input unit, − 0.5
outputs Unit −0
2,1 .5
0∗ 1 (OR) 1∗
1
Unit Unit
1,1 ∗ 1 3,1 Output:
1 −1
. (A AND 1
0∗ 5 − 1 ) (NOT B))
Unit 1 0∗ (
Unit
1,2 1 ∗1 2,2
Input unit,
(AND)
outputs 32
Verifying the XOR Network
• We can follow the same process to compute the output of this
network for the other three cases.
– Here we consider the case where (corresponding to false XOR false).
– The output is 0, as it should be.

Input unit, − 0.5


outputs Unit −0
2,1 .5
0∗ 1 (OR) 0∗
1
Unit Unit
1 Output:
1,1
0∗ − 3,1
1. 5 (A AND 0
0∗ − 1 ) (NOT B))
Unit 1 0∗ (
Unit
1,2 0∗1 2,2
Input unit,
(AND)
outputs 33
Verifying the XOR Network
• We can follow the same process to compute the output of this
network for the other three cases.
– Here we consider the case where (corresponding to true XOR false).
– The output is 1, as it should be.

Input unit, − 0.5


outputs Unit −0
2,1 .5
1∗ 1 (OR) 1∗
1
Unit Unit
1 Output:
1,1
0∗ − 3,1
1. 5 (A AND 1
1∗ − 1 ) (NOT B))
Unit 1 0∗ (
Unit
1,2 0∗1 2,2
Input unit,
(AND)
outputs 34
Verifying the XOR Network
• We can follow the same process to compute the output of this
network for the other three cases.
– Here we consider the case where (corresponding to true XOR true).
– The output is 0, as it should be.

Input unit, − 0.5


outputs Unit −0
2,1 .5
1∗ 1 (OR) 1∗
1
Unit Unit
1,1 ∗ 1 3,1 Output:
1 −1
. (A AND 0
1∗ 5 − 1 ) (NOT B))
Unit 1 1∗ (
Unit
1,2 1 ∗1 2,2
Input unit,
(AND)
outputs 35
Neural Networks
• Our XOR neural network consists of five units:
– Two input units, that just represent the two inputs to the network.
– Three perceptrons.

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 36
Neural Network Layers
• Oftentimes, as in the XOR example, neural networks are organized into layers.
• The input layer is the initial layer of input units (units 1,1 and 1,2 in our
example).
• The output layer is at the end (unit 3,1 in our example).
• Zero, one or more hidden layers can be between the input and output layers.

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 37
Neural Network Layers
• There is only one hidden layer in our example, containing units 2,1 and 2,2.
• Each hidden layer's inputs are outputs from the previous layer.
• Each hidden layer's outputs are inputs to the next layer.
• The first hidden layer's inputs come from the input layer.
• The last hidden layer's outputs are inputs to the output layer.
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 38
Feedforward Networks
• Feedforward networks are networks where there are no directed loops.
• If there are no loops, the output of a unit cannot (directly or indirectly)
influence its input.
• While there are varieties of neural networks that are not feedforward or
layered, our main focus will be layered feedforward networks.

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 39
Computing the Output
• Notation: is the number of layers. Layer 1 is the input layer, layer is the output
layer.
• The outputs of the units of layer 1 are simply the inputs to the network.
• For :
– Compute the outputs of layer , given the outputs of layer .

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 40
Computing the Output
• To compute the outputs of layer (where ), we simply need to
compute the output of each perceptron belonging to layer .
– For each such perceptron, its inputs are coming from outputs of units at
layer , which we have already computed.
– Remember, we compute layer outputs in increasing order of .
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 41
What Neural Networks Can Compute
• An individual perceptron is a linear classifier.
– The weights of the perceptron define a linear boundary between two classes.
• Layered feedforward neural networks with one hidden layer can
compute any continuous function.
• Layered feedforward neural networks with two hidden layers can
compute any mathematical function.
• This has been known for decades, and is one reason scientists have
been optimistic about the potential of neural networks to model
intelligent systems.
• Another reason is the analogy between neural networks and
biological brains, which have been a standard of intelligence we are
still trying to achieve.
• There is only one catch: How do we find the right weights?
42
Finding the Right Weights
• The goal of training a neural network is to figure out
good values for the weights of the units in the
network.

43

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy