0% found this document useful (0 votes)
41 views31 pages

Chapter 7

The document discusses neural networks and neural language models, explaining that neural networks are composed of interconnected computational units modeled after neurons that take weighted inputs and produce an output. It provides details on the basic components of neural networks, including units, weights, biases, activation functions, and how they are arranged in layers to perform tasks like XOR classification through hidden representations. Feedforward neural networks are introduced as the simplest type of neural network with units connected in a forward direction from input to output layers.

Uploaded by

Fairooz Toroshe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views31 pages

Chapter 7

The document discusses neural networks and neural language models, explaining that neural networks are composed of interconnected computational units modeled after neurons that take weighted inputs and produce an output. It provides details on the basic components of neural networks, including units, weights, biases, activation functions, and how they are arranged in layers to perform tasks like XOR classification through hidden representations. Feedforward neural networks are introduced as the simplest type of neural network with units connected in a forward direction from input to output layers.

Uploaded by

Fairooz Toroshe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

CHAPTER 7 | Neural Networks and

Neural Language Models

“[M]achines of this character can behave in a very


complicated manner when the number of units is large.” 

                Alan Turing (1948) “Intelligent Machines”, page 6

1
Introduction

They are called neural because:

 Origins lie in the neuron 


 Simplified model of the human
neuron like computing element 
 Described in terms of
propositional logic
Introduction
Neural network is a 

 Network of small computing units 


 Takes a vector of input values  
 Produces a single output value
 Called a feedforward network 
• Because the computation proceeds iteratively from one layer
of units to the next
Unit
takes a weighted sum of its inputs  
one additional bias term     
               

Using vector notation:


                 z = w· x+b
[w: weight vector, b: scalar bias b, x: input vector x]
Activation

 a non-linear function f to z 
 output of this function : the activation value, a 
               y = a = f(z)  
 Different activation functions:  
• Sigmoid
• Tanh
• rectified linear unit or ReLU
Sigmoid
it maps the output into the range (0,1)  
  
               
 the output of a neural unit: 
        

 Example:

    weight vector: w = [0.2,0.3,0.9] , bias: b = 0.5 and x = [0.5,0.6,0.1] 


Sigmoid

 Used in the output layer of binary classification


 Disadvantage:
• Non-zero centered output
Tanh
 Advantages: mean of the activation closer to zero
Relu (Rectified linear unit)
           
    y = ReLU(z) = max(z,0) 

Advantages:

• Avoids vanishing gradient problem


• Doesn't become saturated       
Vanishing Gradient:
Networks is trained by 
 Propagating an error signal backwards

                   
Now, 

 gradients that are almost 0 cause the error signal to get smaller to be used
for training
 a problem called the vanishing gradient problem
The XOR problem
Can neural units compute simple functions of input?
AND OR
XOR
x1​ x2​ y​ x1​ x2​ y​ x1​ x2​ y​
0​ 0​ 0​ 0​ 0​ 0​ 0​ 0​ 0​
0​ 1​ 0​ 0​ 1​ 1​ 0​ 1​ 1​
1​ 0​ 0​ 1​ 0​ 1​ 1​ 0​ 1​
1​ 1​ 1​ 1​ 1​ 1​ 1​ 1​ 0​
Perceptrons

A very simple neural unit
• Binary output (0 or 1)
• No non-linear activation function
Easy to build AND or OR with perceptrons
AND :

             0
0           0      0     

             -1    
        

         
Easy to build AND or OR with perceptrons
AND :

             0
1            1     0     

             -1    
        

         
Easy to build AND or OR with perceptrons
AND :

             1
0           0     0     

             -1    
        

         
Easy to build AND or OR with perceptrons
AND :

             1
1           1       1   

             -1    
        

         
Easy to build AND or OR with perceptrons
OR :

         0
                    0   
         0         0      0
                            
                   0  
Easy to build AND or OR with perceptrons
OR :

         0
                    0   
         1         1      1
                            
                   0  
Easy to build AND or OR with perceptrons
OR :

         1
                    1   
         0         0      1
                            
                   0  
Easy to build AND or OR with perceptrons
OR :

         1
                    1  
         1         1      1
                            
                   0  
Not possible to capture XOR with perceptrons !!

 Perceptron equation given x1 and x2, is the equation of a line

w1x1 + w2x2 + b = 0

 in standard linear format: x2 = (−w1/w2)x1 + (−b/w2) 

 This line acts as a decision boundary
• 0 if input is on one side of the line
• 1 if on the other side of the line
Decision boundaries
x x x
12 12 12

?
0 x 0 x 0
0 1 0 1 0 1
1 1
a) x1 b) x1 c) x1

AND x2 OR x2 XOR x2
 Filled circles represent perceptron outputs of 1,
 white circles perceptron outputs of 0
 no way to draw a line that correctly separates the two categories for
XOR
The solution: neural networks
 Can be calculated by a layered network of perceptron units

 Can compute using two layers of ReLU-based units


• The middle layer (called h) has two units
• the output layer (called y) has one unit
The solution: neural networks
        0                       0
                           0
                 0
                                       0  input x = [0, 0]
                                  0                                           In hidden layer,  [0, -1]
        0                        0        -1 =>[0]  After Relu  h layer as
                                  0  [0, 0]
                                 -1  Final output 0
The solution: neural networks
        0                        0
                           1
                 1
                                       1  input x = [0, 1]
                                  0                                           In hidden layer,  [1, 0]
        1                        0                 0   After Relu  h layer as
                                  1  [1, 0]
                                 -1  Final output 1
The hidden representation h
 hidden representations  x
= [0, 1] and x = [1, 0]  are
merged into h = [1, 0]
 The merger makes it easy
to linearly separate the
positive and negative
cases of XOR
Feedforward Neural Networks
 Simplest kind of neural network
 Multilayer  network 
 Units are connected with no cycles
 Outputs from units in each layer are passed to units in the
next higher layer
 No outputs are passed back to lower layers
 Sometimes called multi-layer perceptrons
Feedforward Neural
Networks
 feedforward networks have three
kinds of nodes
• input units, hidden units, and
output units
 fully-connected 
• Takes as input the outputs from
previous layer
• link between every pair of units
from two adjacent layers
Hidden layer computation
3 steps : Multiplying the weight matrix W by the input vector x , adding the
bias vector  b , applying the activation function g

h = σ(Wx+b)
X = Where,
g[z1,z2,z3] =
[g(z1),g(z2),g(z3)]

+ =
Output layer computation
weight matrix U  , U ∈  n2×n1
input vector (h)
intermediate output z, z ∈ R n2
           Where,       z = Uh

z : a vector of real-valued number


can’t be the output of the classifier
Softmax
For normalizing a vector of real values
To a vector that encodes a probability
distribution between 0 and 1 
 sum to 1
Used for muliclass classification 
Final Equation

h = σ(Wx+b) 
z = Uh 
y = softmax(z)

Where x ∈ n0 , h ∈ n1 , b ∈ n1 , W ∈ n1×n0 , U ∈ n2×n1 , and the


output vector y ∈ n2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy