Neural Networks
Neural Networks
1. Introduction:
1.1 History:
The history of artificial neural networks is filled with colorful, creative individuals from
a variety of fields, many of whom struggled for decades to develop concepts that we now take
for granted.
The modern view of neural networks began in the 1940s with the work of Warren
McCulloch and Walter Pitts, who showed that networks of artificial neurons could, in principle,
compute any arithmetic or logical function. Their work is often acknowledged as the origin of
the neural network field.
The first practical application of artificial neural networks came in the late 1950s, with
the invention of the perceptron network and associated learning rule by Frank Rosenblatt.
Rosenblatt and his colleagues built a perceptron network and demonstrated its ability to
perform pattern recognition. At about the same time, Bernard Widrow and Ted Hoff introduced
a new learning algorithm and used it to train adaptive linear neural networks, which were
similar in structure and capability to Rosenblatt’s perceptron. The Widrow-Hoff learning rule
is still in use today.
Interest in neural networks had faltered during the late 1960s because of the lack of new
ideas and powerful computers with which to experiment. During the 1980s both of these
impediments were overcome, and research in neural networks increased dramatically. New
personal computers and workstations, which rapidly grew in capability, became widely
available. In addition, important new concepts were introduced.
Two new concepts were most responsible for the rebirth of neural networks. The first
was the use of statistical mechanics to explain the operation of a certain class of recurrent
network, which could be used as an associative memory.
The second key development of the 1980s was the backpropagation algorithm for
training multilayer perceptron networks, which was discovered independently by several
different researchers.
Many of the advances in neural networks have had to do with new concepts, such as
innovative architectures and training rules. Just as important has been the availability of
powerful new computers on which to test these new concepts.
1
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
1.2 Applications:
Google uses neural networks for image tagging (automatically identifying an image and
assigning keywords), and Microsoft has developed neural networks that can help convert
spoken English speech into spoken Chinese speech. These examples are indicative of the broad
range of applications that can be found for neural networks. The applications are expanding
because neural networks are good at solving problems, not just in engineering, science and
mathematics, but in medicine, business, finance and literature as well. Their application to a
wide variety of problems in many fields makes them very attractive. Also, faster computers
and faster algorithms have made it possible to use neural networks to solve complex industrial
problems that formerly required too much computation.
The following list are some of neural network applications:
1. Aerospace :
High performance aircraft autopilots, flight path simulations, aircraft control
systems, autopilot enhancements, aircraft component simulations, aircraft
component fault detectors.
2. Automotive:
Automobile automatic guidance systems, fuel injector control, automatic braking
systems, misfire detection, virtual emission sensors, warranty activity analyzers.
3. Banking:
Check and other document readers, credit application evaluators, cash forecasting,
firm classification, exchange rate forecasting, predicting loan recovery rates,
measuring credit risk.
4. Defense:
Weapon steering, target tracking, object discrimination, facial recognition, new kinds
of sensors, sonar, radar and image signal processing including data compression,
feature extraction and noise suppression, signal/image identification.
5. Electronics:
Code sequence prediction, integrated circuit chip layout, process control, chip failure
analysis, machine vision, voice synthesis, nonlinear modeling.
6. Entertainment:
Animation, special effects, market forecasting.
7. Financial:
Real estate appraisal, loan advisor, mortgage screening, corporate bond rating, credit
line use analysis, portfolio trading program, corporate financial analysis, currency
price prediction.
2
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
3
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
single long fiber that carries the signal from the cell body out to other neurons. The point of
contact between an axon of one cell and a dendrite of another cell is called a synapse. It is the
arrangement of neurons and the strengths of the individual synapses, determined by a complex
chemical process, that establishes the function of the neural network. Figure 1.1 is a simplified
schematic diagram of two biological neurons.
4
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
referred to as the net input, goes into a transfer function (activation function) 𝑓 , which
produces the scalar neuron output .
If we relate this simple model back to the biological neuron that we discussed in section
1.3, the weight 𝑤 corresponds to the strength of a synapse, the cell body is represented by the
summation and the transfer function, and the neuron output 𝑎 represents the signal on the axon.
5
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
0 𝑛<0
𝑓(𝑛) = { ………. (2.1)
1 𝑛≥0
This function will used to create neurons that classify inputs into two distinct categories
6
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
The log-sigmoid transfer function is commonly used in multilayer networks that are
trained using the backpropagation algorithm.
4. Hyperbolic tangent transfer function:
The hyperbolic tangent is a sigmoid function and is defined by
𝑒 𝑛 −𝑒 −𝑛
𝑓(𝑛) = ………….. (2.4)
𝑒 𝑛 +𝑒 −𝑛
7
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
𝑛
tanh( 2 )+1 1
Since = then using the tanh function instead of the logistic one is
2 1+𝑒−𝑛
equivalent. The tanh function has the advantage of being symmetrical with respect to the
origin.
5. Radial Basis transfer function:
2
𝑓 = e−𝑛 ………….. (2.5)
8
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Example 2.2:
Let = 4 , 𝑝 = 2 and 𝑏 = – 2 with 𝑓 radial basis, what is the single neuron output ?
2
𝑓 = e−𝑛
2 2
𝑎 = 𝑒 −(4∗2+(−2)) = 𝑒 −(6) = 2.31952E − 16
9
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
10
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
11
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
As noted previously, the row indices of the elements of matrix 𝐖 indicate the
destination neuron associated with that weight, while the column indices indicate the source of
the input for that weight. Thus, the indices in 𝑤3,2 say that this weight represents the connection
to the third neuron from the second source.
Fortunately, the S-neuron, R-
input, one-layer network also can be
drawn in abbreviated notation, as
shown in Figure 2.10.
12
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
As shown, there are 𝑅 inputs, 𝑆 1 neurons in the first layer, 𝑆 2 neurons in the second
layer, etc. As noted, different layers can have different numbers of neurons.
The outputs of layers one and two are the inputs for layers two and three. Thus layer 2
can be viewed as a one-layer network with 𝑅 = 𝑆 1 inputs, 𝑆 = 𝑆 2 neurons, and an 𝑆 2 × 𝑆 1
weight matrix 𝐖 𝟐 . The input to layer 2 is 𝐚𝟏 , and the output is 𝐚𝟐 .
A layer whose output is the network output is called an output layer. The other layers
are called hidden layers. The network shown above has an output layer (layer 3) and two hidden
layers (layers 1 and 2).
The same three-layer network discussed previously also can be drawn using our
abbreviated notation, as shown in Figure 2.12.
13
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Thus the output is the input delayed by one time step. (This assumes that time is updated
in discrete steps and takes on only integer values.) Eq. (2.6) requires that the output be
initialized at time = 0 . This initial condition is indicated in Figure 2.13 by the arrow coming
into the bottom of the delay block.
Another related building block, which we will use for the continuous-time recurrent
networks is the integrator, which is shown in Figure 2.14.
The Integrator output 𝐚(𝑡) is computed from its
input 𝐮(𝑡) according to
𝑡
𝐚(𝑡) = ∫0 𝐮(𝜏) 𝑑𝜏 + 𝐚(0) ……….. (2.7)
𝐚(1) = 𝐬𝐚𝐭𝐥𝐢𝐧𝐬(𝐖𝐚(0) + 𝐛)
𝐚(2) = 𝐬𝐚𝐭𝐥𝐢𝐧𝐬(𝐖𝐚(1) + 𝐛)
⋮ Figure 2.15 Recurrent Network
15
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Examples:
Example 3.1:
Given a two-input neuron with the following parameters: 𝑏 = 1.2 , 𝐖 = [3 2] and
𝐩 = [−5 6]T , calculate the neuron output for the following transfer functions:
1. A symmetrical hard limit transfer function.
2. A saturating linear transfer function.
3. A hyperbolic tangent sigmoid transfer function
Answer:
First calculate the net input :
−5
𝑛 = 𝐖𝐩 + 𝑏 = [3 2] [ ] + (1.2) = −1.8
6
Now find the outputs for each of the transfer functions.
−1 𝑛<0
1. 𝑓(𝑛) = {
1 𝑛≥0
𝑎 = 𝑓(−1.8) = −1
0 𝑛<0
2. 𝑓(𝑛) = { 𝑛 0≤𝑛≤1
1 𝑛>1
𝑎 = 𝑓(−1.8) = 0
𝑒𝑛−𝑒−𝑛
3. 𝑓(𝑛) =
𝑒𝑛+𝑒−𝑛
𝑎 = 𝑓(−1.8) = −0.9468
16
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Example 3.2:
A single-layer neural network is to have six inputs and two outputs. The outputs are to
be limited to and continuous over the range 0 to 1. What can you tell about the network
architecture? Specifically:
1. How many neurons are required?
2. What are the dimensions of the weight matrix?
3. What kind of transfer functions could be used?
4. Is a bias required?
Answer:
1. Two neurons, one for each output, are required.
2. The weight matrix has two rows corresponding to the two neurons and six columns
corresponding to the six inputs. (The product 𝐖𝐩 is a two-element vector).
3. The transfer functions is Logistic (Log-Sigmoid ) would be most appropriate.
4. Not enough information is given to determine if a bias is required.
3.1.1 Classification:
Classification is the process of classifying input into groups. For example, an insurance
company may want to classify insurance applications into different risk categories, or an online
organization may want its email system to classify incoming mail into groups of spam and non-
spam messages.
17
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Often, the neural network is trained by presenting it with a sample group of data and
instructions as to which group each data element belongs. This allows the neural network to
learn the characteristics that may indicate group membership.
3.1.2 Prediction
Prediction is another common application for neural networks. Given a time-based series
of input data, a neural network will predict future values. The accuracy of the guess will be
dependent upon many factors, such as the quantity and relevancy of the input data. For
example, neural networks are commonly applied to problems involving predicting movements
in financial markets.
Pattern recognition is one of the most common uses for neural networks. Pattern
recognition is a form of classification. Pattern recognition is simply the ability to recognize a
pattern. The pattern must be recognized even when it is distorted. Consider the following
everyday use of pattern recognition.
Every person who holds a driver’s license should be able to accurately identify a traffic
light. This is an extremely critical pattern recognition procedure carried out by countless drivers
every day. However, not every traffic light looks the same, and the appearance of a particular
traffic light can be altered depending on the time of day or the season. In addition, many
variations of the traffic light exist. Still, recognizing a traffic light is not a hard task for a human
driver.
How hard is it to write a computer program that accepts an image and tells you if it is a
traffic light? Without the use of neural networks, this could be a very complex task. Most
common programming algorithms are quickly exhausted when presented with a complex
pattern recognition problem.
3.1.4 Optimization
Another common use for neural networks is optimization. Optimization can be applied
to many different problems for which an optimal solution is sought. The neural network may
not always find the optimal solution; rather, it seeks to find an acceptable solution.
Optimization problems include circuit board assembly, resource allocation, and many
others.
18
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Perhaps one of the most well-known optimization problems is the traveling salesman
problem (TSP). A salesman must visit a set number of cities. He would like to visit all cities
and travel the fewest number of miles possible. With only a few cities, this is not a complex
problem. However, with a large number of cities, brute force methods of calculation do not
work nearly as well as a neural network approach.
𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟐
0 0 0
0 1 0
1 0 0
1 1 1
Table 3.1 The AND Logical Operation (Binary)
A simple neural network can be created that recognizes the AND logical operation.
This network will contain two inputs and one neuron (perceptron) with threshold as the transfer
function. A neural network that recognizes the AND logical operation is shown in Figure 3.1.
A
𝑝1 𝑝B
2
There are two inputs to the network
shown in Figure 3.1. Each input has a weight
of one. The threshold is T = 1.5. Therefore, a
neuron will only fire (𝑜𝑢𝑡𝑝𝑢𝑡 = 1) if both inputs
are true. If either input is false, the sum of the two
inputs will not exceed the threshold of T = 1.5
(𝑜𝑢𝑡𝑝𝑢𝑡 = 0).
Figure 3.1 A neural network that recognizes
0 𝑛 < 1.5 the AND logical operation.
𝑓(𝑛) = {
1 𝑛 ≥ 1.5
19
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
To find the separating line, we need to find the slope and the interceptions of the line i.e.
the points (0, 1.5) , (1.5,0):
T 𝑤1,1 𝑝1
𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 = T ⟹ 𝑝2 = −
𝑤1,2 𝑤1,2
20
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
1.5−0
𝑚= = −1
0−1.5
T
= 1.5 ⟹ T = 1.5 𝑤1,2
𝑤1,2
Now, Table 3.2 shows the truth table for the AND logical operation as bipolar
representation.
𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟐
−1 −1 −1
−1 1 −1
1 −1 −1
1 1 1
Table 3.2 The AND Logical Operation (Bipolar)
21
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Another way to determine the values for 𝑤1,1 , 𝑤1,2 & T by finding a line separating the 3 “−1”
from the “1” , one possible decision boundary for this function is shown in Figure 3.3.
So, the equation for the separating line in Figure 3.3 is:
1−0
𝑚= = −1
0−1
𝑦 = 𝑚(𝑥 − 𝑥1 ) + 𝑦1 = −𝑥 + 1
T 𝑤1,1 𝑝1
As we see previously 𝑝2 = −
𝑤1,2 𝑤1,2
𝑤1,1
∴ 𝑤1,2
= 1 ⟹ 𝑤1,1 = 𝑤1,2
T
= 1 ⟹ T = 𝑤1,2
𝑤1,2
22
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
𝑏 𝑤1,1 𝑝1
⟹ 𝑝2 = − −
𝑤1,2 𝑤1,2
If 𝑤1,1 = 𝑤1,2 = 1 ⟹ 𝑏 = −1
𝒑𝟏 𝒑𝟐 𝒑𝟏 OR 𝒑𝟐
0 0 0
0 1 1
1 0 1
1 1 1
Table 3.3 The OR Logical Operation (Binary)
The neural network that will recognize the OR operation is shown in Figure 3.5.
The OR neural network looks very similar to
A
𝑝1 𝑝B
2
the AND neural network. The biggest difference is
the threshold value. Because the threshold is lower,
only one of the inputs needs to have a value of true T = 0.9
𝑤1,1
∴ 𝑤1,2
= 1
⟹ 𝑤1,1 = 𝑤1,2
T
= 0.5
𝑤1,2
⟹ T = 0.5 𝑤1,2
If 𝑤1,1 = 𝑤1,2 = 1
Figure 3.6 Truth Table (Binary)
⟹ T = 0.5
Table 3.4 shows the truth table for the OR logical operation as bipolar representation.
𝒑𝟏 𝒑𝟐 𝒑𝟏 OR 𝒑𝟐
−1 −1 −1
−1 1 1
1 −1 1
1 1 1
Table 3.4 The OR Logical Operation (Bipolar)
24
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Another way to determine the values for 𝑤1,1 , 𝑤1,2 & T by finding a line separating the 3 “−1”
from the “1” , one possible decision boundary for this function is shown in Figure 3.7.
So, the equation for the separating line in Figure 3.7 is:
−1 − 0
𝑚= = −1
0+1
𝑦 = 𝑚(𝑥 − 𝑥1 ) + 𝑦1 = −𝑥 − 1
T 𝑤1,1 𝑝1
As we see previously 𝑝2 = −
𝑤1,2 𝑤1,2
25
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
𝑤1,1
∴ 𝑤1,2
= 1 ⟹ 𝑤1,1 = 𝑤1,2
T
= −1 ⟹ T = − 𝑤1,2
𝑤1,2
If 𝑤1,1 = 𝑤1,2 = 1 ⟹ T = −1
From above we see that the neuron (perceptron ) with two inputs no bias can satisfy the
truth table for OR logic by changing the boundary for threshold (hard limit ) transfer function
to the value T .
If we don’t want to change the boundary for the function
0 𝑛<0
𝑓(𝑛) = {
1 𝑛≥0
In this situation, we need to add a bias to the neuron as in Figure 3.8.
Then, the input (Table 3.4 ) to the neuron is 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + 𝑏
The decision line is
𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + 𝑏 = 0 A1
𝑝 𝑝B
2
𝑏 𝑤1,1 𝑝1
⟹ 𝑝2 = − −
𝑤1,2 𝑤1,2
26
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
𝑎 = 1 if 𝑤1,1 𝑝1 > T
𝑎 = 0 if 𝑤1,1 𝑝1 < T
𝑝1 −0.5 T = −0.1
Apply truth Table 3.5, we get
0>T Figure 3.9 A neural network that recognizes
𝑤1,1 < T the NOT logical operation.
𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟏 NAND 𝒑𝟐
𝒑𝟐
0 0 0 1
0 1 0 1
1 0 0 1
1 1 1 0
Table 3.6 The NAND Logical Operation (Binary)
𝑤1,1 > T
𝑤1,1 + 𝑤1,2 < T
Let 𝑤1,1 = −0.5 , 𝑤1,2 = − 0.5 & T = −0.8 ( which satisfy the inequalities).
27
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
𝑤1,1 < T
𝑤1,1 + 𝑤1,2 < T
Let 𝑤1,1 = −0.5 , 𝑤1,2 = − 0.5 & T = −0.1 ( which satisfy the inequalities).
𝒑𝟏 𝒑𝟐 𝒑𝟏 XOR 𝒑𝟐
0 0 0
0 1 1
1 0 1
1 1 0
Table 3.8 The XOR Logical Operation
28
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
1.5
0.5
p1
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-0.5
-1
-1.5
-2
p2
It is easy see from Figure 3.12 that there is no straight line can separate the points into
two groups.
The XOR logical operation requires a slightly more complex neural network than the
AND and OR operators. The neural networks presented so far have had only one neuron
(perceptron) with one or two inputs. More complex neural networks also include one or more
hidden layers. The XOR operator requires a hidden layer.
Figure 3.13 shows a two-layer neural network that can be used to recognize the XOR
operator.
𝑝1 𝑝2
29
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Homework 3.1:
Solve the examples 3.5, 3.6 and 3.7 by using bipolar representation.
30
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Where 𝑝𝑗𝑞 is the 𝑗𝑡ℎ element of the 𝑞 𝑡ℎ input vector 𝐩𝑞 , 𝑎𝑖𝑞 is the 𝑖 𝑡ℎ element of the
network output when the 𝑞 𝑡ℎ input vector is presented to the network, and 𝛼 is a positive
constant (0 < 𝛼 ≤ 1), called the learning rate.
Note: The choice of the value of learning rate is important when we implement a neural
network. A large learning rate corresponds to rapid learning but might also result in oscillations.
The Hebb rule defined in Eq. 4.1 is an unsupervised learning rule. It does not require any
information concerning the target output.
The supervised Hebb rule substitute the target (desired) output with the neuron output.
The weight update can be written as:
𝑤𝑖𝑗𝑛𝑒𝑤 = 𝑤𝑖𝑗𝑜𝑙𝑑 + 𝑡𝑖𝑞 𝑝𝑗𝑞 …………. (4.2)
31
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Example 4.1:
A Hebb net for AND function: binary case
Input Target
𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟐
1 1 1
0 1 0
1 0 0
0 0 0
Table 4.1 The AND Logical Operation (Binary)
𝑏 𝑤1,1 𝑝1
Recall equation 𝑝2 = − −
𝑤1,2 𝑤1,2
32
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Because the target value is 0, no learning occurs. Thus, using binary target values
prevents the net from learning any pattern for which the target is “off”.
The choice of training patterns can play a significant role in determining which problems
can be solved using the Hebb rule. The next example shows that the AND function can be
solved if we modify its representation to bipolar form.
Example 4.2:
A Hebb net for AND function: bipolar case
Input Target
𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟐
1 1 1
1 −1 −1
−1 1 −1
−1 −1 −1
Table 4.2 The AND Logical Operation (Bipolar)
33
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
1.5
0
-1.5 -1 -0.5 -0.5 0 0.5 1 1.5
training pair. -1
-1.5
p2
Figure 4.4 shows that the response of the net after third training pair.
Even though the weights have changed, the separating line is still
𝑝2 = −𝑝1 + 1
35
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Example 4.3:
0.5 0.5
−0.5 1 0.5 1
Let 𝐩1 = [ ] , 𝐭1 = [ ] and 𝐩2 = [ ] , 𝐭 2 = [ ]. Use Hebbian learning rule
0.5 −1 −0.5 1
−0.5 −0.5
to train the neural network. ( Don’t use bias)
Which mean we need to train the network again to get the right output. (i.e. repeat the
above calculation for the pattern (𝐩2 , 𝐭 2 ) ).
36
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Recall from section 2.2.2 the three-layer network in abbreviated notation is shown in
Figure 4.5.
37
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
As we discussed earlier, for multilayer networks the output of one layer becomes the
input to the following layer. The equations that describe this operation are
𝐚𝑚+1 = 𝐟 𝑚+1 (𝐖 𝑚+1 𝐚𝑚 + 𝐛𝑚+1 ) 𝑓𝑜𝑟 𝑚 = 0,1, … , 𝑀 − 1 ……….. (4.4)
where 𝑀 is the number of layers in the network.
Step 4: Update the weights and biases are updated using the approximate steepest descent rule:
𝐖 𝑚 (𝑘 + 1) = 𝐖 𝑚 (𝑘) − 𝛼 𝐬 𝑚 (𝐚𝑚−1 )𝑇
𝐛𝑚 (𝑘 + 1) = 𝐛𝑚 (𝑘) − 𝛼 𝐬 𝑚
38
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Example 4.4:
Consider the following network shown in Figure 4.6
39
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Now we are ready to start the algorithm. The training points can be presented in any
order, but they are often chosen randomly. For our initial input we will choose 𝑝 = 1 , which
is the 16𝑡ℎ training point:
𝑎0 = 𝑝 = 1
−0.27 −0.48 −0.75
𝐚1 = 𝐟1 (𝐖1 𝑎0 + 𝐛1 ) = 𝐥𝐨𝐠𝐬𝐢𝐠 ([ ] [1] + [ ]) = 𝐥𝐨𝐠𝐬𝐢𝐠 ([ ])
−0.41 −0.13 −0.54
1
0.75
= [1+𝑒1 ] = [0.321]
0.368
1+𝑒 0.54
The next stage of the algorithm is to backpropagate the sensitivities. Before we begin the
backpropagation, recall that we will need the derivatives of the transfer functions, 𝑓 ′1 (𝑛) and
𝑓 ′2 (𝑛). For the first layer
𝑑 1 𝑒 −𝑛 1 1
𝑓 ′1 (𝑛) = ( −𝑛 ) = (1+𝑒 −𝑛 )2 = (1 − −𝑛 ) ( ) = (1 − 𝑎1 )(𝑎1 ).
𝑑𝑛 1+𝑒 1+𝑒 1+𝑒 −𝑛
40
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
(1 − 0.321)(0.321) 0 0.09
=[ ][ ] [−2.522]
0 (1 − 0.368)(0.368) −0.17
=[
0.218 0 −0.227
][ ]=[
−0.0495
]
0 0.233 0.429 0.0997
The final stage of the algorithm is to update the weights. For simplicity, we will use a
learning rate 𝛼 = 0.1. We have
𝐖 2 (1) = 𝐖 2 (0) − 𝛼s 2 (𝐚1 )𝑇 = [0.09 − 0.17] − (0.1)[−2.522][0.321 0.368]
= [0.171 − 0.0772]
𝐛2 (1) = 𝐛2 (0) − 𝛼s 2 = [0.48] − (0.1)[−2.522] = [0.732]
−0.27 −0.0495 [1] −0.265
𝐖1 (1) = 𝐖1 (0) − 𝛼𝐬1 (𝐚0 )𝑇 = [ ] − (0.1) [ ] =[ ]
−0.41 0.0997 −0.420
−0.48 −0.0495 −0.475
𝐛1 (1) = 𝐛1 (0) − 𝛼𝐬1 = [ ] − (0.1) [ ]=[ ]
−0.13 0.0997 −0.140
This completes the first iteration of the backpropagation algorithm. We next proceed to
randomly choose another input from the training set and perform another iteration of the
algorithm. We continue to iterate until the difference between the network response and the
target function reaches some acceptable level.
Example 4.5:
Use Backpropagation Algorithm to train the following network shown in Figure 4.8
𝑝1 𝑝2 𝑡
0.4 −0.7 0.1
0.3 −0.5 0.05
0.6 0.1 0.3
0.2 0.4 0.25
0.1 −0.2 0.12
0.4
Let 𝐚0 = [ ]
−0.7
The first layer output is
0.1 −0.2 0.4 0.18
𝐚1 = 𝐟1 (𝐖1 𝒂0 + 𝐛1 ) = 𝐥𝐨𝐠𝐬𝐢𝐠 ([ ][ ]) = 𝐥𝐨𝐠𝐬𝐢𝐠 ([ ])
0.4 0.2 −0.7 0.02
1
−0.18
= [1+𝑒1 ] = [0.5448]
0.505
1+𝑒 −0.02
(1 − 0.5448)(0.5448) 0 0.2
=[ ][
] [−0.18116]
0 (1 − 0.505)(0.505) −0.5
42
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
=[
0.248 0
][
0.2
] [−0.18116] = [
0.0496
] [−0.18116] = [
−0.009
]
0 0.25 −0.5 −0.125 0.023
This completes the first iteration of the backpropagation algorithm. We next proceed to
randomly choose another input from the training set and perform another iteration of the
algorithm. We continue to iterate until the difference between the network response and the
target function reaches some acceptable level.
Summary:
The above example 4.5 can be summarize in following Figure:
43
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
The Bias:
Some networks employ a bias unit as part of every layer except the output layer. These
units have a constant activation value of 1 or −1, its weight might be adjusted during learning.
The bias unit provides a constant term in the weighted sum which results in an improvement
on the convergence properties of the network.
Biases are almost always helpful. In effect, a bias value allows you to shift the
activation (transfer) function to the left or right, which may be critical for successful
learning.
It might help to look at a simple example. Consider this 1-input, 1-output network that
has no bias:
The output of the network is computed by multiplying the input 𝑝 by the weight 𝑤0 and
passing the result through some kind of activation function (e.g. a sigmoid function.)
Here is the function that this network computes, for various values of 𝑤0 :
Changing the weight 𝑤0 essentially changes the "steepness" of the sigmoid. That’s
useful, but what if you wanted the network to output 0 when 𝑝 is 2. Just changing the steepness
of the sigmoid won’t really work “ you want to be able to shift the entire curve to the right”.
That's exactly what the bias allows you to do. If we add a bias to that network, like so:
44
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
Then the output of the network becomes 𝑠𝑖𝑔(𝑤0 ∗ 𝑝 + 𝑤1 ∗ 1.0). Here is what the
output of the network looks like for various values of 𝑤1 :
A bias acts exactly as a weight on a connection from a unit whose activation is always
1. Increasing the bias increases the net input to the unit. If a bias is included, the input to the
activation function is:
𝑛 = 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + … + 𝑤1,𝑅 𝑝𝑅 + 𝑏
Hopfield Network:
5.1 Discrete Hopfield Network:
The Hopfield neural network is perhaps the simplest type of neural network. As shown
in Figure 5.1, a Hopfield network consists of a single layer of neurons, 1, 2, . . . , 𝑠. The network
is fully interconnected; that is, every neuron in the network is connected to every other
neuron. The network is recurrent; that is, it has feedforward/feedbackward capabilities,
which means input to the neurons comes from external input as well as from the neurons
themselves internally. Also, the network is autoassociative which means that if the neural
network recognizes a pattern, it will return that pattern (i.e. input 𝐩 = target 𝐭).
The Hopfield network is classified under supervised learning.
45
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
0 𝑓𝑜𝑟 𝑖 = 𝑗
𝑤𝑖𝑗 = { 𝑞 ……. (5.1)
∑𝑟=1 𝑝𝑖𝑟 𝑝𝑗𝑟 𝑓𝑜𝑟 𝑖 ≠ 𝑗
Or, in matrix notation
𝐩1
𝐩2
Let 𝐏 = [ ⋮ ] ⟹ 𝐖 = 𝐏𝑇 𝐏
𝐩𝑞
If the resulting 𝐖 has diagonal 𝑤𝑖𝑖 ≠ 0, we can change the diagonal such that 𝑤𝑖𝑖 = 0.
𝐸 always decreases when 𝑎𝑖 changes the value, and 𝐸 stays the same when there is no
change in 𝑎𝑖 's. The term "energy" here represents a measure that reflects the state of the
solution, and is somewhat analogous to the concept of physical energy.
47
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
1. Synchronous updates: all units compute their activations and then update their states
simultaneously.
2. Asynchronous updates: one unit at a time computes its activation and updates its state. The
sequence of selected units can be a fixed or a random one.
Example 5.1:
−1 1 −1
1 −1 −1
Let 𝐩1 = [ ] , 𝐩2 = [ ], 𝐩3 = [ ] be examples pattern. Use Hopfield Network to
−1 1 −1
−1 −1 1
1
1
recognize the unknown pattern [ ].
1
−1
Solution:
𝐩1
𝐩2
𝐩3
Unknown Pattern
Figure 5.3 Hopfield Network (4 neurons)
−1 1 −1 −1
𝐏 = [ 1 −1 1 − 1]
−1 −1 −1 1
48
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
−1 1 −1 3 −1 3 −1
−1 1 −1 −1
1 −1 −1 −1 3 −1 −1
𝐖=[ ][ 1 −1 1 − 1] = [ ]
−1 1 −1 3 −1 3 −1
−1 −1 −1 1
−1 − 1 1 −1 −1 −1 3
Since must 𝑤𝑖𝑖 = 0, we get
0 −1 3 −1
−1 0 −1 −1
𝐖=[ ]
3 −1 0 −1
−1 −1 −1 0
Synchronous updates:
Iteration 1:
1
1
Let 𝐚𝑜𝑙𝑑 =[ ]
1
−1
0 −1 3 −1 1 3
−1 0 −1 −1 1 −1
𝐖𝐚𝑜𝑙𝑑 =[ ][ ] = [ ]
3 −1 0 −1 1 3
−1 −1 −1 0 −1 −3
3 1
−1 −1
𝐚𝑛𝑒𝑤 = 𝑓(𝐖𝐚𝑜𝑙𝑑 ) = 𝑓 ([ ]) = [ ]
3 1
−3 −1
49
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
0 −1 3 −1 1 3
1 −1 0 −1 −1 1 1 −1 8
𝐸1 = − [1 1 1 − 1] [ ] [ ] = − [1 1 1 − 1] [ ] = − = −4
2 3 −1 0 −1 1 2 3 2
−1 −1 −1 0 −1 −3
0 −1 3 −1 1 5
1 −1 0 − 1 − 1 −1 1 −1 12
𝐸2 = − [1 − 1 1 − 1] [ ] [ ] = − [1 − 1 1 − 1] [ ] = −
2 3 −1 0 −1 1 2 5 2
−1 − 1 − 1 0 −1 −1
= −6
Iteration 2:
1
−1
Now, 𝐚𝑜𝑙𝑑 =[ ]
1
−1
0 −1 3 −1 1 5
−1 0 − 1 − 1 −1 −1
𝐖𝐚𝑜𝑙𝑑 =[ ][ ] = [ ]
3 −1 0 −1 1 5
−1 −1 −1 0 −1 −1
5 1
−1 −1
𝐚𝑛𝑒𝑤 = 𝑓(𝐖𝐚𝑜𝑙𝑑 ) = 𝑓 ([ ]) = [ ]
5 1
−1 −1
Since 𝐚𝑛𝑒𝑤 in the second iteration does not change we must stop and 𝐚𝑛𝑒𝑤 is the solution.
Asynchronous updates:
1
1
𝑎1 = 𝑓 ([0 −1 3 − 1] [ ]) = 𝑓(3) = 1
1
−1
50
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures
1
1
𝑎2 = 𝑓 ([−1 0 − 1 − 1] [ ]) = 𝑓(−1) = −1
1
−1
1
1
𝑎3 = 𝑓 ([3 −1 0 − 1] [ ]) = 𝑓(3) = 1
1
−1
1
1
𝑎4 = 𝑓 ([−1 − 1 − 1 0] [ ]) = 𝑓(−3) = −1
1
−1
51