0% found this document useful (0 votes)
79 views51 pages

Neural Networks

This document provides an overview of artificial neural networks (ANN). It discusses the history of ANN, beginning in the 1940s and covering major developments in the field. It then describes some common applications of ANN today, including in areas like aerospace, banking, defense, electronics, finance, healthcare, manufacturing, and more. Finally, it discusses how ANN are biologically inspired by neurons in the human brain and provides some basic concepts about neuron models and network architectures.

Uploaded by

elamein osman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views51 pages

Neural Networks

This document provides an overview of artificial neural networks (ANN). It discusses the history of ANN, beginning in the 1940s and covering major developments in the field. It then describes some common applications of ANN today, including in areas like aerospace, banking, defense, electronics, finance, healthcare, manufacturing, and more. Finally, it discusses how ANN are biologically inspired by neurons in the human brain and provides some basic concepts about neuron models and network architectures.

Uploaded by

elamein osman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

University of Baghdad 3rd Class

College of Science Dr. Najlaa Mohammed 2015


Dept. Of Computer Science Neural Networks Lectures

Artificial Neural Networks (ANN)

1. Introduction:
1.1 History:
The history of artificial neural networks is filled with colorful, creative individuals from
a variety of fields, many of whom struggled for decades to develop concepts that we now take
for granted.
The modern view of neural networks began in the 1940s with the work of Warren
McCulloch and Walter Pitts, who showed that networks of artificial neurons could, in principle,
compute any arithmetic or logical function. Their work is often acknowledged as the origin of
the neural network field.
The first practical application of artificial neural networks came in the late 1950s, with
the invention of the perceptron network and associated learning rule by Frank Rosenblatt.
Rosenblatt and his colleagues built a perceptron network and demonstrated its ability to
perform pattern recognition. At about the same time, Bernard Widrow and Ted Hoff introduced
a new learning algorithm and used it to train adaptive linear neural networks, which were
similar in structure and capability to Rosenblatt’s perceptron. The Widrow-Hoff learning rule
is still in use today.
Interest in neural networks had faltered during the late 1960s because of the lack of new
ideas and powerful computers with which to experiment. During the 1980s both of these
impediments were overcome, and research in neural networks increased dramatically. New
personal computers and workstations, which rapidly grew in capability, became widely
available. In addition, important new concepts were introduced.
Two new concepts were most responsible for the rebirth of neural networks. The first
was the use of statistical mechanics to explain the operation of a certain class of recurrent
network, which could be used as an associative memory.
The second key development of the 1980s was the backpropagation algorithm for
training multilayer perceptron networks, which was discovered independently by several
different researchers.
Many of the advances in neural networks have had to do with new concepts, such as
innovative architectures and training rules. Just as important has been the availability of
powerful new computers on which to test these new concepts.

1
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Neural networks have clearly taken a permanent place as important


mathematical/engineering tools. They don’t provide solutions to every problem, but they are
essential tools to be used in appropriate situations.

1.2 Applications:
Google uses neural networks for image tagging (automatically identifying an image and
assigning keywords), and Microsoft has developed neural networks that can help convert
spoken English speech into spoken Chinese speech. These examples are indicative of the broad
range of applications that can be found for neural networks. The applications are expanding
because neural networks are good at solving problems, not just in engineering, science and
mathematics, but in medicine, business, finance and literature as well. Their application to a
wide variety of problems in many fields makes them very attractive. Also, faster computers
and faster algorithms have made it possible to use neural networks to solve complex industrial
problems that formerly required too much computation.
The following list are some of neural network applications:
1. Aerospace :
High performance aircraft autopilots, flight path simulations, aircraft control
systems, autopilot enhancements, aircraft component simulations, aircraft
component fault detectors.
2. Automotive:
Automobile automatic guidance systems, fuel injector control, automatic braking
systems, misfire detection, virtual emission sensors, warranty activity analyzers.
3. Banking:
Check and other document readers, credit application evaluators, cash forecasting,
firm classification, exchange rate forecasting, predicting loan recovery rates,
measuring credit risk.
4. Defense:
Weapon steering, target tracking, object discrimination, facial recognition, new kinds
of sensors, sonar, radar and image signal processing including data compression,
feature extraction and noise suppression, signal/image identification.
5. Electronics:
Code sequence prediction, integrated circuit chip layout, process control, chip failure
analysis, machine vision, voice synthesis, nonlinear modeling.
6. Entertainment:
Animation, special effects, market forecasting.
7. Financial:
Real estate appraisal, loan advisor, mortgage screening, corporate bond rating, credit
line use analysis, portfolio trading program, corporate financial analysis, currency
price prediction.
2
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

8. Insurance: Policy application evaluation, product optimization.


9. Manufacturing:
Manufacturing process control, product design and analysis, process and machine
diagnosis, real-time particle identification, visual quality inspection systems, beer
testing, welding quality analysis, paper quality prediction, computer chip quality
analysis, analysis of grinding operations, chemical product design analysis, machine
maintenance analysis, project bidding, planning and management, dynamic
modeling of chemical process systems.
10. Medical:
Breast cancer cell analysis, EEG and ECG analysis, prosthesis design, optimization
of transplant times, hospital expense reduction, hospital quality improvement,
emergency room test advisement.
11. Oil and Gas:
Exploration, smart sensors, reservoir modeling, well treatment decisions, seismic
interpretation.
12. Robotics:
Trajectory control, forklift robot, manipulator controllers, vision systems,
autonomous vehicles.
13. Speech:
Speech recognition, speech compression, vowel classification, text to speech
synthesis.
14. Securities:
Market analysis, automatic bond rating, stock trading advisory systems.
15. Telecommunications:
Image and data compression, automated information services, real-time translation
of spoken language, customer payment processing systems.
16. Transportation:
Truck brake diagnosis systems, vehicle scheduling, routing systems.

1.3 Biological Inspiration:


In this section we will briefly describe the characteristics of brain function that have
inspired the development of artificial neural networks.
The brain consists of a large number (approximately 1011 ) ) of highly connected
elements (approximately 104 connections per element) called neurons. For our purposes these
neurons have three principal components: the dendrites, the cell body and the axon. The
dendrites are tree-like receptive networks of nerve fibers that carry electrical signals into the
cell body. The cell body effectively sums and thresholds these incoming signals. The axon is a

3
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

single long fiber that carries the signal from the cell body out to other neurons. The point of
contact between an axon of one cell and a dendrite of another cell is called a synapse. It is the
arrangement of neurons and the strengths of the individual synapses, determined by a complex
chemical process, that establishes the function of the neural network. Figure 1.1 is a simplified
schematic diagram of two biological neurons.

Figure 1.1: Schematic Drawing of Biological


Neurons
Artificial neural networks do not approach the complexity of the brain. There are,
however, two key similarities between biological and artificial neural networks. First, the
building blocks of both networks are simple computational devices (although artificial neurons
are much simpler than biological neurons) that are highly interconnected. Second, the
connections between neurons determine the function of the network.

Neuron Model and Network Architectures:


Notation: Scalars — small italic letters: a, b, c
Vectors — small bold non italic letters: a, b, c
Matrices — capital BOLD non italic letters: A, B, C

2.1 Neuron Model:


2.1.1 Single-Input Neuron:
A single-input neuron is shown in Figure 2.1. The scalar input 𝑝 is multiplied by the
scalar weight 𝑤 to form , one of the terms that is sent to the summer. The other input, 1 , is
multiplied by a bias (offset) 𝑏 and then passed to the summer. The summer output 𝑛 , often

4
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

referred to as the net input, goes into a transfer function (activation function) 𝑓 , which
produces the scalar neuron output .
If we relate this simple model back to the biological neuron that we discussed in section
1.3, the weight 𝑤 corresponds to the strength of a synapse, the cell body is represented by the
summation and the transfer function, and the neuron output 𝑎 represents the signal on the axon.

Figure 2.1: Single-Input Neuron


The neuron output is calculated as
𝑎 = 𝑓(𝑤𝑝 + 𝑏)
Example 2.1:
Let 𝑤 = 3 , 𝑝 = 2 and 𝑏 = – 1.5, what is the single-input neuron output ?
𝑎 = 𝑓(3 ∗ 2 + (−1.5)) = 𝑓(4.5)
The actual output depends on the particular transfer function that is chosen.
Notes:
1. The bias is much like a weight, except that it has a constant input of 1. However, if you
do not want to have a bias in a particular neuron, it can be omitted.
2. Note that 𝑤 and 𝑏 are both adjustable scalar parameters of the neuron. Typically the
transfer function is chosen by the designer and then the parameters 𝑤 and 𝑏 will be
adjusted by some learning rule so that the neuron input/output relationship meets some
specific goal.

5
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

2.1.2 Transfer Functions:


The transfer function in Figure 2.1 may be a linear or a nonlinear function of . A
particular transfer function is chosen to satisfy some specification of the problem that the
neuron is attempting to solve.
There are variety of transfer functions some of them are listed below:
1. Threshold ( Hard Limit ) transfer function:

0 𝑛<0
𝑓(𝑛) = { ………. (2.1)
1 𝑛≥0

This function will used to create neurons that classify inputs into two distinct categories

2. Linear transfer function:

𝑓(𝑛) = 𝑛 ………….. (2.2)

This transfer function are used in the ADALINE networks.

6
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

3. Logistic (Log-Sigmoid ) transfer function:


Logistic function is a standard sigmoid function and is defined by
1
𝑓(𝑛) = 1+𝑒 −𝑛 ………….. (2.3)

The derivative of 𝑓 is defined by 𝑓 ′ = 𝑓 (1 − 𝑓)

The log-sigmoid transfer function is commonly used in multilayer networks that are
trained using the backpropagation algorithm.
4. Hyperbolic tangent transfer function:
The hyperbolic tangent is a sigmoid function and is defined by
𝑒 𝑛 −𝑒 −𝑛
𝑓(𝑛) = ………….. (2.4)
𝑒 𝑛 +𝑒 −𝑛

The derivative of 𝑓 is defined by 𝑓 ′ = (1 − 𝑓 2 )

Figure 2.5 Hyperbolic tangent transfer function

7
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

𝑛
tanh( 2 )+1 1
Since = then using the tanh function instead of the logistic one is
2 1+𝑒−𝑛
equivalent. The tanh function has the advantage of being symmetrical with respect to the
origin.
5. Radial Basis transfer function:

2
𝑓 = e−𝑛 ………….. (2.5)

Figure 2.6 Radial Basis transfer function

Most of the transfer functions used are summarized in Table 2.1.

8
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Example 2.2:
Let = 4 , 𝑝 = 2 and 𝑏 = – 2 with 𝑓 radial basis, what is the single neuron output ?
2
𝑓 = e−𝑛
2 2
𝑎 = 𝑒 −(4∗2+(−2)) = 𝑒 −(6) = 2.31952E − 16

2.1.3 Multiple-Input Neuron:


Typically, a neuron has more than one input. A neuron with 𝑅 inputs is shown in Figure
2.5. The individual inputs 𝐩 = (𝑝1 , 𝑝2 , 𝑝3 , … , 𝑝𝑅 ) are each weighted by corresponding
elements 𝑤1,1 , 𝑤1,2 , … , 𝑤1,𝑅 of the weight matrix 𝐖.

9
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Figure 2.7 Multiple-Input Neuron


The neuron has a bias , which is summed with the weighted inputs to form the net input 𝑛 :
𝑛 = 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + … + 𝑤1,𝑅 𝑝𝑅 + 𝑏 ……….. (2.4)
This expression can be written in matrix form:
𝑛 = 𝐖𝐩 + 𝑏 ……….. (2.5)
where the matrix 𝐖 for the single neuron case has only one row.
Now the neuron output can be written as:
𝑎 = 𝑓(𝐖𝐩 + 𝑏) ………… (2.6)
The elements of the weight matrix had indices, which are, the first index indicates the
particular neuron destination for that weight. The second index indicates the source of the
signal fed to the neuron. Thus, the indices in 𝑤1,2 say that this weight represents the connection
to the first (and only) neuron from the second source. Of course, this convention is more useful
if there is more than one neuron, as will be the case later.
We would like to draw networks with several neurons, each having several inputs.
Further, we would like to have more than one layer of neurons. You can imagine how complex
such a network might appear if all the lines were drawn. It would take a lot of ink, could hardly
be read, and the mass of detail might obscure the main features. Thus, we will use an
abbreviated notation. A multiple-input neuron using this notation is shown in Figure 2.6.

10
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Figure 2.8 Neuron with 𝑅 Inputs, Abbreviated Notation


Note that the number of inputs to a network is set by the external specifications of the
problem. If, for instance, you want to design a neural network that is to predict kite-flying
conditions and the inputs are air temperature, wind velocity and humidity, then there would be
three inputs to the network.

2.2 Network Architectures:


Commonly one neuron, even with many inputs, may not be sufficient. We might need
five or ten, operating in parallel, in what we will call a “layer”. This concept of a layer is
discussed below.

2.2.1 Single Layer network:


A single layer network of neurons is shown in Figure 2.9. Note that each of the inputs is
connected to each of the neurons and that the weight matrix now has 𝑆 rows.
The layer includes the weight matrix, the summers, the bias vector, the transfer function
boxes and the output vector.
Each element of the input vector 𝐩 is connected to each neuron through the weight
matrix 𝐖. Each neuron has a bias 𝑏𝑖 , a summer, a transfer function 𝑓 and an output 𝑎𝑖 . Taken
together, the outputs form the output vector .

11
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Figure 2.9 Single Layer of 𝑆 Neurons


You might ask if all the neurons in a layer must have the same transfer function. The
answer is no; you can define a single (composite) layer of neurons having different transfer
functions by combining two of the networks shown above in parallel. Both networks would
have the same inputs, and each network would create some of the outputs.
The input vector elements enter the network through the weight matrix 𝐖:
𝑤1,1 𝑤1,2 … 𝑤1,𝑅
𝑤2,1 𝑤2,2 ⋯ 𝑤2,𝑅
𝐖=[ ⋮ ⋱ ⋮ ]
𝑤𝑆,1 𝑤𝑆,2 ⋯ 𝑤𝑆,𝑅

As noted previously, the row indices of the elements of matrix 𝐖 indicate the
destination neuron associated with that weight, while the column indices indicate the source of
the input for that weight. Thus, the indices in 𝑤3,2 say that this weight represents the connection
to the third neuron from the second source.
Fortunately, the S-neuron, R-
input, one-layer network also can be
drawn in abbreviated notation, as
shown in Figure 2.10.

Figure 2.10 Layer of 𝑆 Neurons, Abbreviated Notation

12
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

2.2.2 Multiple Layers of Neurons:


Now consider a network with several layers. Each layer has its own weight matrix , its
own bias vector 𝐛 , a net input vector 𝐧 and an output vector 𝐚. We need to introduce some
additional notation to distinguish between these layers. We will use superscripts to identify the
layers. Specifically, we append the number of the layer as a superscript to the names for each
of these variables. Thus, the weight matrix for the first layer is written as 𝐖 𝟏 , and the weight
matrix for the second layer is written as 𝐖 𝟐 . This notation is used in the three-layer network
shown in Figure 2.11.

Figure 2.11 Three-Layer Network

As shown, there are 𝑅 inputs, 𝑆 1 neurons in the first layer, 𝑆 2 neurons in the second
layer, etc. As noted, different layers can have different numbers of neurons.
The outputs of layers one and two are the inputs for layers two and three. Thus layer 2
can be viewed as a one-layer network with 𝑅 = 𝑆 1 inputs, 𝑆 = 𝑆 2 neurons, and an 𝑆 2 × 𝑆 1
weight matrix 𝐖 𝟐 . The input to layer 2 is 𝐚𝟏 , and the output is 𝐚𝟐 .
A layer whose output is the network output is called an output layer. The other layers
are called hidden layers. The network shown above has an output layer (layer 3) and two hidden
layers (layers 1 and 2).
The same three-layer network discussed previously also can be drawn using our
abbreviated notation, as shown in Figure 2.12.

13
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Figure 2.12 Three-Layer Network, Abbreviated Notation


Multilayer networks are more powerful than single layer networks. For instance, a two-
layer network having a sigmoid first layer and a linear second layer can be trained to
approximate most functions arbitrarily well. Single- layer networks cannot do this.
As for the number of layers, most practical neural networks have just two or three layers.
Four or more layers are used rarely.
We should say something about the use of biases. One can choose neurons with or
without biases. The bias gives the network an extra variable, and so you might expect that
networks with biases would be more powerful than those without, and that is true. Note, for
instance, that a neuron without a bias will always have a net input 𝑛 of zero when the network
inputs 𝐩 are zero. This may not be desirable and can be avoided by the use of a bias.

2.2.3 Recurrent Networks:


Before we discuss recurrent networks, we need to introduce some simple building
blocks. The first is the delay block, which is illustrated in Figure 2.13.
The delay output 𝐚(𝑡) is computed from its
input 𝐮(𝑡) according to

𝐚(𝑡) = 𝐮(𝑡 − 1) ……….. (2.6)

Figure 2.13 Delay Block


14
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Thus the output is the input delayed by one time step. (This assumes that time is updated
in discrete steps and takes on only integer values.) Eq. (2.6) requires that the output be
initialized at time = 0 . This initial condition is indicated in Figure 2.13 by the arrow coming
into the bottom of the delay block.
Another related building block, which we will use for the continuous-time recurrent
networks is the integrator, which is shown in Figure 2.14.
The Integrator output 𝐚(𝑡) is computed from its
input 𝐮(𝑡) according to

𝑡
𝐚(𝑡) = ∫0 𝐮(𝜏) 𝑑𝜏 + 𝐚(0) ……….. (2.7)

Figure 2.14 Integrator Block


The initial condition 𝐚(0) is indicated by the arrow coming into the bottom of the
integrator block.
A recurrent network is a network with feedback; some of its outputs are connected to its
inputs. This is quite different from the networks that we have mentioned before, which were
strictly feedforward with no backward connections. One type of discrete-time recurrent
network is shown in Figure 2.15.
In this particular network the
vector 𝐩 supplies the initial conditions
(i.e., 𝐚(0) = 𝐩). Then future outputs of
the network are computed from previous

𝐚(1) = 𝐬𝐚𝐭𝐥𝐢𝐧𝐬(𝐖𝐚(0) + 𝐛)
𝐚(2) = 𝐬𝐚𝐭𝐥𝐢𝐧𝐬(𝐖𝐚(1) + 𝐛)
⋮ Figure 2.15 Recurrent Network

Recurrent networks are potentially more powerful than feedforward networks.

15
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

How to Pick an Architecture:


Problem specifications help define the network in the following ways:
1. Number of network inputs = number of problem inputs.
2. Number of neurons in output layer = number of problem outputs.
3. Output layer transfer function choice at least partly determined by problem
specification of the outputs.

Examples:
Example 3.1:
Given a two-input neuron with the following parameters: 𝑏 = 1.2 , 𝐖 = [3 2] and
𝐩 = [−5 6]T , calculate the neuron output for the following transfer functions:
1. A symmetrical hard limit transfer function.
2. A saturating linear transfer function.
3. A hyperbolic tangent sigmoid transfer function
Answer:
First calculate the net input :
−5
𝑛 = 𝐖𝐩 + 𝑏 = [3 2] [ ] + (1.2) = −1.8
6
Now find the outputs for each of the transfer functions.
−1 𝑛<0
1. 𝑓(𝑛) = {
1 𝑛≥0
𝑎 = 𝑓(−1.8) = −1
0 𝑛<0
2. 𝑓(𝑛) = { 𝑛 0≤𝑛≤1
1 𝑛>1
𝑎 = 𝑓(−1.8) = 0
𝑒𝑛−𝑒−𝑛
3. 𝑓(𝑛) =
𝑒𝑛+𝑒−𝑛
𝑎 = 𝑓(−1.8) = −0.9468
16
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Example 3.2:
A single-layer neural network is to have six inputs and two outputs. The outputs are to
be limited to and continuous over the range 0 to 1. What can you tell about the network
architecture? Specifically:
1. How many neurons are required?
2. What are the dimensions of the weight matrix?
3. What kind of transfer functions could be used?
4. Is a bias required?
Answer:
1. Two neurons, one for each output, are required.
2. The weight matrix has two rows corresponding to the two neurons and six columns
corresponding to the six inputs. (The product 𝐖𝐩 is a two-element vector).
3. The transfer functions is Logistic (Log-Sigmoid ) would be most appropriate.
4. Not enough information is given to determine if a bias is required.

3.1 Types of Problems:


There are many different problems that can be solved with a neural network. However,
neural networks are commonly used to address particular types of problems. The following
four types of problem are frequently solved with neural networks:
• Classification.
• Prediction.
• Pattern recognition.
• Optimization.

3.1.1 Classification:
Classification is the process of classifying input into groups. For example, an insurance
company may want to classify insurance applications into different risk categories, or an online
organization may want its email system to classify incoming mail into groups of spam and non-
spam messages.

17
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Often, the neural network is trained by presenting it with a sample group of data and
instructions as to which group each data element belongs. This allows the neural network to
learn the characteristics that may indicate group membership.

3.1.2 Prediction
Prediction is another common application for neural networks. Given a time-based series
of input data, a neural network will predict future values. The accuracy of the guess will be
dependent upon many factors, such as the quantity and relevancy of the input data. For
example, neural networks are commonly applied to problems involving predicting movements
in financial markets.

3.1.3 Pattern Recognition

Pattern recognition is one of the most common uses for neural networks. Pattern
recognition is a form of classification. Pattern recognition is simply the ability to recognize a
pattern. The pattern must be recognized even when it is distorted. Consider the following
everyday use of pattern recognition.
Every person who holds a driver’s license should be able to accurately identify a traffic
light. This is an extremely critical pattern recognition procedure carried out by countless drivers
every day. However, not every traffic light looks the same, and the appearance of a particular
traffic light can be altered depending on the time of day or the season. In addition, many
variations of the traffic light exist. Still, recognizing a traffic light is not a hard task for a human
driver.
How hard is it to write a computer program that accepts an image and tells you if it is a
traffic light? Without the use of neural networks, this could be a very complex task. Most
common programming algorithms are quickly exhausted when presented with a complex
pattern recognition problem.

3.1.4 Optimization

Another common use for neural networks is optimization. Optimization can be applied
to many different problems for which an optimal solution is sought. The neural network may
not always find the optimal solution; rather, it seeks to find an acceptable solution.
Optimization problems include circuit board assembly, resource allocation, and many
others.

18
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Perhaps one of the most well-known optimization problems is the traveling salesman
problem (TSP). A salesman must visit a set number of cities. He would like to visit all cities
and travel the fewest number of miles possible. With only a few cities, this is not a complex
problem. However, with a large number of cities, brute force methods of calculation do not
work nearly as well as a neural network approach.

The following simple examples will illustrate classification problems.

Example 3.3: The AND Operation


We will now look at a neural network that acts as an AND gate. Table 3.1 shows
the truth table for the AND logical operation.

𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟐
0 0 0
0 1 0
1 0 0
1 1 1
Table 3.1 The AND Logical Operation (Binary)

A simple neural network can be created that recognizes the AND logical operation.
This network will contain two inputs and one neuron (perceptron) with threshold as the transfer
function. A neural network that recognizes the AND logical operation is shown in Figure 3.1.

A
𝑝1 𝑝B
2
There are two inputs to the network
shown in Figure 3.1. Each input has a weight
of one. The threshold is T = 1.5. Therefore, a
neuron will only fire (𝑜𝑢𝑡𝑝𝑢𝑡 = 1) if both inputs
are true. If either input is false, the sum of the two
inputs will not exceed the threshold of T = 1.5
(𝑜𝑢𝑡𝑝𝑢𝑡 = 0).
Figure 3.1 A neural network that recognizes
0 𝑛 < 1.5 the AND logical operation.
𝑓(𝑛) = {
1 𝑛 ≥ 1.5

19
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

𝑎=1 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 > T


𝑎=0 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 < T
We make use of the truth table (3.1) and the following inequalities are obtained.
0<T
𝑤1,2 < T
𝑤1,1 < T
𝑤1,1 + 𝑤1,2 > T
The task is to determine the values for 𝑤1,1 , 𝑤1,2 & T so that the output satisfies the
logic AND function.
We can choose 𝑤1,1 = 0.5 , 𝑤1,2 = 0.8 & T = 1 or 𝑤1,1 = 1 , 𝑤1,2 = 1 & T = 1.5 as
the solution ( satisfy the inequalities).
Another way to determine the values for 𝑤1,1 , 𝑤1,2 & T by finding a line separating the
3 “0” from the “1” as in Figure 3.2 , which is called decision boundary. Apparently, the 3 lines
and many other lines can satisfy the requirement. We say that the problem is linear separable.

Figure 3.2 Truth Table (Binary)

To find the separating line, we need to find the slope and the interceptions of the line i.e.
the points (0, 1.5) , (1.5,0):
T 𝑤1,1 𝑝1
𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 = T ⟹ 𝑝2 = −
𝑤1,2 𝑤1,2
20
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

1.5−0
𝑚= = −1
0−1.5

(𝑦 − 𝑦1 ) = 𝑚(𝑥 − 𝑥1 ) ⟹ 𝑦 = 𝑚(𝑥 − 𝑥1 ) + 𝑦1 = −𝑥 + 1.5


𝑤1,1
∴ 𝑤1,2
= 1 ⟹ 𝑤1,1 = 𝑤1,2

T
= 1.5 ⟹ T = 1.5 𝑤1,2
𝑤1,2

If 𝑤1,1 = 𝑤1,2 = 1 ⟹ T = 1.5

Now, Table 3.2 shows the truth table for the AND logical operation as bipolar
representation.

𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟐
−1 −1 −1
−1 1 −1
1 −1 −1
1 1 1
Table 3.2 The AND Logical Operation (Bipolar)

𝑎=1 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 > T


𝑎 = −1 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 < T
We make use of the truth table (3.2) and the following inequalities are obtained.
−𝑤1,1 − 𝑤1,2 < T
−𝑤1,1 + 𝑤1,2 < T
𝑤1,1 − 𝑤1,2 < T
𝑤1,1 + 𝑤1,2 > T
The task is to determine the values for 𝑤1,1 , 𝑤1,2 & T so that the output satisfies the
logic AND function.
We can choose 𝑤1,1 = 0.5 , 𝑤1,2 = 0.8 & T = 1 or 𝑤1,1 = 1 , 𝑤1,2 = 1 & T = 1.5 as
the solution ( satisfy the inequalities).

21
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Another way to determine the values for 𝑤1,1 , 𝑤1,2 & T by finding a line separating the 3 “−1”
from the “1” , one possible decision boundary for this function is shown in Figure 3.3.

Figure 3.3 Truth Table (Bipolar)

So, the equation for the separating line in Figure 3.3 is:
1−0
𝑚= = −1
0−1
𝑦 = 𝑚(𝑥 − 𝑥1 ) + 𝑦1 = −𝑥 + 1
T 𝑤1,1 𝑝1
As we see previously 𝑝2 = −
𝑤1,2 𝑤1,2

𝑤1,1
∴ 𝑤1,2
= 1 ⟹ 𝑤1,1 = 𝑤1,2

T
= 1 ⟹ T = 𝑤1,2
𝑤1,2

If 𝑤1,1 = 𝑤1,2 = 1 ⟹ T=1


From above we see that the neuron (perceptron ) with two inputs no bias can satisfy the
truth table for AND logic by changing the boundary for threshold (hard limit ) transfer function
to the value T .
If we don’t want to change the boundary for the function
0 𝑛<0
𝑓(𝑛) = {
1 𝑛≥0
In this situation, we need to add a bias to the neuron as in Figure 3.4.
Then, the input (Table 3.2) to the neuron is 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + 𝑏

22
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

The decision line is


𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + 𝑏 = 0 A
𝑝1 𝑝B
2

𝑏 𝑤1,1 𝑝1
⟹ 𝑝2 = − −
𝑤1,2 𝑤1,2

From above the separating line in Figure 3.3 is: −1


1 𝑓
𝑦 = −𝑥 + 1
𝑤1,1
∴ = 1 ⟹ 𝑤1,1 = 𝑤1,2
𝑤1,2
Figure 3.4 A neural network that recognizes
𝑏
− = 1 ⟹ 𝑏 = − 𝑤1,2 the AND logical operation.
𝑤1,2

If 𝑤1,1 = 𝑤1,2 = 1 ⟹ 𝑏 = −1

Example 3.4: The OR Operation


Neural networks can be created to recognize other logical operations as well. Consider
the OR logical operation. The truth table for the OR logical operation is shown in Table 3.3.
The OR logical operation is true if either input is true.

𝒑𝟏 𝒑𝟐 𝒑𝟏 OR 𝒑𝟐
0 0 0
0 1 1
1 0 1
1 1 1
Table 3.3 The OR Logical Operation (Binary)

The neural network that will recognize the OR operation is shown in Figure 3.5.
The OR neural network looks very similar to
A
𝑝1 𝑝B
2
the AND neural network. The biggest difference is
the threshold value. Because the threshold is lower,
only one of the inputs needs to have a value of true T = 0.9

for the output neuron to fire.

Figure 3.5 A neural network that


recognizes the OR logical operation.
23
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

𝑎=1 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 > T


𝑎=0 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 < T
We make use of the truth table (3.3) and the following inequalities are obtained.
0<T
𝑤1,2 > T
𝑤1,1 > T
𝑤1,1 + 𝑤1,2 > T
We can choose 𝑤1,1 = 0.5 , 𝑤1,2 = 0.8 & T = 0.4 or 𝑤1,1 = 1 , 𝑤1,2 = 1 & T = 0.9
as the solution ( satisfy the inequalities).
One of the boundary decisions for the OR function can be seen in Figure 3.6
𝑦 = −𝑥 + 0.5
T 𝑤1,1 𝑝1
𝑝2 = −
𝑤1,2 𝑤1,2

𝑤1,1
∴ 𝑤1,2
= 1

⟹ 𝑤1,1 = 𝑤1,2
T
= 0.5
𝑤1,2

⟹ T = 0.5 𝑤1,2
If 𝑤1,1 = 𝑤1,2 = 1
Figure 3.6 Truth Table (Binary)
⟹ T = 0.5
Table 3.4 shows the truth table for the OR logical operation as bipolar representation.

𝒑𝟏 𝒑𝟐 𝒑𝟏 OR 𝒑𝟐
−1 −1 −1
−1 1 1
1 −1 1
1 1 1
Table 3.4 The OR Logical Operation (Bipolar)

24
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

𝑎=1 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 > T


𝑎 = −1 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 < T
We make use of the truth table (3.4) and the following inequalities are obtained.
−𝑤1,1 − 𝑤1,2 < T
−𝑤1,1 + 𝑤1,2 > T
𝑤1,1 − 𝑤1,2 > T
𝑤1,1 + 𝑤1,2 > T
The task is to determine the values for 𝑤1,1 , 𝑤1,2 & T so that the output satisfies the
logic OR function.
We can choose 𝑤1,1 = 0.8 , 𝑤1,2 = 0.5 & T = −0.4 or 𝑤1,1 = 1 , 𝑤1,2 = 1 & T = − 0.9
as the solution ( satisfy the inequalities).

Another way to determine the values for 𝑤1,1 , 𝑤1,2 & T by finding a line separating the 3 “−1”
from the “1” , one possible decision boundary for this function is shown in Figure 3.7.

Figure 3.7 Truth Table (Bipolar)

So, the equation for the separating line in Figure 3.7 is:
−1 − 0
𝑚= = −1
0+1
𝑦 = 𝑚(𝑥 − 𝑥1 ) + 𝑦1 = −𝑥 − 1
T 𝑤1,1 𝑝1
As we see previously 𝑝2 = −
𝑤1,2 𝑤1,2

25
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

𝑤1,1
∴ 𝑤1,2
= 1 ⟹ 𝑤1,1 = 𝑤1,2

T
= −1 ⟹ T = − 𝑤1,2
𝑤1,2

If 𝑤1,1 = 𝑤1,2 = 1 ⟹ T = −1
From above we see that the neuron (perceptron ) with two inputs no bias can satisfy the
truth table for OR logic by changing the boundary for threshold (hard limit ) transfer function
to the value T .
If we don’t want to change the boundary for the function
0 𝑛<0
𝑓(𝑛) = {
1 𝑛≥0
In this situation, we need to add a bias to the neuron as in Figure 3.8.
Then, the input (Table 3.4 ) to the neuron is 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + 𝑏
The decision line is
𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + 𝑏 = 0 A1
𝑝 𝑝B
2

𝑏 𝑤1,1 𝑝1
⟹ 𝑝2 = − −
𝑤1,2 𝑤1,2

From above the separating line in Figure 3.7 is: 1


1 𝑓
𝑦 = −𝑥 − 1
𝑤1,1
∴ = 1 ⟹ 𝑤1,1 = 𝑤1,2
𝑤1,2
Figure 3.8 A neural network that recognizes
𝑏
− =−1 ⟹ 𝑏 = 𝑤1,2 the OR logical operation.
𝑤1,2

If 𝑤1,1 = 𝑤1,2 = 1 ⟹ 𝑏=1

Example 3.5: The NOT Operation


Table 3.5 shows the truth table for the NOT logical operation.
𝒑𝟏 NOT 𝒑𝟏
0 1
1 0
Table 3.5 The NOT Logical Operation (Binary)

26
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

𝑎 = 1 if 𝑤1,1 𝑝1 > T
𝑎 = 0 if 𝑤1,1 𝑝1 < T
𝑝1 −0.5 T = −0.1
Apply truth Table 3.5, we get
0>T Figure 3.9 A neural network that recognizes
𝑤1,1 < T the NOT logical operation.

Let 𝑤1,1 = −0.5 & T = −0.1 (which satisfy the inequalities).

Example 3.6: The NAND Operation


Table 3.6 shows the truth table for the AND logical operation.

𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟏 NAND 𝒑𝟐
𝒑𝟐
0 0 0 1
0 1 0 1
1 0 0 1
1 1 1 0
Table 3.6 The NAND Logical Operation (Binary)

For the NAND logic function, we have


𝑎=1 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 > T
𝑝1
𝑎=0 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 < T
𝑝2 −0.5 T = −0.8
Apply truth Table 3.6, we get
0>T
Figure 3.10 A neural network that
𝑤1,2 > T recognizes the NAND logical operation.

𝑤1,1 > T
𝑤1,1 + 𝑤1,2 < T

Let 𝑤1,1 = −0.5 , 𝑤1,2 = − 0.5 & T = −0.8 ( which satisfy the inequalities).

27
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Example 3.7: The NOR Operation


Table 3.7 shows the truth table for the AND logical operation.
𝒑𝟏 𝒑𝟐 𝒑𝟏 OR 𝒑𝟐 𝒑𝟏 NOR 𝒑𝟐
0 0 0 1
0 1 1 0
1 0 1 0
1 1 1 0
Table 3.7 The NOR Logical Operation (Binary)

For the NOR logic function, we have


𝑎=1 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 > T
𝑝1
𝑎=0 if 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 < T
𝑝2 −0.5 T = −0.1
Apply truth Table 3.7, we get
0>T
Figure 3.11 A neural network that
𝑤1,2 < T recognizes the NOR logical operation.

𝑤1,1 < T
𝑤1,1 + 𝑤1,2 < T

Let 𝑤1,1 = −0.5 , 𝑤1,2 = − 0.5 & T = −0.1 ( which satisfy the inequalities).

Example 3.8: The XOR Operation


Next we will consider a neural network for the exclusive or (XOR) logical operation.
The XOR truth table is shown in Table 3.8.

𝒑𝟏 𝒑𝟐 𝒑𝟏 XOR 𝒑𝟐
0 0 0
0 1 1
1 0 1
1 1 0
Table 3.8 The XOR Logical Operation

28
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

1.5

0.5
p1

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-0.5

-1

-1.5

-2
p2

Figure 3.12 Truth Table (Binary)

It is easy see from Figure 3.12 that there is no straight line can separate the points into
two groups.
The XOR logical operation requires a slightly more complex neural network than the
AND and OR operators. The neural networks presented so far have had only one neuron
(perceptron) with one or two inputs. More complex neural networks also include one or more
hidden layers. The XOR operator requires a hidden layer.
Figure 3.13 shows a two-layer neural network that can be used to recognize the XOR
operator.

𝑝1 𝑝2

Figure 3.13 A neural network that


recognizes the XOR logical operation.

29
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Homework 3.1:
Solve the examples 3.5, 3.6 and 3.7 by using bipolar representation.

Neural Network Learning:


One of the questions will raised from the above examples “How do we determine the
weight matrix and bias for perceptron networks with many inputs, where it is impossible to
visualize the decision boundaries?”. The answer is to build an algorithm for training perceptron
networks, so that they can learn to solve classification problems.
4.1 Types of Learning:
A procedure for modifying the weights and biases of a network. (This procedure may
also be referred to as a training algorithm.) The purpose of the learning rule is to train the
network to perform some task. There are many types of neural network learning rules. They
fall into two broad categories: supervised learning and unsupervised learning.

4.1.1 Supervised Learning:


The learning rule is provided with a set of examples (the training set) of proper network
behavior:
{𝐩1 , 𝐭1 }, {𝐩2 , 𝐭 2 }, {𝐩3 , 𝐭 3 }, … , {𝐩𝑞 , 𝐭 𝑞 }
where 𝐩𝑞 is an input to the network and 𝐭 𝑞 is the corresponding correct (target) output.
As the inputs are applied to the network, the network outputs are compared to the targets. The
learning rule is then used to adjust the weights and biases of the network in order to move the
network outputs closer to the targets.

4.1.2 Unsupervised Learning:


The weights and biases are modified in response to network inputs only. There are no
target outputs available. At first glance this might seem to be impractical. How can you train a
network if you don’t know what it is supposed to do? Most of these algorithms perform some
kind of clustering operation. They learn to categorize the input patterns into a finite number of
classes.

30
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

4.2 Learning Rules:

4.2.1 Hebbian Learning Rule:


The earliest and simplest learning rule for neural network is generally known as the Hebb
rule. The Hebb (1949) learning rule is based on the assumption that if two neighbor neurons
must be activated and deactivated at the same time, then the weight connecting these neurons
should increase. For neurons operating in the opposite phase, the weight between them should
decrease (i.e. the weight increase if both 𝑝𝑗 and 𝑎𝑖 are positive or negative, the weight decrease
whenever 𝑝𝑗 and 𝑎𝑖 have opposite sign ). The weight update can be written as:

𝑤𝑖𝑗𝑛𝑒𝑤 = 𝑤𝑖𝑗𝑜𝑙𝑑 + 𝛼 𝑎𝑖𝑞 𝑝𝑗𝑞 …………. (4.1)

Where 𝑝𝑗𝑞 is the 𝑗𝑡ℎ element of the 𝑞 𝑡ℎ input vector 𝐩𝑞 , 𝑎𝑖𝑞 is the 𝑖 𝑡ℎ element of the
network output when the 𝑞 𝑡ℎ input vector is presented to the network, and 𝛼 is a positive
constant (0 < 𝛼 ≤ 1), called the learning rate.

Note: The choice of the value of learning rate is important when we implement a neural
network. A large learning rate corresponds to rapid learning but might also result in oscillations.

The Hebb rule defined in Eq. 4.1 is an unsupervised learning rule. It does not require any
information concerning the target output.
The supervised Hebb rule substitute the target (desired) output with the neuron output.
The weight update can be written as:
𝑤𝑖𝑗𝑛𝑒𝑤 = 𝑤𝑖𝑗𝑜𝑙𝑑 + 𝑡𝑖𝑞 𝑝𝑗𝑞 …………. (4.2)

Where 𝑡𝑖𝑞 is the 𝑖 𝑡ℎ element of the 𝑞 𝑡ℎ target vector 𝐭 𝑞 . (𝛼 = 1 for simplicity).


Equation 4.2 can be written in vector notation:
𝐰 𝑛𝑒𝑤 = 𝐰 𝑜𝑙𝑑 + 𝐭 𝑞 𝐩𝑞 …………. (4.3)
4.2.1.1 Algorithm:

Step 1: Initialize all weights : 𝑤𝑖𝑗 = 0


Step 2: For each 𝑞 input training vector and target output pair (𝐩, 𝐭) , do steps 3 – 4.
Step 3: Adjust the weights for
𝑤𝑖𝑗𝑛𝑒𝑤 = 𝑤𝑖𝑗𝑜𝑙𝑑 + 𝑡𝑖 𝑝𝑗

31
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Step 4: Adjust the bias:


𝑏𝑖𝑛𝑒𝑤 = 𝑏𝑖𝑜𝑙𝑑 + 𝑡𝑖
Note that the bias is adjusted exactly like a weight.

Example 4.1:
A Hebb net for AND function: binary case
Input Target
𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟐
1 1 1
0 1 0
1 0 0
0 0 0
Table 4.1 The AND Logical Operation (Binary)

To use Hebb rule, initial weights are:


𝑤1,1 = 0 , 𝑤1,2 = 0 and 𝑏 = 0
The weight change for training (input, target) are:
𝑤1,1 = 𝑡 𝑝1 , 𝑤1,2 = 𝑡 𝑝2 and 𝑏 = 𝑡
The weight updates for the first input pair are as follows:
Input Target Weights
𝒑𝟏 𝒑𝟐 b 𝒑𝟏 AND 𝒑𝟐 𝑤1,1 𝑤1,2 𝑏
1 1 1 1 1 1 1

𝑏 𝑤1,1 𝑝1
Recall equation 𝑝2 = − −
𝑤1,2 𝑤1,2

So, the separating line becomes 𝑝2 = −𝑝1 − 1


Figure 4.1 shows that the response of the net after first training pair.
The following Table shows weight update after second, third and fourth training inputs
pairs:
Input Target Weights
𝒑𝟏 𝒑𝟐 b 𝒑𝟏 AND 𝒑𝟐 𝑤1,1 𝑤1,2 𝑏
0 1 1 0 1 1 1
1 0 1 0 1 1 1
0 0 1 0 1 1 1

32
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Because the target value is 0, no learning occurs. Thus, using binary target values
prevents the net from learning any pattern for which the target is “off”.

Figure 4.1 Decision boundary for binary AND function using


Hebb rule after first training pair.

The choice of training patterns can play a significant role in determining which problems
can be solved using the Hebb rule. The next example shows that the AND function can be
solved if we modify its representation to bipolar form.

Example 4.2:
A Hebb net for AND function: bipolar case
Input Target
𝒑𝟏 𝒑𝟐 𝒑𝟏 AND 𝒑𝟐
1 1 1
1 −1 −1
−1 1 −1
−1 −1 −1
Table 4.2 The AND Logical Operation (Bipolar)

Take, initial weights are:


𝑤1,1 = 0 , 𝑤1,2 = 0 and 𝑏 = 0
The weight change for training (input, target) are:
𝑤1,1 = 𝑡 𝑝1 , 𝑤1,2 = 𝑡 𝑝2 and 𝑏 = 𝑡

33
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Presenting the first input pair, we get


Input Target Weights
𝒑𝟏 𝒑𝟐 b 𝒑𝟏 AND 𝒑𝟐 𝑤1,1 𝑤1,2 𝑏
1 1 1 1 1 1 1

So, the separating line becomes 𝑝2 = −𝑝1 − 1


Figure 4.2 shows that the response of the net after will now be correct for the first training
pair.

Figure 4.2 Decision boundary for bipolar AND function


using Hebb rule after first training pair

Presenting the second input pair, we get


Input Target Weights
𝒑𝟏 𝒑𝟐 b 𝒑𝟏 AND 𝒑𝟐 𝑤1,1 𝑤1,2 𝑏
1 −1 1 −1 0 2 0

So, the separating line becomes 𝑝2 = 0

1.5

Figure 4.3 shows that the 1


0.5
response of the net after second
p1

0
-1.5 -1 -0.5 -0.5 0 0.5 1 1.5
training pair. -1
-1.5
p2

Figure 4.3 Decision boundary for bipolar AND


function using Hebb rule after second training pair
34
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Presenting the third input pair, we get

Input Target Weights


𝒑𝟏 𝒑𝟐 b 𝒑𝟏 AND 𝒑𝟐 𝑤1,1 𝑤1,2 𝑏
−1 1 1 −1 1 1 −1

So, the separating line becomes


𝑝2 = −𝑝1 + 1

Figure 4.4 shows that the response of the net after third training pair.

Figure 4.4 Decision boundary for bipolar AND function


using Hebb rule after third training pair

Presenting the fourth input pair, we get

Input Target Weights


𝒑𝟏 𝒑𝟐 b 𝒑𝟏 AND 𝒑𝟐 𝑤1,1 𝑤1,2 𝑏
−1 −1 1 −1 2 2 −2

Even though the weights have changed, the separating line is still
𝑝2 = −𝑝1 + 1

So the graph of the decision regions remains as in Figure 4.4.

35
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Example 4.3:
0.5 0.5
−0.5 1 0.5 1
Let 𝐩1 = [ ] , 𝐭1 = [ ] and 𝐩2 = [ ] , 𝐭 2 = [ ]. Use Hebbian learning rule
0.5 −1 −0.5 1
−0.5 −0.5
to train the neural network. ( Don’t use bias)

Take, initial weights are:


𝑤1,𝑖 = 0 , 𝑤2,𝑖 = 0 𝑖 = 1, 2, 3, 4

Presenting the first input pattern (𝐩1 , 𝐭1 )

𝑤1,1 = 𝑡1 𝑝1 = (1)(0.5) = 0.5


𝑤1,2 = 𝑡1 𝑝2 = (1)(−0.5) = −0.5
𝑤1,3 = 𝑡1 𝑝3 = (1)(0.5) = 0.5
𝑤1,4 = 𝑡1 𝑝4 = (1)(−0.5) = −0.5

𝑤2,1 = 𝑡2 𝑝1 = (−1)(0.5) = −0.5


𝑤2,2 = 𝑡2 𝑝2 = (−1)(−0.5) = 0.5
𝑤2,3 = 𝑡2 𝑝3 = (−1)(0.5) = −0.5
𝑤2,4 = 𝑡2 𝑝4 = (−1)(−0.5) = 0.5

Now, if we presenting the pattern (𝐩2 , 𝐭 2 ) , we get wrong output


𝑎1 = 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + 𝑤1,3 𝑝3 + 𝑤1,4 𝑝4
= (0.5)(0.5) + (−0.5)(0.5) + (0.5)(−0.5) + (−0.5)(−0.5) = 0
𝑎2 = 𝑤2,1 𝑝1 + 𝑤2,2 𝑝2 + 𝑤2,3 𝑝3 + 𝑤2,4 𝑝4
= (−0.5)(0.5) + (0.5)(0.5) + (−0.5)(−0.5) + (0.5)(−0.5) = 0

Which mean we need to train the network again to get the right output. (i.e. repeat the
above calculation for the pattern (𝐩2 , 𝐭 2 ) ).

36
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Note: The above calculation can be done using matrix notation


𝐖 = 𝐭1 𝐩1𝑇 + 𝐭 2 𝐩𝑇2 + ⋯ + 𝐭 𝑄 𝐩𝑇𝑄
𝐩1𝑇
𝐩𝑇2
𝐖 = [𝐭1 𝐭 2 … 𝐭 𝑄 ] = 𝐓𝐏 𝑇

𝑇
[𝐩 𝑄 ]
So, the weights for the example 4.3 are
1 1 0.5 −0.5 0.5 − 0.5
𝐖 = 𝐓𝐏 𝑇 = [ ][ ]
−1 1 0.5 0.5 −0.5 − 0.5
1 0 0 −1
=[ ]
0 1 −1 0
Now, check
1 1
⟹ 𝐖𝐩1 = [ ] , 𝐖𝐩2 = [ ].
−1 1

4.2.2 The Backpropagation Algorithm:

Recall from section 2.2.2 the three-layer network in abbreviated notation is shown in
Figure 4.5.

Figure 4.5 Three-Layer Network

37
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

As we discussed earlier, for multilayer networks the output of one layer becomes the
input to the following layer. The equations that describe this operation are
𝐚𝑚+1 = 𝐟 𝑚+1 (𝐖 𝑚+1 𝐚𝑚 + 𝐛𝑚+1 ) 𝑓𝑜𝑟 𝑚 = 0,1, … , 𝑀 − 1 ……….. (4.4)
where 𝑀 is the number of layers in the network.

Then the algorithm is:


Step 1: Propagate the input forward through the network:
𝐚0 = 𝐩
Step 2: Compute the output for each layer
𝐚𝑚+1 = 𝐟 𝑚+1 (𝐖 𝑚+1 𝐚𝑚 + 𝐛𝑚+1 ) 𝑓𝑜𝑟 𝑚 = 0,1, … , 𝑀 − 1
The outputs of the neurons in the last layer are considered the network outputs:
𝐚 = 𝐚𝑀
Step 3: Propagate the sensitivities backward through the network:
(𝐬 𝑀 → 𝐬 𝑀−1 → ⋯ → 𝐬 2 → 𝐬1 )
𝐬 𝑀 = −2 𝐅 ′𝑀 (𝐧𝑀 )(𝐭 − 𝐚)

𝐬 𝑚 = 𝐅 ′𝑚 (𝐧𝑚 )(𝐖 𝑚+1 )𝑇 𝐬 𝑚+1 𝑓𝑜𝑟 𝑚 = 𝑀 − 1, … , 2, 1.


Where
𝑓 ′𝑚 (𝑛1𝑚 ) 0 … 0
′𝑚 (𝑛𝑚 )
0 𝑓 … 0
𝐅 ′𝑚 (𝐧𝑚 ) = [ 2
]
⋮ ⋮ ⋮
0 0 … 𝑓 ′𝑚 (𝑛𝑠𝑚𝑚 )

Step 4: Update the weights and biases are updated using the approximate steepest descent rule:
𝐖 𝑚 (𝑘 + 1) = 𝐖 𝑚 (𝑘) − 𝛼 𝐬 𝑚 (𝐚𝑚−1 )𝑇
𝐛𝑚 (𝑘 + 1) = 𝐛𝑚 (𝑘) − 𝛼 𝐬 𝑚

38
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Example 4.4:
Consider the following network shown in Figure 4.6

Figure 4.6 Function Approximation Network

Which approximate the function


𝜋
𝑔(𝑝) = 1 + sin ( 𝑝) −2≤𝑝 ≤2
4
To solve this problem we use the backpropagation algorithm but before that, we need to
choose some initial values for the network weights and biases. Generally, these are chosen to
be small random values.
−0.27 −0.48
𝐖1 (0) = [ ] , 𝐛1 (0) = [ ] , 𝐖 2 (0) = [0.09 − 0.17] , 𝐛2 (0) = [0.48]
−0.41 −0.13

The response of the network for these


initial values is illustrated in Figure 4.7, along
with the sine function we wish to approximate.
Next, we need to select a training set
{𝑝1 , 𝑡1 }, {𝑝2 , 𝑡2 }, … , {𝑝𝑄 , 𝑡𝑄 } . In this case, we
will sample the function at 21 points in the
range [−2,2] at equally spaced intervals of 0.2.
The training points are indicated by the circles
in Figure 4.7.
Figure 4.7 Initial Network Response

39
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Now we are ready to start the algorithm. The training points can be presented in any
order, but they are often chosen randomly. For our initial input we will choose 𝑝 = 1 , which
is the 16𝑡ℎ training point:
𝑎0 = 𝑝 = 1
−0.27 −0.48 −0.75
𝐚1 = 𝐟1 (𝐖1 𝑎0 + 𝐛1 ) = 𝐥𝐨𝐠𝐬𝐢𝐠 ([ ] [1] + [ ]) = 𝐥𝐨𝐠𝐬𝐢𝐠 ([ ])
−0.41 −0.13 −0.54
1
0.75
= [1+𝑒1 ] = [0.321]
0.368
1+𝑒 0.54

The second layer output is


0.321
𝑎2 = 𝑓 2 (𝐖 2 𝐚1 + 𝐛2 ) = 𝑝𝑢𝑟𝑒𝑙𝑖𝑛 ([0.09 − 0.17] [ ] + [0.48]) = [0.446]
0.368
The error would then be
𝜋 𝜋
𝑒 = 𝑡 − 𝑎 = {1 + sin ( 𝑝) } − 𝑎2 = {1 + sin ( 1) } − 0.446 = 1.261.
4 4

The next stage of the algorithm is to backpropagate the sensitivities. Before we begin the
backpropagation, recall that we will need the derivatives of the transfer functions, 𝑓 ′1 (𝑛) and
𝑓 ′2 (𝑛). For the first layer
𝑑 1 𝑒 −𝑛 1 1
𝑓 ′1 (𝑛) = ( −𝑛 ) = (1+𝑒 −𝑛 )2 = (1 − −𝑛 ) ( ) = (1 − 𝑎1 )(𝑎1 ).
𝑑𝑛 1+𝑒 1+𝑒 1+𝑒 −𝑛

For the second layer we have


𝑑
𝑓 ′2 (𝑛) = ( 𝑛) = 1
𝑑𝑛
We can now perform the backpropagation. The starting point is found at the second layer,
using:
s 2 = −2 F ′2 (n2 )(t − a) = −2[𝑓 ′2 (𝑛2 )](1.261) = −2[1](1.261) = −2.522.
The first layer sensitivity is then computed by backpropagating the sensitivity from the
second layer, using:

1 ′1 (𝐧1 )(𝐖 2 )𝑇 2 (1 − 𝑎11 )(𝑎11 ) 0 0.09


𝐬 =𝐅 s =[ ][ ] [−2.522]
0 (1 − 𝑎12 )(𝑎12 ) −0.17

40
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

(1 − 0.321)(0.321) 0 0.09
=[ ][ ] [−2.522]
0 (1 − 0.368)(0.368) −0.17

=[
0.218 0 −0.227
][ ]=[
−0.0495
]
0 0.233 0.429 0.0997

The final stage of the algorithm is to update the weights. For simplicity, we will use a
learning rate 𝛼 = 0.1. We have
𝐖 2 (1) = 𝐖 2 (0) − 𝛼s 2 (𝐚1 )𝑇 = [0.09 − 0.17] − (0.1)[−2.522][0.321 0.368]
= [0.171 − 0.0772]
𝐛2 (1) = 𝐛2 (0) − 𝛼s 2 = [0.48] − (0.1)[−2.522] = [0.732]
−0.27 −0.0495 [1] −0.265
𝐖1 (1) = 𝐖1 (0) − 𝛼𝐬1 (𝐚0 )𝑇 = [ ] − (0.1) [ ] =[ ]
−0.41 0.0997 −0.420
−0.48 −0.0495 −0.475
𝐛1 (1) = 𝐛1 (0) − 𝛼𝐬1 = [ ] − (0.1) [ ]=[ ]
−0.13 0.0997 −0.140

This completes the first iteration of the backpropagation algorithm. We next proceed to
randomly choose another input from the training set and perform another iteration of the
algorithm. We continue to iterate until the difference between the network response and the
target function reaches some acceptable level.

Example 4.5:
Use Backpropagation Algorithm to train the following network shown in Figure 4.8

Figure 4.8 Neural Network

Where 𝐟1 & 𝑓 2 are sigmoid functions, with the initial weights


0.1 −0.2
𝐖1 (0) = [ ] , 𝐖 2 (0) = [0.2 − 0.5]
0.4 0.2
41
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

And, the training set {𝐩𝟏 , 𝑡1 }, {𝐩𝟐 , 𝑡2 }, … , {𝐩𝟓 , 𝑡5 } are

𝑝1 𝑝2 𝑡
0.4 −0.7 0.1
0.3 −0.5 0.05
0.6 0.1 0.3
0.2 0.4 0.25
0.1 −0.2 0.12

0.4
Let 𝐚0 = [ ]
−0.7
The first layer output is
0.1 −0.2 0.4 0.18
𝐚1 = 𝐟1 (𝐖1 𝒂0 + 𝐛1 ) = 𝐥𝐨𝐠𝐬𝐢𝐠 ([ ][ ]) = 𝐥𝐨𝐠𝐬𝐢𝐠 ([ ])
0.4 0.2 −0.7 0.02
1
−0.18
= [1+𝑒1 ] = [0.5448]
0.505
1+𝑒 −0.02

The second layer output is


0.5448
𝑎2 = 𝑓 2 (𝐖 2 𝐚1 + 𝐛2 ) = 𝑙𝑜𝑔𝑠𝑖𝑔 ([0.2 − 0.5] [ ]) = 𝑙𝑜𝑔𝑠𝑖𝑔([−0.14354])
0.505
= [0.4642]
The error would then be
𝑒 = 𝑡 − 𝑎 = 𝑡 − 𝑎2 = 0.1 − 0.4642 = −0.3642.
We can now perform the backpropagation. The second layer sensitivity is
s 2 = −2 [𝑓 ′2 (𝑛2 )](t − a) = −2(1 − 𝑎2 )𝑎2 (−0.3642)
= −2(1 − 0.4642)(0.4642)(−0.3642) = −2(0.09058) = −0.18116.
The first layer sensitivity is
(1 − 𝑎11 )(𝑎11 ) 0 0.2
𝐬1 = 𝐅 ′1 (𝐧1 )(𝐖 2 )𝑇 s 2 = [ ] [ ] [−0.18116]
0 (1 − 𝑎12 )(𝑎12 ) −0.5

(1 − 0.5448)(0.5448) 0 0.2
=[ ][
] [−0.18116]
0 (1 − 0.505)(0.505) −0.5
42
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

=[
0.248 0
][
0.2
] [−0.18116] = [
0.0496
] [−0.18116] = [
−0.009
]
0 0.25 −0.5 −0.125 0.023

Now, update the weights with learning rate 𝛼 = 0.6.


𝐖 2 (1) = 𝐖 2 (0) − 𝛼s 2 (𝐚1 )𝑇 = [0.2 − 0.5] − (0.6)[−0.18116][0.5448 0.505]
= [0.2 − 0.5] − [−0.109][0.5448 0.505] = [0.2 − 0.5] + [0.059 0.055]
= [0.259 − 0.028]
0.1 −0.2 −0.009
𝐖1 (1) = 𝐖1 (0) − 𝛼𝐬1 (𝐚0 )𝑇 = [ ] − (0.6) [ ] [0.4 − 0.7]
0.4 0.2 0.023
0.1 −0.2 −0.004 0.006
=[ ] − (0.6) [ ]
0.4 0.2 0.009 −0.161
0.1 −0.2 −0.0024 0.0036 −0.098 −0.204
=[ ]−[ ]=[ ]
0.4 0.2 0.0054 −0.0966 0.395 0.497

This completes the first iteration of the backpropagation algorithm. We next proceed to
randomly choose another input from the training set and perform another iteration of the
algorithm. We continue to iterate until the difference between the network response and the
target function reaches some acceptable level.

Summary:
The above example 4.5 can be summarize in following Figure:

43
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

The Bias:
Some networks employ a bias unit as part of every layer except the output layer. These
units have a constant activation value of 1 or −1, its weight might be adjusted during learning.
The bias unit provides a constant term in the weighted sum which results in an improvement
on the convergence properties of the network.
Biases are almost always helpful. In effect, a bias value allows you to shift the
activation (transfer) function to the left or right, which may be critical for successful
learning.
It might help to look at a simple example. Consider this 1-input, 1-output network that
has no bias:

The output of the network is computed by multiplying the input 𝑝 by the weight 𝑤0 and
passing the result through some kind of activation function (e.g. a sigmoid function.)
Here is the function that this network computes, for various values of 𝑤0 :

Changing the weight 𝑤0 essentially changes the "steepness" of the sigmoid. That’s
useful, but what if you wanted the network to output 0 when 𝑝 is 2. Just changing the steepness
of the sigmoid won’t really work “ you want to be able to shift the entire curve to the right”.
That's exactly what the bias allows you to do. If we add a bias to that network, like so:

44
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Then the output of the network becomes 𝑠𝑖𝑔(𝑤0 ∗ 𝑝 + 𝑤1 ∗ 1.0). Here is what the
output of the network looks like for various values of 𝑤1 :

A bias acts exactly as a weight on a connection from a unit whose activation is always
1. Increasing the bias increases the net input to the unit. If a bias is included, the input to the
activation function is:
𝑛 = 𝑤1,1 𝑝1 + 𝑤1,2 𝑝2 + … + 𝑤1,𝑅 𝑝𝑅 + 𝑏

Hopfield Network:
5.1 Discrete Hopfield Network:
The Hopfield neural network is perhaps the simplest type of neural network. As shown
in Figure 5.1, a Hopfield network consists of a single layer of neurons, 1, 2, . . . , 𝑠. The network
is fully interconnected; that is, every neuron in the network is connected to every other
neuron. The network is recurrent; that is, it has feedforward/feedbackward capabilities,
which means input to the neurons comes from external input as well as from the neurons
themselves internally. Also, the network is autoassociative which means that if the neural
network recognizes a pattern, it will return that pattern (i.e. input 𝐩 = target 𝐭).
The Hopfield network is classified under supervised learning.
45
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

Figure 5.2 Sample Pattern

Figure 5.1 Hopfield Neural Network

Each input/output, 𝑝𝑖 𝑜𝑟 𝑎𝑗 , takes a discrete bipolar value of either 1 or −1 . The


number of neurons, 𝑠, is the size required for each pattern in the bipolar representation. For
example, suppose that each pattern is a letter represented by an 10 × 12 two-dimensional
array, where each array element is either 1 for a black square or −1 for a blank square (for
example, Figure 5.2). Then 𝑠 will be 10 × 12 = 120. Each neuron is associated by weight,
𝑤𝑖𝑗 , which satisfies the following conditions:
𝑤𝑖𝑗 = 𝑤𝑗𝑖 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖, 𝑗 = 1, 2, … , 𝑠 (i.e. 𝐖 is symmetric)
𝑤𝑖𝑖 = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 = 1, 2, … , 𝑠 (i.e. There is no self-feedback)
So, the weight matrix is
0 𝑤1,2 𝑤1,3 … 𝑤1,𝑠
𝑤 0 𝑤2,3 ⋯ 𝑤2,𝑠
𝐖 = [ 2,1 ]
⋮ ⋱ ⋮
𝑤𝑆,1 𝑤𝑆,2 𝑤𝑆,3 ⋯ 0

With symmetrical hard limit transfer function


−1 𝑛<0
𝑓(𝑛) = {
1 𝑛≥0

5.1.1 Determining Weights:


Suppose that 𝐩1 , 𝐩2 , 𝐩3 , … , 𝐩𝑞 examples patterns are presented. Each pattern 𝐩𝑞 has 𝑠
inputs 𝑝1𝑞 , 𝑝2𝑞 , 𝑝3𝑞 , … , 𝑝𝑠𝑞 where 𝑝𝑘𝑞 = 1 𝑜𝑟 − 1, 𝑘 = 1, 2, … , 𝑠.
Determine 𝑤𝑖𝑗 for 𝑖, 𝑗 = 1, 2, … , 𝑠 by using Hebb rule:
46
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

0 𝑓𝑜𝑟 𝑖 = 𝑗
𝑤𝑖𝑗 = { 𝑞 ……. (5.1)
∑𝑟=1 𝑝𝑖𝑟 𝑝𝑗𝑟 𝑓𝑜𝑟 𝑖 ≠ 𝑗
Or, in matrix notation
𝐩1
𝐩2
Let 𝐏 = [ ⋮ ] ⟹ 𝐖 = 𝐏𝑇 𝐏
𝐩𝑞
If the resulting 𝐖 has diagonal 𝑤𝑖𝑖 ≠ 0, we can change the diagonal such that 𝑤𝑖𝑖 = 0.

5.1.2 Determining Outputs:


The output of each neuron is

𝑎𝑖𝑛𝑒𝑤 = 𝑓 (∑𝑠𝑗=1 𝑤𝑖𝑗 𝑎𝑗𝑜𝑙𝑑 ) 𝑖 = 1, 2, … , 𝑠 ……. (5.2)


𝑖≠𝑗

5.1.3 Energy function:

Define 𝐸, an energy (or Lyapunov) function, as:


1
𝐸 = − ∑𝑠𝑖=1 ∑𝑠𝑗=1 𝑤𝑖𝑗 𝑎𝑖𝑜𝑙𝑑 𝑎𝑗𝑜𝑙𝑑 ……. (5.3)
2

𝐸 always decreases when 𝑎𝑖 changes the value, and 𝐸 stays the same when there is no
change in 𝑎𝑖 's. The term "energy" here represents a measure that reflects the state of the
solution, and is somewhat analogous to the concept of physical energy.

5.1.4 Hopfield Network Algorithm:


Step 1: Determine 𝑤𝑖𝑗 , 𝑖, 𝑗 = 1, 2, … , 𝑠 by using Eq. (5.1).
Step 2: Initialize with unknown pattern 𝑎𝑖 = 𝑝𝑖 , 𝑖, 𝑗 = 1, 2, … , 𝑠.
Step 3: Perform iterations updating 𝑎𝑖 's by using Eq. (5.2) until the energy function
𝐸 in Eq. (5.3) stops decreasing or have lowest energy (or, equivalently, 𝑎𝑖 's
remain unchanged). Then 𝑎𝑖 represents the solution, that is, the pattern that best
matches (associates to) the unknown input.

47
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

5.1.5 Types of Update:

1. Synchronous updates: all units compute their activations and then update their states
simultaneously.
2. Asynchronous updates: one unit at a time computes its activation and updates its state. The
sequence of selected units can be a fixed or a random one.

Example 5.1:
−1 1 −1
1 −1 −1
Let 𝐩1 = [ ] , 𝐩2 = [ ], 𝐩3 = [ ] be examples pattern. Use Hopfield Network to
−1 1 −1
−1 −1 1
1
1
recognize the unknown pattern [ ].
1
−1
Solution:

𝐩1

𝐩2

𝐩3

Unknown Pattern
Figure 5.3 Hopfield Network (4 neurons)

−1 1 −1 −1
𝐏 = [ 1 −1 1 − 1]
−1 −1 −1 1

48
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

−1 1 −1 3 −1 3 −1
−1 1 −1 −1
1 −1 −1 −1 3 −1 −1
𝐖=[ ][ 1 −1 1 − 1] = [ ]
−1 1 −1 3 −1 3 −1
−1 −1 −1 1
−1 − 1 1 −1 −1 −1 3
Since must 𝑤𝑖𝑖 = 0, we get
0 −1 3 −1
−1 0 −1 −1
𝐖=[ ]
3 −1 0 −1
−1 −1 −1 0

Also, you can compute weight matrix using Eq. (5.1).


𝑤11 = 0
𝑤12 = 𝑝11 𝑝21 + 𝑝12 𝑝22 + 𝑝13 𝑝23 = (−1)(1) + (1)(−1) + (−1)(−1) = −1 = 𝑤21
𝑤13 = 𝑝11 𝑝31 + 𝑝12 𝑝32 + 𝑝13 𝑝33 = (−1)(−1) + (1)(1) + (−1)(−1) = 3 = 𝑤31
𝑤14 = 𝑝11 𝑝41 + 𝑝12 𝑝42 + 𝑝13 𝑝43 = (−1)(−1) + (1)(−1) + (−1)(1) = −1 = 𝑤41
𝑤23 = 𝑝21 𝑝31 + 𝑝22 𝑝32 + 𝑝23 𝑝33 = (1)(−1) + (−1)(1) + (−1)(−1) = −1 = 𝑤32
𝑤24 = 𝑝21 𝑝41 + 𝑝22 𝑝42 + 𝑝23 𝑝43 = (1)(−1) + (−1)(−1) + (−1)(1) = −1 = 𝑤42
𝑤34 = 𝑝31 𝑝41 + 𝑝32 𝑝42 + 𝑝33 𝑝43 = (−1)(−1) + (1)(−1) + (−1)(1) = −1 = 𝑤43

Synchronous updates:
Iteration 1:
1
1
Let 𝐚𝑜𝑙𝑑 =[ ]
1
−1
0 −1 3 −1 1 3
−1 0 −1 −1 1 −1
𝐖𝐚𝑜𝑙𝑑 =[ ][ ] = [ ]
3 −1 0 −1 1 3
−1 −1 −1 0 −1 −3
3 1
−1 −1
𝐚𝑛𝑒𝑤 = 𝑓(𝐖𝐚𝑜𝑙𝑑 ) = 𝑓 ([ ]) = [ ]
3 1
−3 −1
49
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

0 −1 3 −1 1 3
1 −1 0 −1 −1 1 1 −1 8
𝐸1 = − [1 1 1 − 1] [ ] [ ] = − [1 1 1 − 1] [ ] = − = −4
2 3 −1 0 −1 1 2 3 2
−1 −1 −1 0 −1 −3
0 −1 3 −1 1 5
1 −1 0 − 1 − 1 −1 1 −1 12
𝐸2 = − [1 − 1 1 − 1] [ ] [ ] = − [1 − 1 1 − 1] [ ] = −
2 3 −1 0 −1 1 2 5 2
−1 − 1 − 1 0 −1 −1
= −6

Since 𝐸2 < 𝐸1 ⟹ 𝐚𝑛𝑒𝑤 is the solution

Iteration 2:
1
−1
Now, 𝐚𝑜𝑙𝑑 =[ ]
1
−1
0 −1 3 −1 1 5
−1 0 − 1 − 1 −1 −1
𝐖𝐚𝑜𝑙𝑑 =[ ][ ] = [ ]
3 −1 0 −1 1 5
−1 −1 −1 0 −1 −1
5 1
−1 −1
𝐚𝑛𝑒𝑤 = 𝑓(𝐖𝐚𝑜𝑙𝑑 ) = 𝑓 ([ ]) = [ ]
5 1
−1 −1

Since 𝐚𝑛𝑒𝑤 in the second iteration does not change we must stop and 𝐚𝑛𝑒𝑤 is the solution.

Asynchronous updates:
1
1
𝑎1 = 𝑓 ([0 −1 3 − 1] [ ]) = 𝑓(3) = 1
1
−1

50
University of Baghdad 3rd Class
College of Science Dr. Najlaa Mohammed 2015
Dept. Of Computer Science Neural Networks Lectures

1
1
𝑎2 = 𝑓 ([−1 0 − 1 − 1] [ ]) = 𝑓(−1) = −1
1
−1
1
1
𝑎3 = 𝑓 ([3 −1 0 − 1] [ ]) = 𝑓(3) = 1
1
−1
1
1
𝑎4 = 𝑓 ([−1 − 1 − 1 0] [ ]) = 𝑓(−3) = −1
1
−1

51

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy