0% found this document useful (0 votes)

27 views11 pages

Multilayer Perceptron and Neural Networks

Uploaded by

Raquel Gómez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views11 pages

Multilayer Perceptron and Neural Networks

Uploaded by

Raquel Gómez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228340819

Multilayer perceptron and neural networks

Article in WSEAS Transactions on Circuits and Systems · July 2009

CITATIONS READS

242 25,066

4 authors, including:

Valentina Emilia Balas Nikos E Mastorakis

Aurel Vlaicu University of Arad Technical University of Sofia
511 PUBLICATIONS 4,392 CITATIONS 970 PUBLICATIONS 5,528 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Environment View project

Call for Book Chapters: Security and Privacy for Internet of Medical Things (IoMT) View project

All content following this page was uploaded by Nikos E Mastorakis on 28 September 2016.

The user has requested enhancement of the downloaded file.

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

Multilayer Perceptron and Neural Networks

MARIUS-CONSTANTIN POPESCU1 VALENTINA E. BALAS2
3
LILIANA PERESCU-POPESCU NIKOS MASTORAKIS4
Faculty of Electromechanical and Environmental Engineering, University of Craiova1
Faculty of Engineering, “Aurel Vlaicu” University of Arad2
“Elena Cuza” College of Craiova3
ROMANIA,
Technical University of Sofia4
BULGARIA.
popescu.marius.c@gmail.com balas@inext.ro mastor@wses.org

Abstract: - The attempts for solving linear inseparable problems have led to different variations on the number
of layers of neurons and activation functions used. The backpropagation algorithm is the most known and used
supervised learning algorithm. Also called the generalized delta algorithm because it expands the training way
of the adaline network, it is based on minimizing the difference between the desired output and the actual
output, through the downward gradient method (the gradient tells us how a function varies in different
directions). Training a multilayer perceptron is often quite slow, requiring thousands or tens of thousands of
epochs for complex problems. The best known methods to accelerate learning are: the momentum method and
applying a variable learning rate. The paper presents the possibility to control the induction driving using neural
systems.

Key-Words:- Backpropagation algorithm, Gradient method, Multilayer perceptron, Induction driving.

1 Introduction strata, without any processing on the inputs. In what

The multilayer perceptron is the most known and follows, we will count only the layers consisting of
most frequently used type of neural network. On stand-alone neurons, but we will mention that the
most occasions, the signals are transmitted within the inputs are grouped in the input layer. There are also
network in one direction: from input to output. There feed-back networks, which can transmit impulses in
is no loop, the output of each neuron does not affect both directions, due to reaction connections in the
the neuron itself. This architecture is called feed- network. These types of networks are very powerful
forward (Fig.1). and can be extremely complicated. They are
dynamic, changing their condition all the time, until
the network reaches an equilibrium state, and the
search for a new balance occurs with each input
change. Introduction of several layers was
determined by the need to increase the complexity of
decision regions. As shown in the previous
paragraph, a perceptron with a single layer and one
input generates decision regions under the form of
semi planes. By adding another layer, each neuron
acts as a standard perceptron for the outputs of the
neurons in the anterior layer, thus the output of the
network can estimate convex decision regions,
Fig. 1: Neural network feed-forward multilayer. resulting from the intersection of the semi planes
Layers which are not directly connected to the generated by the neurons. In turn, a three-layer
environment are called hidden. In the reference perceptron can generate arbitrary decision areas
material, there is a controversy regarding the first (Fig.2). Regarding the activation function of
layer (the input layer) being considered as a stand- neurons, it was found that multilayer networks do
alone (itself a) layer in the network, since its only not provide an increase in computing power
function is to transmit the input signals to the upper compared to networks with a single layer, if the

ISSN: 1109-2734 579 Issue 7, Volume 8, July 2009

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

activation functions are linear, because a linear argument and are saturated, somewhat taking over
function of linear functions is also a linear function. the role of threshold for high absolute values of the
argument. It has been shown [4] that a network
(possibly infinite) with one hidden layer is able to
approximate any continuous function.

Fig. 4: Sigmoid single-pole activation function.

This justifies the property of the multilayer

perceptron to act as a universal approximator. Also,
by applying the Stone-Weierstrass theorem in the
neural network, it was demonstrated that they can
calculate certain polynomial expressions: if there are
two networks that calculate exactly two functions f1,
namely f2, then there is a larger network that
calculates exactly a polynomial expression of f1 and
Fig. 2: Decision regions of multilayer perceptrons. f2. Multi Perceptron is the best known and most used
type of neural networks are trained units of the type
The power of the multilayer perceptron comes shown in Fig. 5. Each of these units forms a
precisely from non-linear activation functions. weighted sum of its inputs to which are added a
Almost any non-linear function can be used for this constant. This amount is then passed through a non-
purpose, except for polynomial functions. Currently, linear function which is often called the activation
the functions most commonly used today are the function. Most units are interconnected in a manner
single-pole (or logistic) sigmoid, shown in Figure 3: "feed forward" ie interconnections which form a
loop as shown in Fig. 6.
1
f (s) = . (1)
1 + e− s

Fig. 5: A multi-unit perceptron.

Fig. 3: Sigmoid single-pole activation function.

And the bipolar sigmoid (the hyperbolic tangent)

function, shown in Figure 4, for a=2:

1 − e − a⋅ s
f ( s) = . (2)
1 + e − a⋅ s Fig. 6: Example network "feed forward". Each circle
represents a unit of the type shown in Figure 6.
It may be noted that the sigmoid functions act Each connection between units is a share. Each
approximately linear for small absolute values of the unit also has an entry in the diagonal are not shown.

ISSN: 1109-2734 580 Issue 7, Volume 8, July 2009

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

For some types of applications recurrent networks (ie 2 The backpropagation algorithm
not "feed forward"), in which some interconnections Learning networks is typically achieved through a
forming loop, are also used. I have seen in Figure 6 supervised manner. It can be assumed to be available
an example of feed forward network. As mentioned a learning environment that contains both the
interconnections units of this type of network does a learning models and models of desired output
not form loop, so the network is called feed forward. corresponding to input (this is known as "target
Networks in which there is one or more loops of models"). As we will see, learning is typically based
interconnections as represented in Figure 7.a shall on the minimization of measurement errors between
appoint recurring between the units has a share. Each network outputs and desired outputs. This implies a
unit also has an entry in the diagonal are not shown. back propagation through a network similar to that
which is learned. For this reason algorithm learning
is called back-propagation. The method was first
proposed by [2], but at that time it was virtually
ignored, because it supposed volume calculations too
large for that time. It was then rediscovered by [20],
but only in the mid-'80s was launched by Williams
[18] as a generally accepted tool for training of the
multilayer perceptron. The idea is to find the
a)
minimum error function e(w) in relation to the
connections weights. The algorithm for a multilayer
perceptron with a hidden layer is the following [8]:
Step 1: Initializing. All network weights and
thresholds are initialized with random values,
distributed evenly in a small range, for example
⎛ − 2.4 2.4 ⎞
⎜ ⎟
⎜ F , F ⎟ , where Fi is the total number of inputs
⎝ i i ⎠
b) of the neuron i [6]. If these values are 0, the
gradients which will be calculated during the trial
will be also 0 (if there is no direct link between input
and output) and the network will not learn. More
training attempts are indicated, with different initial
weights, to find the best value for the cost function
(minimum error). Conversely, if initial values are
large, they tend to saturate these units. In this case,
c) derived sigmoid function is very small. It acts as a
multiplier factor during the learning process and thus
the saturated units will be nearly blocked, which
makes learning very slow.
Step 2: A new era of training. An era means
presenting all the examples in the training set. In
most cases, training the network involves more
d)
training epochs. To maintain mathematical rigor, the
Fig. 7: Common types of networks: a) a recurrent
weights will be adjusted only after all the test vectors
network; b) a stratified network; c) a network with
links between units of input and output; d) a feed
will be applied to the network. Therefore, the
forward network fully connected. gradients of the weights must be memorized and
adjusted after each model in the training set, and the
In feed forward networks, units are usually arranged end of an epoch of training, the weights will be
in levels (layers) as in Figure 7.b but other topologies changed only one time (there is an „on-line” variant,
can be used. Figure 7.c shows a type of network that more simple, in which the weights are updated
is useful in some applications in which direct links directly, in this case, the order in which the vectors
between units of input and output are used. Figure of the network are presented might matter.
7.d shows a network with 3 units which is fully All the gradients of the weights and the current error
connected i.e. that all interconnections are allowed to are initialized with 0 (Δwij = 0 and E = 0).
feed restriction forward. Step 3: The forward propagation of the signal

ISSN: 1109-2734 581 Issue 7, Volume 8, July 2009

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

3.1 An example from the training set is applied to the 4.3 The gradients of the errors for the neurons in the
to the inputs. hidden layer are calculated:
3.2 The outputs of the neurons from the hidden layer
( )
l
are calculated: δ j ( p) = y j ( p) ⋅ 1 − y j ( p) ⋅ ∑ δk ( p) ⋅ w jk ( p) , (11)
⎛ n ⎞ k =1
y j ( p) = f ⎜ ∑ xi ( p) ⋅ wij − θ j ⎟ , (3)
⎝ i =1 ⎠ where l is the number of outputs for the network.
where n is the number of inputs for the neuron j from 4.4 The gradients of the weights between the input
the hidden layer, and f is the sigmoid activation layer and the hidden layer are updated:
function.
Δwij ( p) = Δwij ( p) + xi ( p ) ⋅ δ j ( p ) . (12)
3.3 The real outputs of the network are calculated:

⎛m ⎞ Step 5: A new iteration.

yk ( p) = f ⎜ ∑ x jk ( p) ⋅ w jk ( p) − θ k ⎟ , (4) If there are still test vectors in the current training
⎝ i =1 ⎠ epoch, pass to step 3. If not, the weights all the
where m is the number of inputs for the neuron k
connections will be updated based on the gradients
from the output layer.
of the weights:
3.4 The error per epoch is updated: wij = wij + η ⋅ Δwij , (13)

E=E+
(ek ( p ) )2 where η is the learning rate.
. (5)
2 If an epoch is completed, we test if it fulfils the
criterion for termination (E<Emax or a maximum
Step 4: The backward propagation of the errors number of training epochs has been reached).
and the adjustments of the weights. If not, we pass to step 2. If yes, the algorithm ends.
4.1 The gradients of the errors for the neurons in the Example: MATLAB program [11] allows the
output layer are calculated: generation of a logical OR functions, which means
that the perceptron separates the classes of 0 from
δ k ( p) = f '⋅ek ( p) , (6) the classes of 1. Obtaining in the Matlab work space:
where f’ is the derived function for the activation, epoch:1SSE:3
and the error ek ( p ) = yd , k ( p ) − yk ( p) . epoch:2SSE:1
epoch:3SSE:1 epoch:4SSE:0
If we use the single-pole sigmoid (equation 1, its Test on the lot [0 1] s =1
derived is:
e− x After the fourth iteration, the perceptron separates
f ' ( x) = = f ( x) ⋅ (1 − f ( x) ) . (7)
(
1 + e− x
2
) two classes (0 and 1) by a line. After the fourth
iteration the perceptron separates by a line two
classes (0 and 1). The percepton was tested in the
If we use the bipolar sigmoid (equation 2, its derived
is: ⎡0 ⎤
presence of the vector input ⎢ ⎥ .
⎣1⎦
2a ⋅ e−a⋅ x a
f ' ( x) = = ⋅ (1 − f ( x)) ⋅ (1 + f ( x)) . (8)
(1 + e )
− a⋅ x 2 2

Further, let’s suppose that the function utilized is the

single-pole sigmoid. Then the equation (6) becomes:

δ k ( p ) = yk ( p ) ⋅ (1 − yk ( p ) ) ⋅ ek ( p) . (9)

4.2 The gradients for the weights between the hidden

layer and the output layer are updated:

Δw jk ( p) = Δw jk ( p ) + y j ( p ) ⋅ δ k ( p) . (10)

Fig. 8: The evolution of the sum of squared errors.

ISSN: 1109-2734 582 Issue 7, Volume 8, July 2009

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

The perceptron makes the logic OR function for which decreases the speed of convergence. For the
which the classes are linearly separable; that is one output neurons, the activation functions adapted to
of the conditions of the perceptron. If the previous the distribution of the output data are recommended.
programs is performed for the exclusive OR Therefore, for problems of the binary classification
function, we will observe that, for any of the two (0/1), the single-pole sigmoid is appropriate. For a
classes, there is no line to allow the separation into classification with n classes, each corresponding to a
two classes (0 and 1). binary output of the network (for example, an
application of optical character recognition), the
softmax extension of the single-pole sigmoid may be
3 Methods to accelerate the learning used.
The momentum method [18] proposes adding a term
to adjust weights. This term is proportional to the last e yk
amendment of the weight, i.e. the values with which y k' = n . (16)
yi
the weights are adjusted are stored and they directly ∑e
influence all further adjustments: i =1

Δwij ( p) = Δwij ( p) + α ⋅ Δwij ( p − 1) . (14) For continuous values, we can make a pre-processing
and a post processing of data, so that the network
Adding a new term is done after the update of the will operate with scaled values, for example in the
gradients for the weights from equations 10 and 12. range [-0.9, 0.9] for the hyperbolic tangent. Also, for
The method of variable learning rate [19] is to use an continuous values, the activation function of the
individual learning rate for each weight and adapt output neurons may be linear, especially if there are
these parameters in each iteration, depending on the no known limits for the range in which these can be
successive signs of the gradients [9]: found. In a local minimum, the gradients of the error
become 0 and the learning no longer continues. A
⎧⎪u ⋅ ηij ( p −1),sgn(Δwij ( p)) = sgn(Δwij ( p −1)) ⎫⎪ solution is multiple independent trials, with weights
ηij ( p) = ⎨ ⎬
(15) initialized differently at the beginning, which raises
⎪⎩d ⋅ ηij ( p −1),sgn(Δwij ( p)) = −sgn(Δwij ( p −1))⎪⎭
the probability of finding the global minimum. For
large problems, this thing can be hard to achieve and
If during the training the error starts to increase, then local minimums may be accepted, with the
rather than decrease, the learning rates are reset to condition that the errors are small enough. Also,
initial values and then the process continues. different configurations of the network might be
tried, with a larger number of neurons in the hidden
layer or with more hidden layers, which in general
4 Practical considerations of working lead to smaller local minimums. Still, although local
with multilayer perceptrons minimums are indeed a problem, practically they are
For relatively simple problems, a learning rate of not unsolvable. An important issue is the choice of
η = 0.7 is acceptable, but in general it is the best configuration for the network in terms of
number of neurons in hidden layers. In most
recommended the learning rate to be around 0.2. To
situations, a single hidden layer is sufficient. There
accelerate through the momentum method, a
are no precise rules for choosing the number of
satisfactory value for α is 0.9. If the learning rate is
neurons. In general, the network can be seen as a
variable, typical values that work well in most
system in which the number of test vectors
situations are u = 1.2 and d = 0.8.
multiplied by the number of outputs is the number of
Choosing the activation function for the output
equations and the number of weights represents the
layer of the network depends on the nature of the
number of unknown. The equations are generally
problem to be solved. For the hidden layers of
nonlinear and very complex and so it is very difficult
neurons, sigmoid functions are preferred, because
to solve them exactly through conventional means.
they have the advantage of both non-linearity and the
Training algorithm aims precisely to find
differentially (prerequisite for applying the
approximate solutions to minimize errors. If the
backpropagation algorithm). The biggest influence of
network approximates the training set well, this is
a sigmoid on the performances of the algorithm
not a guarantee that it will find the same good
seems to be the symmetry of origin [1]. The bipolar
solutions for the data in another set, the testing set.
sigmoid is symmetrical to the origin, while the
Generalization implies the existence of regularities in
unipolar sigmoid is symmetrical to the point (0, 0.5),

ISSN: 1109-2734 583 Issue 7, Volume 8, July 2009

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

the data, of a model that can be learned. In analogy Example: We associate an input vector X=[1 –0.5]
with classical linear systems, this would mean some and a target vector T=[0.5 1] of size imposed by two
redundant equations. Thus, if the number of weights restrictions that can be reduced to two degrees of
is less than the number of test vectors, for a correct
freedom (the points W and the slopes B) of a single
approximation, the network must be based on
intrinsic patterns of data models, models which are Adaline neuron [9]. We suggest solving the linear
to be found in the test data as well. A heuristic rule system of 2 equations with 2 unknowns [12]:
states that the number of weights should be around
or below one tenth of the number of training vectors w+b=0.5, - 0.5w+b=1, (17)
and the number of exits. In some situations however obtaining in the end the solutions:
(e.g., if training data are relatively few), the number
1 5
of weights can be even half of the product. For a w= - and b = .
multilayer perceptron is considered that the number 3 6
of neurons in a layer must be sufficiently large so The Matlab program offers solutions obtained with
that this layer to provide three or more edges for the help of the Adaline neuron either by points or by
each convex region identified by the next layer [5]. slopes. Matlab program offers solutions obtained
So the number of neurons in a layer must be more using Adaline neuron, either by points or by slopes
than three times higher than that of the next layer. As [3], [7], [10], [21].
mentioned before, a sufficient number of weights
lead to under-fitting, while too many of the weights
leads to over-fitting, events presented in Figure 9.

Fig. 9: The capacity for the approximation of a neural

network based on the number of weights.
Fig. 10: The points (weight) and slopes (bias) of the
The same occurs if the number of training epochs is neuron identified as algebraic solutions.
too small or too large. A method of solving this
problem is stopping the training when you reach the
best generalization. For a network large enough, it
was verified experimentally that the training error 5 Implementation
decreases continuously, while the number of training In this section we will discuss some issues related to
epochs increases. However, for data different than practical implementation perceptron and algorithm
those from the training set, we find that the error of backpropagation.
decreases from the beginning up to a point until it Sigmoid. As I said above activation functions that
starts increasing again. That is why stopping the are most commonly used units are multi perceptrons
training must occur when the error for the validation type sigmoid. Other types of non-linearity have been
set is minimum [13]. This is done by dividing the tested once but their behaviour appears to be
training into two: about 90% of data will be used for generally inferior to those of sigmoid. In class
the training itself and the rest, called cross-validation sigmoid there are still wide choices. Feature sigmoid
set is used for the measurement of the error. Training that seem to have the greatest influence on the
stops when the error starts to increase for the cross- performance of learning algorithm is symmetry to
validation set, moment called the "point of maximum the home, while the logistics of the example is
generalization”. Depending on the network symmetric to a point of coordinates (0, 0.5).
performance at this time, then you can try different Symmetry to give the home a bipolar sigmoid which
configurations, lowering or increasing the number of normally tends to lead to error surfaces better
neurons in the intermediate layer (or layers). conditioned. Sigmoid as logistical curves tend to
induce the narrowest error function, which weakens

ISSN: 1109-2734 584 Issue 7, Volume 8, July 2009

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

the speed of learning procedure. will tend to congested facilities. The saturation
Output units and target values. Most practical derived nonlinear sigmoid is very small. Since these
applications of multi perceptrons can be divided in a derivatives act as a multiplier in the back
clear relative in two different classes. In a class of propagation, the relative weights derived entry unit
target outputs have a continuous range of values, and will be very small. The unit will be largely "locked"
the network is to make an operation of non-linear by learning very slow.
regression. Normal in this case is not convenient to If you put a unit of data and network are all the
put non-linearity in the output network. In fact we same radicals in the arithmetic average of the squares
are normally outputs that are able to cover the entire (rms) and are all independent of each other and the
range of possible target values, which is often higher weights are initialized in a fixed time when the rms
than the values sigmoid. I can well understand to sum of the entry unit will be proportional to fi 1/2,
scale output amplitudes sigmoid how but it is rarely where fi is the number of entries and the unit (often
any advantage relative to simple use of units with called fan-in of the unit). To maintain the rms sum of
non-linearity in output. Output units are said to be entries similar to each other, and to avoid saturation
linear. Simply get them to output the weighted sum of units with high fan-in, a parameter, controlling the
of the entries plus their term diagonal. size of the range boot, is sometimes varied from one
In another class, which includes mainly unit to another, making you = k/(fi) 1/2. There are
applications for classification and pattern recognition various options for the choice of k. Some prefer to
target outputs are binary, ie, take only 2 values. In initialize the weights so close to home, making it a
this case it is usual to use units of output by non- very small k (e.g. 0.01 to 0.1) and thus retain their
linearity sigmoid similar to other units in the units in the central line at the beginning of the
network. Binary target values that are most learning process. Others prefer high values of k (eg 1
appropriate depend on sigmoid used. Often target or higher), leading their units in the non-linear even
values are chosen to be equal to the 2 values of at the beginning of the learning process.
asymptote sigmoid (0 and 1 for logistics function and Decorrelation and normalization of entry. To
± 1 for the tanh and arctan scale). In this case gain consider the simplest network that can design one,
error to 0 units of output will need to obtain consists of a single linear unit. Networks with a
complete saturation ie the amount of entries should single linear unit (adalines) are used for a long time
become infinite. This would tend to lead weights of in the area of signal processing in discrete time.
these units to increase indefinitely in absolute value Filters with finite impulse response to (FIR) can
and slow the learning process. To improve the speed now be seen as single units without a diagonal line.
of learning is therefore usually used for target values Entries are consecutive samples input signal and
which are close but not equal to the asymptote of filter coefficients are the weights. Therefore,
sigmoid (eg 0.05 and 0.95 for the logistics and ± 0.9 adaptive filtering with FIR filters is an essential form
for the functions tanh and arctan scale). of learning in real-time networks with linear
Initializing share. Before you can start the networks. Therefore there is no surprise that the first
algorithm back-propagation is necessary to set the filtered adaptive algorithms were derived from the
weights of the network with some initial values. A delta rule [14]. It is well known in Adaptive filter
natural choice would be to initialize all with a value theory that learning is the fastest, because the error
of 0. So do not lean learning outcome in a particular is well-conditioned (no tub) if the entries are linear
direction. However it can be seen easily by applying units uncorrelated between them, which means that
the back propagation rule that if the initial weights <xixj>=0 for i≠j, and value equal squares
are all 0 gradient is 0 (except for those relating to 2 2
<xi >=<xj > for all i,j. Here <.> is expected value
share or links between units of input and output, if (often, when we learn perceptrons, the expected
such links exist in the network). Furthermore the value can be estimated by simply learning media
gradient components will always remain 0 during the set). If it is used also in diagonal line units, it act as
learning even if there are direct links. Therefore, it is a further input which is equal to 1. Which means
normally necessary to initialize the weights with that the square is 1, and therefore the squares of
different values of 0. The most common procedure is other entries must be all equal to 1? On the other
to initialize with random values drawn from a hand, cross correlation of other entries with the new
uniform distribution on a symmetric interval [-a, a]. entry is made simple and expected values of these
As mentioned above some independent learning entries. Which should be equal to 0, as with all
independent random initialization can be used to find cross-correlation between input:
the best minim cost function. It is understandable
that the large share (resulting in high values of a) <xi>=<xj>=0. (18)

ISSN: 1109-2734 585 Issue 7, Volume 8, July 2009

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

In conclusion, for a faster learning of a single unit ∂E m ∂E ∂wi m ∂E

with the diagonal line should be amended so that =∑ =∑ (21)
∂a i =1 ∂wi ∂a i =1 ∂wi
the process averages each component input is 0.
Derivatives appearing in the last line can be
<xi>=0, (19)
calculated by the normal procedure of dissemination
- back.
and components are normalized and decorrelating:
In conclusion, we should calculate derivatives
<xixj>=δij, (20) relative weight to each individual through the normal
method, and then use the amount to update them and
where δij is Kronecker symbol. thus to adjust all the weights together. Also we must
Experience revealed that this type of processing remember that the common weights are initialized
also tends to accelerate learning for multilayer with the same value.
perceptrons. Setting the components of the input 0
may be made simply by adding a constant suitable
for everyone. Decorelating can then be 6 Experimental results
accomplished by any of orthogonal, for example, Inside the vector control of an induction motor it can
the technique describe in [15]. Finally, the be implemented a cvasi-PI standard fuzzy controller.
normalization can be achieved by a suitable scaling The optimization criterion (absolute error
of each component. The hardest step is orthogonal,
integration) for such of controllers must guarantee
many people and once you jump, by setting the
average to 0 and 0 mean squares. This simplified the robustness of the system. This cvasi-PI fuzzy
process is usually designed as a normal entry; often controller replaces the speed classic controller from
increase the speed of learning networks. A the vector control schema of the driving system
technique developed to accelerate in May, ([11], [12]). A fuzzy control can be implemented
involving normalization and adaptive deco relation inside of a numerical control that involves the use of
input lines of the network is described in [16]. a digital signal processor DSP (for example, TMS
Common shares. In some cases one would like to
320C31). Taking account of the mathematical model
constrain some of the network weights to be equal
with others. This situation may occur, for example, if developed in [11], [12], can be implemented a cvasi-
we are to achieve the same kind of processing in PI standard fuzzy controller in the induction driving
different parts of the model input. It is a situation environment (Fig. 11).
often encountered in image processing, where some Lm
would like to detect the same feature in different J
dω
dt
( )
= M e − M 0 + k1ω + k2ω 2 Me = p2 Ψsqisq
, Lr
(22)
parts of the input image. An example in a binary
application is described in [17]. Two examples of where M0 is the constant component part of the static
situations with common shares will be described torque Ms; K1 and K2 are proportional constants; Me
below, the presentation of recurrent networks. The
is the electromagnetic torque ; ω is the angular
difficulty in linking manually split shares that is
speed; isq is the stator currents along the axes q; Lm is
payable even if the weights are initialized with the
the periodical mutual inductivity between the stator
same value, derivatives of common functions of the
and the rotor; Lm is the inductivity of the stator.
cost of each will generally be different between
them.
The solution is quite simple. Assume that we
collected all the weights in a weighting vector w =
(W1, W2, ....) T (where T means transposed), and I
share that first must be kept equal between them.
These weights are not actually arguments
independent of the cost function E. To maintain all
arguments function is independent, should replace all Fig. 11: Fuzzy logic controller.
of these weights with a single argument, with which
they are all equal weights. Then, as derived in part of Instead of the fuzzy controller [11] it is placed a
E should be calculated relative to, and not relative to neural controller which should have the learning
all the individual weights. possibility of the control surface of the fuzzy
But controller.

ISSN: 1109-2734 586 Issue 7, Volume 8, July 2009

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

Using Simulink structured schema [11], [12], the

vectors e, Δ e and Δ u are extracted in MATLAB
environment. They are introduced into a neural
network with four layers by activation functions of
sigmoid type (MATLAB/Neural Network Toolbox).
Finally, after the network activation and the vectors
e, Δ e and Δ u are passed throw the learning process
(after Marquardt-Levenberg method), will result the
solution towards the neural network converges
(learning surface of neural controller - Fig.12, Fig.
13). It can be observed that plate areas are reduced,
and the control surface peaks of fuzzy controller, in
a) dials 1 and 3, are no more outline by the neural
network of perceptron type. It would be necessary to
make an analysis in the e, Δ e phases plane because
only some points of surface are significant from
control point of view. It is interesting to know the
accurate value of the output increment, when are
analyzed some points of the surface remote from
reference point (e=0, Δ e=0), which is control main
objective.

7 Conclusion
Multilayer perceptrons are the most commonly used
types of neural networks. Using the backpropagation
b) algorithm for training, they can be used for a wide
Fig. 12: Control surface approximation of fuzzy range of applications, from the functional
controller by a network of neurons a) normalized approximation to prediction in various fields, such as
coordinates; b) actual values. estimating the load of a calculating system or
modelling the evolution of chemical reactions of
polymerization, described by complex systems of
differential equations. In implementing the
algorithm, there are a number of practical problems,
mostly related to the choice of the parameters and
network configuration. First, a small learning rate
leads to a slow convergence of the algorithm, while a
too high rate may cause failure (algorithm will
"jump" over the solution). Another problem
characteristic of this method of training is given by
a) local minimums. A neural network must be capable
of generalization.
The advantage of fuzzy logic controller will
disappear when comparing to a wind-up PI
controller, knowing that this is working in a linear.
On the other hand, a wind-up PI-controller does not
make any problems when the output variable reaches
the saturation value since the signal corresponding to
the difference between limited output and unlimited
output is once more fed to the controller for
b) desaturation. For a same control surface, the
Fig. 13: Driving on-load start-up, using a fuzzy advantage of using a neural controller consists in
controller (f) and a neural controller perceptron type (n) calculus time decreasing as against with that lost
with 4 layers: a) speed shape; b) stator current shape. when it is used a fuzzy controller with a bigger
number of linguistic labels.

ISSN: 1109-2734 587 Issue 7, Volume 8, July 2009

Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

References Carpathian Control Congress, pp.209-214,

[1] Almeida, L.B. Multilayer perceptrons, in Miskolc-Lillafured, Budapesta, 2005.
Handbook of Neural Computation, IOP [13] Popescu M.C., Reţele neuronale şi algoritmi
Publishing Ltd and Oxford University Press, genetici utilizaţi în optimizarea proceselor.
1997. Sesiunea Natională de Comunicări Stiinţifice.
[2] Bryson, A.E., Ho, Y.C. Applied Optimal Ediţia a IX-a. Secţiunea Matematică, Târgu-Jiu,
Control, Blaisdell, New York, 1969. noiembrie 24-25, 2001.
[3] Curteanu, S., Petrila, C., Ungureanu, Ş., Leon, F. [14] Popescu M.C., Balas V., Olaru O., Mastorakis
Genetic Algorithms and Neural Networks Used N., The Backpropagation Algorithm Functions
in Optimization of a Radical Polymerization for the Multilayer Perceptron, Proceedings of
Process, Buletinul Universităţii Petrol-Gaze din the 11th WSEAS International Conference on
Ploieşti, vol. LV, seria tehnică, nr.2, pp. 85-93, Sustainability in Science Engineering, pp.28-
2003. 31, Timisoara, Romania, may 27-29, 2009.
[4] Cybenko, G. Approximation by superpositions of [15] Popescu M.C,, Olaru O., Mastorakis N.,
a sigmoidal function, Math. Control, Signal Equilibrium Dynamic Systems Intelligence,
Syst. 2, pp.303-314, 1989. WSEAS Transactions on Information Science
[5] Dumitrescu, D., Costin, H. Reţele neuronale, and Applications, Issue 5, Volume 6, pp.725-
Teorie şi aplicaţii, Ed. Teora, Bucureşti, 1996. 735, May 2009.
[6] Haykin, S. Neural Networks: A Comprehensive [16] Popescu M.C., Olaru O, Mastorakis N.
Foundation, Maxmillan, IEEE Press, 1994. Equilibrium Dynamic Systems Integration,
[7] Leon, F., Gâlea, D., Zaharia, M. H. Load Proceedings of the 10th WSEAS Int. Conf. on
Balancing In Distributed Systems Using Automation & Information (ICAI '09), March
Cognitive Behavioural Models, Bulletin of 23-25, 2009.
Technical University of Iaşi, Tome XLVIII (LII), [17] Popescu M.C., Petrişor A., Drighiciu A., Fuzzy
fasc.1-4, 2002. Control Algorithm Implementation using
[8] Negnevitsky, M. Artificial Intelligence: A Guide LabWindows – Robot, WSEAS Transactions on
to Intelligent Systems, Addison Wiesley, Systems Journal, Issue 1, Volume 8, pp.117-
England, 2002. 126, January 2009,
[9] Popescu M.C., Hybrid neural network for [18] Principe, J.C., Euliano, N.R., Lefebvre, W.C.
prediction of process parameters in injection Neural and Adaptive Systems. Fundamentals
moulding, Annals of University of Petroşani, Through Simulations, John Wiley & Sons, Inc,
Electrical Engineering, Universities Publishing 2000.
House, Petroşani, Vol. 9, pp.312-319, 2007. [19] Rumelhart, D.E., Hinton, G.E., Williams, R.J.
[10] Popescu M.C., Olaru O, Mastorakis N. Learning representations by backpropagating
Equilibrium Dynamic Systems Integration errors, Nature 323, pp.533-536, 1986.
Proceedings of the 10th WSEAS Int. Conf. on [20] Silva, F.M., Almeida, L.B. Acceleration
Automation & Information, Prague, pp.424- techniques for the backpropagation algorithm
430, March 23-25, 2009. in L.B. Almeida, C.J. Wellekens (eds.), Neural
[11] Popescu M.C., Modelarea şi simularea Networks, Springer, Berlin, pp.110–19, 1990.
proceselor, Editura Universitaria Craiova, pp. [21] Werbos, P.J. The Roots of Backpropagation,
261-273, 2008. John Wiley & Sons, New York, 1974.
[12] Popescu M.C., Petrişor A. Neuro-fuzzy control
of induction driving, 6th International

ISSN: 1109-2734 588 Issue 7, Volume 8, July 2009

View publication stats

Ohs352 Project Report Notes
No ratings yet
Ohs352 Project Report Notes
67 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
51 pages
Module 3
No ratings yet
Module 3
83 pages
Unit 5
No ratings yet
Unit 5
35 pages
Aimlf Unit4
No ratings yet
Aimlf Unit4
20 pages
Unit-4 MLT
No ratings yet
Unit-4 MLT
105 pages
ML & AI Notes
No ratings yet
ML & AI Notes
81 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
82 pages
4.0 The Complete Guide To Artificial Neural Networks
No ratings yet
4.0 The Complete Guide To Artificial Neural Networks
23 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Cufsm Advanced Functions
No ratings yet
Cufsm Advanced Functions
34 pages
Multilayer Perceptron and Neural Networks
No ratings yet
Multilayer Perceptron and Neural Networks
11 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
Unit-4 Full
No ratings yet
Unit-4 Full
48 pages
ML-Lec10-Artificial Neural Networks
No ratings yet
ML-Lec10-Artificial Neural Networks
76 pages
Unit 6 Application of AI
No ratings yet
Unit 6 Application of AI
91 pages
Mod 2.1,2.2
No ratings yet
Mod 2.1,2.2
24 pages
Unit-Ii MLT1
No ratings yet
Unit-Ii MLT1
45 pages
Feed-Forward Multi-Layer Neural Networks: Definitions
No ratings yet
Feed-Forward Multi-Layer Neural Networks: Definitions
33 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Unit V
No ratings yet
Unit V
49 pages
ML Unit 2
No ratings yet
ML Unit 2
24 pages
Unit2 3 Notes
No ratings yet
Unit2 3 Notes
34 pages
Formula Sheet - ANOVA, Chi-Square & Regression
No ratings yet
Formula Sheet - ANOVA, Chi-Square & Regression
1 page
Unit 4
No ratings yet
Unit 4
38 pages
Basics
No ratings yet
Basics
48 pages
Dersnot 6452 1668688984
No ratings yet
Dersnot 6452 1668688984
36 pages
Lec 6-7 (Neural Networks)
No ratings yet
Lec 6-7 (Neural Networks)
26 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
Torque Values For Nut
No ratings yet
Torque Values For Nut
1 page
Neural Networks
No ratings yet
Neural Networks
19 pages
Aatman Ai New
No ratings yet
Aatman Ai New
11 pages
Aiml Unit 5
No ratings yet
Aiml Unit 5
34 pages
Unit 3 Endsem PYQs
No ratings yet
Unit 3 Endsem PYQs
19 pages
UNit 6 Machine Learning
No ratings yet
UNit 6 Machine Learning
23 pages
D. Inverse Trigonometric Functions: One-To-One Onto
No ratings yet
D. Inverse Trigonometric Functions: One-To-One Onto
69 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
eBook Tặng 1 - Em Tự Tin Vào Lớp 1 Với Mighty Math Singapore - Full - 224 Trang
No ratings yet
eBook Tặng 1 - Em Tự Tin Vào Lớp 1 Với Mighty Math Singapore - Full - 224 Trang
226 pages
CVDL Cae1
No ratings yet
CVDL Cae1
28 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
19 pages
A Brief Review of Feed-Forward Neural Networks
No ratings yet
A Brief Review of Feed-Forward Neural Networks
8 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
No ratings yet
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
25 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
Pattern Recognition in Neural Networks: T. Muthya Mounika, V.V. Vishnu Prabhakar
No ratings yet
Pattern Recognition in Neural Networks: T. Muthya Mounika, V.V. Vishnu Prabhakar
3 pages
Unit 2
No ratings yet
Unit 2
20 pages
10 04 Angle Between Two Curves PDF
No ratings yet
10 04 Angle Between Two Curves PDF
15 pages
Introduction Neural
No ratings yet
Introduction Neural
13 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
9 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Feedforward
No ratings yet
Feedforward
34 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
2024-26 - Jr.C-120 - Physics Teaching & Test Schedule With Class & Home Work
No ratings yet
2024-26 - Jr.C-120 - Physics Teaching & Test Schedule With Class & Home Work
30 pages
Multilayer Perceptron: Architecture Optimization and Training
No ratings yet
Multilayer Perceptron: Architecture Optimization and Training
5 pages
Multilayer Perceptron PDF
No ratings yet
Multilayer Perceptron PDF
5 pages
Supervised Learning Neural Networks
No ratings yet
Supervised Learning Neural Networks
34 pages
The Deep Neural Network-A Review
No ratings yet
The Deep Neural Network-A Review
5 pages
An Introduction To Standard and Enriched Isogeomet
No ratings yet
An Introduction To Standard and Enriched Isogeomet
93 pages
Unit 3
No ratings yet
Unit 3
8 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Neuro-Fuzzy Systems and Their Applications: Bogdan
No ratings yet
Neuro-Fuzzy Systems and Their Applications: Bogdan
15 pages
Week 1 - Nautical Charts and Its Classification Aprt 1
No ratings yet
Week 1 - Nautical Charts and Its Classification Aprt 1
21 pages
M911 G11 - Transformation Geometry
No ratings yet
M911 G11 - Transformation Geometry
12 pages
Object Classification Through Perceptron Model Using Labview
No ratings yet
Object Classification Through Perceptron Model Using Labview
4 pages
Notes On Mathematics of Quantum Mechanics: Sadi Turgut
No ratings yet
Notes On Mathematics of Quantum Mechanics: Sadi Turgut
56 pages
Introduction To Data Science With R Programming
No ratings yet
Introduction To Data Science With R Programming
12 pages
Pattern Recognition & Analysis Assignment - Ii
No ratings yet
Pattern Recognition & Analysis Assignment - Ii
19 pages
Understanding Multi-Layer Feed-Forward Neural Networks in Machine Learning
No ratings yet
Understanding Multi-Layer Feed-Forward Neural Networks in Machine Learning
4 pages
PD Iso TS 22762-4-2014
No ratings yet
PD Iso TS 22762-4-2014
40 pages
Pest Identification Using Matlab
100% (1)
Pest Identification Using Matlab
14 pages
Artificial Neural Network Application in Logic System: Siddharth Saxena TCET, Mumbai
No ratings yet
Artificial Neural Network Application in Logic System: Siddharth Saxena TCET, Mumbai
5 pages
Mini-Frac Analysis For Unconventional Reservoirs Using Fast Welltest 16-Aug-2013 0
100% (1)
Mini-Frac Analysis For Unconventional Reservoirs Using Fast Welltest 16-Aug-2013 0
44 pages
Umbrello Handbook X
No ratings yet
Umbrello Handbook X
41 pages
Lecture 4 - Metrology & Measurement
No ratings yet
Lecture 4 - Metrology & Measurement
15 pages
19 - Introduction To Neural Networks
No ratings yet
19 - Introduction To Neural Networks
7 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
67 pages
Practice For 2ND PT
No ratings yet
Practice For 2ND PT
3 pages
6.0 CX1104 Part2Introduction 28sep2022
No ratings yet
6.0 CX1104 Part2Introduction 28sep2022
6 pages
Mathematics Assignment Term 3 by Ashhal Ayubi & Ayman Mondal
No ratings yet
Mathematics Assignment Term 3 by Ashhal Ayubi & Ayman Mondal
10 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
ME2134 Summary Feedback
No ratings yet
ME2134 Summary Feedback
13 pages
Carabao Head Keyholder: Name of Product
No ratings yet
Carabao Head Keyholder: Name of Product
5 pages
Kat PDF
No ratings yet
Kat PDF
2 pages
Project Planning and Approval Worksheet
100% (2)
Project Planning and Approval Worksheet
8 pages
C Sharp Logical Test
No ratings yet
C Sharp Logical Test
6 pages
San Francisco Bread Co
No ratings yet
San Francisco Bread Co
3 pages
Tennessee: Free Preview Copies!
No ratings yet
Tennessee: Free Preview Copies!
16 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Multilayer Perceptron and Neural Networks

Uploaded by

Multilayer Perceptron and Neural Networks

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Multilayer perceptron and neural networks

Article in WSEAS Transactions on Circuits and Systems · July 2009

Valentina Emilia Balas Nikos E Mastorakis

SEE PROFILE SEE PROFILE

Environment View project

The user has requested enhancement of the downloaded file.

Multilayer Perceptron and Neural Networks

Key-Words:- Backpropagation algorithm, Gradient method, Multilayer perceptron, Induction driving.

1 Introduction strata, without any processing on the inputs. In what

ISSN: 1109-2734 579 Issue 7, Volume 8, July 2009

Fig. 4: Sigmoid single-pole activation function.

This justifies the property of the multilayer

Fig. 5: A multi-unit perceptron.

Fig. 3: Sigmoid single-pole activation function.

And the bipolar sigmoid (the hyperbolic tangent)

ISSN: 1109-2734 580 Issue 7, Volume 8, July 2009

ISSN: 1109-2734 581 Issue 7, Volume 8, July 2009

⎛m ⎞ Step 5: A new iteration.

Further, let’s suppose that the function utilized is the

4.2 The gradients for the weights between the hidden

Fig. 8: The evolution of the sum of squared errors.

ISSN: 1109-2734 582 Issue 7, Volume 8, July 2009

ISSN: 1109-2734 583 Issue 7, Volume 8, July 2009

Fig. 9: The capacity for the approximation of a neural

ISSN: 1109-2734 584 Issue 7, Volume 8, July 2009

ISSN: 1109-2734 585 Issue 7, Volume 8, July 2009

In conclusion, for a faster learning of a single unit ∂E m ∂E ∂wi m ∂E

ISSN: 1109-2734 586 Issue 7, Volume 8, July 2009

Using Simulink structured schema [11], [12], the

ISSN: 1109-2734 587 Issue 7, Volume 8, July 2009

References Carpathian Control Congress, pp.209-214,

ISSN: 1109-2734 588 Issue 7, Volume 8, July 2009

View publication stats

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.