0% found this document useful (0 votes)
27 views11 pages

Multilayer Perceptron and Neural Networks

Uploaded by

Raquel Gómez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views11 pages

Multilayer Perceptron and Neural Networks

Uploaded by

Raquel Gómez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228340819

Multilayer perceptron and neural networks

Article in WSEAS Transactions on Circuits and Systems · July 2009

CITATIONS READS

242 25,066

4 authors, including:

Valentina Emilia Balas Nikos E Mastorakis


Aurel Vlaicu University of Arad Technical University of Sofia
511 PUBLICATIONS 4,392 CITATIONS 970 PUBLICATIONS 5,528 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Environment View project

Call for Book Chapters: Security and Privacy for Internet of Medical Things (IoMT) View project

All content following this page was uploaded by Nikos E Mastorakis on 28 September 2016.

The user has requested enhancement of the downloaded file.


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

Multilayer Perceptron and Neural Networks


MARIUS-CONSTANTIN POPESCU1 VALENTINA E. BALAS2
3
LILIANA PERESCU-POPESCU NIKOS MASTORAKIS4
Faculty of Electromechanical and Environmental Engineering, University of Craiova1
Faculty of Engineering, “Aurel Vlaicu” University of Arad2
“Elena Cuza” College of Craiova3
ROMANIA,
Technical University of Sofia4
BULGARIA.
popescu.marius.c@gmail.com balas@inext.ro mastor@wses.org

Abstract: - The attempts for solving linear inseparable problems have led to different variations on the number
of layers of neurons and activation functions used. The backpropagation algorithm is the most known and used
supervised learning algorithm. Also called the generalized delta algorithm because it expands the training way
of the adaline network, it is based on minimizing the difference between the desired output and the actual
output, through the downward gradient method (the gradient tells us how a function varies in different
directions). Training a multilayer perceptron is often quite slow, requiring thousands or tens of thousands of
epochs for complex problems. The best known methods to accelerate learning are: the momentum method and
applying a variable learning rate. The paper presents the possibility to control the induction driving using neural
systems.

Key-Words:- Backpropagation algorithm, Gradient method, Multilayer perceptron, Induction driving.

1 Introduction strata, without any processing on the inputs. In what


The multilayer perceptron is the most known and follows, we will count only the layers consisting of
most frequently used type of neural network. On stand-alone neurons, but we will mention that the
most occasions, the signals are transmitted within the inputs are grouped in the input layer. There are also
network in one direction: from input to output. There feed-back networks, which can transmit impulses in
is no loop, the output of each neuron does not affect both directions, due to reaction connections in the
the neuron itself. This architecture is called feed- network. These types of networks are very powerful
forward (Fig.1). and can be extremely complicated. They are
dynamic, changing their condition all the time, until
the network reaches an equilibrium state, and the
search for a new balance occurs with each input
change. Introduction of several layers was
determined by the need to increase the complexity of
decision regions. As shown in the previous
paragraph, a perceptron with a single layer and one
input generates decision regions under the form of
semi planes. By adding another layer, each neuron
acts as a standard perceptron for the outputs of the
neurons in the anterior layer, thus the output of the
network can estimate convex decision regions,
Fig. 1: Neural network feed-forward multilayer. resulting from the intersection of the semi planes
Layers which are not directly connected to the generated by the neurons. In turn, a three-layer
environment are called hidden. In the reference perceptron can generate arbitrary decision areas
material, there is a controversy regarding the first (Fig.2). Regarding the activation function of
layer (the input layer) being considered as a stand- neurons, it was found that multilayer networks do
alone (itself a) layer in the network, since its only not provide an increase in computing power
function is to transmit the input signals to the upper compared to networks with a single layer, if the

ISSN: 1109-2734 579 Issue 7, Volume 8, July 2009


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

activation functions are linear, because a linear argument and are saturated, somewhat taking over
function of linear functions is also a linear function. the role of threshold for high absolute values of the
argument. It has been shown [4] that a network
(possibly infinite) with one hidden layer is able to
approximate any continuous function.

Fig. 4: Sigmoid single-pole activation function.

This justifies the property of the multilayer


perceptron to act as a universal approximator. Also,
by applying the Stone-Weierstrass theorem in the
neural network, it was demonstrated that they can
calculate certain polynomial expressions: if there are
two networks that calculate exactly two functions f1,
namely f2, then there is a larger network that
calculates exactly a polynomial expression of f1 and
Fig. 2: Decision regions of multilayer perceptrons. f2. Multi Perceptron is the best known and most used
type of neural networks are trained units of the type
The power of the multilayer perceptron comes shown in Fig. 5. Each of these units forms a
precisely from non-linear activation functions. weighted sum of its inputs to which are added a
Almost any non-linear function can be used for this constant. This amount is then passed through a non-
purpose, except for polynomial functions. Currently, linear function which is often called the activation
the functions most commonly used today are the function. Most units are interconnected in a manner
single-pole (or logistic) sigmoid, shown in Figure 3: "feed forward" ie interconnections which form a
loop as shown in Fig. 6.
1
f (s) = . (1)
1 + e− s

Fig. 5: A multi-unit perceptron.

Fig. 3: Sigmoid single-pole activation function.

And the bipolar sigmoid (the hyperbolic tangent)


function, shown in Figure 4, for a=2:

1 − e − a⋅ s
f ( s) = . (2)
1 + e − a⋅ s Fig. 6: Example network "feed forward". Each circle
represents a unit of the type shown in Figure 6.
It may be noted that the sigmoid functions act Each connection between units is a share. Each
approximately linear for small absolute values of the unit also has an entry in the diagonal are not shown.

ISSN: 1109-2734 580 Issue 7, Volume 8, July 2009


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

For some types of applications recurrent networks (ie 2 The backpropagation algorithm
not "feed forward"), in which some interconnections Learning networks is typically achieved through a
forming loop, are also used. I have seen in Figure 6 supervised manner. It can be assumed to be available
an example of feed forward network. As mentioned a learning environment that contains both the
interconnections units of this type of network does a learning models and models of desired output
not form loop, so the network is called feed forward. corresponding to input (this is known as "target
Networks in which there is one or more loops of models"). As we will see, learning is typically based
interconnections as represented in Figure 7.a shall on the minimization of measurement errors between
appoint recurring between the units has a share. Each network outputs and desired outputs. This implies a
unit also has an entry in the diagonal are not shown. back propagation through a network similar to that
which is learned. For this reason algorithm learning
is called back-propagation. The method was first
proposed by [2], but at that time it was virtually
ignored, because it supposed volume calculations too
large for that time. It was then rediscovered by [20],
but only in the mid-'80s was launched by Williams
[18] as a generally accepted tool for training of the
multilayer perceptron. The idea is to find the
a)
minimum error function e(w) in relation to the
connections weights. The algorithm for a multilayer
perceptron with a hidden layer is the following [8]:
Step 1: Initializing. All network weights and
thresholds are initialized with random values,
distributed evenly in a small range, for example
⎛ − 2.4 2.4 ⎞
⎜ ⎟
⎜ F , F ⎟ , where Fi is the total number of inputs
⎝ i i ⎠
b) of the neuron i [6]. If these values are 0, the
gradients which will be calculated during the trial
will be also 0 (if there is no direct link between input
and output) and the network will not learn. More
training attempts are indicated, with different initial
weights, to find the best value for the cost function
(minimum error). Conversely, if initial values are
large, they tend to saturate these units. In this case,
c) derived sigmoid function is very small. It acts as a
multiplier factor during the learning process and thus
the saturated units will be nearly blocked, which
makes learning very slow.
Step 2: A new era of training. An era means
presenting all the examples in the training set. In
most cases, training the network involves more
d)
training epochs. To maintain mathematical rigor, the
Fig. 7: Common types of networks: a) a recurrent
weights will be adjusted only after all the test vectors
network; b) a stratified network; c) a network with
links between units of input and output; d) a feed
will be applied to the network. Therefore, the
forward network fully connected. gradients of the weights must be memorized and
adjusted after each model in the training set, and the
In feed forward networks, units are usually arranged end of an epoch of training, the weights will be
in levels (layers) as in Figure 7.b but other topologies changed only one time (there is an „on-line” variant,
can be used. Figure 7.c shows a type of network that more simple, in which the weights are updated
is useful in some applications in which direct links directly, in this case, the order in which the vectors
between units of input and output are used. Figure of the network are presented might matter.
7.d shows a network with 3 units which is fully All the gradients of the weights and the current error
connected i.e. that all interconnections are allowed to are initialized with 0 (Δwij = 0 and E = 0).
feed restriction forward. Step 3: The forward propagation of the signal

ISSN: 1109-2734 581 Issue 7, Volume 8, July 2009


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

3.1 An example from the training set is applied to the 4.3 The gradients of the errors for the neurons in the
to the inputs. hidden layer are calculated:
3.2 The outputs of the neurons from the hidden layer
( )
l
are calculated: δ j ( p) = y j ( p) ⋅ 1 − y j ( p) ⋅ ∑ δk ( p) ⋅ w jk ( p) , (11)
⎛ n ⎞ k =1
y j ( p) = f ⎜ ∑ xi ( p) ⋅ wij − θ j ⎟ , (3)
⎝ i =1 ⎠ where l is the number of outputs for the network.
where n is the number of inputs for the neuron j from 4.4 The gradients of the weights between the input
the hidden layer, and f is the sigmoid activation layer and the hidden layer are updated:
function.
Δwij ( p) = Δwij ( p) + xi ( p ) ⋅ δ j ( p ) . (12)
3.3 The real outputs of the network are calculated:

⎛m ⎞ Step 5: A new iteration.


yk ( p) = f ⎜ ∑ x jk ( p) ⋅ w jk ( p) − θ k ⎟ , (4) If there are still test vectors in the current training
⎝ i =1 ⎠ epoch, pass to step 3. If not, the weights all the
where m is the number of inputs for the neuron k
connections will be updated based on the gradients
from the output layer.
of the weights:
3.4 The error per epoch is updated: wij = wij + η ⋅ Δwij , (13)

E=E+
(ek ( p ) )2 where η is the learning rate.
. (5)
2 If an epoch is completed, we test if it fulfils the
criterion for termination (E<Emax or a maximum
Step 4: The backward propagation of the errors number of training epochs has been reached).
and the adjustments of the weights. If not, we pass to step 2. If yes, the algorithm ends.
4.1 The gradients of the errors for the neurons in the Example: MATLAB program [11] allows the
output layer are calculated: generation of a logical OR functions, which means
that the perceptron separates the classes of 0 from
δ k ( p) = f '⋅ek ( p) , (6) the classes of 1. Obtaining in the Matlab work space:
where f’ is the derived function for the activation, epoch:1SSE:3
and the error ek ( p ) = yd , k ( p ) − yk ( p) . epoch:2SSE:1
epoch:3SSE:1 epoch:4SSE:0
If we use the single-pole sigmoid (equation 1, its Test on the lot [0 1] s =1
derived is:
e− x After the fourth iteration, the perceptron separates
f ' ( x) = = f ( x) ⋅ (1 − f ( x) ) . (7)
(
1 + e− x
2
) two classes (0 and 1) by a line. After the fourth
iteration the perceptron separates by a line two
classes (0 and 1). The percepton was tested in the
If we use the bipolar sigmoid (equation 2, its derived
is: ⎡0 ⎤
presence of the vector input ⎢ ⎥ .
⎣1⎦
2a ⋅ e−a⋅ x a
f ' ( x) = = ⋅ (1 − f ( x)) ⋅ (1 + f ( x)) . (8)
(1 + e )
− a⋅ x 2 2

Further, let’s suppose that the function utilized is the


single-pole sigmoid. Then the equation (6) becomes:

δ k ( p ) = yk ( p ) ⋅ (1 − yk ( p ) ) ⋅ ek ( p) . (9)

4.2 The gradients for the weights between the hidden


layer and the output layer are updated:

Δw jk ( p) = Δw jk ( p ) + y j ( p ) ⋅ δ k ( p) . (10)

Fig. 8: The evolution of the sum of squared errors.

ISSN: 1109-2734 582 Issue 7, Volume 8, July 2009


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

The perceptron makes the logic OR function for which decreases the speed of convergence. For the
which the classes are linearly separable; that is one output neurons, the activation functions adapted to
of the conditions of the perceptron. If the previous the distribution of the output data are recommended.
programs is performed for the exclusive OR Therefore, for problems of the binary classification
function, we will observe that, for any of the two (0/1), the single-pole sigmoid is appropriate. For a
classes, there is no line to allow the separation into classification with n classes, each corresponding to a
two classes (0 and 1). binary output of the network (for example, an
application of optical character recognition), the
softmax extension of the single-pole sigmoid may be
3 Methods to accelerate the learning used.
The momentum method [18] proposes adding a term
to adjust weights. This term is proportional to the last e yk
amendment of the weight, i.e. the values with which y k' = n . (16)
yi
the weights are adjusted are stored and they directly ∑e
influence all further adjustments: i =1

Δwij ( p) = Δwij ( p) + α ⋅ Δwij ( p − 1) . (14) For continuous values, we can make a pre-processing
and a post processing of data, so that the network
Adding a new term is done after the update of the will operate with scaled values, for example in the
gradients for the weights from equations 10 and 12. range [-0.9, 0.9] for the hyperbolic tangent. Also, for
The method of variable learning rate [19] is to use an continuous values, the activation function of the
individual learning rate for each weight and adapt output neurons may be linear, especially if there are
these parameters in each iteration, depending on the no known limits for the range in which these can be
successive signs of the gradients [9]: found. In a local minimum, the gradients of the error
become 0 and the learning no longer continues. A
⎧⎪u ⋅ ηij ( p −1),sgn(Δwij ( p)) = sgn(Δwij ( p −1)) ⎫⎪ solution is multiple independent trials, with weights
ηij ( p) = ⎨ ⎬
(15) initialized differently at the beginning, which raises
⎪⎩d ⋅ ηij ( p −1),sgn(Δwij ( p)) = −sgn(Δwij ( p −1))⎪⎭
the probability of finding the global minimum. For
large problems, this thing can be hard to achieve and
If during the training the error starts to increase, then local minimums may be accepted, with the
rather than decrease, the learning rates are reset to condition that the errors are small enough. Also,
initial values and then the process continues. different configurations of the network might be
tried, with a larger number of neurons in the hidden
layer or with more hidden layers, which in general
4 Practical considerations of working lead to smaller local minimums. Still, although local
with multilayer perceptrons minimums are indeed a problem, practically they are
For relatively simple problems, a learning rate of not unsolvable. An important issue is the choice of
η = 0.7 is acceptable, but in general it is the best configuration for the network in terms of
number of neurons in hidden layers. In most
recommended the learning rate to be around 0.2. To
situations, a single hidden layer is sufficient. There
accelerate through the momentum method, a
are no precise rules for choosing the number of
satisfactory value for α is 0.9. If the learning rate is
neurons. In general, the network can be seen as a
variable, typical values that work well in most
system in which the number of test vectors
situations are u = 1.2 and d = 0.8.
multiplied by the number of outputs is the number of
Choosing the activation function for the output
equations and the number of weights represents the
layer of the network depends on the nature of the
number of unknown. The equations are generally
problem to be solved. For the hidden layers of
nonlinear and very complex and so it is very difficult
neurons, sigmoid functions are preferred, because
to solve them exactly through conventional means.
they have the advantage of both non-linearity and the
Training algorithm aims precisely to find
differentially (prerequisite for applying the
approximate solutions to minimize errors. If the
backpropagation algorithm). The biggest influence of
network approximates the training set well, this is
a sigmoid on the performances of the algorithm
not a guarantee that it will find the same good
seems to be the symmetry of origin [1]. The bipolar
solutions for the data in another set, the testing set.
sigmoid is symmetrical to the origin, while the
Generalization implies the existence of regularities in
unipolar sigmoid is symmetrical to the point (0, 0.5),

ISSN: 1109-2734 583 Issue 7, Volume 8, July 2009


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

the data, of a model that can be learned. In analogy Example: We associate an input vector X=[1 –0.5]
with classical linear systems, this would mean some and a target vector T=[0.5 1] of size imposed by two
redundant equations. Thus, if the number of weights restrictions that can be reduced to two degrees of
is less than the number of test vectors, for a correct
freedom (the points W and the slopes B) of a single
approximation, the network must be based on
intrinsic patterns of data models, models which are Adaline neuron [9]. We suggest solving the linear
to be found in the test data as well. A heuristic rule system of 2 equations with 2 unknowns [12]:
states that the number of weights should be around
or below one tenth of the number of training vectors w+b=0.5, - 0.5w+b=1, (17)
and the number of exits. In some situations however obtaining in the end the solutions:
(e.g., if training data are relatively few), the number
1 5
of weights can be even half of the product. For a w= - and b = .
multilayer perceptron is considered that the number 3 6
of neurons in a layer must be sufficiently large so The Matlab program offers solutions obtained with
that this layer to provide three or more edges for the help of the Adaline neuron either by points or by
each convex region identified by the next layer [5]. slopes. Matlab program offers solutions obtained
So the number of neurons in a layer must be more using Adaline neuron, either by points or by slopes
than three times higher than that of the next layer. As [3], [7], [10], [21].
mentioned before, a sufficient number of weights
lead to under-fitting, while too many of the weights
leads to over-fitting, events presented in Figure 9.

Fig. 9: The capacity for the approximation of a neural


network based on the number of weights.
Fig. 10: The points (weight) and slopes (bias) of the
The same occurs if the number of training epochs is neuron identified as algebraic solutions.
too small or too large. A method of solving this
problem is stopping the training when you reach the
best generalization. For a network large enough, it
was verified experimentally that the training error 5 Implementation
decreases continuously, while the number of training In this section we will discuss some issues related to
epochs increases. However, for data different than practical implementation perceptron and algorithm
those from the training set, we find that the error of backpropagation.
decreases from the beginning up to a point until it Sigmoid. As I said above activation functions that
starts increasing again. That is why stopping the are most commonly used units are multi perceptrons
training must occur when the error for the validation type sigmoid. Other types of non-linearity have been
set is minimum [13]. This is done by dividing the tested once but their behaviour appears to be
training into two: about 90% of data will be used for generally inferior to those of sigmoid. In class
the training itself and the rest, called cross-validation sigmoid there are still wide choices. Feature sigmoid
set is used for the measurement of the error. Training that seem to have the greatest influence on the
stops when the error starts to increase for the cross- performance of learning algorithm is symmetry to
validation set, moment called the "point of maximum the home, while the logistics of the example is
generalization”. Depending on the network symmetric to a point of coordinates (0, 0.5).
performance at this time, then you can try different Symmetry to give the home a bipolar sigmoid which
configurations, lowering or increasing the number of normally tends to lead to error surfaces better
neurons in the intermediate layer (or layers). conditioned. Sigmoid as logistical curves tend to
induce the narrowest error function, which weakens

ISSN: 1109-2734 584 Issue 7, Volume 8, July 2009


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

the speed of learning procedure. will tend to congested facilities. The saturation
Output units and target values. Most practical derived nonlinear sigmoid is very small. Since these
applications of multi perceptrons can be divided in a derivatives act as a multiplier in the back
clear relative in two different classes. In a class of propagation, the relative weights derived entry unit
target outputs have a continuous range of values, and will be very small. The unit will be largely "locked"
the network is to make an operation of non-linear by learning very slow.
regression. Normal in this case is not convenient to If you put a unit of data and network are all the
put non-linearity in the output network. In fact we same radicals in the arithmetic average of the squares
are normally outputs that are able to cover the entire (rms) and are all independent of each other and the
range of possible target values, which is often higher weights are initialized in a fixed time when the rms
than the values sigmoid. I can well understand to sum of the entry unit will be proportional to fi 1/2,
scale output amplitudes sigmoid how but it is rarely where fi is the number of entries and the unit (often
any advantage relative to simple use of units with called fan-in of the unit). To maintain the rms sum of
non-linearity in output. Output units are said to be entries similar to each other, and to avoid saturation
linear. Simply get them to output the weighted sum of units with high fan-in, a parameter, controlling the
of the entries plus their term diagonal. size of the range boot, is sometimes varied from one
In another class, which includes mainly unit to another, making you = k/(fi) 1/2. There are
applications for classification and pattern recognition various options for the choice of k. Some prefer to
target outputs are binary, ie, take only 2 values. In initialize the weights so close to home, making it a
this case it is usual to use units of output by non- very small k (e.g. 0.01 to 0.1) and thus retain their
linearity sigmoid similar to other units in the units in the central line at the beginning of the
network. Binary target values that are most learning process. Others prefer high values of k (eg 1
appropriate depend on sigmoid used. Often target or higher), leading their units in the non-linear even
values are chosen to be equal to the 2 values of at the beginning of the learning process.
asymptote sigmoid (0 and 1 for logistics function and Decorrelation and normalization of entry. To
± 1 for the tanh and arctan scale). In this case gain consider the simplest network that can design one,
error to 0 units of output will need to obtain consists of a single linear unit. Networks with a
complete saturation ie the amount of entries should single linear unit (adalines) are used for a long time
become infinite. This would tend to lead weights of in the area of signal processing in discrete time.
these units to increase indefinitely in absolute value Filters with finite impulse response to (FIR) can
and slow the learning process. To improve the speed now be seen as single units without a diagonal line.
of learning is therefore usually used for target values Entries are consecutive samples input signal and
which are close but not equal to the asymptote of filter coefficients are the weights. Therefore,
sigmoid (eg 0.05 and 0.95 for the logistics and ± 0.9 adaptive filtering with FIR filters is an essential form
for the functions tanh and arctan scale). of learning in real-time networks with linear
Initializing share. Before you can start the networks. Therefore there is no surprise that the first
algorithm back-propagation is necessary to set the filtered adaptive algorithms were derived from the
weights of the network with some initial values. A delta rule [14]. It is well known in Adaptive filter
natural choice would be to initialize all with a value theory that learning is the fastest, because the error
of 0. So do not lean learning outcome in a particular is well-conditioned (no tub) if the entries are linear
direction. However it can be seen easily by applying units uncorrelated between them, which means that
the back propagation rule that if the initial weights <xixj>=0 for i≠j, and value equal squares
are all 0 gradient is 0 (except for those relating to 2 2
<xi >=<xj > for all i,j. Here <.> is expected value
share or links between units of input and output, if (often, when we learn perceptrons, the expected
such links exist in the network). Furthermore the value can be estimated by simply learning media
gradient components will always remain 0 during the set). If it is used also in diagonal line units, it act as
learning even if there are direct links. Therefore, it is a further input which is equal to 1. Which means
normally necessary to initialize the weights with that the square is 1, and therefore the squares of
different values of 0. The most common procedure is other entries must be all equal to 1? On the other
to initialize with random values drawn from a hand, cross correlation of other entries with the new
uniform distribution on a symmetric interval [-a, a]. entry is made simple and expected values of these
As mentioned above some independent learning entries. Which should be equal to 0, as with all
independent random initialization can be used to find cross-correlation between input:
the best minim cost function. It is understandable
that the large share (resulting in high values of a) <xi>=<xj>=0. (18)

ISSN: 1109-2734 585 Issue 7, Volume 8, July 2009


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

In conclusion, for a faster learning of a single unit ∂E m ∂E ∂wi m ∂E


with the diagonal line should be amended so that =∑ =∑ (21)
∂a i =1 ∂wi ∂a i =1 ∂wi
the process averages each component input is 0.
Derivatives appearing in the last line can be
<xi>=0, (19)
calculated by the normal procedure of dissemination
- back.
and components are normalized and decorrelating:
In conclusion, we should calculate derivatives
<xixj>=δij, (20) relative weight to each individual through the normal
method, and then use the amount to update them and
where δij is Kronecker symbol. thus to adjust all the weights together. Also we must
Experience revealed that this type of processing remember that the common weights are initialized
also tends to accelerate learning for multilayer with the same value.
perceptrons. Setting the components of the input 0
may be made simply by adding a constant suitable
for everyone. Decorelating can then be 6 Experimental results
accomplished by any of orthogonal, for example, Inside the vector control of an induction motor it can
the technique describe in [15]. Finally, the be implemented a cvasi-PI standard fuzzy controller.
normalization can be achieved by a suitable scaling The optimization criterion (absolute error
of each component. The hardest step is orthogonal,
integration) for such of controllers must guarantee
many people and once you jump, by setting the
average to 0 and 0 mean squares. This simplified the robustness of the system. This cvasi-PI fuzzy
process is usually designed as a normal entry; often controller replaces the speed classic controller from
increase the speed of learning networks. A the vector control schema of the driving system
technique developed to accelerate in May, ([11], [12]). A fuzzy control can be implemented
involving normalization and adaptive deco relation inside of a numerical control that involves the use of
input lines of the network is described in [16]. a digital signal processor DSP (for example, TMS
Common shares. In some cases one would like to
320C31). Taking account of the mathematical model
constrain some of the network weights to be equal
with others. This situation may occur, for example, if developed in [11], [12], can be implemented a cvasi-
we are to achieve the same kind of processing in PI standard fuzzy controller in the induction driving
different parts of the model input. It is a situation environment (Fig. 11).
often encountered in image processing, where some Lm
would like to detect the same feature in different J

dt
( )
= M e − M 0 + k1ω + k2ω 2 Me = p2 Ψsqisq
, Lr
(22)
parts of the input image. An example in a binary
application is described in [17]. Two examples of where M0 is the constant component part of the static
situations with common shares will be described torque Ms; K1 and K2 are proportional constants; Me
below, the presentation of recurrent networks. The
is the electromagnetic torque ; ω is the angular
difficulty in linking manually split shares that is
speed; isq is the stator currents along the axes q; Lm is
payable even if the weights are initialized with the
the periodical mutual inductivity between the stator
same value, derivatives of common functions of the
and the rotor; Lm is the inductivity of the stator.
cost of each will generally be different between
them.
The solution is quite simple. Assume that we
collected all the weights in a weighting vector w =
(W1, W2, ....) T (where T means transposed), and I
share that first must be kept equal between them.
These weights are not actually arguments
independent of the cost function E. To maintain all
arguments function is independent, should replace all Fig. 11: Fuzzy logic controller.
of these weights with a single argument, with which
they are all equal weights. Then, as derived in part of Instead of the fuzzy controller [11] it is placed a
E should be calculated relative to, and not relative to neural controller which should have the learning
all the individual weights. possibility of the control surface of the fuzzy
But controller.

ISSN: 1109-2734 586 Issue 7, Volume 8, July 2009


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

Using Simulink structured schema [11], [12], the


vectors e, Δ e and Δ u are extracted in MATLAB
environment. They are introduced into a neural
network with four layers by activation functions of
sigmoid type (MATLAB/Neural Network Toolbox).
Finally, after the network activation and the vectors
e, Δ e and Δ u are passed throw the learning process
(after Marquardt-Levenberg method), will result the
solution towards the neural network converges
(learning surface of neural controller - Fig.12, Fig.
13). It can be observed that plate areas are reduced,
and the control surface peaks of fuzzy controller, in
a) dials 1 and 3, are no more outline by the neural
network of perceptron type. It would be necessary to
make an analysis in the e, Δ e phases plane because
only some points of surface are significant from
control point of view. It is interesting to know the
accurate value of the output increment, when are
analyzed some points of the surface remote from
reference point (e=0, Δ e=0), which is control main
objective.

7 Conclusion
Multilayer perceptrons are the most commonly used
types of neural networks. Using the backpropagation
b) algorithm for training, they can be used for a wide
Fig. 12: Control surface approximation of fuzzy range of applications, from the functional
controller by a network of neurons a) normalized approximation to prediction in various fields, such as
coordinates; b) actual values. estimating the load of a calculating system or
modelling the evolution of chemical reactions of
polymerization, described by complex systems of
differential equations. In implementing the
algorithm, there are a number of practical problems,
mostly related to the choice of the parameters and
network configuration. First, a small learning rate
leads to a slow convergence of the algorithm, while a
too high rate may cause failure (algorithm will
"jump" over the solution). Another problem
characteristic of this method of training is given by
a) local minimums. A neural network must be capable
of generalization.
The advantage of fuzzy logic controller will
disappear when comparing to a wind-up PI
controller, knowing that this is working in a linear.
On the other hand, a wind-up PI-controller does not
make any problems when the output variable reaches
the saturation value since the signal corresponding to
the difference between limited output and unlimited
output is once more fed to the controller for
b) desaturation. For a same control surface, the
Fig. 13: Driving on-load start-up, using a fuzzy advantage of using a neural controller consists in
controller (f) and a neural controller perceptron type (n) calculus time decreasing as against with that lost
with 4 layers: a) speed shape; b) stator current shape. when it is used a fuzzy controller with a bigger
number of linguistic labels.

ISSN: 1109-2734 587 Issue 7, Volume 8, July 2009


Marius-Constantin Popescu, Valentina E. Balas,
WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Liliana Perescu-Popescu, Nikos Mastorakis

References Carpathian Control Congress, pp.209-214,


[1] Almeida, L.B. Multilayer perceptrons, in Miskolc-Lillafured, Budapesta, 2005.
Handbook of Neural Computation, IOP [13] Popescu M.C., Reţele neuronale şi algoritmi
Publishing Ltd and Oxford University Press, genetici utilizaţi în optimizarea proceselor.
1997. Sesiunea Natională de Comunicări Stiinţifice.
[2] Bryson, A.E., Ho, Y.C. Applied Optimal Ediţia a IX-a. Secţiunea Matematică, Târgu-Jiu,
Control, Blaisdell, New York, 1969. noiembrie 24-25, 2001.
[3] Curteanu, S., Petrila, C., Ungureanu, Ş., Leon, F. [14] Popescu M.C., Balas V., Olaru O., Mastorakis
Genetic Algorithms and Neural Networks Used N., The Backpropagation Algorithm Functions
in Optimization of a Radical Polymerization for the Multilayer Perceptron, Proceedings of
Process, Buletinul Universităţii Petrol-Gaze din the 11th WSEAS International Conference on
Ploieşti, vol. LV, seria tehnică, nr.2, pp. 85-93, Sustainability in Science Engineering, pp.28-
2003. 31, Timisoara, Romania, may 27-29, 2009.
[4] Cybenko, G. Approximation by superpositions of [15] Popescu M.C,, Olaru O., Mastorakis N.,
a sigmoidal function, Math. Control, Signal Equilibrium Dynamic Systems Intelligence,
Syst. 2, pp.303-314, 1989. WSEAS Transactions on Information Science
[5] Dumitrescu, D., Costin, H. Reţele neuronale, and Applications, Issue 5, Volume 6, pp.725-
Teorie şi aplicaţii, Ed. Teora, Bucureşti, 1996. 735, May 2009.
[6] Haykin, S. Neural Networks: A Comprehensive [16] Popescu M.C., Olaru O, Mastorakis N.
Foundation, Maxmillan, IEEE Press, 1994. Equilibrium Dynamic Systems Integration,
[7] Leon, F., Gâlea, D., Zaharia, M. H. Load Proceedings of the 10th WSEAS Int. Conf. on
Balancing In Distributed Systems Using Automation & Information (ICAI '09), March
Cognitive Behavioural Models, Bulletin of 23-25, 2009.
Technical University of Iaşi, Tome XLVIII (LII), [17] Popescu M.C., Petrişor A., Drighiciu A., Fuzzy
fasc.1-4, 2002. Control Algorithm Implementation using
[8] Negnevitsky, M. Artificial Intelligence: A Guide LabWindows – Robot, WSEAS Transactions on
to Intelligent Systems, Addison Wiesley, Systems Journal, Issue 1, Volume 8, pp.117-
England, 2002. 126, January 2009,
[9] Popescu M.C., Hybrid neural network for [18] Principe, J.C., Euliano, N.R., Lefebvre, W.C.
prediction of process parameters in injection Neural and Adaptive Systems. Fundamentals
moulding, Annals of University of Petroşani, Through Simulations, John Wiley & Sons, Inc,
Electrical Engineering, Universities Publishing 2000.
House, Petroşani, Vol. 9, pp.312-319, 2007. [19] Rumelhart, D.E., Hinton, G.E., Williams, R.J.
[10] Popescu M.C., Olaru O, Mastorakis N. Learning representations by backpropagating
Equilibrium Dynamic Systems Integration errors, Nature 323, pp.533-536, 1986.
Proceedings of the 10th WSEAS Int. Conf. on [20] Silva, F.M., Almeida, L.B. Acceleration
Automation & Information, Prague, pp.424- techniques for the backpropagation algorithm
430, March 23-25, 2009. in L.B. Almeida, C.J. Wellekens (eds.), Neural
[11] Popescu M.C., Modelarea şi simularea Networks, Springer, Berlin, pp.110–19, 1990.
proceselor, Editura Universitaria Craiova, pp. [21] Werbos, P.J. The Roots of Backpropagation,
261-273, 2008. John Wiley & Sons, New York, 1974.
[12] Popescu M.C., Petrişor A. Neuro-fuzzy control
of induction driving, 6th International

ISSN: 1109-2734 588 Issue 7, Volume 8, July 2009

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy