Sio-Iong Ao Len Gelman Editors: 1 Advances in Electrical Engineering and Computational Science
Sio-Iong Ao Len Gelman Editors: 1 Advances in Electrical Engineering and Computational Science
Sio-Iong Ao
Len Gelman
Editors
Advances in
Electrical
Engineering and
Computational
Science
1 23
Advances in Electrical Engineering
and Computational Science
Lecture Notes in Electrical Engineering
Volume 39
Advances in Electrical
Engineering and
Computational Science
123
Editors
Dr. Sio-Iong Ao Dr. Len Gelman
Harvard School of Engineering Cranfield University
and Applied Sciences School of Engineering
Harvard University Dept. of Process and Systems
60 Oxford Street Engineering
Cambridge Cranfield, Beds
MA 02138, USA United Kingdom MK43 0AL
siao@seas.harvard.edu
DOI 10.1007/978-90-481-2311-7
9 8 7 6 5 4 3 2 1
springer.com
Preface
Dr. Sio-Iong Ao
Prof. Len Gelman
v
Contents
vii
viii Contents
C. Grabner
1.1 Introduction
As the motor design process has to consider multi-physical effects, such as, e.g.
electrical, thermal and mechanical aspects, a very deep and complex analysis has to
take place in order to find the best technical and commercial choice for the product.
Thus, extended investigations to the acoustic behavior of the complete speed vari-
able induction drive system, as it is schematically depicted in Fig. 1.1, have been
performed in the lab to obtain reproducible results with high accuracy.
C. Grabner
Research and Development, Siemens AG, Frauenauracherstrasse 80, D-91056 Erlangen,
E-mail: Grabner.Christian@SIEMENS.COM
Sound is the mechanical vibration of a gaseous medium through which the energy is
transferred away from the source by progressive sound waves. Whenever an object
vibrates, a small portion of the energy involved is lost to the surrounding area as
unwanted sound – the acoustic noise [1, 2].
As the acoustic intensity, the power passing through a unit area in space, is propor-
tional in the far field to the square of the sound pressure, a convenient scale for the
measurement can be defined as sound pressure level
1 Acoustic Behavior of Squirrel Cage Induction Motors 3
2
p .t/ p .t/
Lp .t/ D 10 log D 20 log ; (1.1)
p0 p0
in decibel, with p0 as the reference sound pressure of 20 Pa. The measured sound
pressure p .t/ in (1) depends on many insecure factors, such as e.g. orientation and
distance of receiver, temperature and velocity gradients inside the involved medium.
The analysis of (1) in the frequency domain is fortunately done with the discrete
fast Fourier-analysis. Therefore, the time-periodical signal (1) is sampled as Lp;n
and further processed at a distinct number of N samples as
X
N1
OLp; D Lp;n ej.2n=N/ ; D 0; 1; 2 : : : N 1 (1.2)
nD0
The quietest sound at 1,000 Hz which can be heard by the average person is found
in Fig. 1.2 to be about 20 μ Pa and this value has been standardized as the nomi-
nal hearing threshold for the purpose of sound level measuring. At the other end
of the scale the threshold of pain occurs at a sound pressure of approximately
100 Pa.
An inspection of Fig. 1.2 shows that a pure tone having a sound pressure level of,
e.g. 20 dB and a frequency of 1,000 Hz is plainly audible, whereas one having the
same sound pressure level but a frequency of 100 Hz is well below the threshold of
audibility and cannot be heard at all.
The loudness of a pure tone of constant sound pressure level, perhaps the simplest
acoustic signal of all, varies with its frequency, even though the sound pressure may
be the same in every case.
4 C. Grabner
Fig. 1.2 Typical sound pressure levels in decibel and equal loudness levels in phon for different
noise sources
Although our hearing mechanism is not well-adapted for making quantity mea-
surements of the relative loudness of different sounds, there is a fair agreement
between observers, when two pure tones of different frequencies appear to be
equally load. It is therefore possible to establish in Fig. 1.2 contour plots of equal
loudness in phone. These subjective felled loudness curves are obtained by alter-
nately sounding a reference tone of 1,000 Hz and a second tone of some other
frequency. The intensity level of the second tone is then adjusted to the value that
makes the two tones appear equally load.
1 Acoustic Behavior of Squirrel Cage Induction Motors 5
A pure ton having a frequency of 100 Hz and a sound pressure level of about
35 dB sounds as load as a pure 1,000 Hz tone whose intensity level is 20 dB, and
hence the loudness level of the 100 Hz tone is by definition 20 phon. The 1,000 Hz
frequency is thus the reference for all loudness measurements, and all contours
of equal loudness expressed in phon have the same numerical value as the sound
pressure level in decibel at 1,000 Hz.
Due to the human ears assessment of loudness, the defined purely physical sound
pressure terminus has to be modified by an implicit weighting process in a way
which corresponds to the more complex response of the human being. Therefore,
among several possibilities, distinguished by B, C, D, E or SI-weighting in Fig. 1.3,
the A-weighted level has be found to be the most suitable for modifying the fre-
quency response to follow approximately the equal loudness of 20 phon in Fig. 1.2.
Fig. 1.4 Stator design with totally-closed (left) and semi-closed (right) stator slots
Table 1.1 Test results with regard to their suitability for acoustic noise minimization
Combination Stator slot design Rotor design
Motor ① Totally-closed Skewed
Motor ② Totally-closed Un-skewed
Motor ③ Semi-closed Un-skewed
Motor ④ Semi-closed Skewed
Table 1.1, have been consecutively tested with regard to their suitability for acoustic
noise minimization.
All investigated motor designs are operated at the rated electrical voltage level of
400 V. They have a rated power of approximately 750 W at the nominal speed of
about 1,395 rpm. As all listed samples in Table 1.1, are assisted with the same motor
fan, a direct comparison of the acoustic tests results is feasible.
1 Acoustic Behavior of Squirrel Cage Induction Motors 7
The noise measurement has been carried out for each motor design over a
wide speed range beginning from the lower value of 500 rpm up to rated-speed of
1,395 rpm at constant mechanical rated-torque of 5 Nm. Within increasing higher
speed ranges, the drive works in the field weakening range and so the load has been
continuously reduced.
After reaching the physically quasi-steady operational state for each adjusted
speed value, the measured data set has been stored. Thus, purely dynamic acoustic
noise in fact of a transient real-time run-up is herewith not considered.
The depicted sound pressure level courses in Figs. 1.6 and 1.7, which are repre-
senting the skewed rotor designs ① and ④, show a very smooth and continuous
track behavior, which fortunately avoids extensively noise peaks over the total speed
range. Thus, the utilization of the standard skewing technology avoids the generation
of undesired significant noise peak over the complete speed range.
As obviously from Figs. 1.6 and 1.7, the course of the proposed novel motor
combination ① has advantageously the lowest sound pressure level in comparison
to the usually design ④ at several speed values.
The introduced novel motor topology of the standard squirrel cage induction
motor with totally-closed slot tips is therefore suitable to reduce undesired noise
emission not only at the nominal speed of 1,395 rpm. The emitted noise of the pro-
posed motor design ① is reduced by the amount of 10 dB at no-load in Fig. 1.6 and
about 8 dB at rated-load in Fig. 1.7 in comparison to the state of the art design ④.
Fig. 1.6 Measured sound pressure level versus speed for the motor ① (blue), motor ② (red), motor
③ (green) and motor ④ (violet) at sinusoidal power supply and no-load
8 C. Grabner
Fig. 1.7 Measured sound pressure level versus speed for the motor ① (blue), motor ② (red), motor
③ (green) and motor ④ (violet) at sinusoidal power supply and rated-load
Contrarily, in case of both test objects ② and ③, a very discontinuous and even
speed sensitive sound level behaviors characterized by some very distinct noise
peaks is found from Figs. 1.6 and 1.7.
Especially the varying operational state from no-load to rated-load causes
extended magnetic saturation effects within the totally-closed stator tooth tip
regions. This fact has, in dependency on the motor design ② or ③ unexpected inverse
impacts to the emitted noise levels. There are some interesting secondary aspects
occurring at higher speed ranges. Considering the test object ③, the maximal occur-
ring peak of 81 dB in Fig. 1.6 at 1,842 rpm is slightly reduced at the same speed
value to 79 dB in Fig. 1.7. A completely contrarily observation could be for motor
②. The noise peak of 55 dB at 1,670 rpm at no-load is significantly shifted to much
higher levels with arising additional peaks of 62 dB at the same speed in case of the
rated-load condition in Fig. 1.7.
Fig. 1.8 FFT of the sound pressure level for motor ① at rated-load depicted for the speed range
500–3,000 rpm
Fig. 1.9 FFT of the sound pressure level for motor ② at rated-load depicted for the speed range
500–3,000 rpm
Fig. 1.10 FFT of the sound pressure level for motor ③ at rated-load depicted for the speed range
500–3,000 rpm
10 C. Grabner
Fig. 1.11 FFT of the sound pressure level for motor ④ at rated-load depicted for the speed range
500–3,000 rpm
Some very distinct spectral components due to stator slot harmonics of motor ③
at the speed of 1,900 rpm could be identified in Fig. 1.10 with magnitudes of 59 dB
at the surrounding of the single frequency of 2,300 Hz.
1.5 Conclusion
References
1. L.E. Kinsler and A.R. Frey, Fundamentals of acoustics, Wiley:, New York/London, 1962.
2. M. Bruneau, Fundamentals of acoustics, ISTE, London, 2006.
3. H. Jordan, Der geräuscharme Elektromotor, Verlag W. Girardet, Essen, 1950.
Chapter 2
Model Reduction of Weakly Nonlinear Systems
Abstract In general, model reduction techniques fall into two categories – moment
–matching and Krylov techniques and balancing techniques. The present contribu-
tion is concerned with the former. The present contribution proposes the use of a
perturbative representation as an alternative to the bilinear representation [4]. While
for weakly nonlinear systems, either approximation is satisfactory, it will be seen
that the perturbative method has several advantages over the bilinear representation.
In this contribution, an improved reduction method is proposed. Illustrative exam-
ples are chosen, and the errors obtained from the different reduction strategies will
be compared.
2.1 Introduction
As the level of detail and intricacy of dynamical system models continues to grow,
the essential behaviour of such systems can be lost in a cloud of complexity. Fur-
thermore, numerical simulations of such systems can consume extensive memory
resources and take inordinate amounts of time. Consequently, to cope with the grow-
ing complexity and dimensionality of systems, model reduction has become a vital
aspect of modern system simulation. Model reduction techniques for linear systems
are well studied (e.g. [1, 2] and references therein). However, the study of nonlin-
ear systems is much more complicated and the development of model reduction
methods for large-scale nonlinear systems represents a formidable challenge.
M. Condon (B)
School of Electronic Engineering, Dublin City University, Glasnevin, Dublin 9, Ireland,
E-mail: Marissa.Condon@dcu.ie
P
x.t/ D f .x.t// C Bu.t/
where f W IRn ! IRn is a non-linear function, x 2 IRn are the state-space variables.
The initial condition is x.0/ D x0 and u.t/; y.t/ 2 IR. B; C 2 IRn are constant
vectors (C is a vector-row and B is a vector-column). It is assumed that x D 0 is
a stable equilibrium point of the system (2.1) and x0 D 0. Under this assumption,
f .x/ can be expanded in a generalised Taylor’s series about x D 0:
xP 1 D A1 x1 C Bu
xP 2 D A1 x2 C A2 .x1 ˝ x1 / (2.4)
xP 3 D A1 x3 C A2 .x1 ˝ x2 C x2 ˝ x1 / C A3 .x1 ˝ x1 ˝ x1 /
::
:
xP D Ax C Bu
y D C x; (2.6)
where
2 3 2 3
A1 B 0 0 0 :::
6 A1 7 60 A2 0 0 7
6 7 6 7
AD6 A1 7 BD60 0 A2 A3 : : : 7 (2.7)
4 5 4 5
:: :: :: :: ::
: : : : :
2 3
u1
6 u2 7
6 7
u.t/ D 6 u3 7 C D ŒC; C; C; : : : ; u1 D u.t/; u2 D x1 ˝ x1 ;
4 5
::
:
u3 D x1 ˝ x2 C x2 ˝ x1 ; u4 D x1 ˝ x1 ˝ x1 ; : : :
The source u2 for the second equation in (2.4) depends only on the state vector x1
determined from the first equation and so on. Note that since A1 is a stable matrix,
A is also automatically stable. Now, u2 , u3 , etc. are not independent inputs like u1
and therefore, linear system theory cannot be applied directly to the representation
in (2.6). However, subsequent sections will show how the representation may be
adapted so that linear system theory and consequently, linear model reduction may
be applied to the representation in (2.6).
16 M. Condon, G.G. Grahovski
where 2 3
x .1/
6 .2/ 7
O
x.t/ D 4x 5;
::
:
The goal of any model reduction technique is to replace the n- dimensional system
(2.1) with a system of much smaller dimension k n, such that the behaviour of
the reduced order system satisfactorily represents the behaviour of the full system.
In projection based reduction schemes, a projection matrix, V , is selected such that
its columns span the required subspace [4]. The reduced system is then formed
from approximating the state vector x with xO where xO D VO x. Consider a linear
state-space representation:
P
x.t/ D Ax.t/ C Bu.t/
y.t/ D C x.t/; (2.9)
2 Model Reduction of Weakly Nonlinear Systems 17
The state-space equations for the reduced system are then given as:
PO
x.t/ D AOx.t/
O C BO u.t/
O
O
y.t/ O
D C x.t/;
O (2.10)
where
AO D V t AV; BO D V t B; CO D C V:
In Krylov-based methods, the projection matrix is chosen to span the columns of
the Krylov subspace
The rationale for selection of this subspace is that it results in matching the first m
moments of the original and reduced systems. Here s0 is the point in the complex
plane about which moments are matched. However, the Krylov-based reduction
methods are preferable to direct moment matching techniques as the methods avoid
the numerical problems arising in explicit moment matching.
For nonlinear systems, model reduction is not as straightforward as for linear
systems. In this contribution, we look at some of the properties of linear systems
with a view to adapting the perturbative representation of the nonlinear system so
that a reduction strategy similar to that for linear systems can be applied to it.
Consider the behaviour of a linear system when u ! ˛u. In this case, the output
also changes as y ! ˛y. We term this the scale invariance property which holds for
linear systems. The result is that the Krylov-based reduction method is unaffected
when u ! ˛u. Similarly, if x ! ˇx, the reduction process is unaffected. However,
nonlinear systems are not scale invariant. For example, consider the perturbative
system under a rescaling of the input. That is consider u ! ˛u. The Bu term of
(1.6) transforms as:
2 3
1
6 ˛ 7
6 7
Bu ! ˛ 6 ˛ 2 7 Bu: (2.11)
4 5
::
:
It is evident from (2.11) that the scale invariance property does not hold. To enable
application of linear theory to (2.6) would require that which is not the case as
evident from (2.11). Consequently, linear model reduction techniques may not be
applied directly to the perturbative representation and hence, a modification is
18 M. Condon, G.G. Grahovski
Bu D DBU ;
where
D D diag .1; ; 2 ; : : : /;
with
U D Œu1 ; 1 u2 ; 2 u3 ; 2 u4 ; : : : :
for any nonzero function . If when u1 .t/ ! ˛u1 .t/, transforms as: ! ˛,
then transforms as:
U ! ˛U : (2.12)
It transforms in the same manner as the input to a linear system. The property in
(2.12) is very important as it shows that to enable application of linear systems
theory to (2.6), then the proper input to (2.6) is actually U and not u.
An estimate for may be determined as follows: If D 0, then the system in
(2.6) is linear. Thus, must be proportional to the output due to the nonlinearity
y y1 , where y1 D C x1 is the output from the linear part of (2.6). For the purposes
of practical implementation of the reduction scheme, it is convenient to take as a
constant parameter. Hence, the following is deemed an appropriate choice for :
jy y1 j
D ; (2.13)
T juj
N
where the bar denotes the average value of the waveform over the time interval
Œ0; T for which the behaviour of the system is under examination, provided uN ¤
0. An exact optimal value for for a particular model parameterisation may be
chosen from computer simulations for a particular ‘test input’ that is close to the
inputs for which the system is designed. is then determined by minimising an
error function using the Nelder–Mead algorithm. A suitable error function is the
following:
pP
.fex fred /2
err D ; (2.14)
N
where fex is the output from the exact model and fred is the output from the reduced
model, N is the number of samples taken of fex to compute the error.
However, for most practical cases, the estimate in (2.13) suffices. Obviously, the
reduced model approximates the input–output behaviour of the system locally. No
conclusions however, can be drawn about its global behaviour.
2 Model Reduction of Weakly Nonlinear Systems 19
The reduction process for the perturbative representation proceeds as follows: Let
the approximation in (2.2) involve K terms. The dimension of the system represen-
tation in (2.6) is thus NK. Suppose it is required to reduce the system to dimension
k. The Krylov space for the first-order response x1 in (2.4) and (2.6) is formed as
K1 D .A1 1
1 ; A1 B/ (s0 is set to zero to simplify the explanation but this is not
necessary). An orthogonal projection matrix, V1 for the first-order system is formed
from K1 i.e. xP 1 D V1 x1 . Now, the second-order system in (2.4) and (2.6) is formed
as:
This differs from the standard second-order system such as that presented by Phillips
[4]. In the standard version, D 1. However, results in section 6 will show that
inclusion of the novel proposal for achieves greater accuracy.
The Krylov space for the linear system in (2.15) is then formed as K2 D
.A1 1
2 ; A2 B2 /. An orthogonal projection matrix, V2 , is formed from K2 and this
matrix is used to reduce the second order system. The procedure for reducing the
higher order terms in (2.4) and (2.6) , i.e. x3 ; : : : , in the perturbative system is
similar.
The circuit employed is the nonlinear ladder shown in Fig. 2.1. The number of nodes
in the system is n D 30. The ladder in represents a heat flow model [7]. The voltage
at the mth node represents the temperature on a rod at a distance proportional to
m. The (input) voltage at node 1 represents the heat source. The nonlinearities re-
present the dependence of the conductivity on the temperature. The average voltage
at all nodes is taken as the output and this represents the average temperature of the
rod. Varying the circuit parameters corresponds to different spatial or environment
1 2 3 n
r r r
c c c c r
u(I)
r fref ( u ) ind ( u ) iref ( u ) inl ( u )
inl D gv 2 ; (2.16)
for v > 0. The parameters are C D r D 1. The strength of the nonlinearity is varied
by varying g.
The dimension of the original state-space is n D 30. The perturbative representa-
tion (2.6) contains two terms, i.e. K D 2. The reduction process is performed from
the representation (2.6) of order nK D 60 to a representation of order k D 6. The
value is D 1:6443. Fig. 2.2 shows the result for an exponential input et from
the reduced perturbative model for g D 0:1 superimposed on the result from a full
nonlinear model of Fig. 2.1. The root mean square error between this result and that
computed from a full model is 0:0026. The reduced model is equally accurate for
other input signals. In order to confirm the value of inclusion of , the root mean
square error is 0:0042 when D 1.
As a second example, consider the 30-section nonlinear RC ladder shown in
Fig. 2.3.
0.015
0.01
v(t)
0.005
0
0 1 2 3 4 5
time s
Fig. 2.2 Reduced perturbative model g D 0:1 (solid line D full model, dashed line D reduced
model)
The reduction process described in Section 2.5 is applied. The system is reduced
from a dimension of 60 to a dimension of 6. The value of is determined from
(1.13) as 0:6747. With this value of , the RMS error is 0:0037. With the standard
approach of [4], the RMS error is 0:0076.
2.7 Conclusions
0.014
0.012
0.01
0.008
v(t)
0.006
0.004
0.002
0
0 2 4 6 8 10
time s
Fig. 2.4 Reduced perturbative model (solid line D full model, dashed line D reduced model)
22 M. Condon, G.G. Grahovski
Acknowledgements This material is based upon works supported by Science Foundation Ireland
Science Foundation Ireland under Principal Investigator Grant No. 05/IN.1/I18.
References
3.1 Introduction
To obtain the large breakdown voltage, the lateral surface electric field
distribution along the silicon surface must be uniform [3]. Several ideas have been
used to enhance the breakdown voltage of the lateral bipolar devices. These include
Reduced Surface Field (RESURF) principle [4], fully depleted collector drift region
[5], graded drift region [6, 7], semi insulating polysilicon (SIPOS) passivation lay-
ers [8]. The concept of extended box has been used to increase the breakdown
voltage of lateral Schottky transistor [9]. By using linearly varying doping con-
centration and linearly varying oxide thickness, the lateral surface component of the
electric field in the device can be made perfectly uniform and the maximum break-
down voltage can be obtained. But it is extremely difficult to fabricate a device with
linearly varying doping concentration and simultaneously having linearly varying
oxide thickness [10, 12].
In this paper, numerical simulation of LBJT with multizone step doped drift
region on buried oxide thick step (BOTS) has been performed. we have studied
and simulated three types of the devices in this work. In type one, the device is
having two doping zones but without BOTS. This is the conventional device. The
second type again uses two doping zones but is having BOTS, called as two zone
proposed (2ZP) device. The third type uses three doping zones and BOTS, called
as three zone proposed (3ZP) device. To increase the breakdown voltage and at the
same time retain the ease of fabrication of the device, the linearly varying doping
and the linearly varying oxide thickness has been replaced with step profile of both.
The multizone step doping and thick step oxide results in increased uniformity in the
lateral surface electric field in the drift region, reduction of base-collector junction
electric field by using lower doping concentration near the junction and enhance-
ment of collector breakdown voltage. The simulation results using MEDICI [13]
has shown that the 2ZP device has a breakdown voltage of 200% higher than the
conventional device. A 3ZP device possesses breakdown voltage 260% higher than
the conventional one. It has been observed that increasing number of zones fur-
ther increases breakdown voltage marginally but increases complexity of the device
significantly. The high breakdown voltage in the proposed devices can be obtained
even at high drift doping concentration. This improves the tradeoff between the
breakdown voltage and the on-resistance of the device.
Fig. 3.1 (a) An ideal lateral bipolar transistor (b) Conventional LBJT (c) Two zone proposed (2ZP)
LBJT (d) Three zone proposed (3ZP) device
difficult as it needs large number of masking and lithographic steps, which renders
its fabrication practically impossible. A simple and conventionally used lateral BJT
on SOI is shown in Fig. 1b. This structure is easier to fabricate in comparison to
the ideal one, but at the cost of increase in on-resistance and the decrease in break-
down voltage. The breakdown voltage is also poor in the conventional structure.
The proposed practical alterations to the structure in Fig. 3.1a, that is, 2ZP and 3ZP
are shown in Fig. 3.1c and d respectively. These structure retains the advantages of
the ideal structure, that is, high breakdown voltage and improved tradeoff between
breakdown voltage and on-resistance. The proposed structures are easier to fabricate
than the ideal one, but are more complex than the conventional one.
Extensive device simulations were carried out using the device simulator MEDICI.
The various models activated in the simulation are analytic, prpmob, fldmob, consrh,
auger and BGN [13]. The mobility models prpmob and fldmob specify that the
perpendicular and parallel electrical field components are being used. The concen-
tration dependent Shockley-Read-Hall model (consrh) and Auger recombination
model (auger) are activated. Since the device is bipolar, having PN junctions, the
band gap narrowing effect has been considered by using BGN model. The various
device and simulation parameter used are shown in Table 3.1. The current gain and
Gummel plot curves for the conventional and 3ZP devices are shown in Fig 3.2.
The common emitter current gain of the all devices is chosen to be identical 50)
by an appropriate choice of various doping concentrations. This is being done for
better comparison of the breakdown voltage. The Gummel plot for the proposed and
26 S.A. Loan et al.
Si film thickness 1 μm 1 μm 1 μm
Oxide thickness 1.5 μm 1.5 μm 1.5 μm
Emitter length 6.4 μm 6.4 μm 6.4 μm
Emitter doping conc.(cm3 )(n type) 5 1019 5 1019 5 1019
Substrate doping conc.(cm3 )(p type) 5 1013 5 1013 5 1013
Collector doping conc.(n type)(cm3 / nC = 5 1019 nC = 5 1019 nC = 5 1019
n1 = 3 1015 n1 = 3 1015 n1 = 3 1015
n2 = 5 1015 n2 = 5 1015 n2 = 5 1015
n3 = 7 1015
Drift region length LDN1 = 15 μm LDN1 = 15 μm LDN1 = 10 μm
LDN 2 = 19 μm LDN 2 = 19 μm LDN 2 = 11 μm
LDN 3 = 13 μm
TAUN0, TAUP0 (SRH electron and hole 5 106 (s) 5 106 (s) 5 106 (s)
carrier life time coefficients)
VSURF at poly base contact (Surface 3 106 cm/s 3 106 cm/s 3 106 cm/s
recombination velocity)
Metal (Al) work function 4.17 eV 4.17 eV 4.17 eV
Fig. 3.2 (a) Beta versus collector current (b) Gummel plots for the the conventional and 3ZP
device)
conventional devices are more or less similar, that is why they are overlapping for
most of the base voltage range. Since the oxide is fully covering the base region in
both the cases, the area of the base remains constant in all cases and hence the base
current. The output characteristics of the conventional, 2ZP and 3ZP devices are
shown in Fig. 3.3a, b, c respectively. The various parameters used in obtaining these
characteristics are shown in Table 3.1. In all structures the buried oxide is 1.5 μm
thick and BOTS thickness is varied from 1 to 9 μm.
After comparing the simulation results we have found that the common emitter
breakdown voltage with base open (BVCEO ) is significantly higher for three zone
proposed (3ZP) and two zone proposed (2ZP) devices than the conventional device.
The BVCEO of the conventional device is 124 V. Using same device parameters
3 Lateral Bipolar Transistor 27
Fig. 3.3 Output characteristics of (a) conventional device (b) 2ZP device (c) 3ZP device
as for the conventional device, the 2ZP device with BOTS length of 34 μm and
thickness of 6 μm, the BVCEO is 370 V. Enhancing the number of zones to three,
that is, the 3ZP device, the breakdown voltage increases to 440 V. This shows that
the breakdown voltage in 3ZP device is 260% higher than the conventional device
and more than 20% higher than 2ZP device.
The reason behind this enhancement in breakdown voltage is explained in
Fig. 3.4, which shows that the lateral surface electric field is more uniform in the
proposed devices than in the conventional device. The conventional device is having
least uniformity in the electric field and the 3ZP device is having highest uniformity
in the electric field. The enhanced uniformity in surface electric field is due to the
presence of steps in doping and step in oxide [10]. These steps result in the gener-
ation of additional electric peaks in the collector drift region, which are shown in
Fig. 3.4. These generated peaks pull down the peaks at the edges of the collector drift
region. Further, the buried oxide thick step (BOTS) reduces the peak electric field
at n2 -nC junction in 2ZP device and n3 -nC in 3ZP device by redistributing applied
potential in the collector drift region and in thick oxide. The analytical approach
deduced in [12] explains the creation of the electric field component due to BOTS
and increase in breakdown voltage.
28 S.A. Loan et al.
The potential contours of the conventional and the proposed devices are shown
in Fig. 3.5a and b respectively. The crowding of the potential contours along the
oxide layer as shown in Fig. 3.5a results in lowering the breakdown voltage in the
conventional device. In the proposed device, BOTS helps in reducing the crowding
of the potential contours as shown in Fig. 5b. This reduction in crowding of the
potential contours in BOTS, makes it possible to apply more voltage to the device
before it breaks down.
Figure 6a and b show the electric field profiles at top and bottom of the SOI
film respectively. The electric field profile in the oxide layer is shown in Fig. 6c.
These figures also show how the location of electric field peaks vary with BOTS
length. The increase in breakdown voltage can be attributed to how the electric field
peaks are distributed in the device for different values of BOTS length. As can be
seen from these figures that there is an optimum length which results in maximum
uniformity of the lateral surface electric field and hence the maximum breakdown
3 Lateral Bipolar Transistor 29
Fig. 3.6 Electric field profile in the proposed device at (a) top (b) bottom of silicon film (c) inside
the oxide layer for different values of LBOTS
Fig. 3.7 Effect of variation in (a) BODS length and (b) BODS thickness on breakdown voltage in
proposed devices
this plot it is clear that for the BOTS length of 30–40 μm, the breakdown voltage
2ZP device is >200% higher than that of the conventional device. For 3ZP device,
the breakdown voltage is > 260% higher than that of the conventional device. At
the optimum length, the lateral surface electric field is more uniform in the pro-
posed devices. The impact of oxide thickness on breakdown voltage is shown in
Fig. 7b. It shows that the breakdown voltage first increases with increase in BOTS
thickness, then subsequently saturates. This is because thick BOTS helps to sustain
high electric field to a value lower than the field causing breakdown in the oxide.
Since both horizontal and vertical components of the electric field contribute to the
avalanche process, increasing BOTS thickness further does not improve breakdown
voltage [9]. The simulation results have shown that for TBOT S = 8 μm and LBOT S
= 36 μm, the breakdown voltage of the 3ZP device is 445 V, 2ZP device is 380 V
and that of the conventional device is 124 V. This shows that the breakdown voltage
in the 3ZP device is enhanced by 260% and in 2ZP by 206%.
Figure 3.8 shows the effect of substrate doping on the breakdown voltage. It is
observed that there is an optimum doping concentration which gives the maximum
breakdown voltage. The effect is explained in Fig. 8b, which shows electric field
profile at specified values of substrate doping concentration in a 2ZP and in conven-
tional device. At high substrate doping concentration, the electric field builds up at
the n2 -nC junction. This electric field peak breaks down the device at low voltage.
At low substrate doping, the electric field builds up at the collector-base junction,
and ultimately breakdown occurs due to this electric field peak.
Figure 3.9a shows the effect of varying doping concentration in two zones of drift
region on breakdown voltage in the two zone proposed device. It is clear that there
are optimum values of drift region doping concentration in the two zones, which
gives maximum breakdown voltage. The optimum drift region doping concentra-
tion in N1 and N2 is about 6 1015 cm3 . The electric field profile at specified
values of doping concentration in two zones N1 and N2 is shown in Fig. 3.9b. For
n1 = 3 1016 cm3 , the electric field at collector-base junction is high and results
Fig. 3.8 (a) Breakdown voltage versus substrate concentration in conventional and proposed
devices. (b) Electric field profile at specified values of substrate doping concentration in 2ZP and
in conventional devices
3 Lateral Bipolar Transistor 31
Fig. 3.9 (a) Breakdown voltage versus drift doping concentration in 2ZP device. (b) Electric field
profile at specified values of drift region doping concentration in 2ZP device
3.4 Conclusion
Numerical simulation of a novel lateral bipolar junction transistor on SOI has been
performed. The novelty in the device is the combination of two/three zone step
doping with the thick step oxide. It is known that the linearly varying drift doping
and linearly varying oxide thickness results in maximum breakdown voltage. How-
ever, it is very difficult to fabricate such a device. In this paper a device with high
breakdown voltage and relatively easy to fabricate has been proposed. The BVCEO
has been maximized by choosing appropriate values of TBOT S and LBOT S . The
breakdown voltage has been found to be dependent on substrate doping and drift
region doping concentration. In this simulation study a more than 260% increase in
breakdown voltage has been observed in the three zone proposed device.
References
1. Ning T. H., “Why BiCMOS and SOI BiCMOS” IBM Journal of Res and Dev, vol. 46, no. 2/3,
pp. 181–186, Mar./May 2002.
2. I-Shun Michael Sun et al., “Lateral High speed Bipolar transistors on SOI for RF SoC
applications” IEEE Transactions on Electron Decives, vol. 52, no. 7, pp. 1376–1383, July
2005.
3. Roy S. D. and Kumar M. J., “Realizing High Voltage Thin Film Lateral Bipolar Transistors
on SOI with a Collector Tub” Microelectronics International, 22/1, pp. 3–9, 2005.
4. Huang Y. S. and Baliga B. J., “Extention of resurf principle to dielectrically isolated power
devices,” Proceedings of the 3rd International Symposium on Power Semiconductor Devices
and ICs, pp. 27–30, April 1991.
5. Kumar M. J. and Bhat K. N., “Collector recombination lifetime from the quasi-saturation
analysis of high voltage bipolar transistors,” IEEE Trans. Electron Devices, vol. 37, no. 11,
pp. 172–177, 1995.
6. Zhang S., Sin J. K. O., Lai T. M. L. and KO P. K., “Numerical modeling of linear doping
profiles for high-voltage think film SOI devices,” IEEE Trans. Electron Devices, vol. 46, no.
5, pp. 1036–1041, 1995.
7. Cao G. J., De Soza M. M. and Narayanan E. M. S., “Resurfed Lateral Bipolar Transis-
tor for High-Voltage, High-Frequency Applications” Proceedings of the 12th International
Symposium on Power Semiconductor Devices and ICs. Toulouse, France, pp. 185–187, 2000.
8. Jaume D., Charitat G., Reynes J. M. and Rossel P., “High-Voltage planar devices using
filed plate and semi-resistive layers,” IEEE Transactions on Electron Devices, vol. 38, no. 7,
pp. 1478–1483, July 1991.
9. Kumar M. J. and Roy S. D., “A New High Breakdown Voltage Lateral Schottkey Collector
Bipolar transistor on SOI: Design and Analysis” IEEE Transactions Electron Device, vol. 52,
no. 11, Nov. 2005.
10. Sunkavalli R et al., “Step drift doping profile for high voltage DI lateral power devices”
Proceedings of IEEE International SOI Conference, pp. 139–140, Oct. 1995, 2005.
11. Luo J et al., “A high performance RF LDMOSFET in thin film SOI technology with step drift
profile”, Solid State Electronics, vol. 47, pp. 1937–1941, 2003.
12. Kim I. J et al., “Breakdown Voltage improvement for thin film SOI Power MOSFET’s by a
Buried Oxide Step Structures” IEEE Electron Device Letters, vol. 15, no. 5, pp. 148–150, May
1994.
13. TMA MEDICI 4.2 . Technology Modeling Associates Inc. Palo Alto, US 2006.
Chapter 4
Development of a Battery Charging
Management Unit for Renewable Power
Generation
4.1 Introduction
J. Wang (B)
School of Electronic, Electrical and Computer Engineering, University of Birmingham,
Birmingham, B15 2TT, UK,
E-mail: j.h.wang@bham.ac.uk
warming. China has abundant inland and offshore wind energy resources, providing
potential conditions for various types of capacity, in-grid or off-grid wind stations.
According to a new study from an environmental group, China’s southern province
of Guangdong could support 20 GW of wind generating capacity by 2020, provid-
ing as much as 35,000 GWh of clean electrical energy annually, which is equivalent
to 17% of Guangdong’s total current power demand [2]. By end of year 2005, China
had built 59 wind farms with 1,854 wind turbine generators and a 1,266 MW wind
power installed capacity [2]. In China, a large number of remote rural or mountain-
ous inhabitants have no access to electricity, but in these areas there are abundant
natural energy resources such as wind or solar energy. To alleviate the intermittences
in renewable power generation, use of energy storage is an evitable route of choice.
Batteries are nowadays the most realistic means in energy storage. The charging and
discharging of batteries affect the battery life time and energy efficiency greatly. At
present, batteries have been used to provide electricity for lighting, radios, TVs and
communications. The use of a small size wind generators with the assistance of
batteries could provide local alternative means to enable more households to have
access to electricity. Generally, the framework of the energy conversion process in
a small-scale renewable energy generation system can be described by Fig. 4.1. The
energy conversion process illustrated in Fig. 4.1 can be divided into eight steps.
Many research works such as the modeling of the wind turbine, the control strate-
gies for the three-phase generator have been conducted and the research results
were reported in many publications [3, 4]. The work reported in the chapter mainly
focuses on the development of a battery control unit. The energy conversion modes
are analyzed. Then the working principles of chargeable batteries are probed into.
Control strategies based on fuzzy logic have been suggested to ensure the optimal
performance in battery charging and discharging. A charging/discharging control
module using embedded system technology is designed to charge the battery by
following the optimal charging/discharging curve of the battery. The whole unit is
controlled and monitored by an embedded microprocessor system.
Embedded
microprocessor
Power
grid
If the output electrical power is over the load demand, the surplus is used to charge
the battery. Provided that the load does not need all the output power, and the battery
is fully charged, the superfluous power is then sent to the main grid. In the system,
the output electrical power is provided to the loads with priority. There exist five
possibilities with the relationship among Poutput , Pload , Pbattery and PGrid , and five
working modes of energy conversion in the system is formed accordingly [3], in
which Poutput represents the electrical power output, Pload the power consumption
of the loads, Pbattery the power charged into or discharged from the battery, PGrid the
power supplied to the main grid. The five modes are:
Mode1: Poutput D 0; Pload D 0; Pbattery D 0; PGrid D 0 – the system is idle.
Mode 2: Poutput > Pload ; Poutput Pload < Pbattery ; PGrid D 0 – the generated
electricity is more than the load demand and the battery is charged to less than
its full capacity.
Mode 3: Poutput > Pload ; Poutput Pload > Pbattery ; PGrid > 0 – the generated
electrical power is more than the load demand and the battery is over charged so
the extra power is sent to the grid.
Mode 4: Poutput < Pload ; Pload < 0; PGrid D 0 – the generated electrical power
is less than the load demand and the battery is discharged.
Mode 5: Poutput < Pload ; Pload D 0; PGrid D 0 – the generated electrical power
is much less than the demanded energy supply and the battery is fully discharged.
The battery must be disconnected from the system to avoid over discharge.
From the discussion, it can be seen that the battery works in three statuses: dis-
connected from the system, charged by the renewable power or discharged to supply
power to the loads, as shown in Fig. 4.2. The status of the battery is depended on
the working modes of the system, and shifts according to different modes [4].
The most commonly used batteries are divided into three categories, alkali bat-
tery, nickel iron battery and lead acid battery. The lead acid battery is chosen for
our system. The working principles of the lead acid battery can be found in many
literatures [5, 6]. In 1960s, an America scientist Mass has put forward an optimal
charging curve for the battery based on the lowest output gas rate, as shown in
Fig. 4.3. If the charging current keeps to the track of the optimal curve, the charging
hours can be sharply cut down without any side effect on the capacity and life-span
of the battery. The Ampere-hour rule for charging the Lead-acid batteries can be
considered as the most efficient charging approach, considering the charging time
and the excellent performance provided by this method to maintain the life-span
of the battery. One example of the application of the Ampere-hour rule to charge
battery is shown in Fig. 4.4. More details in charging method for lead-acid batteries
was discussed in reference [5].
When the battery is charged, the charging current Ic , the charging voltage (port
voltage) Uc , the potential difference Eb between the positive plate and the negative
plate of the battery and the internal resistor Rb of the battery has the following
i(A)
80 90Ah
60Ah
Y
60
40 40Ah
20
0 t(h)
Fig. 4.3 Optimal charging 1 2 3 4
curve of a battery X
U,I
y (V,A)
2.23V U
Per cell
Under charging
t(h)
O
x
relationship:
Uc Eb
Ic D (4.1)
Rb
The set point of the charging current is closely related with the capacity of the bat-
tery [7]. With the potential difference of the battery gradually increasing, the
charging voltage rises with it so that the battery can be charged according to the
formula ( 4.1), but it kept below a certain maximum level as shown in the part one
of Fig. 4.4. The second stage starts when the battery voltage reaches a certain level,
which is defined by the characteristics of the battery. During this phase, the control
variable switches over from current to voltage. The voltage is maintained almost
unchanged at a given value, while the current is gradually decreasing until it drops
to a value set at the initial stage [8].
Wind power
Load consumption
X
0
t3 t4t5 t6
t1 t2 t7 t8
Fig. 4.5 Relationship between the wind power and the load consumption
38 Y. Yin et al.
Fuzzy logic control can be well implemented to a low-cost hardware system and
easily updated by amending rules to improve performance or adding new features.
Referred to the Ampere-hour charging curve of the battery in Fig. 4.4, the charging
process seems infinite. At the beginning of charging process, the charging current
is quite large, while it drops quickly with time elapses. During this period, most
electrical power has been converted into chemical energy. At the end of the charg-
ing process, the current is close to zero and hardly changes. In general, when the
charging capacity of the battery reaches to 90% of its rated value, the charging
process is considered to complete.
In this system, there are four input variables. P D P P n is the difference of
the wind power P and the load consumption Pn; P 0 D dP =dt 0 represents the
changing rate of P ; T D T n T the relative temperature of the battery to the
surrounding temperature; T 0 D dT =dt the changing rate of T . The output
variable is the charging voltage U . So, the control function can be described as:
The general fuzzy rules can be represented by (3), which is illustrated in Fig. 4.6.
If (P is : : : and P 0 is : : : and T is : : : and T 0 is : : :)
Ñ
IF U is very high
Ñ
AND U' is large
Very high Ñ
AND T is very high
Ñ High Ñ
AND T' is large
U
Ok THEN U is Pn
Low Ñ
Very low IF U is high
Ñ
AND U' is large
Ñ
AND T is very high
Ñ
Large AND T' is largeTHEN
Ñ U is Pn-1
U' Medium
Small Ñ
IF U is ok
Zero Ñ
AND U' is large
Ñ
AND T is very high
Ñ
AND T' is large Defuzzifier U
Very high THEN U is Pn-2
Ñ High
T Ñ
Ok IF U is low
Ñ
Low AND U' is large
Ñ
Very low AND T is very high
Ñ
AND T' is large
THEN U is Pn-3
Ñ Large Ñ
IF U is very low
T' Medium Ñ
AND U' is large
Small Ñ
AND T is very high
Zero Ñ
AND T' is largeTHEN
U is Pn-4
…...
Very high Ñ
IF P is very high
Ñ
P High Ñ
AND P' is large
Ok THEN U1 is P 24
Low
Very low Ñ
IF P is high
Ñ
AND P' is large Defuzzifier U1
THEN U1 is P 23
Large
Ñ Medium Ñ
P' IF P is ok
Ok Ñ
AND P' is large
Small THEN U1 is P 22
Very small …...
Very high Ñ
IF T is very high
Ñ
T High Ñ
AND T' is large
Ok THEN U 2 is P 23
Low
Very low Ñ
IF T is high
Ñ U2
AND T' is large Defuzzifier
THEN U 2 is P 23
Large Ñ
Ñ
T' Medium IF T is ok
Ñ
Small AND T' is large
Zero THEN U 2 is P 22
…...
P Ñ
P U
+
Ñ F1 + + Battery T
P'
Pn Ñ
T +
F2 Ñ
T'
Tn
4.4.2 Fuzzification
The fuzzification of the system has two separate parts. One is for the outer loop
and the other is for the inner loop. As the fuzzification of the outer control loop is
much the same as that of the inner loop, we only present the discussion of the outer
loop. The input variables for the outer loop is the P and P 0 . P is positive
in the whole charging process. At the same time, it drops gradually in the whole
process. As P 0 is the differential coefficient of P to t, it can be positive or
negative in the charging process. Given a value set of X in Œ0; C1, P is labeled
by:
PV PH PO PL PZ
PL PM PZ NM NL
LP SP ZE SN LN
PV PH PO PL PZ PL PM PZ NM NL
Y Y
Ñ Ñ
U U'
1
0 1 -1 0 1
X X
LN SN ZE SP LP
U1
1
0
0
Min Max
Another approach is to label them with the minimum and maximum values of the
analogue to digital converter, such as the set of [0, 255] or [255, 0] with an 8-bit
converter (or more details, see [11]).
Fuzzy rule evaluation is the key step for a fuzzy logic control system. This step
processes a list of rules from the knowledge base using input values from RAM to
produce a list of fuzzy outputs to RAM. There are many mutual methods to evaluate
fuzzy rules, among which State Space method is the most popular one due to its
close relationship with the transference function of the system [11]. The rule base
for outer loop of the system is summarized in Table 4.1.
These rules are expressed in the form of IF–THEN, as generally given in formula
(4.3). The fuzzy rules are specified as follows:
42 Y. Yin et al.
PL
Y
Ñ
P
1
0.4
0 X Y
U1
1
Y
0.4 AND 0.6 = 0.4
Ñ 0.4
P'
PZ 1 X
Min 0 Max
0.6
X
0
The final step in the fuzzy logic controller is to defuzzify the output, combining the
fuzzy output into a final systems output. The Centroid method is adopted, which
favours the rule with the output of greatest area [12, 13]. The rule for outputs can be
defuzzified using the discrete Centroid computation described in formula (4.9).
P
4
S i F i
1
Uout D
P
4
Si
1
D SUM.i D 1 to 4 of . S.i / F .i ///=SUM.i D 1 to 4 of S.i //
S.1/ F .1/ C S.2/ F .2/ C S.3/ F .3/ C S.4/ F .4/
D (4.9)
S.1/ C S.2/ C S.3/ C S.4/
In (4.9), S.i / is the truth value of the result membership function for rule i , and F .i /
represents the value while the result membership function (ZE) is maximum over the
4 Development of a Battery Charging 43
output variable fuzzy set range. With the fuzzy control rules discussed above, the
typical battery charging current and voltage curves are shown in Fig. 4.12.
In the simulation, the setting point for the charging voltage is 14.5 V and that
of the charging current to 6 A. The charging process can be separated into two
stages. During the first stage, the fuzzy control strategy is implemented to deter-
mine the proper start charging time and to prevent it from being over or insufficiently
charged. At the beginning, the port voltage Uc starts with a small value, while the
current keeps constant to the set value, so the battery will be fully charged. During
the second stage, a normal charging strategy is used. The control variable switches
over from current to voltage. The voltage is maintained almost unchanged at a given
value, while the current is gradually decreasing until it drops to a value set at the
initial stage. As can be seen from Fig. 4.12, the performance of the fuzzy controller
proves to be very good, and the charging curve is much close to the Ampere-hour
curve in Fig. 4.5. Charging time in this method has been reduced to about 30%,
comparing to the classical methods. For a fully discharged battery, the required
charging time is approximately 2.5 h (9,000 s).
The performance of the renewable energy generation system, including the battery
charging and discharging unit, is controlled and monitored by an embedded micro-
processor. In our system, a 32-bit RISC microprocessor from Samsung Company,
S3C2410X, is employed. The S3C2410X was developed using an ARM920T core,
0.18 um CMOS standard cells and a memory complier. The S3C2410X provides a
44 Y. Yin et al.
Remote server
TCP/IP
Fuzzy control
algorithm Driver circuit Battery
D/A
S3C2410 A/D Sensors
X circuit
Disk USB
Communication program
Fuzzy control program
Data management program
…...
Application program
Drivers
Hardware
The software architecture of the renewable energy generation system can be sep-
arated into three parts, namely the operating system environment, hardware driver
and application program, as shown in Fig. 4.14.
The operating system environment is based on Embedded Linux, which is
transplanted into the S3C2410X microcontroller, providing the basic software envi-
ronment for file system, network interface, memory management and multi-task
schedule etc. to the system. Embedded Linux is a free and open source real-time
operating system, and it is programmed with standard C language. It can be tailed
according to specific system requirements with some redundant tasks removed and
certain enhanced functions added, which guarantees the system’s reliability and
real-time and powerful performance.
The control process of the battery charging and discharging is non-linear, time-
varying with pure time delay, multiple variables and many external disturbances.
Many parameters such as the charging rate, the permitted maximum charging cur-
rent, the internal resistor, the port voltage, the temperature and moisture, change
during the charging and discharging process can not be obtained directly, and it
is impossible to use traditional control system. A fuzzy control unit for battery
46 Y. Yin et al.
Acknowledgements The authors would pay their appreciations to The Science Foundation for
New Teachers of Ministry of Education of China (No. 20070497059) and The Open Research
Projects Supported by The Project Fund (No. 2007A18) of The Hubei Province Key Laboratory
of Mechanical Transmission and Manufacturing Engineering, Wuhan University of Science and
Technology and The National Science Foundation of China (No. 50675166).
References
5.1 Introduction
where ! is the optical angular frequency, where Ex .t/ and Ey .t/ are .x; y/-
components of electric filed before transformation and Ex0 .t/; Ey0 .t/ are .x; y/-
components of electric filed after. Thus, we have:
0
Ex .t/ Ex .t/
D Q (5.2)
Ey0 .t/ Ey .t/
where Q is a complex Jones matrix with unit determinant. A subset of Jones matri-
ces, called the set of matrices of birefringence or optical activity that not only
preserves the degree of polarizations, but also has the additional feature of preserv-
ing two orthogonal fields (according to the Hermitian scalar product) [14] which
were orthogonal before the transformation. Matrices of this kind are complex uni-
tary matrices with unit determinant. Throughout this chapter we strictly refer to the
Jj as subset of Q D ŒJ0 : : : Jj : : : Jk1 .
By using the Jones representation, the field can be represented by the vec-
tor, J D ŒEx Ey T and the intensity of the beam can be normalized so that
ˇ ˇ2
jEx j2 C ˇEy ˇ D 1. Two SOPs represented by J1 and J2 are orthogonal if their
inner product is zero. Any SOP can be transformed into another by multiplying it by
a Mueller matrix [14] which are required for SOP processing e.g. polarizers, rota-
tors and retarders. In PolSK, the angle of one polarization component is switched
relative to the other between two angles; therefore, binary data bits are mapped into
two Jones vectors. A block diagram of the proposed PolSK–OCDMA transmitter
is illustrated in Fig. 5.1. The light source is a highly coherent laser with a fully
polarized SOP. If a non-polarized source is used, then a polarizer can be inserted
after the laser source. The light beam first passes through the polarization controller
that sets the polarization to an angle of 45ı for simplicity. Then, the lightwave gets
divided through polarization beam splitter (PBS) to become SOP-encoded in PolSK
modulator which switches the SOP of the input beam between two orthogonal states
(i.e. 0ı and 180ı at the phase modulator in Fig. 5.1 N times per bit according to an
externally supplied code (i.e. DPMPC) that spreads the optical signal into CDMA
format. Thereafter, the PolSK–OCDMA modulated signals are combined through
polarization beam combiner (PBC) and broadcasted. It is also displayed in Fig. 5.1
that for a K-user system with the first user as the desired one (for example), the ith
user SOP-encoded signal can be written as:
(
J0 if di .t/ ˚ ci .t/ D 0
Ji .t/ D (5.3)
J1 if di .t/ ˚ ci .t/ D 1
where di .t/ is the data signal with symbol duration of Ts , ci .t/ is the N -chip code
sequences of DPMPC signal with chip duration of Tc and di .t/; ci .t/ 2 f0; 1g; ˚
denotes the signal correlation. As the emitted light is initially (linearly) polarized at
an angle of 45ı , therefore J0 D p1 Œ1 1T and J1 D p1 Œ1 1T. In other words,
2 2
we have [8]:
1 1 1
Q D ŒJ0 J1 D p (5.4)
2 1 1
Therefore, the polarization-modulated signal travels a distance of L km through an
optical SMF. Consequently, the SOP-encoded signal undergoes several impairments
such as attenuation, dispersion, polarization rotation and fiber nonlinearity. At the
receiver end shown in Fig. 5.2, the SOP rotation is compensated by the polariza-
tion controller whose function is to insure that the received signal and the optical
components at the receiver have the same SOP reference axis.
The channel is represented by the Jones matrix Q and Ref:g refers to the real part
of complex EN 0 .t/. Since equipower signal constellations have been considered, both
orthogonal components are assumed to be equally attenuated. Thus, these terms can
be included in the constant amplitude of the electric field E.t/, N which neglects a
loss of orthogonality on the channel. While switching time in SOP (i.e. bit-rate) is
much slower than the chip-rate, the elements of the Jones matrix can be understood
as time-independent (i.e. Tc << Ts ). The x-component of the received electric field
vector based on Q D ŒJ0 J1 (see Eq. (5.4)) is:
( )
XK
0
Ex .t/ D Re E.t/ N ŒJ0 di .t/ C J1 .1 di .t// uT .t iTs /:ci .t/ (5.6)
i D1
Thus, orthogonal components of the ith user are given as Exi .t/ D J0 di .t/ci .t/
N and Eyi .t/ D J1 .1di .t//ci .t/E.t/
E.t/ N where the .x; y/-components of received
modulated signal are [7]:
!
Exi .t/CEyi .t/ X
K
0 Exi .t/Eyi .t/
Exi .t/D C ci .t/di .t/ uT .t iTs / cos.'xi /
2 2
i D1
!
E xi .t/E yi .t/ X K
E xi .t/CE yi .t/
0
Eyi .t/D C ci .t/di .t/ uT .t iTs / cos.'yi /
2 2
i D1
(5.7)
where 'xi and 'yi describe the frequencies and phases of transmitting lasers in a
general form of ' D !t C . Based on the concept of CDMA, the field vectors of
all K transmitters are combined and multiplexed over the same channel. Thus, the
overall channel field vector can be expressed as:
X
K
EN C hannel D EN i0 .t/ (5.8)
i D1
Figure 5.2 illustrates the application of the OTDLs used as the optical correlator
in this incoherent PolSK–OCDMA configuration. The delay coefficients in OTDLs
are designed in such a way to make them perform as a CDMA chip-decoder in both
branches. Additionally, OTDL in lower branch is set up with complement of code
used in upper branch (i.e. shown by OTDL/ to decode other symbol (i.e. ‘1’). It
can be observed from Fig. 5.2 that OTDLs output contain N chip pulses that can
be assumed as a parallel circuit of many single PDs so that their currents are added
and no interference between the OTDL pulses is possible. The signals are photo-
detected in the balanced-detector arrangement to generate the differential electrical
current (Idiff D I1 I2 ) ready for data-extraction in decision processor unit. The
total upper branch current (i.e. x-component) considering all chip currents after
photo-detection is then obtained as:
52 M.M. Karbassian, H. Ghafouri-Shiraz
ZTs X ( K
c.nTc / C 1 X Exi .t/ C Eyi .t/
N
0
I D<
2 2
nD1
t D0 i D1
2 )
Exi .t/ Eyi .t/
C di .t/ci .t nTc / cos.'xi / dt (5.9)
2
where < is the PD responsivity, ci .t nTc / is the nth chip of assigned spreading
code of the ith user.
By further simplification, it can be modified as:
ZTs X (
c.nTc / C 1 X 2
N K
<
I D
0
Exi .t/ C Eyi
2
.t/ C di .t/ci .t nTc /
4 2
t D0 nD1 i D1
o
2 2
.Exi .t/ Eyi .t// .1 cos 2'xi / dt
8 2
ZTs XN ˆ< c.nTc / C 1 6X X n
K K
<
C 4 Exi .t/ C Eyi .t/
8 nD1 :̂
2
t D0 i D1 j D1
j ¤i
C di .t/ci .t nTc /
.Exi .t/ Eyi .t// Exj .t/ C Eyj .t/ C dj .t/cj .t nTc /
) #)
.Exj .t/ Eyj .t/// cos .'xi C 'xj / C cos .'xi 'xj / dt
(5.10)
< X c.nTc / C 1
N
I0 D
4 nD1 2
!)
X
K
2
Exi .t/ C Eyi
2
.t/ C di .t/ci .t nTc /.Exi
2
.t/ Eyi
2
.t// (5.11)
i D1
Si0 D Exi
2 2
.t/ C Eyi .t/
(5.12)
Si1 D Exi
2 2
.t/ Eyi .t/
5 Optical CDMA Transceiver Architecture 53
where Si0 refers to signal intensity part, generated in upper branch of polarization
modulator at the transmitter while Si1 refers to the linear polarized part, generated
in lower branch containing data (see Fig. 5.1. Thus, Eq. (5.11) can be rewritten as:
( !)
< X c.nTc / C 1 X 0
N K
I D
0
Si C di .t/ci .t nTc /Si
1
(5.13)
4 nD1 2
i D1
Similarly the total current of the lower branch (i.e. y-component) can be
derived as:
( !)
< XN
1 c.nT c / XK
I1 D Si0 C di .t/ci .t nTc /Si1 (5.14)
4 nD1 2
i D1
<X X
N K
0
I D c.nTc / Si C di .t/ci .t nTc /Si1 C n.t/ (5.15)
4 nD1
i D1
2
where n.t/ represents the total filtered Gaussian noise with variance ˝ 2 ˛ of that
includes: (i) the PD shot-noise with electric current variance of iav D 2eiBo
where iav is the average photo-current; (ii) optically filtered ASE noise with vari-
2
ance of ASE D 2N0 B0 where N0 is the (unilateral) power spectral density (PSD)
of the white ASE noise arriving on each polarization and B0 is the optical filter
bandwidth; (iii) electronic receiver noise current (i.e. thermal noise) at the low-pass
2
filter with variance of LP D 2kb TBel =R: where R is the filter direct-current (dc)
equivalent impedance, T the absolute temperature and kb the Boltzmann constant,
Bel the filter bandwidth. Thus the overall variance of additive noise of n.t/ can be
represented as: ˝ 2˛
/ D iav C ASE C LP
2 2 2
n.t (5.16)
By considering the first user as the intended user, then we can modify the
differential output current, i.e. Eq. (5.15), as:
<X 0 <X
N N
I D S1 c.nTc / C c.nTc /c1 .t nTc /d1 .t/:S11
4 nD1 4 nD1
(5.17)
< XX
K N
C c.nTc /ci .t nTc /di .t/Si1 C n.t/
4
i D2
nD1
The first element in Eq. (5.17) is a dc current that needs estimation and removal
in the balanced-detector. The second element represents the intended data mixed
with its assigned spreading code auto-correlation and polarization while the third
element assumes the interference (i.e. MAI) caused by other transmitters and the
last one is the noise. Thus, the system SNR can be expressed as:
54 M.M. Karbassian, H. Ghafouri-Shiraz
!2
< P
N
4
c.nTc /c1 .t nTc /d1 .t/:S11
nD1
SNR D !2 (5.18)
< P
K P
N
4 c.nTc /ci .t nTc /di .t/Si1 C n.t
2
/
i D2 nD1
Both the auto- and cross-correlation of the DPMPC can be expressed respectively
as [9–13]:
XN
c.nTc /c1 .t nTc / D P C 2 (5.19)
nD1
X
N
Xli D cl .nTc /ci .t nTc / (5.20)
nD1
2
.
Note that SNR.1/ D <2 d12 .t/:S11 .P C 2/2 16n.t 2
/ D Eb =N0 , where Eb is the
energy of one bit, N0 is the noise PSD, denotes the single-user SNR. Equation (5.22)
is one of the main results of this study as it represents the SNR of polarization-
modulated optical CDMA system.
5 Optical CDMA Transceiver Architecture 55
BER estimation of binary PolSK modulation has already been evaluated [6, 14].
Here, the numerical results of the BER performance of the proposed transceiver
based on the above detailed analysis, resulted the OCDMA system SNR, are
demonstrated and discussed.
Figure 5.3 shows the BER of this architecture against the single-user SNR (shown
on Figs. by Sdb /. Different trends like 10%, 15%, 20% and 25% of full-load (i.e.
P 2 P interfering users) [2] as the number of simultaneous active users where
P D 19 have been evaluated in this investigation. As illustrated in Fig. 5.3, the
system that can manage 25% of full-load is able to provide BER D 109 with
Sdb D 16:5dB; whereas for Sdb D 8:5dB the system can support 20% load which
is still superior enough to deliver the network services. Furthermore, the system can
tolerate 15% load with only Sdb D 7 dB. It is indicated that the delivering network
services under these conditions is very power efficient. Although for supporting
greater number of users, higher values for P and Sdb are recommended.
Figure 5.4 also indicates the BER performance against the number of simul-
taneous users (K/ for the discussed architecture. It is observable from Fig. 5.4,
when the number of transmissions (i.e. users) increases, the BER also becomes
higher due to the growing interferences. The system employed Sdb D 14dB can
tolerate 80 simultaneous users where P D 19 which is equal to 24% of full-load.
While 73 users (21% of full-load) are guaranteed very consistent communication
Fig. 5.3 BER performance of the transceiver against single-user SNR, Sdb
56 M.M. Karbassian, H. Ghafouri-Shiraz
Fig. 5.4 BER performance of the transceiver against the number of simultaneous users, K
link (BER 109 / with only Sdb D 10dB, which refers to cost-effective design
with less power consumption.
As a comparison [7, 8], the proposed architecture can tolerate greater number of
users with less Sdb . On the other hand, the results achieved in this analysis are with
the code-length of 399 (i.e. for P D 19) [2] which is much shorter than analyzed
Gold sequences [7, 8]. That implies the proposed structure can provide even higher
throughput as the code-length is smaller.
5.5 Conclusion
This chapter has proposed and evaluated novel transceiver architecture of incoher-
ent PolSK–OCDMA with dual-balanced detection. The application of OTDL as the
CDMA-decoder has also been investigated. The BER performance of PolSK over
OCDMA in cooperation with DPMPC as the spreading code has been demonstrated
taking into account the effects of optical ASE noise, electronic receiver noise, PDs
shot-noise and mainly the MAI. The results indicated that the architecture can reli-
ably and power-efficiently accommodate great number of simultaneous users, and
can be promising for high-data-rate long-haul optical transmission.
5 Optical CDMA Transceiver Architecture 57
References
1. F. Liu, M. M. Karbassian and H. Ghafouri-Shiraz, “Novel family of prime codes for synch-
ronous optical CDMA”, J. Opt. Quant. Electron., vol. 39, no. 1, pp. 79–90, 2007
2. M. M. Karbassian and H. Ghafouri-Shiraz, “Fresh prime codes evaluation for synchronous
PPM and OPPM signaling for optical CDMA networks”, J. Lightw. Tech., vol. 25, no. 6,
pp. 1422–1430, 2007
3. H. Ghafouri-Shiraz, M. M. Karbassian, F. Lui, “Multiple access interference cancellation
in Manchester-coded synchronous optical PPM-CDMA network”, J. Opt. Quant. Electron.,
vol. 39, no. 9, pp. 723–734, 2007
4. M. M. Karbassian and H. Ghafouri-Shiraz, “Capacity enhancement in synchronous optical
overlapping PPM-CDMA network by a novel spreading code”, in Proceedings of GlobeCom,
pp. 2407–2411, 2007
5. S. Benedetto et al., “Coherent and direct-detection polarization modulation system experi-
ments,” in Proceedings of ECOC, 1994
6. S. Benedetto, R. Gaudino and P. Poggiolini, “Direct detection of optical digital transmission
based on polarization shift keying modulation”, IEEE J. Selected Areas Comms., vol. 13,
no. 3, pp. 531–542, 1995
7. K. Iversen, J. Mueckenheim and D. Junghanns, “Performance evaluation of optical CDMA
using PolSK-DD to improve bipolar capacity”, in SPIE Proceedings, vol. 2450 (Amsterdam),
pp. 319–329, 1995
8. N. Tarhuni, T. O. Korhonen and M. Elmusrati, “State of polarization encoding for optical code
division multiple access networks”, J. Electromagnet. WavesAppl. (JEMWA), vol. 21, no. 10,
pp. 1313–1321, 2007
9. M. M. Karbassian and H. Ghafouri-Shiraz, “Performance analysis of heterodyne detected
coherent optical CDMA using a novel prime code family”, J. Lightw. Technol., vol. 25, no. 10,
pp. 3028–3034, 2007
10. M. M. Karbassian and H. Ghafouri-Shiraz, “Phase-modulations analyses in coherent homo-
dyne optical CDMA network using a novel prime code family”, in Proceedings of WCE-
ICEEE (IAENG), pp. 358–362, 2007
11. M. M. Karbassian and H. Ghafouri-Shiraz, “Performance analysis of unipolar code in different
phase modulations in coherent homodyne optical CDMA”, J. Eng. Lett. (IAENG), vol. 16,
no. 1, pp. 50–55, 2008
12. M. M. Karbassian and H. Ghafouri-Shiraz, “Novel channel interference reduction in optical
synchronous FSK-CDMA networks using a data-free reference”, J. Lightw. Technol., vol. 26,
no. 8, pp. 977–985, 2008
13. M. M. Karbassian and H. Ghafouri-Shiraz, “Frequency-shift keying optical code-division
multiple-access system with novel interference cancellation”, J. Microw. Opt. Techno. Lett.,
vol. 50, no. 4, pp. 883–885, 2008
14. S. Benedetto and P. Poggiolini, “Theory of polarization shift keying modulation”, IEEE Trans.
Comms., vol. 40, no. 4, pp. 708–721, 1992
Chapter 6
Template Based: A Novel STG Based Logic
Synthesis for Asynchronous Control Circuits
6.1 Introduction
implement the circuits with hazard free. To find a state encoding, several authors
have concerned with different ways, such as a logic synthesis using Petri-net
unfolding synthesis using state based technique [1–3], synthesis using structural
encoding technique, and etc. State based technique is involved several problems.
The major problem is the detection of state encoding conflicts. It causes exponential
complexity of conflict states with size of specification.
Another problem is the insertion of additional signals to solve the conflict states.
Signal insertion must be done in such a way that the new signals are consistent
(rising and falling transitions alternate) and their behaviors are hazard free. This
method is insufficient for the specification such large scale of signals. Structural
encoding method is more suitable than classical state based technique. The structural
encoding scheme is Petri-net level encoding. This method encodes with two silent
transitions [3, 5] to generate encode STG. Signal removal is begun after encoded
STG has been generated. Iteratively removing signal is done with greedy removal
from encode STG. In the end, it is produced a reduced STG. The next is to start
projection for CSC support for each non-input signals. This method is available
for large state encoding. However, it suffers from complexity reduction in greedy
removal phase according to the number of iterative signal reduction.
This article proposes a template based method for synthesizing complicated
STGs, and it also useful for some large scale STGs. The template based technique
not only be useful for complicated and large scale STGs but also solves the reduction
complexity of structural encoding technique.
Our template based technique was explained by one complicated example called
asynchronous DMA controller that has been used in our asynchronous processor
implementation [5].
We introduced three types of synthesis scheme in our explanation. Firstly, the
synthesis of asynchronous DMA controller using state based technique. We derived
the DMA’s STG to state graph and showed the limitation of synthesizable properties
which is limited by STG’s rule called complete state coding (CSC). The complete
state coding meant the state graph must be unique on state space; the asynchronous
DMA controller specification is unsatisfied while state based was applied. From
the state graph, if we try to insert an additional signal, the complication level is
increased and suffers the state explosion.
Secondly, structural encoding technique is applied. The asynchronous DMA con-
troller can be synthesized with this technique, but it shows some complexity in
signal removal phase. Finally, template based is introduced. We generate the tem-
plate STG from original STG using Petri-net level then we trace all non-input signals
to small state space and derive it to Karnaugh map, if it shows conflicts, we have
to insert some signals from template STG repeatedly, and we can get the final STG
called STG with appropriate insertion point.
The final part of this paper is concluded with the complexity comparison between
our template based technique, state based technique and structural encoding
technique.
6 Template Based: A Novel STG Based Logic Synthesis for Asynchronous 61
Setup
Transfer
Request
I/O not ready
I/O
Transfer
data
Count = 0
Hold Dev_Req
Hold_Ack DMA WT_Req
CPU_INT WT_ack
(a)
TC_Req
TC_Ack
addr
Bus data
Interface
Base Addr
INC/DEC
Count
(a)
WT_ack+
TC_Req+ WT_Req−
TC_Ack+ WT_Ack−
TC_Req– CPU_INT+
TC_Ack− Hold−
WT_Req+
Hold_ack−
Hold_Ack+
Dev_Req−
Hold+
CPU_INT−
(b)
Dev_Req+
The internal architecture is very closely based on the classic DMA controller.
The architecture proposed of the DMA controller split into above functional
units. The most complex of these units is timing and control unit, which consist
of a large and complicated STG which described in the diagram on the Fig. 6.4.
The DMA controller works follow on STG as shown in Fig. 6.4. Finally, DMA
releases the usage of asynchronous system bus and activate the CPU INT line in
order to indicate the asynchronous processor that transfer was complete.
The purpose of this section is to present the synthesis of DMA controller using
state-based technique. The key steps in this method are the generation of a state
graph, which is a binary encoded reach ability graph of the underlying Petri-net,
6 Template Based: A Novel STG Based Logic Synthesis for Asynchronous 65
11111000
TC_Req+ WT_Req-
WT_Ack+
11111100 11101000
TC_Ack+ TC_Req+ WT_Ack-
WT_Req-
10101001 11000000
Hold_ack-
01
01
WT_req-
01
10
WT_Ack- Hold+
10
00
01
00
00
10
10
10
Dev_req- Dev_req-
Dev_req-
01
01
10
01
10
01
00
00000000 10000000
00
WT_req-
00
WT_Ack-
00
CPU_INT– Dev_req+
00
00
and deriving boolean equations for the output signals via their next state functions
obtained from the state-graph.
The state graph shown in Fig. 6.5 is a state space of DMA controller specification
that was derived from Fig. 6.4, the shadowed states are the conflict states, it clearly
suffers to state explosion problem of state based method, to solve the explosion
problem, an additional states are inserted, from state graph, it hardly finds an inser-
tion of the internal signals due to many of redundant states, it’s the major problem
of state based technique.
State encoding, at this point, we must have noticed a peculiar situation for the
value of the next-state function for signal that in two states with the same binary
encoding. This binary encoding is assigned to the shadowed states in Fig. 6.5.
The binary encoding of the state graph signals alone cannot determine the future
behavior. Hence, an ambiguity arises when trying to define the next-state function.
This ambiguity is illustrated in the Karnaugh map. When this occurs, the system is
said to violate the Complete State Coding (CSC) property. Enforcing CSC is one of
the most difficult problems in the synthesis of asynchronous circuits.
66 S. Sudeng, A. Thongtak
This section shows a limitation and problem of state based synthesis. However,
the solution of state based design is also possible; with inserting internal signals to
solve the two conflicting states are disambiguated by the value of CSC, which is the
last value in the binary vectors. Now Boolean minimization can be performed and
logic equations can be obtained.
The structural encoding scheme provides a way to avoid the state space explosion
problem. The main benefit of using structural method is the ability to deal with large
highly concurrent specifications, which cannot be tracked by state base method, the
structural method for synthesis of asynchronous circuits is the class of mark graph.
By transform the STG specification to Petri net, and then encode it to be well form
specification as shown in Figs. 6.6 and 6.7.
WT_Ack+
Sp7+ Sp17+
WT_ack+
Sp6-
Sp16-
TC_Req+
WT_Req-
Sp13+ Sp8+
TC_Req+ WT_Req−
Sp18+
Sp7-
Sp17-
TC_Ack+
WT_Req+
Sp10+ CPU_INT+
Sp4-
Sp5+
TC_Ack- Sp14-
Hold_Ack+
Sp11+ Hold-
Hold_ack−
Sp3-
Sp10- Sp15-
Hold_Ack+ Sp4+
Hold_ack-
Sp12+
Dev_Req− Hold+
Sp11-
Sp2-
Dev_Req-
CPU_INT− Sp18-
Sp2-
Dev_Req+
(b) CPU_INT-
Sp1-
Sp2+
Dev_Req+
Sp6+ Sp5-
P6 P6
P6
WT_Ack+
P5
P5 P5
Sp5-
Sp16+
P16 P16 P16
The main idea of structural method is insertion of new signals in the initial
specification in a way that unique encoding is guaranteed in the transform speci-
fication.
Structural encoding technique, working at level of Petri-net can synthesize a big
size of specification. State based is used in the final state of this method, when
specification has been decomposing to smaller ones [4].
The synthesis flow is given a consistent STG, encode for all signals resulting an
STG contains a new set of signals that ensure unique state and complete state cod-
ing property. Depending encoding technique applied, since many of encoding signal
may unnecessary to guarantee unique and complete state coding, they are iteratively
removed each signal using greedy heuristics until no more signal can be removed
without violating unique and complete state coding property. The reduced STG next
projected onto different sets of signals to implement each individual output signal.
One the reduced STG is reached, it must be computed the complete state support
for each non-input signal, applying CSC support algorithm as shown in next section.
Afterward the projection of the STG into the CSC support for each non-input signal
is performed, finally speed independent synthesis of each projection is performed,
provide the small size of the projection, state based techniques can be applied for
performing the synthesis. When synthesizing the projection for each non-input sig-
nal a, every signal is the projection but a is considered as an input signal. These
prevent the logic synthesis to try to solve possible conflicts for the rest of signals
(Fig. 6.8).
Initial STG
Encoded STG
Signal Removal
Transformation
+
CSC Checking
Reduced STG
Synthesis
Synthesis
produced, and project for all non-inputs signal to get the small size CSC support,
apply state based method for each CSC support to get a circuits for each non-input
signals, However, structural based still consumed more wasting time in the design
phase, especially signal reduction phase, each time of reduction, it must verify the
persistence, consistence and complete state coding properties before reached the
final reduction.
This work focuses on complexity reduction; our process begins with an encoding
STG using Petri-net level in order to form a template STG. Then the projection
to each non-input signals from an original STG is done. After that, we trace the
projection to smaller state space. If the small state space shows conflicts, we have
to insert balance signals from template STG. Unbalance signals are inserted after in
case the space still shows conflicts. Finally, we can get the STG with appropriate
insertion points which are used to be projected for CSC support on each non-input
signal as shown in Fig. 6.9.
Initial STG
Template STG
Original STG
Conflict = Yes Projection for s1 Projection for sn Projection
Conflict = No
Figure 6.10 shows the template STG that encoded with the encoding scheme in
Fig. 6.7.
After the template based technique was applied. We can get the STG with
appropriate insertion signals as shown in Fig. 6.11.
WT_Ack+
Sp7+ Sp17+
Sp6−
Sp16−
+
TC_Req
WT_Req−
Sp 13+ Sp8+
Sp18+
Sp7−
Sp17−
TC_Ack+
Sp16+ Sp6+
TC_Req− Sp13−
WT_Req+
Sp10+ CPU_INT+
Sp4−
Sp9− Sp15+
Sp5+
TC_Ack− Sp14−
Hold_Ack+
Sp11+ Hold−
Sp3−
Sp10− Sp15−
Hold_ack− Sp4+
Sp12+
Hold+
Sp11−
Sp2−
Dev_Req−
Sp1+ Sp3+
Sp2− Sp18−
Dev_Req+
CPU_INT−
Sp1−
Sp2+
WT_ack+
Sp5+
TC_Req+ WT_Req−
TC_Ack+ WT_Ack−
WT_Req+
Sp9+ CPU_INT+
Hold_Ack+
Sp9− Hold−
TC_Ack−
Sp5−
Hold_ack−
Sp1+
Hold+
Dev_Req−
Dev_Req+
CPU_INT−
Sp1−
CSC support calculation for each non-input (output and internal signals) signal
is calculated. Finally we got the small state graph which guarantee that it save
from state explosion and also satisfied the synthesizing properties (Figs. 6.12
and 6.13).
72 S. Sudeng, A. Thongtak
TC_Req+ TC_Ack–
Sp9–
TC_Ack+
WT_ack+ Sp9–
TC_Req– WT_Req+
Sp9+ Sp5+
TC_Req–
TC_req+ SP5–
CSC Support WT_Ack+
(Sp9)
Sp9+
WT_ack– WT_Ack–
WT_Req+
Sp5+
TC_Req WT_Req–
+
TC_Ack+ WT_Ack
-
TC_Ack+ WT_Req+
Sp1+
Sp9+ CPU_INT+
Sp5+ Hold_ack+
Sp9+ CPU_INT+
Dev_Req+ Hold_Ack+
Sp9– Hold–
SP5–
TC_Ack– CSC Support WT_Req-
Sp1– (CPU_INT) TC_Ack– CSC Support
Sp9–
Sp5– (Sp5) WT_Req+
Hold_Ack-
CPU_INT–
Hold_ack–
Dev_req– Sp1+
Hold+
Dev_Req–
Dev_Req+
CPU_INT–
CSC Support
Sp1–
(Hold)
CSC Support
WT_Req+
(Sp1) WT_Ack+
Hold+ Dev_Req+
CPU_INT+ WT_Req–
Dev_Req+
CPU_INT–
CPU_INT+
Dev_Req– Sp1+
Dev_req– WT_Ack–
Hold–
Sp1–
CPU_INT–
Fig. 6.12 STG with Insertion Point and CSC support calculation
6.7 Result
2. CPU_INT 3. TC_Req
Sp1
1. Hold Dev_Req Sp9
Dev_Req
Hold CPU_INT TC_Req
CPU_INT TC_Ack WT_Ack
5. Sp1
WT_Ack
4. WT_Req
CPU_INT Sp1
6. Sp5
WT_Ack
Sp5 Dev_Req Hold_Ack
Sp5
Hold_Ack WT_Req WT_Req
7. Sp9
TC_Req
Sp9
TC_Ack
tools for asynchronous controller synthesis, all circuits in Table 6.1 were manually
synthesis.
This paper proposes template based technique for reducing additional signals in
signal transition graph (STG) based logic synthesis, and solves the reduction com-
plexity of structural encoding method; our method is explained by asynchronous
DMA controller specification. The final section showed the comparison between
our template based method with state based and structural encoding method. The
number of iterative signal removal according to our method is less than others,
Roughly speaking, structural encoding is better than classical state based method,
and template based method is less complexity than structural encoding.
74 S. Sudeng, A. Thongtak
References
1. S.B. Park. Synthesis of Asynchronous VLSI circuit from Signal Transition Graph Specifica-
tions, Ph.D. thesis, Tokyo Institute of Technology, 1996.
2. J. Carmona, J. Cortadella. ILP models for the synthesis of asynchronous control circuits. In
Proceedings of the International Conference Computer-Aided Design (ICCAD), pp. 818–825,
November 2003.
3. J. Carmona and J. Cortadella. State encoding of large asynchronous controllers. In Proceedings
of the ACM/IEEE Design Automation Conference, pp. 939–944, July 2006.
4. J. Carmona, J. Cortadella, and E. Pastor. A structural encoding technique for the synthesis of
asynchronous circuits. In International Conference on Application of Concurrency to System
Design, pp. 157–166, June 2001.
5. S. Sudeng and A. Thongtak FPGA Implementation of Quasi-Delay Insensitive Microproces-
sor. The World Congress on Engineering and Computer Science (WCECS2007), Clark Kerr
Campus, University of California Berkeley, San Francisco, CA, USA on 24–26 October 2007.
Chapter 7
A Comparison of Induction Motor Speed
Estimation Using Conventional MRAS
and an AI-Based MRAS Parallel System
Abstract The Model Reference Adaptive System (MRAS) is probably the most
widely applied speed sensorless drive control scheme. This chapter compares induc-
tion motor speed estimation using conventional MRAS and AI-based MRAS with
Stator Resistance Compensation. A conventional mathematical model based MRAS
speed estimation scheme can give a relatively precise speed estimation result, but
errors will occur during low frequency operation. Furthermore, it is also very sen-
sitive to machine parameter variations. An AI-based MRAS-based system with a
Stator Resistance Compensation model can improve the speed estimation accuracy
and is relatively robust to parameter variations even at an extremely low frequency.
Simulation results using a validated machine model are used to demonstrate the
improved behaviour.
7.1 Introduction
Much effort has been devoted to speed-sensorless induction machine drive schemes,
with the Model Reference Adaptive System (MRAS) being the most popular [1].
In a conventional mathematical-model-based MRAS, some state variables are esti-
mated in a reference model (e.g. rotor flux linkage components, rd , rq , or
back e.m.f. components, ed , eq , etc.) of the induction machine obtained by using
measured quantities (e.g. stator currents and perhaps voltages).
These reference model components are then compared with state variables esti-
mated using an adaptive model. The difference between these state variables is then
used in an adaptation mechanism, which, for example, outputs the estimated value
of the rotor speed .!O r / and adjusts the adaptive model until satisfactory performance
is obtained [2–6].
Greater accuracy and robustness can be achieved if the mathematical model is
not used at all and instead an AI-based non-linear adaptive model is employed. It is
then also possible to eliminate the need of the separate PI controller, since this can
be integrated into the tuning mechanism of the AI-based model [7].
However, both the conventional MRAS and AI-based MRAS scheme are easily
affected by machine parameter variations, which happen during practical opera-
tion [8, 9]. Hence an online stator resistance estimator is applied to the AI-based
MRAS scheme which is shown to make the whole scheme more robust using a com-
puter simulation and could make the scheme usable for practical operation [10, 11].
The comparison of schemes presented here is felt to be valuable since much of the
literature presents results for the novel approach alone [1].
These two equations represent a so-called stator voltage model, which does not con-
tain the rotor speed and is therefore a reference model. However, when the rotor
voltage equations of the induction machine are expressed in the stationary reference
frame, they contain the rotor fluxes and the speed as well. These are the equations
of the adaptive model:
Z
O rd D .1=Tr / Lm isD O rd !O r Tr O rq dt (7.3)
Z
O rq D .1=Tr / Lm isQ O rq !O r Tr O rd dt (7.4)
7 A Comparison of Induction Motor Speed Estimation 77
Flux estimator ψ′ ε
Speed tuning signal ω
reference model
ψ ′ × ψ ′ =Im ψ ′ψ ′
Flux estimator ψ′
adaptive model
ω
Kp+K ¤p
Fig. 7.1 MRAS-based rotor speed observer using rotor flux linkages for the speed tuning signal
The reference and adaptive models are used to estimate the rotor flux linkages and
the angular difference of the outputs of the two estimators is used as the speed
tuning signal. This tuning signal is the input to a linear controller (PI controller)
which outputs the estimated rotor speed as shown in Fig. 7.1. The estimated speed
can be expressed as (7.5)
Z
!O r D Kp "! C Ki "! dt (7.5)
simple two layer neural network, which gives the whole system a fast response and
better accuracy than the conventional MRAS [13, 14].
Compared to the conventional MRAS scheme, the MRAS based rotor speed esti-
mator containing a two-layer ANN could give more accurate estimation result
and be relatively robust to parameters variations. The two-layer ANN replaces the
adjustable model and adaptive mechanism in the conventional MRAS, but the ref-
erence model is still necessary for estimation the rotor flux which is used as speed
tuning signal. Several machine parameters are used to build the conventional refer-
ence model, such as stator resistance .Rs / and stator reluctance .Ls /. These parame-
ters may change during the different periods of motor operation. The values of these
parameters are fixed in the reference model. So the ANN speed estimator is still sen-
sitive to parameter variations especially during the motor low speed running period.
To solve this problem and make this scheme more independent to the machine
parameters a stator resistance estimator is built in the new reference model, in which
the stator resistance Rs value could be estimated online. Figure 7.3 shows the total
scheme of this neural network based MRAS with a dynamic reference model.
In this new system, both the reference model and adaptive model of the conven-
tional MRAS system are modified for better performance. The whole system can
be divided into two main parts, the dynamic reference model part and the neural
network part. The dynamic reference part consists of the dynamic reference model
derived from Eqs. (7.1) and (7.2), in which the stator resistance Rs is replaced by
7 A Comparison of Induction Motor Speed Estimation 79
ψrd +
usD
usQ Dynamic Reference
ψrq - +
isD Model
isQ
-
Adaptive
Z-1 Z-1 Rs Tuning
Mechanism
PI Controller Signal
Two-layer ANN
W2 Weight Weight
-1
Z Z-1 Adjustment
Fig. 7.3 MRAS based ANN speed estimator with dynamic reference model
the online estimated value RO s coming from Eqs. (7.6) and (7.7),
O Ki
Rs D Kp C eRs (7.6)
p
eRs D isD . rd O rd / C isQ . rq O rq / (7.7)
The neural network part contains a simple two-layer network, with only an input
layer and an output layer. Adjustable and constant weights are built in the neural net-
work, and the adjustable weights are proportional to the rotor speed. The adjustable
weights are changed by using the error between the outputs of the reference model
and the adjustable model, since any mismatch between the actual rotor speed and
the estimated rotor speed results in an error between the outputs of the reference and
adaptive estimators.
To obtain the required weight adjustments in the ANN, the sampled data forms
of Eqs. (7.3) and (7.4) are considered. using the backward difference method the
sampled data forms of the equations for the rotor flux linkages can be written as
(7.8) and (7.9), where T is the sampling time.
Thus the rotor flux linkages at the kth sampling instant can be obtained from the
previous .k 1/th values as
w1 D 1 c
w2 D !r cTr D !r T (7.12)
w3 D cLm
It can be seen that w1 and w3 are constant weights, but w2 is a variable weight and
is proportional to the speed. Thus Eqs. (7.10) and (7.11) take the following forms:
O rd .k/ D w1 O rd .k 1/ w2 O rq .k 1/
(7.13)
C w3 isD .k 1/
O rq .k/ D w1 O rq .k 1/ C w2 O rd .k 1/
(7.14)
C w3 isQ .k 1/
These equations can be visualized by the very simple two-layer ANN shown in
Fig. 7.4.
The neural network is training by the backpropagation method, the estimated
rotor speed can be obtained from:
Fig. 7.4 Neural network representation for estimated rotor flux linkages
7 A Comparison of Induction Motor Speed Estimation 81
where is the learning rate and ˛ is a positive constant called the momentum con-
stant. The inclusion of the momentum term into the weight adjustment mechanism
can significantly increase the convergence, which is extremely useful when the ANN
shown in Fig. 7.4 is used to estimate in real time the speed of the induction machine.
To compare the conventional MRAS and the AI-based MRAS with dynamic refer-
ence model, simulations are established by using Matlab-Simulink software, based
on the standard well established validated 2-axis machine model [6].
Speed estimation results using conventional MRAS and neural network based
MRAS are shown in Figs. 7.5 and 7.6 respectively. These results assume the
machine parameters are correctly measured and unchanged during operation. Both
of the two schemes can give good speed tracking results.
Further simulation were carried out with changed stator resistance to test how
much changing parameters would affect the speed estimation results.
In Figs. 7.7 and 7.8, simulations are carried out with the stator resistance changed
by a small amount, 2%. Obviously both schemes are still sensitive to parameter
variations.
A final simulation for AI-based MRAS with the dynamic reference model is
shown in Fig. 7.9. The online estimated stator resistance is displayed in Fig. 7.10.
This simulation result shows the effect caused by the stator resistance variation has
been considerably improved.
Fig. 7.7 Speed estimation by using conventional MRAS (with stator resistance Rs changed 2%)
Comparing all the above simulation results shows that the conventional MRAS
scheme works well when the parameters are precisely measured and do not change
during operation. The MRAS with adaptive model replaced by the two-layer neural
network can slightly improve the performance when working in the same situation.
But both schemes can still be easily affected by parameters variations, which do
occur during practical operation. Introducing the stator resistance online estimator
gives much improved performance which should enable the scheme to be usable for
practical operation.
7 A Comparison of Induction Motor Speed Estimation 83
Fig. 7.8 Speed estimation using two-layer ANN MRAS (with stator resistance Rs changed 2%)
200
Estimated Speed
150 Rotor Speed
100
50
Speed
-50
-100
-150
-200
0 1 2 3 4 5 6
Time
Fig. 7.9 Speed estimation using two-layer MRAS with dynamic reference model
7.6 Conclusion
The main objective of this chapter is to compare conventional MRAS and AI-based
MRAS parallel system for induction motor sensorless speed estimation. The con-
ventional MRAS can give good speed estimation in most of the operation period,
but errors will occur during low frequency operation mainly caused by machine
parameter variations. An AI-based MRAS system can give improved accuracy and
84 C. Yang, Prof. J. Finch
2.5
Estimated Rs
Rs
2
1.5
Rs
0.5
-0.5
0 1 2 3 4 5 6
Time
bypasses the PI controller tuning problems. The simple structure of the two-layer
neural network shown in Fig. 7.4 yields a speed estimation system working online
with a fast response. Also the simple two-layer neural network does not require a
separate learning stage, since the learning takes place during the on-line speed esti-
mation process. This is mainly due to the fact that the development time of such an
estimator is short and the estimator can be made robust to parameter variations and
noise. Furthermore, in contrast to most conventional schemes, it can avoid the direct
use of a speed-dependent mathematical model of the machine.
However, the Two-layer neural network MRAS lies more in the realm of adaptive
control then neural networks. The speed value is not obtained at the output, but as
one of the weights. Moreover, only one weight is adjusted in the training. Therefore,
it would still be sensitive to parameter variations and system noise.
In the new approach, an online stator resistance estimator is used to compen-
sate the parameter variations. The computer simulation results show that this new
approach makes the whole scheme more robust to parameter variations, enhancing
the possibility of practical use of the neural network based MRAS scheme. The sta-
tor resistance estimator works under an adaptive mechanism (PI controller). Further
study could be carried out to replace the PI controller with another simple neural
network which could also estimate more machine parameters.
References
1. Finch, J.W. and Giaouris, D., Controlled AC Electrical Drives, IEEE Transactions on
Industrial Electronics, Feb. 2008, 55, 1, 1–11.
2. Landau, Y.D., Adaptive Control the Model Reference Approach, 1979: Marcel Dekker, New
York.
7 A Comparison of Induction Motor Speed Estimation 85
3. Vas, P., Sensorless Vector and Direct Torque Control, 1998: Oxford University Press, Oxford.
4. Shauder, C., Adaptive Speed Identification for Vector Control of Induction Motors without
Rotational Transducers. IEEE Transactions on Industry Applications, 1992, 28, 1054–1062.
5. Yang, G. and T. Chin, Adaptive-Speed Identification Scheme for a Vector-Controlled Speed
Sensorless Inverter-Induction Motors. IEEE Transactions on Industry Applications, 1993, 29,
820–825.
6. Fitzgerald, A.E., C. Kingsley, and S.D. Umans, Electric Machinery. 6th ed., 2003: McGraw-
Hill International Edition, New York.
7. Vas, P., Artificial-Intelligence-Based Electrical Machines and Drives. 1999: Oxford University
Press, Oxford.
8. Kumara, I.N.S., Speed Sensorless Field Oriented Control for Induction Motor Drive. Ph.D.
thesis, 2006, University of Newcastle upon Tyne.
9. Leonhard, W., Controlled AC Drives, a Successful Transition from Ideas to Industrial Practice,
1996: Elsevier, Amsterdam, Netherlands.
10. Zhen, L. and L. Xu, Sensorless Field Orientation Control of Induction Machines Based on a
Mutual MRAS Scheme. IEEE Transactions on Industrial Electronics, 1998, 45, 824–831.
11. Holtz, J. and J. Quan, Drift-and Parameter-Compensated Flux Estimator for Persistent Zero-
Stator-Frequency Operation of Sensorless-Controlled Induction Motors. IEEE Transactions
on Industry Applications, 2003, 39, 1052–1060.
12. Ohtani, T., N. Takada, and K. Tanaka, Vector Control of Induction Motor without Shaft
Encoder. IEEE Transactions on Industry Applications, 1992, 28, 157–164.
13. Peng, F.Z. and T. Fukao, Robust Speed Identification for Speed-Sensorless Vector Control of
Induction Motors. IEEE Transactions on Industry Applications, 1994, 30, 945–953.
14. Vasic, V. and S. Vukosavic, Robust MRAS-Based Algorithm for Stator Resistance and Rotor
Speed Identification. IEEE Power Engineering Review, 2001, 21, 39–41.
Chapter 8
A New Fuzzy-Based Additive Noise Removal
Filter
Abstract A digital color image C can be represented in different color space such
as RGB, HSV, L a b etc. In the proposed method, RGB space is used as the basic
color space. Different proportions of red, green and blue light gives a wide range
of colors. Colors in RGB space are represented by a 3-D vector with first element
being red, the second being green and third being blue, respectively. The general
idea in this method is to take into account the fine details of the image such as edges
and color component distances, which will be preserved by the filter. The goal of
the first filter is to distinguish between local variations due to image structures such
as edges. The goal is accomplished by using Euclidean distances between color
components instead of differences between the components as done in most of the
existing filters. The proposed method uses 2-D distances instead of 3-D distances,
and uses three fuzzy rules to calculate weights for the Takagi-Sugeno fuzzy model.
8.1 Introduction
Images are often degraded by random noise. Noise can occur during image capture,
transmission or processing, and may be dependent on or independent of image con-
tent. Noise is usually described by its probabilistic characteristics. Gaussian noise
is a very good approximation of noise that occurs in many practical cases [1].
The ultimate goal of restoration techniques is to improve an image in some pre-
defined sense. Although there are areas of overlap, image enhancement is largely
M. S. Nair (B)
Rajagiri School of Computer Science, Rajagiri College of Social Sciences, Kalamassery,
Kochi 683104, Kerala, India,
E-mail: madhu s nair2001@yahoo.com
a subjective process, while image restoration is for the most part an objective
process. Restoration attempts to reconstruct or recover an image that has been
degraded by using a priori knowledge of the degradation phenomenon [2]. Thus
restoration techniques are oriented toward modeling the degradation and applying
the inverse process in order to recover the original image. This approach usually
involves formulating a criterion of goodness that will yield an optimal estimate
of the desired result. By contrast, enhancement techniques basically are heuris-
tic procedures designed to manipulate an image in order to take advantage of the
psychophysical aspects of human visual system. For example, histogram equaliza-
tion is considered an enhancement technique because it is primarily on the pleasing
aspects it might present to the viewer, whereas removal of image blur by applying a
deblurring function is considered a restoration technique.
Image restoration differs from image enhancement in that the latter is con-
cerned more with accentuation or extraction of image features rather than restoration
of degradations. Image restoration problems can be quantified precisely, whereas
enhancement criteria are difficult to represent mathematically. Consequently, restor-
ation techniques often depend only on the class or ensemble properties of a data
set, whereas image enhancement techniques are much more image dependent. The
degradation process is usually modeled as a degradation function that, together with
an additive noise term, operates on an input image f .x; y/ to produce a degraded
image g.x; y/. Given g.x; y/, some knowledge about the degradation function H ,
and some knowledge about the additive noise term .x; y/, the objective of restora-
tion is to obtain an estimate fO.x; y/ of the original image. The estimate needs to
be as close as possible to the original input image and, in general, the more about
H and is known, the closer fO.x; y/ will be to f .x; y/.
If H is a linear, position-invariant process, then the degraded image is given in
the spatial domain by
where h.x; y/ is the spatial representation of the degradation function and the
symbol “*” indicates convolution [2].
Image noise reduction has come to specifically mean a process of smoothing
noise that has somehow corrupted the image. During image transmission, noise
which is usually independent of the image signal occurs. Noise may be additive,
where noise and image signal g are independent.
where f .x; y/ is the noisy image signal, g.x; y/ is the original image signal and
v.x; y/ is the noise signal which is independent of g [3]. The additive noise image v
models an undesirable, unpredictable corruption of g. The process v is called a two-
dimensional random process or a random field. The goal of restoration is to recover
an image h that resembles g as closely as possible by reducing v. If there is an ade-
quate model for the noise, then the problem of finding h can be posed as the image
8 A New Fuzzy-Based Additive Noise Removal Filter 89
Noise reduction is the process of removing noise from a signal. Noise reduction
techniques are conceptually very similar regardless of the signal being processed,
however a priori knowledge of the characteristics of an expected signal can mean the
implementations of these techniques vary greatly depending on the type of signal.
Although linear image enhancement tools are often adequate in many applications,
significant advantages in image enhancement can be attained if non-linear tech-
niques are applied [3]. Non-linear methods effectively preserve edges and details
of images, whereas methods using linear operators tend to blur and distort them.
Additionally, non-linear image enhancement tools are less susceptible to noise.
One method to remove noise is to use linear filters by convolving the original
image with a mask. The Gaussian mask comprises elements determined by a Gaus-
sian function. It gives the image a blurred appearance if the standard deviation of
the mask is high, and has the effect of smearing out the value of a single pixel over
an area of the image. Averaging sets each pixel to the average value of itself and its
nearby neighbors. Averaging tends to blur an image, because pixel intensity values
which are significantly higher or lower than the surrounding neighborhood would
smear across the area. Conservative smoothing is another noise reduction technique
that is explicitly designed to remove noise spikes (e.g. salt and pepper noise) and
is, therefore, less effective at removing additive noise (e.g. Gaussian noise) from an
image [5].
Additive noise is generally more difficult to remove from images than impulse
noise because a value from a certain distribution is added to each image pixel, for
example, a Gaussian distribution. A huge amount of wavelet based methods [6]
are available to achieve a good noise reduction (for the additive noise type), while
preserving the significant image details.. The wavelet denoising procedure usually
consists of shrinking the wavelet coefficients, that is, the coefficients that contain
primarily noise should be reduced to negligible values, while the ones containing
a significant noise-free component should be reduced less. A common shrinkage
approach is the application of simple thresholding nonlinearities to the empirical
wavelet coefficients [7, 8]. Shrinkage estimators can also result from a Bayesian
approach, in which a prior distribution of the noise-free data (e.g., Laplacian [4],
generalized Gaussian [9–11]) is integrated in the denoising scheme.
90 M. Wilscy, M.S. Nair
Fuzzy set theory and fuzzy logic offer us powerful tools to represent and process
human knowledge represented as fuzzy if-then rules. Several fuzzy filters for noise
reduction have already been developed, e.g., the iterative fuzzy control based filters
from [12], the GOA filter [13, 14], and so on. Most of these state-of-the-art methods
are mainly developed for the reduction of fat-tailed noise like impulse noise. Never-
theless, most of the current fuzzy techniques do not produce convincing results for
additive noise [15, 16]. Another shortcoming of the current methods is that most of
these filters are especially developed for grayscale images. It is, of course, possible
to extend these filters to color images by applying them on each color component
separately, independent of the other components. However, this introduces many
artifacts, especially on edge or texture elements.
A new fuzzy method proposed by Stefan Schulte, Valérie De Witte, and Etienne
E. Kerre, is a simple fuzzy technique [17] for filtering color images corrupted with
narrow-tailed and medium narrow-tailed noise (e.g., Gaussian noise) without intro-
ducing the above mentioned artifacts. Their method outperforms the conventional
filter as well as other fuzzy noise filters. In this paper, we are presenting a modified
version of the fuzzy approach proposed by Stefan Schulte, et al. [17], which uses a
Gaussian combination membership function to yield a better result, compared to the
conventional filters as well as the recently developed advanced fuzzy filters.
A digital color image C can be represented in different color space such as RGB,
HSV, L*a*b etc. In the proposed method, RGB space is used as the basic color
space. Different proportions of red, green and blue light gives a wide range of colors.
Colors in RGB space are represented by a 3-D vector with first element being red,
the second being green and third being blue, respectively. These three primary color
components are quantized in the range 0 to 2m 1, where m D 8. A color image
C can be represented by a 2-D array of vectors where (i, j) defines a position in C
called pixel and Ci;j;1 ; Ci;j;2 , and Ci;j;3 , denotes the red, green and blue components,
respectively.
The general idea in this method is to take into account the fine details of the image
such as edges and color component distances, which will be preserved by the filter.
The goal of the first filter is to distinguish between local variations due to image
structures such as edges. The goal is accomplished by using Euclidean distances
between color components instead of differences between the components as done
in most of the existing filters. The proposed method uses 2-D distances instead of
3-D distances (distance between three color components red, green and blue), that is,
the distance between red-green (rg) and red-blue (rb) of the neighbourhood centered
8 A New Fuzzy-Based Additive Noise Removal Filter 91
at (i, j) is used to filter the red component [4]. Similarly, the distance between RG
and green-blue (gb) is used to filter the green component and the distance between
rb and gb is used to filter the blue component, respectively. The method uses three
fuzzy rules to calculate weights for the Takagi-Sugeno fuzzy model [10].
The current image pixel at position (i, j) is processed using a window size of
.2K C 1/ .2K C 1/ to obtain the modified color components. To each of the
pixels in the window certain weights are then assigned namely Wk;l , where k; l 2
f1; 0; 1g. WiCk;jCl;1 , WiCk;jCl;2 , and WiCk;jCl;3 denotes the weights for the red,
green and blue component at position (i + k, j + l), respectively. These weights are
assigned according to the following three fuzzy rules. Let DT (a, b) represents the
distance between the parameters a and b, and NG(y) represents the neighbourhood
of the parameter y. In this case, y represents a pixel with the neighbourhood given
by a 3 3 window. The three fuzzy rules can be represented as follows:
1. If DT(rg, NG(rg)) is SMALL AND DT(rb, NG(rb)) is SMALL THEN the weight
Wk;l;1 is LARGE
2. IF DT(rg, NG(rg)) is SMALL AND DT(gb, NG(gb)) is SMALL THEN the weight
Wk;l;2 is LARGE
3. IF DT(rb, NG(rb)) is SMALL AND DT(gb, NG(gb)) is SMALL THEN the weight
Wk;l;3 is LARGE
In the above fuzzy rules DIST represents the Euclidean distance.
Fuzzy sets are commonly represented by membership functions from which the cor-
responding membership degrees are derived. Membership degrees between zero and
one indicate the uncertainty that whether the distance is small or not. In the proposed
approach, a new membership function SMALL has been used which incorporates a
two-sided composite of two different Gaussian curves. The Gaussian membership
function depends on two parameters and c as given by
.x c/2
f .x; ; c/ D e 2 2
where x is the standard deviation of the distance measure and Cx is the mean of the
distance measure, respectively.
92 M. Wilscy, M.S. Nair
In the above fuzzy rules, the intersection of two fuzzy sets is involved. The inter-
section of two fuzzy sets A and B is generally specified by a binary mapping T,
which aggregates two membership functions as follows:
μA\B .y/ D T.μA .y/; μB .y//, where μA and μB are the membership functions
for the two fuzzy sets A and B, respectively. The fuzzy intersection operator, known
as triangular norms (T-norms), used in this paper is the algebraic product T-norms.
For example, the antecedent of Fuzzy rule 1 is:
The above obtained value, called the activation degree of the fuzzy rule 1, is used
to obtain the corresponding weight. So the weights WiCk;jCl;1 , WiCk;jCl;2 , and
WiCk;jCl;3 are calculated as follows:
The output of the Fuzzy Sub-filter I, denoted as F1, is then given by:
P
CK P
CK
Wi Ck;j C1;1 :Ci Ck;j C1;1
kDK lDK
F1i;j;1 D
P CK
CK P
Wi Ck;j C1;1
kDK lDK
P
CK P
CK
Wi Ck;j C1;2 :Ci Ck;j C1;2
kDK lDK
F1i;j;2 D
P CK
CK P
Wi Ck;j C1;2
kDK lDK
P
CK P
CK
Wi Ck;j C1;3 :Ci Ck;j C1;3
kDK lDK
F1i;j;3 D
P CK
CK P
Wi Ck;j C1;3
kDK lDK
where F1i;j;1 , F1i;j;2 and F1i;j;3 denotes the red, green and blue components of the
Fuzzy sub-filter I output image respectively.
The second sub-filter is used as a complementary filter to the first one. The goal
of this sub-filter is to improve the first method by reducing the noise in the color
components differences without destroying the fine details of the image. In this
step, the local differences in the red, green and blue environment are calculated
8 A New Fuzzy-Based Additive Noise Removal Filter 93
separately. These differences are then combined to calculate the local estimation of
the central pixel. In this step also, a window of size .2L C 1/ .2L C 1/ is used
centered at (i, j) to filter the current image pixel at that position. The local differ-
ences for each element of the window for the three color components are calculated
as follows:
P
CL P
CL
.F 1i Ck;j C1;1 "k;l /
kDL lDL
F2i;j;1 D
.2L C 1/2
P
CL P
CL
.F 1i Ck;j C1;2 "k;l /
kDL lDL
F2i;j;2 D
.2L C 1/2
P
CL P
CL
.F 1i Ck;j C1;3 "k;l /
kDL lDL
F2i;j;3 D
.2L C 1/2
where F2i;j;1 , F2i;j;2 and F2i;j;3 denotes the red, green and blue components of the
output image respectively.
The performance of the discussed filter has been evaluated and compared with con-
ventional filters dealing with additive noise, using MATLAB software tool. As a
measure of objective similarity between a filtered image and the original one, we
use the peak signal-to-noise ratio (PSNR) in decibels (dB).
This similarity measure is based on another measure, namely the mean-square error
(MSE).
94 M. Wilscy, M.S. Nair
P
3 P
N P
M
Œorg .i;j;c/ i mg .i;j;c/2
cD1 i D1 j D1
MSE .i mg; org/ D
3:N:M
where org is the original color image, img is the filtered color image of size
N:M , and S is the maximum possible intensity value (with m-bit integer values,
S will be 2m 1). The standard color images used in this paper are House and
Gantrycrane images. The original image, the noisy image (original image corrupted
with Gaussian noise with a selected value) and restored images using mean filter,
median filter, fuzzy method of [17] and the modified fuzzy method of the above
mentioned standard color images along with their corresponding PSNR values are
shown in Figs. 8.1 and 8.2. From experimental results, it has been found that our
Fig. 8.1 (a) Original House image (256 256). (b) Noisy image (Gaussian noise, D
20). (c) After applying mean filter (3 3 window). (d) After applying median filter (3 3
window). (e) After applying fuzzy filter of [17] with K D 3 .7 7 window/ and L D
2 .5 5 window/. (f) After applying proposed fuzzy filter with K D 3 .7 7 window/ and
L D 2 .5 5 window/
8 A New Fuzzy-Based Additive Noise Removal Filter 95
Fig. 8.2 (a) Original Gantrycrane image (400 264). (b) Noisy image (Gaussian noise, D
40). (c) After applying mean filter (3 3 window). (d) After applying median filter (3 3
window). (e) After applying fuzzy filter of [17] with K D 3 .7 7 window/ and L D
2 .5 5 window/. (f) After applying proposed fuzzy filter with K D 3 .7 7 window/ and
L D 2 .5 5 window/
proposed method receives the best numerical and visual performance for low lev-
els and higher levels of additive noise, by appropriately selecting window size for
the two fuzzy sub-filters. Numerical results that illustrate the denoising capability
of the proposed method, modified method and conventional methods are pictured
in Table 8.1. Table 8.1 shows the PSNRs for the colored House image that were
corrupted with Gaussian noise for D 5; 10; 20; 30 and 40. The window size for
different filters is appropriately chosen to give better PSNR value. The PSNR value
of the noisy image and the best performing filter were shown bold.
96 M. Wilscy, M.S. Nair
Table 8.1 Comparative results in PSNR of different filtering methods for various distortions of
Gaussian noise for the (256 256) colored House image
PSNR (dB)
D5 D 10 D 20 D 30 D 40
Noisy 34.13 28.12 22.10 18.57 16.08
Mean 28.05 27.70 26.57 25.13 23.72
Median 32.31 30.81 27.66 25.02 22.95
Proposed fuzzy method 34.12 31.79 28.38 25.85 23.76
Modified fuzzy method 34.22 32.77 29.33 26.51 24.18
8.5 Conclusion
A fuzzy filter for restoring color images corrupted with additive noise is proposed
in this paper. The proposed filter is efficient and produces better restoration of color
images compared to other filters. Numerical measures such as PSNR and visual
observation have shown convincing results. Further work can be focused on the con-
struction of other fuzzy filtering methods for color images to suppress multiplicative
noise such as speckle noise.
References
12. F. Farbiz and M. B. Menhaj, “A fuzzy logic control based approach for image filtering,” in
Fuzzy Technical Image Processing, E. E. Kerre and M. Nachtegael, Eds., 1st ed. Physica
Verlag, Heidelberg, Germany, 2000, Vol. 52, pp. 194–221.
13. D. Van De Ville, M. Nachtegael, D. Van der Weken, W. Philips, I. Lemahieu, and E. E.
Kerre, “A new fuzzy filter for Gaussian noise reduction,” in Proceedings of SPIE Visual
Communications and Image Processing, 2001, pp. 1–9.
14. D. Van De Ville, M. Nachtegael, D. Van der Weken, E. E. Kerre, and W. Philips, “Noise
reduction by fuzzy image filtering,” IEEE Transactions on Fuzzy Systems, Vol. 11, No. 8, pp.
429–436, August 2003.
15. M. Nachtegael, S. Schulte, D. Van der Weken, V. De Witte, and E. E. Kerre, “Fuzzy fil-
ters for noise reduction: The case of Gaussian noise,” in Proceedings of IEEE International
Conference on Fuzzy Systems, 2005, pp. 201–206.
16. S. Schulte, B. Huysmans, A. PiLzurica, E. E. Kerre, and W. Philips, “A new fuzzy-based wavelet
shrinkage image denoising technique,” Lecture Notes in Computer Science, Vol. 4179, pp.
12–23, 2006.
17. S. Schulte, V. De Witte, and E. E. Kerre, “A Fuzzy noise reduction method for color images,”
IEEE Transaction on Image Processing, Vol. 16, No. 5, May 2007, pp. 1425–1436.
18. A. C. Bovik and S. T. Acton, “Basic Linear Filtering with Application to Image Enhancement,”
Handbook of Image and Video Processing, Academic, New York, 2006, pp. 71–79.
19. B. Vidakovic, “Non-linear wavelet shrinkage with Bayes rules and Bayes factors,” Journal of
the American Statistical Association, Vol. 93, pp. 173–179, 1998.
20. H. Chipman, E. Kolaczyk, and R. McCulloch, “Adaptive Bayesian wavelet shrinkage,” Journal
of the American Statistical Association, Vol. 92, pp. 1413–1421, 1997.
21. L. Sendur and I. W. Selesnick, “Bivariate shrinkage functions for wavelet-based denoising
exploiting inter-scale dependency,” IEEE Transactions on Signal Processing, Vol. 50, No. 11,
pp. 2744–2756, November 2002.
22. M. Crouse, R. Nowak, and R. Baranuik, “Wavelet-based statistical signal processing using
hidden Markov models,” IEEE Transactions on Signal Processing, Vol. 46, No. 4, pp. 886–
902, April 1998.
23. J. Romberg, H. Choi, and R. Baraniuk, “Bayesian tree-structured image modeling using
wavelet-domain hidden Markov models,” IEEE Transactions on Image Processing, Vol. 10,
No. 7, pp. 1056–1068, July 2001.
24. M. Malfait and D. Roose, “Wavelet-based image denoising using a Markov random field
a priori models,” IEEE Transactions on Image Processing, Vol. 6, No. 4, pp. 549–565,
April 1997.
Chapter 9
Enhancement of Weather Degraded Color
Images and Video Sequences Using Wavelet
Fusion
9.1 Introduction
One of the major reasons for accidents in air, on sea and on the road is the poor
visibility due to presence of fog or mist in the atmosphere. During winter, visibility
is worse, sometimes up to few feet only. Under such conditions, light reaching the
human eye is extremely scattered by constituents of atmosphere like fog, haze and
aerosols and the image is severely degraded. Images taken under such bad weather
conditions suffer from degradation and severe contrast loss. The loss of image qual-
ity is a nuisance in many imaging applications. For example, in underwater imaging
J. John (B)
Mtech student, Department of Computer Science, University of Kerala, Karyavattom – 695581,
Trivandrum, Kerala, India,
E-mail: jisha.json@yahoo.com
in murky water, the detection of artifacts becomes difficult due to poor image qual-
ity. Hence, imaging must be performed at close range and this usually results in a
long time required to inspect a small area. Another example is in the navigation of
surface ships and aircraft in bad weather. In weather conditions such as fog, visibility
is low and navigation is more difficult, dangerous and slow.
The image in outdoor scene is degraded by optical scattering of light which pro-
duces additional lightness present in some parts of the image, an effect that has
been referred to as “atmospheric background radiation” [1, 2] or “airlight” [3, 4].
This results in degradation of image contrast, as well as alteration of scene color,
which finally leads to a poor visual perception of the image. Contrast enhancement
methods fall into two groups: non-model-based and model-based.
Non-model-based methods analyze and process the image based solely on the
information from the image. The most commonly used non-model-based methods
are histogram equalization and its variations [5–8]. For color images, histogram
equalization can be applied to R, G, B color channels separately but this leads to
undesirable change in hue. Better results are obtained by first converting the image
to the Hue, Saturation, Intensity color space and then applying histogram equaliza-
tion to the intensity component only [9]. However, even this method does not fully
maintain color fidelity.
There are also other non-model-based methods like unsharp masking [10],
approaches based on the Retinex theory [11–13] and wavelet-based methods [14,
15]. Generally, all non-model-based methods have a problem with maintaining
color fidelity. They also distort clear images, which is an important limitation for
fully automatic operation. Model-based methods use physical models to predict
the pattern of image degradation and then restore image contrast with appropri-
ate compensations. They provide better image rendition but usually require extra
information about the imaging system or the imaging environment.
In [16] John P Oakley and Hong Bu have suggested a method of enhancement
by correcting contrast loss by maintaining the color fidelity. In this method it is
assumed that if the distance between a camera position and all points of a scene rep-
resented by an image generated by the camera is approximately constant, the airlight
will be uniform within the image. But in most real-time situations this assumption
is not valid. This method gives good contrast restoration but does not provide much
visibility enhancement. To enhance the visibility R.T. Tan et al. [17] have proposed
a visibility enhancement method which makes use of color and intensity informa-
tion. Visibility is greatly improved in the resulting images but color fidelity is not
maintained. Hence in situations were the naturality of the image is important this
method cannot be used.
In this work a method for enhancing visibility and maintaining color fidelity is
proposed using wavelet fusion. This method mainly consists of three phases. Given
an input image, first phase is to apply a contrast correction using the depth infor-
mation. Here we compute the value of airlight present in the image by optimizing
a cost function. The second phase consists of finding an approximate airlight value
by using the intensity information of YIQ color model. Contrast restoration in the
first and second phase is performed by removing the airlight from the image and
9 Enhancement of Weather Degraded Color Images 101
applying depth information. The third and final phase of the proposed method con-
sists of a wavelet fusion method to get a resultant image which has considerable
visibility improvement and also maintains the color fidelity.
The rest of the paper is organized as follows: In Section 9.2 we will discuss the
atmospheric scattering models concentrating on the airlight model which forms the
basis of this method. In Section 9.3, the contrast correction is given which forms
the first phase of this work. In Section 9.4 the approximate airlight estimation is
discussed. Section 9.5 introduces the third phase were a wavelet fusion is applied
to get the enhanced image. Section 9.6 explains how the method can be applied on
video sequences. To show the effectiveness of the proposed method, performance
analysis is done in Section 9.7 with the help of a contrast improvement index and
sharpness measure. Section 9.8 includes the experimental results and discussion.
Scattering of light by physical media has been one of the main topics of research
in the atmospheric optics and astronomy communities. In general, the exact nature
of scattering is highly complex and depends on the types, orientations, sizes, and
distributions of particles constituting the media, as well as wavelengths, polarization
states, and directions of the incident light [1,2]. Here, we focus on one of the airlight
model, which form the basis of our work.
While observing an extensive landscape, we quickly notice that the scene points
appear progressively lighter as our attention shifts from the foreground toward the
horizon. This phenomenon, known as airlight results from the scattering of environ-
mental light toward the observer, by the atmospheric particles within the observer’s
cone of vision. The environmental illumination can have several sources, including,
direct sunlight, diffuse skylight and light reflected by the ground. While attenua-
tion causes scene radiance to decrease with path length, airlight increases with path
length. It therefore causes the apparent brightness of a scene point to increase with
depth. The, the irradiance due to airlight is given by
E.x/ D I1 .x/eˇ d.x/ C I1 1 eˇ d.x/ : (9.1)
The first term in the equation represents the direct transmission, while the second
term represents the airlight. E is the image intensity, x is the spatial location, I1 is
the atmospheric environmental light, which is assumed to be globally constant and
is the normalized radiance of a scene point, which is the function of the scene
point reflectance, normalized sky illumination spectrum, and the spectral response
of the camera. ˇ is the atmospheric attenuation coefficient. d is the distance between
the object and the observer. It is the second term in Eq. (9.1) or airlight that causes
degradation in the image taken under bad weather conditions and hence all con-
trast restoration methods are aimed at removing this additional lightness from the
image.
102 J. John, M. Wilscy
y D m.x – / (9.2)
pk0
Sg m ./ D S TD W k D 1; 2; ::::K :GM fpNk W k D 1; 2; ::::Kg
pNk0
(9.4)
9 Enhancement of Weather Degraded Color Images 103
Where GM f:g denotes the geometric mean. The geometric mean can also be
written as
( )
1X
K
GM fxk W k D 1; 2::::::K g D exp ln.xk / (9.5)
k
kD1
Another possible variation on the cost function is to use sample variance in Eq.
(9.4) rather than sample standard deviation, in which case the scaling factor must be
squared.
K 2 ( )
1X 1X
K
k Nk
S./ D : exp ln. Nk /2
(9.6)
k Nk k
kD1 kD1
Obtain the optimum value of which minimizes the cost function by calculating,
˚
cO D arg min Sg m ./g (9.7)
1
eˇ d.x/ D (9.11)
1 g
r/
r
.I1 CI1 CI1
g
Where .I1 r
C I1 C I1 r
/, namely, the environmental light, is assumed to be the
largest intensity in the image. is found by optimizing the cost function Eq. (9.4)
and depth information is obtained from Eq. (9.11). Thus Eq. (9.9) gives the contrast
corrected image. Section IV and V describes how visibility enhancement can be
achieved.
104 J. John, M. Wilscy
The first phase described in Section 9.3 results in an image maintaining the color
fidelity but the visibility is not enhanced particularly in scenes were the distribution
of airlight is not uniform. The second phase uses an approximate airlight estimation
method which results in an image with enhanced visibility but the color fidelity
is not maintained. In the third phase a novel fusion method is used which helps
in extracting the useful information from both images and hence obtaining an
image with enhanced visibility at the same time maintaining the color fidelity. The
daubechies wavelet is used here. The two images obtained as described in Sec-
tion 9.3 and 9.4 are decomposed by using Daubechies wavelet method. The wavelet
decomposition is done using shift invariant wavelet transform. The four images
obtained per image after decomposition are coefficients extracted from the given
image.
The first image is actually approximate coefficients displayed while the sec-
ond image is formed when horizontal coefficients are displayed. The third image
is formed when vertical coefficients are displayed. And the final image comes when
diagonal coefficients are displayed. These coefficients are obtained by the following
process. The image is actually passed through some sets of filters then these images
are obtained. The image is passed through two low pass filters one aligned vertically
and one aligned horizontally.
If image is passed through two filters, one low pass aligned horizontally and
a high pass filter aligned vertically the vertical coefficients are obtained. Vertical
coefficients and are obtained from high pass filter aligned horizontally and low pass
filter aligned vertically. And the final image is for diagonal coefficients which are
obtained with both high pass filters aligned horizontally and vertically. After obtain-
ing the wavelet bands of the two images merge the coefficients by obtaining the
mean value of the approximate coefficients and maximum value from the detailed
coefficients. The resultant image is an enhanced image which contains the maximum
details and also maintains the color fidelity. video enhancement.
9 Enhancement of Weather Degraded Color Images 105
There are only few existing methods to enhance weather degraded video sequences.
In order to apply the enhancement process on video sequence, two methods can
be adopted. Each of the frames can be separately processed or the method can be
applied to the background and foreground pixels separately.
In the first method the computation time is more since the whole method is
applied on each frame. In the approach followed in this paper the background and
foreground pixels are separated and processed separately. The computation time is
much less and the visibility of the resultant video sequence is found to increase
considerably while color fidelity is maintained. In this method, first a static back-
ground frame is estimated. Then, analyze each of the frames to detect the motion
pixels using any standard methodology [7] and apply morphological operations to
remove all noise pixels. The motion pixels are then bounded by boxes so that each
of the frames will be divided into sub images containing motion pixels. The back-
ground frame is processed using the wavelet fusion method. The estimated lightness
parameter in step 1 is utilized globally for the entire video sequence and hence need
not be computed for the sub images. Wavelet fusion is applied to each of the sub
images using the estimated lightness and approximate lightness parameters. Finally
the sub images are merged with the background image to form the enhanced video
sequence. This method saves a lot of computation time since the processing is done
only on the sub images of each frame and the estimated lightness parameter in step
1 of wavelet fusion method is computed only for the background image.
where C is the average value of the local contrast measured with a 3*3 window as:
max min
CD (9.14)
max C min
106 J. John, M. Wilscy
were T is the threshold. The image quality is usually considered higher if its
tenengrad value is larger.
The tenengrad values (TEN) of all images given below has been calculated and
listed in corresponding figure, captions. It is noted that images processed using the
wavelet fusion method described above gives significantly larger tenengrad values,
which indicate the effectiveness of this method. This result agrees with the visual
evaluation of human eye.
The performance of the proposed method has been evaluated and compared with
conventional methods of contrast enhancement using MATLAB software tool. The
performance is analyzed using the measures described above. As a measure of
objective similarity between a contrast restored image and the original one, the
mean-squared error (MSE) is used.
P
3 P
N P
M
Œorg .i; j; c/ img .i; j; c/2
cD1 i D1 j D1
MSE.img,org/ D (9.17)
3:N:M
were org is the original color image, img is the restored color image of size N:M
The color images used in this paper are Trees, Traffic and Aerialview images
of size 256 256. The original image, the noisy image (original image degraded
by fog) and restored images using visibility enhancement method and proposed
wavelet fusion method along with their corresponding tenengrad, contrast improve-
ment index values and mean-squared error values are shown in Fig. 9.1. From
experimental results, it has been found that the proposed method has good contrast
improvement and visibility (measured by the sharpness measure-tenengrad). It also
maintains color fidelity which is shown using the low mean-squared error compared
to other method.
9 Enhancement of Weather Degraded Color Images 107
(a) (b) TEN = 224626 (c)TEN = 225000,CI = 2.6356 (d)TEN = 225000,CI = 1.1555
MSE = 2.0401 e +004 MSE = 9.4580 e+ 003
Fig. 9.1 (a) Original Trees image (256 256). (b) Foggy image. (c) After applying method of
visibility enhancement. (d) After applying proposed method
Fig. 9.2 (a) Frame #1 Traffic video sequence. (b) After applying proposed method
Fig. 9.3 (a) Degraded Aerialview image. (b) After applying proposed method
Figures 9.2 and 9.3 gives the foggy Traffic image and the Aerialview image
degraded by mist. The method proposed is not restricted to uniform suspension of
airlight and hence is applicable for all real-time scenarios.
9.9 Conclusion
References
1. S. Chandrasekhar, Radiative Transfer. Dover, New York, 1960. Raefal C. Gonzalez, Richard
E. Woods, “Digital Image Processing,” 2nd Ed., Pearson Education, 2002, pp. 147–163.
2. H. C. Van De Hulst, Light Scattering by Small Particles. Wiley, New York, 1957. Alan C.
Bovik and Scott T. Acton, “Basic Linear Filtering with Application to Image Enhancement,”
Handbook of Image and Video Processing, Academic, 2006, pp. 71–79:
3. J. P. Oakley and B. L. Satherley, “Improving image quality in poor visibility conditions using
a physical model for contrast degradation,” IEEE Transactions on Image Processing, vol. 7,
no. 2, pp. 167–179, February 1998.
4. S. K. Nayar and S. G. Narasimhan, “Vision in bad weather,” in Proceedings of IEEE
International Conference Computer Vision, 1999, vol. 2, pp. 820–827.
5. R. C. Gonzalez and R. E. Woods, Digital Image Processing. Reading, MA: Addison-Wesley,
1993.
6. S. M. Pizer et al:, “Adaptive histogram equalization and its variations,” Computer Vision,
Graphics, and Image Processing, vol. 39, pp. 355–368, 1987.
7. K. Zuiderveld, “Contrast limited adaptive histogram equalization,” in Graphics Gems IV, P.
Heckbert, Ed. New York: Academic, 1994, ch. VIII.5, pp. 474–485.
8. J. A. Stark, “Adaptive image contrast enhancement using generalizations of histogram
equalization,” IEEE Transactions on Image Processing, vol. 9, no. 5, pp. 889–896, May 2000.
9. J. J. Rodriguez and C. C. Yang, “High-resolution histogram modification of color images,”
Graphical Models and Image Processing, vol. 57, no. 5, pp. 432–440, September 1995.
10. A. Polesel, G. Ramponi, and V. J. Mathews, “Image enhancement via adaptive unsharp
masking,” IEEE Transactions on Image Processing, vol. 9, no. 3, pp. 505–510, March 2000.
11. D. J. Jobson, Z. Rahman, and G. A. Woodell, “Properties and performance of a center/surround
retinex,” IEEE Transactions on Image Processing, vol. 6, no. 3, pp. 451–462, March 1997.
12. D. J. Jobson, Z. Rahman, and G. A. Woodell, “A multi-scale retinex for bridging the gap
between color images and the human observation of scenes,” IEEE Transactions on Image
Processing, vol. 6, no. 7, pp. 965–976, July 1997.
13. Z. Rahman, D. J. Jobson, and G. A. Woodell, “Retinex processing for automatic image
enhancement,” Journal of Electronic Imaging, vol. 13, no. 1, pp. 100–110, January 2004.
14. P. Scheunders, “A multivalued image wavelet representation based on multiscale funda-
mental forms,” IEEE Transactions on Image Processing, vol. 10, no. 5, pp. 568–575, May
2002.
15. L. Grewe, and R. R. Brooks, “Atmospheric attenuation reduction through multi-sensor fusion,”
Proceedings of SPIE, vol. 3376, pp. 102–109, 1998.
16. J. P. Oakley, and H. Bu, “Correction of simple contrast loss in color images,” IEEE
Transactions on Image Processing, vol. 16, no. 2, February 2007.
17. R. T. Tan, N. Pettersson Lars Petersson, “Visibility enhancement of roads with foggy or hazy
scenes,” Proceedings of the 2007 IEEE Intelligent Vehicles Symposium Istanbul, Turkey, June
13–15, 2007.
9 Enhancement of Weather Degraded Color Images 109
18. P. Zeng, H. Dong, J. Chi, X. Xu, School of Information Science and Engineering, Northeast-
ern University. Shenyang, China, Proceedings of the 2004 IEEE International Conference on
Robotics and Biomimetics, August 22–26, 2004, Shenyang, China.
19. E. P. Krotkov, Active Computer vision by Cooperative focus and stereo. Springer-Verlag, New
York, 1989.
20. A. Buerkle, F. Schmoeckel, M. Kiefer, B. P. Amavasai, F. Caparrelli, A. N. Selvan, and J. R.
Travis, “Vision-based closed-loop control of mobile micro robots for micro handling tasks,”
Proceedings of SPIE, vol. 4568, Microrobotics and Micro assembly 111, pp. 187–198, 2001.
Chapter 10
A GA-Assisted Brain Fiber Tracking Algorithm
for DT-MRI Data
L.M. San-José-Revuelta
Abstract This work deals with the problem of fiber tracking in diffusion tensor
(DT) fields acquired via magnetic resonance (MR) imaging. Specifically, we focus
on tuning-up a previously developed probabilistic tracking algorithm by making use
of a genetic algorithm which helps to optimize most of the adjustable parameters of
the tracking algorithm. Since the adjustment of these parameters constitutes a hard
NP-complete problem, traditionally, this task has been heuristically approached. In
previous work, we have already developed a multilayer neural network that was suc-
cessfully applied to this issue. Though robustness was one of its major advantages,
the complexity of the whole algorithm constituted its main drawback. In order to
avoid this situation, here we have explored the possibility of using a computation-
ally simpler method based on a micro-genetic algorithm. This strategy is shown
to outperform the NN-based scheme, leading to a more robust, efficient and human
independent tracking scheme. The tracking of white matter fibers in the human brain
will improve the diagnosis and treatment of many neuronal diseases.
10.1 Introduction
L.M. San-José-Revuelta
E.T.S.I. Telecomunicación, University of Valladolid, 47011 Valladolid, Spain,
E-mail: lsanjose@tel.uva.es
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 111
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
112 L.M. San-José-Revuelta
has raised great interest in the neuro-science community for a better understanding
of the fiber tract anatomy of the human brain. Among the many applications that
arise from tractography we find: brain surgery (knowing the extension of the fiber
bundles could minimize the functional damage to the patient), white matter visu-
alization using fiber pathways (for a better understanding of brain anatomy) and
inference of connectivity between different parts of the brain (useful for functional
and morphological research of the brain).
Most of DT-MRI visualization techniques focus on the integration of sample
points along fiber trajectories, [3], using only the principal eigenvector of the dif-
fusion ellipsoid as an estimate of the predominant direction of water diffusion, [2].
Several approaches have already been developed, some of them include the Runge-
Kutta approach, the multiple diffusion tensor approach, the tensorline approach and
the exhaustive search approach, to name a few.
Though, the in vivo visualization of fiber tracts opens up new perspectives for
neurological research, these algorithms may depict some fiber tracts which do not
exist in reality or miss to visualize important connectivity features (e.g. crossing
or branching structures) mainly due to both some deficiencies in these methods
and several shortcomings inherent in datasets. In order to avoid misinterpretations,
the viewer must be provided with some information on the uncertainty of every
depicted fiber and of its presence in a certain location. In [4], we proposed an esti-
mation algorithm that takes into account the whole information provided by the
diffusion matrix, i.e., it does not only consider the principal eigenvector direction
but the complete 3D information about the certainty of continuing the path through
every possible future direction. An improved version of this algorithm was devel-
oped in [5]. This article included two main aspects: (i) a procedure that on-line
adapts the number of offspring paths emerging from the actual voxel, to the degree
of anisotropy observed in its proximity (this strategy was proved to enhance the
estimation robustness in areas where multiple fibers cross while keeping complex-
ity to a moderate level), and (ii) an initial version of a neural network (NN) for
adjusting the parameters of the algorithm in a user-directed training stage. Subse-
quent work, [6], studied with more detailed the architecture of the neural network
and numerically evaluated its tracking capability, robustness and computational load
when being used with both synthetic and real DT-MR images. This work showed
that in many cases, such as real images with low SNR, a huge computational load
was required.
In this work we propose to use an evolutionary computation-based approach
for tuning-up the parameters of the tracking algorithm instead of using the neural
network scheme. The main aim is to adjust the parameters with a less complex pro-
cedure and to obtain a robust and efficient tracking algorithm. The required human
intervention time should also be reduced. Specifically, we propose a genetic algo-
rithm for this optimization task. Numerical results will prove that this approach
leads to similar and even better convergence results while offering much lower
computational requirements.
10 A GA-Assisted Brain Fiber Tracking Algorithm for DT-MRI Data 113
The basic version of the algorithm here used is described in [4]. Thus, this section
just presents a summary of the method, with special emphasis on the new aspects.
The algorithm uses probabilistic criteria and iterates over several points in the ana-
lyzed volume (the points given by the highest probabilities in the previous iteration).
The process starts in a user-selected seed voxel, V0 , and, at every iteration, it eval-
uates a set of parameters related to the central voxel of a cubic structure consisting
of 3 3 3 D 27 voxels, similar to that shown in Fig. 32.3, left.
The central point, Vc , (No. 14 in the figure) represents the last point of the tract
being analyzed. In the first iteration, Vc D V0 . Obviously, there exist 26 possible
directions to take for the next iteration in order to select the next point of the tract.
Once Vc is selected, the previous point and all those points exceeding the limits of
the MR volume are also removed from the list of possible destination points (valid
points).
Fig. 10.1 Modifications of indices (m; n; p) when moving from Vc to the neighboring voxel Vi ,
1 i 27, i ¤ 14
114 L.M. San-José-Revuelta
X X
3
Pi D ˛ Vj;˛ j (10.1)
˛2fx;y;zg j D1
where parameter a allows the user to give a higher relative weight to either the
anisotropy or the local probability, and 1 and 2 are scaling factors (normally, 1
and 1,000, respectively). The set of values Pi0 is properly normalized so that they
can be interpreted as probabilities.
Besides these considerations, the final probability of voxel i makes also use of the
so-called smoothness parameters (described in [7]) which judge the coherence of
fiber directions among the trajectories passing through voxel Vc . The mathematical
expressions of these four parameters, fspi g4iD1 , as well as their geometrical mean-
ing, is explained in [6]. They measure the angles between the directions that join
successive path points, as well as the angles between these directions and the eigen-
vectors associated to the largest eigenvalues found in those voxels. sp2 ; sp3 and
sp4 are used to maintain the local directional coherence of the estimated tract and
10 A GA-Assisted Brain Fiber Tracking Algorithm for DT-MRI Data 115
avoid the trajectory to follow unlikely pathways, [7]. The threshold for sp1 is set
such that the tracking direction could be moved forward consistently and smoothly,
preventing the computed path from sharp transitions.
Next, the following parameter is calculated for every valid point whose smooth-
ness parameters satisfy the four corresponding threshold conditions,
00
Pi D b. 1 sp1 C 2 sp2 C 3 sp3 C 4 sp4 / C .1 b/Pi0 (10.4)
Probabilities Pi00 can be recursively accumulated, yielding the probability of the path
generated by the successive values of Vc ,
000
Pp .k/ D Pi Pp .k 1/ (10.5)
000 00 P 00
with k being the iteration number, and Pi D Pi = i Pi .
At the end of the visualization stage, every estimated path is plotted with a color
that depends on its probability Pp .
A pool of voxels is formed by selecting, at the end of each iteration, the s best voxels
according to Eq. (10.4). The first voxel of the pool becomes the central voxel Vc at
next iteration, expanding, this way, the current pathway.
As proposed in [6], the value of s is adjusted depending on the degree of
anisotropy found in current voxel Vc and its surroundings. When this anisotropy
is high, it means that a high directivity exists in that zone, and the probability that
Vc belongs to a region where fibers cross is really low. Consequently, s takes a small
value (1, 2 or 3). On the other hand, if Vc is found to be situated in a region of high
anisotropy, the probabilities of having fibers crossing or branching is higher. In this
case, it is interesting to explore various paths starting in Vc . This can be achieved by
setting parameter s to a higher value.
In this work we propose to use an genetic algorithm for adjusting the parameters
of the algorithm (a; b; 1 ; 2 ; 1 ; 2 ; 3 ; 4 ), instead of using the complex and time
consuming NN proposed in [5, 6]. This adjustment is necessary when the algorithm
116 L.M. San-José-Revuelta
is applied to a different area of the brain (fiber bundles) or even to the same portion
but having been scanned with under different conditions. In these cases, the volume
of interest will have a different smoothness and anisotropy characterization.
When the proposed estimation strategy is used, the user is requested to manually
draw a sample fiber path as well as to compare this fiber path to those estimated by
the GA during its first stages. Specifically, the steps for the estimation of the param-
eters are: (i) the user manually draws a sample fiber path, ru , (ii) the GA starts with
np
a randomly generated population of np D 10 individuals fui gi D1 , each of them
being a possible binary representation of parameters , (iii) the tracking algorithm
of Section 10.2 is applied np times, each of them with the set of parameters repre-
sented by each GA’s individual. This way, np different paths ri are obtained, (iv)
every path ri is compared with ru and given a fitness value i , (v) iterate the GA
during ng D 25 generations and then go to step (ii).
Every time that the fiber paths are obtained at step (ii) the user must compare
them to his sample ru and, in case he finds that a tract rj , 1 j np , is better
than his first estimation ru , then rj becomes the new reference path ru . At the end,
solution is obtained from the encoding of the fittest individual.
Though this scheme seems initially complicated, experiments show that a few
iterations lead to sets that allow to obtain good tracking results. The user doesn’t
have to assign too many fitness values or to perform many comparisons. The
extremely reduced size of the population and the low number of generations per
GA execution, lead to moderately short training periods.
10.3.2 GA Description
The initial set of potential solutions, PŒ0 (population at k D 0), is randomly gen-
erated – in our specific application, the user drawn tract could be included in the
np
initial population. Let us denote PŒk D fui gi D1 to the population at iteration k.
10 A GA-Assisted Brain Fiber Tracking Algorithm for DT-MRI Data 117
As previously mentioned, the population size np has been fixed to a low value
exploiting, this way, the properties of the so called micro genetic algorithms
(-GAs).
The proposed GA also implements an elitism strategy: The best individual in the
population is preserved and directly introduced in the new population. Nevertheless,
a duplicate of it can also be used as input for the genetic operators.
This elitist model of the genetic algorithm presents some convergence advantages
over the standard GA. In fact, using Markov chain modeling, it has been proved that
GAs are guaranteed to asymptotically converge to the global optimum – with any
choice of the initial population – if an elitist strategy is used, where at least the best
chromosome at each generation is always maintained in the population, [9]. How-
ever, Bhandari et al. [9] provided the proof that no finite stopping time can guarantee
the optimal solution, though, in practice, the GA process must terminate after a finite
number of iterations with a high probability that the process has achieved the global
optimal solution.
Following the ideas described in [10], the crossover and mutation probabilities
depend on the Shannon entropy of the population (excluding the elite) fitness, which
118 L.M. San-José-Revuelta
is calculated as
X
np
H.PŒk/ D i .k/ log i .k/ (10.6)
i D1
with i .k/ being the normalized fitness of individual ui , i.e., i .k/ D i .k/=
Pnp
i D1 i .k/. When all the fitness values are very similar, with small dispersion,
H.PŒk/ becomes high and pc is decreased – it is not worthwhile wasting time
merging very similar individuals. This way, exploration is boosted, while, con-
versely, exploitation decreases. On the other hand, when this entropy is small, there
exists a high diversity within the population, a fact that can be exploited in order to
increase the horizontal sense of search. Following a similar reasoning, the prob-
ability of mutation is increased when the entropy is high, so as to augment the
diversity of the population and escape from local suboptimal solutions (exploita-
tion decreases, exploration becomes higher). Therefore, we have that probabilities
pm and pc are directly/inverselly proportional to the population fitness entropy,
respectively.
Some exponentially dependence on time k must also be included in the model –
making use of exponential functions – in order to relax (decrease), along time,
the degree of dependence of the genetic operators’ probabilities with the disper-
sion measure. This avoids abandoning good solution estimates when very low
fitness individuals are sporadically created, specially when most of the population
individuals have converged to the global optimum.
As a consequence of these entropy dependent genetic operators, the resulting
complexity of the GA is notably decreased since crossover is applied with a very
low probability (and only on individuals not belonging to the elite), and the diver-
sity control allows the algorithm to work properly with a much smaller population
size, [10].
In order to evaluate the proposed algorithm (parameter tuning + tracking), both syn-
thetic and real DT-MR images have been used. For the sake of comparison, we have
used the same test images as in [6].
Fig. 10.2 Tracking results for the “star” synthetic DT-MR image. Black: seed points. Blue: fiber
paths obtained using adjustment with NN, Green: paths using estimation with the proposed AG.
Red: extrinsic voxels. Initial seeds V0 D fS1 ; S2 ; S3 g. Top left: original synthetic image
Satisfactory tracing results for the first three cases can be found in [4], where a
simpler algorithm was used. For the sake of brevity, in this paper we have worked
with the most complex case, the star (Fig. 10.2, top left). This image consists of
six orthogonal sine half-waves, each of them with an arbitrary radius. Under this
scenario the diffusion field experiments variations with the three coordinate axes and
there exists a crossing region. Three different tracking results are shown in Fig. 10.2,
each of them for a different seed V0 D fS1 ; S2 ; S3 g. Blue tracts were obtained with
an algorithm were parameters were estimated with a NN – [6] – while green ones
correspond to the estimation using the proposed GA.
It can be seen that, in both cases, the path estimates pass through isotropic zones
where different fiber bundles cross. It is also appreciated how both methods dif-
ferentiate between the totally isotropic zones extrinsic to the tracts and the fiber
bundles.
The differentiation between voxels belonging to a fiber or to a very isotropic area,
respectively, is attained by mapping the path probabilities given by Eq. (10.5) into
a color scale and classifying them according to some fixed thresholds. Notice that
seeds S1 and S2 belong to the intrinsic volume (voxels with a very high anisotropy).
In this case both methods move through the most probable direction following the
main direction of the star in each situation. When extrinsic point S3 is selected as
seed, the algorithms explore in the neighboring voxels until they find a voxel with a
high anisotropy value (point P1 ). Once P1 is found, the tracking algorithm proceeds
as in the case of S1 and S2 . Fig. 10.2 shows how the algorithm finds the proper
120 L.M. San-José-Revuelta
Table 10.1 Convergence performance for different SNRs values. Cell values represent percentage
of right convergence for two configurations of the algorithm: s D 1=s D 4. Each cell shows: top:
NN-estimation, middle: GA-estimation, bottom: Bayesian tracking, [13]
SNR (dB)
Image Method 5 10 15 20 25 30
NN 78.3/82.8 89.7/93.6 92.1/94.3 98.3/98.7 99.0/99.0 100/100
Cross AG 81.0/84.9 93.1/94.1 94.2/95.7 98.4/98.3 99.0/100 100/100
Friman, 2005 76.8 89.0 90.7 97.0 100 100
NN 77.7/76.2 88.6/87.5 89.9/89.0 98.2/98.2 99.0/99.0 100/100
Earth AG 81.5/79.0 88.9/91.8 92.8/93.3 98.4/98.5 99.0/100 100/100
Friman, 2005 74.4 83.2 85.0 97.3 99.2 100
NN 71.0/69.7 82.1/81.0 86.1/85.5 96.0/95.8 98.0/97.8 100/100
Log AG 73.5/74.2 84.6/84.2 89.1/87.0 96.7/97.0 97.8/98.0 100/100
Friman 2005 68.8 78.3 85.2 96.0 98.0 100
fiber path whatever (extrinsic or intrinsic) seed voxel is chosen, for both methods of
parameters’ estimation.
Next, the robustness of the tracking algorithm with both parameter estimation
methods is now studied. For the sake of brevity, these experiments were run with
parameter s kept constant during the fiber tract estimation (see Section 10.2).
The convergence performance for different SNRs is shown in Table 10.1. The
first row in each cell corresponds to tracking results when parameters where esti-
mated using the NN, the second contains the results when the proposed GA is
used for this estimation, and the third one shows the values obtained with a slightly
modified version of the Bayesian method proposed in [13].
It can be seen that both algorithms (with NN- and AG-based adjustment) con-
verge properly within a wide range of SNRs, with the AG version showing a
convergence gain of about 3–6% in all cases. The percentage values for the “cross”
and the “earth” test images are very close, while for the “log” case both algorithms
exhibit a slightly lower convergence. Comparing our methods with the Bayesian
approach, we see that the proposed tracking algorithm performs slightly better when
the SNR is low, while the three methods tend to similar results with high SNRs.
Analyzing the simulations of the synthetic images considered, it is seen that con-
vergence results improve whenever the MR image contains branching or crossing
areas – as it is the case in real DT-MR images. This is the case of our “cross” image.
For this image, the convergence results are improved 5% when parameter s is
modified according to the anisotropy. Besides, for these studied cases, we see that
the influence of the procedure that adapts s is higher for low SNRs.
The proposed tracking algorithm has also been applied to real DT-MR images.
Specifically, we have selected the corpus callosum of the brain (see Fig. 10.3).
10 A GA-Assisted Brain Fiber Tracking Algorithm for DT-MRI Data 121
20
40
60
80
100
120 NN
140
160
AG 180
Fig. 10.3 Tracking results for the corpus callosum area of the human brain. Left: tracts obtained
with the tracking algorithm tuned-up with the proposed GA, Right: parameter estimation with NN
The proposed parameter estimation procedure is useful when the volume being
analyzed varies. For instance, with just 5–10 training iterations (repetitions of the
procedure described in 10.3.1), in synthetic images, or 8–16, in real images, the
parameters of the algorithm are fine-tuned so as to get satisfactory results. Note
that these training times are: (i) always inferior to those required by the NN-
based method proposed in [5, 6], (ii) always greatly inferior to the time required to
heuristically adjust the parameters, (iii) only required when the scanning conditions
vary.
10.5 Conclusions
The work here presented expands upon previous work of the author on fiber tracking
algorithms to be used with DT-MR images. Previous papers presented and improved
the basic probabilistic tracking algorithm [4] and developed a novel multilayer
122 L.M. San-José-Revuelta
neural network that helps to tune-up the tracking method, [5]. In this paper we
have presented an Evolutionary Computation-based algorithm that outperforms the
neural network approach.
Numerical simulations have shown that the tracking algorithm that has been
tuned-up using the proposed GA-based method is capable of estimating fiber tracts
both in synthetic and real images. The robustness and convergence have been stud-
ied for different image qualities (SNRs). Results show a convergence gain of about
3–6% with respect to our previous work in [5, 6].
The experiments carried out show that an efficient parameter adjustment in con-
junction with precise rules to manage and update the pool of future seeds lead to:
(i) a better use of computational resources, (ii) a better performance in regions with
crossing or branching fibers, and (iii) a minimization of the required human inter-
vention time. The method has been tested with synthetic and real DT-MR images
with satisfactory results, showing better computational and convergence properties
than already existing Bayesian methods.
References
1. Bjornemo, M. and Brun, A. (2002). White matter fiber tracking diffusion tensor MRI. Master’s
Thesis, Linkoping University.
2. Ehricke, H. H., Klose, U. and Grodd, U. (2006). Visualizing MR diffusion tensor fields by
dynamic fiber tracking and uncertainty mapping. Computers & Graphics, 30:255–264.
3. Mori, S., van Zijl, P.C.M. (2002). Fiber tracking: principles and strategies – a technical review.
Nuclear Magnetic Resonance in Biomedicine, 15:468–480.
4. San-José-Revuelta, L. M., Martı́n-Fernández, M., and Alberola-López, C. (2007). A new
proposal for 3D fiber tracking in synthetic diffusion tensor magnetic resonance images. Pro-
ceedings IEEE International Symposium on Signal Processing and Its Applications, Sharjah,
United Arab Emirates.
5. San-José-Revuelta, L. M., Martı́n-Fernández, M., and Alberola-López, C. (2007). Neural-
network assisted fiber tracking of synthetic and white matter DT-MR images. Proceedings
International Conference on Signal and Image Engineering, ICSIE 2007, London, United
Kingdom I:618–623.
6. San-José-Revuelta, L. M., Martı́n-Fernández, M., and Alberola-López, C. (2008). Efficient
tracking of MR tensor fields using a multilayer neural network. IAENG International Journal
of Computer Science 35:129–139.
7. Kang, N. et al. (2005). White matter fiber tractography via anisotropic diffusion simulation in
the human brain. IEEE Trans. on Medical Imaging, 24:1127–1137.
8. Mitchell, M. (1996). An Introduction to Genetic Algorithms. MIT, Cambridge, MA.
9. Bhandari, D., Murthy, C.A, and Pal, S.K. Genetic algorithm with elitist model and its conver-
gence. International Journal on Pattern Recognition and Artificial Intelligence, 10:731–747.
10. San-José-Revuelta, L. M. (2005). Entropy-guided micro-genetic algorithm for multiuser
detection in CDMA communications. Signal Processing, 85:1572–1587.
11. Gudbjartsson, H., Patz, S. (1995). The Rician Distribution of Noisy MRI Data. Magnetic
Resonance in Medicine, 34:910–914.
12. Westin, C.-F. et al. (2002). Processing and visualization for diffusion tensor MRI. Medical
Image Analysis, 6:93–108.
13. Friman, O., Westin, C.-F. (2005). Uncertainty in white matter fiber tractography Proceedings
of the MICCAI 2005, LNCS 3749, 107–114.
Chapter 11
A Bridge-Ship Collision Avoidance System
Based on FLIR Image Sequences
11.1 Introduction
The fact that more and more ships are built while their size becomes bigger and
bigger is introducing the high risk of collision between bridge and ship in inland
waterways. Incidences of ship-bridge collision mainly cause six types of results,
i.e. damage of bridge, people casualty, damage of ship and goods, economical loss,
social loss and environmental loss. A large amount of statistical analysis indicates
that one of main reasons resulting in ship-bridge collision is the execrable natural
environment such as poorly visible conditions, floods, etc. [1, 2].
J. Liu (B)
Navigation & Guidance Lab, College of Automatic, University of Chongqing,
Chongqing, 400030, China,
E-mail: h.wei@reading.ac.uk
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 123
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
124 J. Liu et al.
Mainly, there are two existing strategies to avoid bridge-ship collision at present
[3, 4]. One is a passive strategy in which fixed islands or safeguard surroundings
are built around bridge piers. The shortages of the passive method are: it could not
avoid ship damage from a collision; the costs are normally high; and it becomes
less effective with constant increase of ship size. The other is an active strategy
that uses radar or video images to monitor moving ships by measuring their course
for collision estimation. Compared with the passive method, the active method
avoids damage of both bridge and ship and its costs are low. However radar is
difficult to detect course changes immediately due to its low short-term accuracy,
and the high noise level makes radar sometimes hardly detect any objects from a
clutter background. Sensors for visible light do not work well under poorly illu-
minated conditions such as fog, mist and night. In contrast, infrared sensors are
capable of adapting weather and light changes during a day. Moreover, the FLIR
images overcome the problems that radar has, i.e. they have high short-term angle
accuracy.
In design, the first consideration of the FLIR surveillance system is its robust-
ness for detecting moving ships. The main difficulties are: (1) low thermal contrast
between the detected object and its surroundings; (2) relatively low signal to noise
ratio (SNR) under the weak thermal contrast; and (3) insufficient geometric, spatial
distribution and statistical information for small targets [5].
Motion detection in a surveillance video sequence captured by a fixed camera can
be achieved by many existing algorithms, e.g. frame difference, background estima-
tion, optical flow method, and statistical learning method [6]. The most common
method is the frame difference method for the reason that it has a great detec-
tion speed and low computation cost. However the detection accuracy by using
this method is strongly affected by background lighting variation between frames.
The most complex algorithm is the optical flow method, by which the computa-
tion cost is high. The statistical learning method needs training samples which may
not be available in most cases, and its computation cost is also relatively high. The
background estimation method is extremely sensitive to the changes of the lighting
condition in which the background is established.
In the FLIR surveillance video sequence used for moving ship detection, a back-
ground normally consists of various information, such as sky, the surface of river,
water waves, large floating objects (non-detected objects) in a flooding season, etc.
In many cases, ships and background in FLIR images are visually merged together.
It is very difficult to detect the targets (moving ships) using normal methods
mentioned above.
The new FLIR video surveillance system for bridge-ship collision avoidance
is proposed in Section 11.2. Section 11.3 presents the novel infrared image pre-
processing algorithm using multi-scale fractal analysis based on the blanket method.
The moving ship detection algorithm in the region of surveillance is also developed
in Section 11.3. Section 11.4 demonstrates the experimental results with detailed
discussion and analysis. Finally, conclusions and future work are presented in
Section 11.5.
11 A Bridge-Ship Collision Avoidance System Based on FLIR Image Sequences 125
In the system, a pan-tilt is fixed on bridge pier, and FLIR camera is installed on the
pan-tilt. The visual region of FLIR i.e. the region of surveillance, can be adjusted by
the pan-tilt, and is configured according to real conditions. The FLIR camera links
to a personal computer (PC) through a frame grabber. When images are captured,
the image processing program in PC’s memory is used to detect the moving ships.
When the moving ships are detected in the region of surveillance, the device for
safety alert is started. The ship driver could be alarmed if necessary, and he/she
would take maneuvers to avoid ship-bridge collision. The flowchart of the system
and sketch map of installation is depicted in Fig. 11.1.
Large amount of experiments carried out in the Yangtse River have proved that
the minimum pre-warning distance between bridge pier and ship to avoid collision
is 500 m in inland waterway, and the valid distance for moving ship detection is
from 800 to 2,000 m when the uncooled infrared FPA (focal plane arrays) ther-
mal imaging camera is used. Therefore, this type of camera is suitable for the
application. The camera resolution is 320 240 pixels. There are three ways
designed to trigger the pre-warning signal, i.e. automatically broadcast the pre-
recorded voice through very high frequency (VHF), automatically broadcast the
pre-recorded voice through loudspeaker, and automatically turn on the assistant
lighting system.
FLIR
Infrared camera
Pan-tilt
Frame grabber
Data acquisition to PC
Bridge Pier
Ship detection algorithms
No
Are there ships?
FOV
Yes
Safety alert
The ROS is defined based on various conditions in real parts of the inland waterway.
Consequently, the ROS appears as a part of region in the original image. The image
analysis and processing is focused on this region only. This excludes unwanted
regions to reduce computation cost in further processing.
Start
Difference Image
No
Num >Th (threshold)?
Yes
The Difference Number of ROS
N_dif = N_dif+1
No
N_dif > N_set?
Yes
There are moving ships
in ROS; N_dif=0
Safety alert
b.x; y; "/ D minfb.x; y; " 1/; min b.m; n; " 1/g (11.4)
j.m;n/.x;y/j1
where, " D 0; 1; ; "max , the image point .m; n/ with distance less than one from
.x; y/ is the four neighbors of .x; y/. To assure the blanket of the surface for scale
" includes all the points of the blanket for scale " 1, .m; n/ are chosen to be the
eight neighbors of pixel .x; y/ in the experiments.
"max is the maximum scale when the fractal features are calculated. "max 2 N;
"max 2:
At scale ", the volume between the upper surface and the lower surface is
calculated by Eq. (11.5):
X
xC" X
yC"
V .x; y; "/ D Œu.k; m; "/ b.k; m; "/ (11.5)
kDx" mDy"
Then, the estimate of the surface area at .x; y/ and " can be obtained by Eq. (11.6):
V .x; y; "/
A.x; y; "/ D (11.6)
2"
Taking the logarithm of both sides in Eq. (11.1), we have:
Linearly fitting log A.x; y; "/ against scale " in Eq. (11.7), fractal dimension D can
be obtained as a constant for all scales ".
For the constant K in Eq. (11.1), also named D-dimensional area [12], which
characterizes the roughness of surface, i.e. different surfaces have different K val-
ues. From this point of view, K is not a constant for the variation of scale ". When
two scales "1 , "2 are used in Eq. (11.7), we have:
In Eqs. (11.8) and (11.9), fractal dimension D is a constant for scale "1 and "2 , and
let "1 D "; "2 D " C 1, " D 0; 1; 2; "max , K.x; y; "/ is derived as
log A.x; y; "/ log." C 1/ log A.x; y; " C 1/ log."/
K.x; y; "/ D exp
log." C 1/ log."/
(11.10)
11 A Bridge-Ship Collision Avoidance System Based on FLIR Image Sequences 129
From Eq. (11.10), the D-dimension area K.x; y; "/ can be calculated from surface
area A.x; y; "/ at point .x; y/ along with scale ". We use a new function C.x; y/ to
measure the deviation of K.x; y; "/ against scale ", as presented in Eq. (11.11).
"max h
X X
"max i2
1
C.x; y/ D K.x; y; "/ K.x; y; "/ (11.11)
"D2
"max 1 "D2
C.x; y/ is the fractal feature used for ship detection in the algorithm. C.x; y/
appears high value for man-made objects, and much lower value for natural back-
ground.
In the process of moving ship detection, a difference between two binary images
generated from segmentation of C.x; y/ is used. A group of pixels with non-zero
values represent the difference. Based on the fact that the FLIR camera is fixed on a
bridge pier, the process is summarized as follows.
1. Generate two binary images Ci .x; y/ and Ci Cs .x; y/ by segmentation of two
frames with interval of s, fi .x; y/ and fi Cs .x; y/. In practice, the interval s is
chosen as five to ten frames.
2. Calculate the difference between image Ci .x; y/ and Ci Cs .x; y/, and obtain
D.x; y/.
3. To count the number of non-zero pixels in D.x; y/, and record it as Num.
4. If Num is larger than the threshold (th) which is experimentally obtained, denote
the number of pixels in ROS have changed once, then the difference number of
ROS (N-dif ) add one.
5. If N-dif is larger than a pre-set value N set from experiments, denote that moving
ships are detected in the ROS. In practice, the value N set is set as 2 or 3, which
is effective in reducing the false alarm ratio.
130 J. Liu et al.
The testing experiments were carried out in the Yangtse River in Chongqing city,
China. An FLIR camera was mounted on a bridge pier. A Celeron1.5 Ghz PC was
connected with the camera through a frame grabber, the frame size was 320 240,
and the frame rate was 30 fps. The parameter settings in the algorithm were, the
frame interval as ten frames, the value of threshold (th) as 5, the value of N set as 2,
and the value of "max as 4. A group of testing results is demonstrated in Fig. 11.3.
The average processing time for each step in the algorithm is shown in Table 11.1.
Observations have indicated that the speed of moving ships is from 20 to 30
km/h. The ROS defines the distance between moving ships and bridge pier as 800–
2,000 m. Therefore the time during which a ship driver takes action to avoid collision
to a bridge pier after altered is between 96 and 360 s. From Table 11.1, it is clearly
shown that the FLIR surveillance system takes about one s to complete a process,
due to the value N set as 2 or 3. It is satisfactory to the application with a real-time
manner.
Comparative experiments were also carried out for system performance analysis
in terms of reliability and effectiveness. The frame difference method was imple-
mented to be compared with the proposed method. FLIR frames were carefully
selected for this comparison. Weather conditions and time of a day were taken into
account when 400 frames with 286 moving ships involved were chosen as the testing
set. Two parameters were introduced as the criterion for the performance, i.e. false
alarm ratio (FAR) and missed alarm ratio (MAR). The comparative results are shown
in Table 11.2. Some typical experimental results are demonstrated in Fig. 11.4.
From the results in Table 11.2, it can be seen that the proposed method for bridge-
ship collision avoidance is superior to the frame difference method in the criterion
of both false alarm ratio and missed alarm ratio. From the results of the testing set
and Fig. 11.4, the system is capable of adapting weather and light changes during a
day.
It is worth to mention that while the FLIR system is mounted on bridge deck,
the performance of surveillance system is impaired by vibration caused by moving
vehicles on the bridge. Therefore, the FLIR system is mounted on a bridge pier in
practice.
11.5 Conclusion
This paper presented a novel FLIR video surveillance system for bridge-ship col-
lision avoidance by using the multi-scale fractal feature, by which moving ships
have successfully been separated from the complex background in inland waterway
images. The proposed algorithm for moving ship detection has achieved the real-
time performance within the ROS in FLIR video sequences. Experimental results
have proved that the developed FLIR video surveillance system is efficient in detect-
ing moving ships to alert possible bridge-ship collisions. Its wide adaptability and
11 A Bridge-Ship Collision Avoidance System Based on FLIR Image Sequences 131
Fig. 11.3 Testing results. (a) The 1st original frame. (b) The 11th original frame. (c) The ROS
image extracted from (a). (d) The ROS image extracted from (b). (e) C.x; y/ of the 1st frame. (f)
C.x; y/ of the 11th frame. (g) Segmenting binary image from (e). (h) Segmenting binary from
(f). (i) D.x; y/ between the 11th and the 1st frames, Num = 32. (j) between the 21st and the 11th
frames, Num = 25
high reliability fro weather and light changes have been proved in long period testing
in Yangtse River, Chongqing city, China.
Our future work will be extended to two venues. First, the investigation for track-
ing multiple ships will be carried out based on the ship detection results. Relative
tracking algorithms will be developed to address the issue in a clutter background.
Second, a data fusion of radar and infrared sensors may be explored to improve
system performance in making use of the complement of data from different
sensors.
11 A Bridge-Ship Collision Avoidance System Based on FLIR Image Sequences 133
Acknowledgement The authors would like to thank the Maritime Safety Administration of
Chongqing for providing experimental sites. This research is partially supported by the China
Scholarship Council (CSC).
References
1. Dai, T. D., Lie, W., and Liu, W. L., 1993, The analysis of ship-bridge collision in main
waterway of the Yangtze River, Navigation of China, 4:44–47.
2. Van Manen, S. E., 2001, Ship collisions due to the presence of bridge, International Navigatio
Association (PIANC), Brussels, Report of WG 19.
3. Zhu, Q. Y., 2006, Pier anticollision system based on image processing, Master’s thesis,
Department of Information Engineering, Wuhan University of Technology, Wuhan.
4. Wu, J., 2004, Development of ship-bridge collision analysis, Journal of Guangdong Commu-
nication Polytechnic, 4:60–64.
5. Liu, J., Huang, X. Y., Chen, Y., and He, N. S., 2007, Target recognition of FLIR images on
radial basis function neural network, in Proceedings of Advances in Neural Networks, ISNN
2007, 4th International Symposium on Neural Networks, Nanjing, pp. 772–777.
6. Zhan, C. H., Duan, X. H., Xu, S. Y., Song, Z., and Luo, M., 2007, An improved moving
object detection algorithm based on frame difference and edge detection, in Proceedings of
the Fourth International Conference on Image and Graphics, Chengdu, pp. 519–523.
7. Mandelbrot, B. B., 1982, The fractal geometry of nature. New York: W.H. Freeman, pp. 1–24.
8. Pentland, A., 1984, Fractal-based description of natural scenes, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 6:661–674.
9. Peleg, S., Naor, J., Hartley, R., and Avnir, D., 1984, Multiple resolution texture analysis and
classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:518–523.
10. Zhang, H., Liu, X. L., Li, J. W., and Zhu, Z. F., 2006, The study of detecting for IR weak and
small targets based on fractal features, In Lecture Notes in Computer Science, Advances in
Multimedia Modeling. Springer, Heidelberg/Berlin, pp. 296–303.
11. Ostu, N., 1979, A threshold selection method from gray-level histograms, IEEE Transactions
System, Man and Cybernetics, SMC-9:62–66.
12. Li, J. and Zhang, T. X., 1996, Target detection based on multiscale fractal parameter change,
Journal of Data Acquisition & Processing, 11(3):218–221.
Chapter 12
A Gray Level Feature Detector and Its
Hardware Architecture
Abstract This chapter describes a fast real time gray scale based feature point detec-
tor and its hardware architecture for FPGA based realization. The implementation
is based on a new efficient technique which is fusion of both affine transformation
invariance and robustness to noise. The novelty of the proposed approach lies in
its highly accurate localization and realization only in terms of addition, subtraction
and logic operations. The algorithm is designed to keep the high throughput require-
ments of today’s feature point detectors and applications in Silicon. The proposed
implementation is highly modular with custom scalability to fit devices like FPGAs
etc, with different resource capacity. The implementation can be ported to any real
time vision processing systems where power and speed are of utmost concern.
12.1 Introduction
N. Nain (B)
Associate Professor, Department of Computer Engineering,
Malaviya National Institute of Technology Jaipur-302017, India,
E-mail: neetanain@yahoo.com
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 135
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
136 N. Nain et al.
significantly. A large number of feature point detectors like [1–7] etc. are reported
in literature which are mainly classified into two categories: template based and
geometry based. Template based approach is to build a set of corner templates and
determine the similarity between the templates and all the sub windows of the gray
level image. Though simple these techniques are time consuming due to multiple
iterations. Geometry based relies on measuring differential geometry features of
corners. They are of two kinds: boundary based and gray level based. Boundary
based, first extract the boundary as a chain code, and then search for significant
turnings at boundary. These techniques suffer from high algorithm complexity, as
multiple steps are needed also the corner accuracy is dependent on the bound-
ary extraction technique used. Gray-level based methods, directly operate on the
gray level image. The feature detector proposed in this work belongs to this last
category.
The chapter is organized as Section 12.2 describes the proposed feature point
detection algorithm. Its hardware realization is explained in Section 12.3. The syn-
thesis results and its analysis is given in Section 12.4 followed by related discussion
and conclusions in Section 12.5.
Where I.i; j / is the pixel under consideration. The response of the 2 2 differ-
ence operator can be classified as, If (H1 D H2 D V1 D V2 D 0) then it is
a constant region; If ((H1 AND H2 > P1 ) OR (V1 AND V2 > P1 /) then it is
a region of only 1D change; If ((H1 OR H2 AND V1 OR V2 / > P1 ) then the
region contains FP s (detected as 2D change). Here P1 is the threshold value
for FP detection which can be varied as per the interest of feature detection. For
example, P1 D 1 will detect all intensity variations as FP s, and P1 D 255 will
detect only step changes (black-to-white or white-to-black) as FP s. The differ-
ence operator detects candidate points for feature detection. As we are interested
only in the 3rd category of responses for FP detection hence each pixel satisfy-
ing this criteria is further processed to determine whether it is a true FP or not.
2. Apply the Pseudo Gaussian Mask to the Points Satisfying the First Differ-
ence Operator – In real case scenario we receive images which are blurred due
to several types of noise and thus FP s and also edges are not well defined. To
overcome these problems and increase the noise immunity of our algorithm we
propose a Pseudo-Gaussian kernel which is derived from a 5 5 Gaussian kernel
having D 1:3. It is further modified and normalized so as to have all factors
some exponent of 2 to accomplish operations like multiplication or division(in
the Gaussian) by simple shift of the bits during calculations. Due to constrained
time and performance intensive requirement for real time applications the Gaus-
sian smoothing is applied only around those pixels which are part of the region
containing FP s. Additionally due to the nature of our difference kernel we apply
only a partial of the entire Gaussian kernel. This partial implementation as shown
in Fig. 12.2 reduces the Gaussian averaging overhead by 75% but produces the
desired noise smoothing on the set of 2 2 pixels under consideration. The new
Gaussian averaged values of the four pixels under consideration (where we get
the third category of responses) are calculated as:
0 0
I.i;j / D Gk I.i;j / I I.i;j C1/ D Gl I.i;j C1/ (12.5)
0 0
I.i C1;j / D Gm I.i C1;j / I I.i C1;j C1/ D Gn I.i C1;j C1/ (12.6)
3 11 12 5
7 10 4
The overall architecture is depicted in the block diagram as shown in Fig. 12.4. Two
RAM blocks are used where RAM block 1 stores the original image with each pixel
represented in gray scale value of 8 bits. Whereas RAM block 2 stores the resultant
image after false positive removal and also stores the IC of each pixel. RAM block
2 is twice the size of the input image. RAM block 2 also feeds the Non-Maxima
suppression module with the resultant IC for suppression of weak features. It is
best to have the fastest possible memory for reducing the processing time of the
140 N. Nain et al.
algorithm as reading and writing from the RAM is the most expensive operation in
our entire algorithm.
Address Generator module generates the memory addressing sequence for both
the RAM modules. It is capable of generating individual addresses depending upon
the control signals from the Control Block.
Control Block is the most important module as it synchronizes the various blocks.
It is made of a State Machine converted to its VHDL entity. It controls the read/write
sequences necessary to proceed further. It takes as input the results of each block
compares them to its present state and decides where to proceed. It is capable
of clearing the module data and assigning the threshold values P1 and P2 for
comparison operation.
Gradient Module 1 uses 4 comparators and 4 subtractors along with other logic
elements to calculate the Gradient mask. This module is again reused as Gradient
Module 2, the only difference being in the threshold value which is P2 instead of P1
as used in the 1st module. This requires 4 read cycles to obtain the pixel information
necessary for calculations.
Pseudo Gaussian Kernel Module is called only if a pixel satisfies the Gradi-
ent Module 1, otherwise processing of all the following modules are skipped and
the address generator moves to the next pixel. This step requires another 20 read
cycles to obtain the total of 24 pixel values compulsory for its mask calculation and
obtaining the updated values for the next gradient module. It has been determined
experimentally (on real images) that on an average only 40% of the entire image
12 A Gray Level Feature Detector and Its Hardware Architecture 141
pixels actually enters this module. This module uses only left shift operations and
20 adders(8 bit) to approximates the convolution and multiplication.
This is followed by the Gradient Module 2 in which apart from mask calculations
the IC is also computed. It is a background module which adds specific value to the
IC map and writes it into the RAM module 2 taking a total of 9 write cycles.
All the pixels that satisfy the Gradient Module 2 are then processed into the False
Positive Removal Module which requires another 7 read cycles, to obtain the unread
pixels required for its processing. It further requires around 16 comparators and 16
subtractors to formulate the input for its Conditional Block.
Once the entire image is over, the Control Unit starts the Non-Maxima Suppres-
sion module which reads the IC of each pixel and determines its significance. It
uses 3 comparators which compares the IC of the current pixel with 3 constant
values in a step manner and colors the pixel accordingly. It uses 2 Latches(8 bit
wide) that prevents unwanted comparing operations. The module first works on all
the column pixels of the current row and then increments the row number for fur-
ther processing. Maximum column and row number for a module can be specified
in the Control Unit Registers. We have used a common shared register having 31
8-bit values, this register stores the pixel values as and when they are read from the
RAM for the current iteration. Once the iteration is completed, instead of clearing all
the values, we shift the values to new positions for the next iteration. This saves the
common read values for the next iteration. This on an average reduces the memory
overhead by 50%. Figure 12.5 shows the Low level block diagram of two of the
various modules depicting the actual mapping of the VHDL into schematic entities.
The top level module is a schematic and hence each VHDL module was converted
into its schematic before adding to the top level module.
The complete hardwired approach is suitable for use in ASIC design or a system
with a vision co-processor. It would be difficult to be used in an existing embedded
Fig. 12.5 Internal structure of module 1 (a) Gradient module, (b) partial non-maxima suppression
module
142 N. Nain et al.
applications. For full filling this usability we took another approach. The system was
designed as an SOPC with NIOS II as the core processor, and then the algorithm’s
logic was added by means of Custom Instructions. The NIOS II processor was then
programmed in C code for executing on the [11] and [12] systems.
The Compilation, verification and Simulation of the architecture is done for both
Altera’s Cyclone II and FPGA device (EP2C20). The entire verification process was
carried on 480 x 640 sized images. The timing analysis post synthesis and simulation
resulted in a throughput speed of about 500 images/s (assuming a minimum average
delay of 6:69 ns). The worst case scenario is evaluated by testing an image with
alternate black and white pixels and assigning P1 and P2 both to 1, thus ensuring
that each and every pixel enters all the modules and time taken is maximum. This
resulted in the worst case timing delay of 19 ns, making our implementation execute
200 images/s. Further the complete implementation requires a tightly coupled (or a
dedicated) RAM having a total of 1;500 KB of free space. Its synthesis resulted
in the implementation summary as enlisted in Table 12.1. Table 12.1 also depicts
the minimum timing latency for each module. Assuming the serial nature of the
iteration for single feature detection module the minimum latency post synthesis
turns out to be 18:2 ns. Thus a 480 640 size image would take 5:3 ns to execute.
Although in real time applications it never takes that much as only 40% of the total
pixels reach the second module, from which only 20% reach the Gradient module
2 while only 10% of the total image pixels actually reach the false positive removal
module. Further this results in only 10% of all the pixels being finally fed into the
Non-maxima suppression module. Re-calculating the modified minimum delay it
comes out as:
Table 12.1 Resource utilization summary and timing analysis for each of the individual modules
Module name Utilization Utilization Latency
(slices) percentage (ns)
(out of 18742)
Fig. 12.6 Output on two of the test images after (a) First difference operator, (b) Second difference
operator (c) False positive removal
The SOPC system with the updated NIOS II core (Supplemented with our Custom
Instructions) was compiled and executed on the DE1 board. Table 12.2 depicts a
detailed description of the system’s resource utilization. The simulated result of the
architecture on two of the test images is shown in Fig. 12.6. Comparing our imple-
mentation results with two of the most popular FP detection algorithms by [5, 8]
as shown in Table 12.3 ( [9] implementation is yet to be completed and thus we
have only included its estimated resource utilization) we can conclude that our
architecture can be regarded as one of the most resource effective and high per-
forming implementation of a feature detector in hardware. Further a frequency of
50 MHz ensures that the power requirement for this implementation is very compet-
itive. Thus for real time applications, we can use Architecture 1 for high throughput
requirements, as it is suitable for the highest resolution video and vision processing
systems. Whereas Architecture 2 can be added to an existing embedded system and
can process QCIF and CIF frames at a rate of upto 25 frames/s.
144 N. Nain et al.
Table 12.3 Comparison with two other popular feature detectors implemented in hardware
Implementation Resource Maximum Throughput
name utilization- frequency- (images/s)
slices MHz
[9] 78,728 — —
[10] 685 58.72 120
Our Algorithm-
Architecture 1 493 50 500
Our Algorithm-
Architecture 2 5,278 100 30
12.5 Conclusions
We proposed a true real time FP detector and successfully depicted the possibility
of embedding a high performance feature detector in an embedded Vision System.
The proposed architecture effectively shows the hardware porting of our own novel
algorithm for feature detection. Our architecture uses only integer add, subtract and
shift operations leading to high performance throughput. It is also modular in its
entirety and can be scaled depending upon the resource availability of the FPGA
device. The above resource utilization numbers can be used for estimating the uti-
lization for the scaled version of the above algorithm. The algorithm incurs at most
28 subtractions(or additions) at any pixel, also assuming the worst case scenario of
highest noise content in the images only 80% of all the pixels will pass the first
step for further processing, giving an upper limit of average subtractions per pixel
to 22. No expensive multiplication or division operations are involved in our algo-
rithm making it one of the most suitable contender for real time implementations.
Further emphasizing the claim that it is an excellent contender for use in hardware
implementations and embedded systems alike. Comparing the algorithms code-size
to [7], it is a mere 5 KB (100 lines of Code) instead of a 145 KB (3;500 lines of
code) of Rostenś. Thus it will find direct use in embedded systems having a very
low memory footprint. Apart from the performance gain, as every pixel operation is
dependent only on its surrounding 5 5 pixels independent of the rest of the image,
so coarse level parallelization is very much possible. As a future work the proposed
technique can be parallelized and pipelined to obtain higher throughput.
References
1. Moravec, H P (1977). In Towards Automatic Visual Obstacle Avoidance, page 584. Proceed-
ings of the 5th International Joint Conference on Artificial Intelligence.
2. Harris, C. and Stephens, M. (1988). In A Combined Corner and Edge Detector, volume 23,
pages 147–151. Proceedings of 4th Alvey Vision Conference, Manchester.
12 A Gray Level Feature Detector and Its Hardware Architecture 145
3. Mokhtarian, F and Soumela, R (1998). In Robust Image Corner Detection Through Curva-
ture Scale Space, volume 20, pages 1376–1381. IEEE Transcations on pattern Analysis and
Machine Intelligence.
4. Lowe, D G (2004). In Distinctive Image Features from Scale-Invariant Keypoints, volume 60,
pages 91–110. International Journal of Computer Vision.
5. Smith, Stephen M. and Brady, J. Michael (1997). Susan – a new approach to low level image
processing. International Journal for Computer Vision, 23(1):45–78.
6. Trajkovic, M and Hedley, M (1998). Fast corner detection. International Journal for Image
and Vision, 16(2):75–87.
7. Rosten, Edward and Drummond, Tom (2006). In Machine Learning for high-speed Corner
Detection, volume 17, pages 211–224. ECCV.
8. C Schmid, Roger Mohr and Bauckhage, C (2004). In Comparing and Evaluating Interest
Points, volume 655, pages 63–86. INRIA Rhone, Montbonnot, France, Europe.
9. Cabani, Cristina and Maclean, W. James (2006). In A Proposed Pipelined Architecture
for FPGA Based Affine Invariant Feature Detectors, pages 112–116. IEEE Conference on
Computer Vision and Pattern Recognition.
10. Cesar Torris-Huitzil, Miguel Arias-Estrada (2000). In An FPGA Architecture for High Speed
Edge and Corner Detection, pages 112–116. IEEE Transactions in Computing.
Chapter 13
SVD and DWT-SVD Domain Robust
Watermarking using Differential Evolution
Algorithm
Veysel Aslantas
Abstract This study aims to develop two optimal watermarking techniques based
on SVD and DWT-SVD domain for grey-scale images. The first one embeds the
watermark by modifying the singular values of the host image with multiple SFs.
In the second approach, a hybrid algorithm is proposed based on DWT and SVD.
Having decomposed the host image into four bands, SVD is applied to each band.
Then, the same watermark is embedded by modifying the singular values of each
band with different SFs. Various combinations of SFs are possible, and it is dif-
ficult to obtain optimal solutions by trial and error. Thus, in order to achieve the
highest possible transparency and robustness, optimization of the scaling factors
is necessary. This work employs DE to obtain optimum SFs. DE can search for
multiple solutions simultaneously over a wide range, and an optimum solution can
be gained by combining the obtained results appropriately. Experimental results of
developed techniques show both the significant improvement in transparency and
the robustness under attacks.
13.1 Introduction
In the past decade, there have been great developments in computer and communica-
tion technology that have enabled digital multimedia contents such as audio, image
and video to be reproduced and distributed with ease. Besides these advantages are
provided by the technology, it also enables illegal operations in these materials such
as duplication, modification and forgery. Due to that, protection of multimedia data
V. Aslantas
Erciyes University, Department of Computer Engineering, 38039 Melikgazi, Kayseri, Turkey,
E-mail: aslantas@erciyes.edu.tr
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 147
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
148 V. Aslantas
has become a very crucial and urgent matter for data owners and service providers.
Several methods have been proposed in order to provide protection against copyright
forgery, misuse or violation. Among the methods, digital watermarking techniques
are the most commonly used.
The principal objective of digital watermarking is to embed data called a water-
mark (tag, label or digital signal) into a multimedia object with the aim of broadcast
monitoring, access control, copyright protection etc. The object may be an image
or video or audio. Watermarking techniques can be divided into various categories
according to visibility, permanency and domain [1, 2].
In relation to the visibility, digital watermarks can be of two types: visible or
invisible watermarks. Visible watermarks can easily be detected by observation such
as an embossed logo superimposed upon the image indicating the ownership of
the image. Although the owner of the multimedia object can be identified with no
computation, watermarks can be removed or destroyed with ease. On the contrary,
invisible watermarks are designed to be imperceptible (transparent) and are inserted
on the unknown places in the host data. The watermarked data should look similar
to the original one, and should not cause suspicion by others. If it is used in an
illegal way, the embedded watermark can then be utilized as a proof for showing the
ownership of the multimedia object. The invisible watermarks are more secure and
robust than the visible watermarks and are the subject of this chapter.
In the class of invisible watermarks, one may further categorise techniques
according to permanency as fragile, semi-fragile and robust. Fragile watermarks
are embedded in such a way that slight changes to the watermarked image would
destroy the watermark. Hence, they are mainly used with the purpose of authentica-
tion. Semi-fragile watermarks are designed to tolerate some degree of modification
to the watermarked image, for instance, the addition of quantization noise from
lossy compression. Robust watermarks are designed to resist intentional or unin-
tentional image modifications called attacks that attempt to destroy or remove the
watermark. Such attacks include filtering, geometric transformations, noise addi-
tion, lossy compression, etc. In general, this kind of watermarks is employed for
copyrights protection and ownership verification.
Depending upon the domain in which the watermark is embedded, the techniques
can be grouped into two classes: spatial and frequency-domain techniques. In spa-
tial domain, the pixel values of the cover image are modified for embedding the
watermark. On the other hand, the watermark is embedded in transform domain by
means of modulating the coefficients of the transformed host image according to
watermark. Most of the transform domain watermarking techniques developed with
the use of Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT),
and Discrete Wavelet Transform (DWT). Although spatial domain methods are less
complex as no transform is used, transform domain watermarking techniques are
more robust in comparison to spatial domain methods against attacks.
Two important and conflicting requirements in watermarking are perceptual
transparency and robustness. If significant modifications are introduced to the host
image either in spatial or transform domain, greater robustness, in general, can
be obtained. However, such modifications are distinguishable and hence do not
13 SVD and DWT-SVD Domain Robust Watermarking 149
of S k
3. Apply SVD to the watermark: W D 3. Obtain possible corrupted singular val-
UW SW VW where W i i D 1; : : :; N are ues of the watermark from each subband:
the singular values of SW d W i k D .d i k i k /=˛k ,
4. Modify the SVs of each subband with the 4. Using the SVs, calculate the four watermarks
SVs of the watermark: k i D k i C (possibly distorted): Wk D UW Sd W k VW T
,
˛k wi k D 1; 2; 3; 4
5. Obtain the modified coefficients of the each
subbands: I k D U k S k V kT
6. Compute the watermarked image by apply-
ing the inverse DWT using
The reminder of this section explains the way that DE is employed to optimize
watermark embedding process with respect to the two conflicting requirements:
transparency and robustness to different attacks.
An N xN host image (or subband) can have N SVs that may reveal different
tolerance to modification. Prior knowledge may not be available about the sensi-
tivity of the image with respect to various values of the SFs so an algorithm needs
to be employed to compute the optimum scaling factors that produce maximum
robustness and transparency. Thereby, DE is employed in this work to achieve this
objective.
where xi;.L/ and xi;.H / are the lower and higher boundaries of D-dimensional
vector xi D fxj;i g D fx1;i ; x2;i ; : : :; xD;i gT , respectively.
Mutation: The main operator of DE is the mutation operator that is employed to
expand the search space. For each target vector xi;G , a mutant vector vi;G C 1 is gen-
erated by the combination of vectors randomly chosen from the current population
at generation G as:
vi;GC1 D xr1;G C F .xr2;G xr3;G / (13.2)
152 V. Aslantas
where i , r1 , r2 , and r3 are mutually different indexes selected from the current
generation f1; 2; : : :; NP g. As a result, the population size must be greater than 3.
F 2 Œ0; 2 is a user-defined real constant called step size which control the perturba-
tion and improve convergence by scaling the difference vector. Smaller values for F
result in faster convergence in the generated population and larger values in higher
diversity.
Crossover: This operator combines successful solutions of the previous genera-
tion in order to maintain diversity in the population. By considering the elements
of the mutated vector, vi;G C 1 , and the elements of the target vector, xi;G , the trial
vector ui;G C 1 is produced as:
I
W Correlation
computation IW
mutation
W*
DE Algorithm
crossover Watermark
extraction
Fitness evaluation
Attacks
selection IW
Watermarking
No embedding
Final generation
k W
Yes
Watermark Host image
Watermarked
image k
where fi and t are the fitness value of the i th solution and the number of attack-
ing methods, respectively. corrI () and corrW () are related to transparency and
154 V. Aslantas
1. Define the fitness function, numbers of variables (D) and the values for population
size (NP), control parameters (F and CR), and number of generations (or any other
terminating criteria)
2. Generate an initial population of potential solutions at random
3. Calculate watermarked images using the solutions in the population by means of
embedding process (Tables 13.1 and 13.2)
4. Apply the attack functions upon the watermarked images one by one
5. Extract out the watermarks from the attacked images using the extraction procedure
(Tables 13.1 and 13.2)
6. Obtain the NC values between the host and each watermarked images
7. Compute the NC values between the watermark and the extracted ones
8. Evaluate the fitness value for each corresponding solution
9. Apply mutation and crossover operators
10. Apply the selection process
Repeat Steps 3–11 until a predefined termination criteria are satisfied
Fig. 13.2 (a) Host image, (b) watermark, (c) SVD domain watermarked image, (d) DWT-SVD
domain watermarked image
13.4 Results
Table 13.4 Control parameters used for SVD and DWT-SVD domain methods
Control parameters SVD domain DWT-SVD domain
NP 150 16
F 0.6 0.5
CR 0.8 0.6
Number of parameters (D D SFs) 32 4
Maximum generation number 400 200
Mutation method DE/best/1/exp DE/best/1/exp
13.5 Conclusions
In this chapter, two optimal watermarking schemes based on SVD and DWT-SVD
domain are presented. As different images (or subbands) may have different sin-
gular values, the proposed techniques provide a general solution for determining
the optimum values of the scaling factors for each image/subband. Differential
156
RT RS GN TR
SVD Domain
DWT-SVD Domain
SVD Domain
DWT-SVD Domain
Table 13.6 SVD domain correlation values . corrI .I; I / and corrW .W; W //
DE Single SF
SFs 0.1 0.3 0.5 0.7 0.9
Table 13.7 DWT-SVD domain correlation values . corrI .I; I / and corrW .W; W //
DE Single SF
SFs 0.005 0.01 0.05 0.1 0.5
corrI 0.9990 0.9999 0.9999 0.9996 0.9985 0.9750
RT 0.9910 0.2512 0.2693 0.3551 0.4121 0.6851
RS 0.9815 0.2916 0.3040 0.4574 0.6227 0.9619
AV 0.9868 0.5281 0.5154 0.4353 0.6176 0.9780
SH 0.9840 0.8213 0.8332 0.8859 0.9227 0.9761
corrW GN 0.9982 0.7527 0.7929 0.9859 0.9950 0.9933
TR 0.9896 0.5733 0.9298 0.9978 0.9994 0.9996
CR 0.8339 0.6942 0.6999 0.7297 0.7998 0.9755
JPEG 0.9974 0.6320 0.7105 0.9315 0.9634 0.9971
References
14.1 Introduction
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 161
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
162 M.M. Ahasan, D.J. Parker
subjects will not be suitable for large animals such as horses. The macroPET sys-
tem described here was constructed as a prototype to demonstrate the feasibility of
performing PET scans on a large scale. It was created by reconfiguring components
from an original ECAT 951 system [8], which had 32 detector modules (“buckets”)
mounted in two rings with an inner diameter of 100 and a 10.8 cm axial field of
view. In the macroPET system the 32 buckets are mounted in a single ring with
an inner diameter of 2.34 m. This paper presents the macroPET design and initial
measurements of its characteristics. To the best of our knowledge, this is the first
attempt to construct and test a PET scanner with such a large diameter.
The ECAT 951 scanner, manufactured by CTI Inc, is based on bismuth germanate
(BGO) block detectors. Each block consists of a BGO crystal 50 60 30 mm3 ,
backed by four photomultiplier tubes (PMTs). The front of the crystal is cut into an
8 8 array of crystal elements, each 6:25 6:75 mm2 . By comparing the light inten-
sity detected in the four PMTs, a -ray interaction can be unambiguously assigned
to an individual crystal element. The blocks are grouped into buckets each com-
prising four detector blocks together with their associated electronics (preamplifiers
and discriminators) under the control of a microprocessor. Data from all the buckets
are fed to a set of coincidence processors which identify events in which a pair of
511 keV -rays have been detected in two buckets within the resolving time of the
system (6 ns).
In the original ECAT 951 the 32 buckets were mounted in two adjacent rings,
each with an inner diameter of 100 cm. The 8,192 individual detector elements
were thus arranged in 16 rings, each of 512 elements. For each bucket, coincidences
could be detected with any of the seven opposing buckets in the same ring or in
the other ring. The resulting field of view was a cylinder approximately 60 cm in
diameter.
For macroPET (Fig. 14.1), the detector blocks have been remounted in a single
horizontal ring of 128 blocks. Because the blocks are tapered to fit tightly together
in the original 64 block rings, in the new arrangement there are gaps of approxi-
mately 7.5 mm between the front faces of adjacent blocks. The inner diameter of the
single ring is 234 cm. For convenience, the blocks are mounted on eight separate alu-
minium base plates, each corresponding to a 45ı segment of the ring. The blocks are
connected in fours to the original sets of bucket electronics, with bucket controllers
from the two original rings alternating around the new single ring. Overlapping of
controllers is made possible by displacing alternate controllers vertically.
By alternating the buckets from the original rings, the original coincidence com-
binations of buckets are appropriate in the new geometry, and enable imaging to
be performed over a field of view approximately 125 cm in diameter. The eight
rings of detector elements span an axial (vertical) field of view of 5.4 cm, which
14 Design and Performance Evaluation of a Prototype Large Ring PET Scanner 163
is divided into 15 image planes (eight direct planes and seven cross planes) with a
plane separation of 3.375 mm.
The initial results reported here were obtained by accepting data with a maxi-
mum ring difference of 3, as is generally the case in acquisition of “2D” PET data.
Purpose-built septa were not constructed for macroPET, but in an attempt to inves-
tigate whether such septa would be valuable some studies were performed with the
original ECAT 951 septa mounted inside the macroPET ring. This limits the FOV
to 82 cm diameter. Apart from these septa, macroPET has no shielding against out
of field activity.
Events within the energy window from 250 to 850 keV were accepted from the
system. A delayed timing window was used to record the distribution of random
coincidences. Data were initially recorded in histograms (prompt minus delayed
events) appropriate to the image planes of the original geometry and were then
rebinned into the correct sinograms for the new geometry. In this rebinning the gaps
between blocks were allowed for by treating each large ring as if it consisted of
1,152 (128 9) detector elements with every ninth detector absent, and interpo-
lation was used to complete the missing sinogram elements. Arc correction of the
projection data was achieved by performing a further rebinning with linear weight-
ing. To compensate for the difference in efficiency of different detector elements, a
normalisation was applied based on the response to a central uniform phantom.
Simple 2D filtered backprojection has been used to reconstruct all the images pre-
sented here, using a Hamming filter of cutoff frequency 0.4 and a 256 by 256 matrix
with zoom factor from 1 to 4. Results measured with septa in place are reported in
brackets.
164 M.M. Ahasan, D.J. Parker
The transaxial spatial resolution of the system, with or without septa, was measured
using data from one or more 68-Ge line sources mounted parallel to the scanner axis.
In each case, a Gaussian fit was performed on the profile across the central plane of
the reconstructed images (Fig. 14.2),
p and the full-width at half-maximum (FWHM)
was calculated as FWHM D 2 2 ln 2, where is the standard deviation of the
fitted Gaussian function. The pixel size in the reconstructed images was determined
experimentally from the image obtained using two 68-Ge line sources separated by
a known distance (40 cm apart).
On the central axis, the transaxial resolution (FWHM) was determined as
11.2 mm (11.7 mm). The resolution degrades slightly as the source is moved off
axis. At 40 cm off-centre, values of 12.0 mm (11.4 mm) were measured, and at 50 cm
off-centre the no-septa resolution was measured as 12.1 mm.
The axial resolution of the system was measured using data from an 18-F point
source positioned at the scanner centre (no-septa mode), and fitting a Gaussian func-
tion to the profile across the 15 image planes (Fig. 14.3). This central point source
gave a transaxial resolution of 10.8 mm and an axial resolution of 9.3 mm.
40000
Unnormalized counts
30000
20000
10000
0
50 75 100 125 150 175 200
-10000
Distance (pixels)
Fig. 14.2 Reconstructed image for two 68-Ge line sources and its profile
1500
Activity (a.u.)
1000
500
These resolution values are all significantly larger than the spatial resolution
quoted for the original ECAT 951 system (around 6 mm). Much of the difference
can be attributed to the effect of acollinearity of the two photons emitted in elec-
tron/positron annihilation. The mean angular deviation from collinearity is around
0:4ı [9] and the resultant effect on resolution is 0.0022D, where D is the ring diam-
eter [10]. This is expected to contribute around 5.1 mm to the FWHM of images
measured using a detector ring of diameter 2.34 m. It is also possible that errors in
positioning individual blocks in the large ring contribute to the poor spatial resolu-
tion. Nevertheless, the scanner has been shown to give acceptable spatial resolution
over a FOV of at least 1 m diameter.
14.3.2 Sensitivity
The sensitivity of the scanner was determined from the count rate measured using
bare sources of known activity in both acquisition modes. To avoid problems of
dead-time etc., these measurements were performed using sources of relatively low
activity. A 14 cm long 68-Ge line source of total activity 200 kBq was mounted
along the scanner axis, so that the activity in each 3.375 mm slice was approxi-
mately 4.8 kBq, and the count rate in each image plane was determined. In another
measurement, an 18-F point source with an activity of 15 MBq was mounted at the
centre of the field of view, and the total count rate (all planes) was found. The sen-
sitivity calculations were corrected for the positron branching ratios of 18-F (0.967)
and 68-Ge (0.89).
Figure 14.4 shows the axial sensitivity profiles determined using a 68-Ge line
source on the scanner axis. The profiles are approximately as expected based on
the number of detector rings which can contribute to each image plane, using a
maximum ring difference of 3 (four combinations in each cross plane from 4 to 12,
three combinations in each direct plane from 3 to 13, and fewer combinations in the
end planes). The septa reduce the sensitivity by approximately 55% in each plane.
The absolute values of sensitivity without septa (approximately 1.87 and
1.39 cps/kBq for cross and direct plane respectively) are significantly lower (by
1.5 0.6
1 0.4 at the centre
at the centre
0.5 0.2 at 25cm off-axis
0 0
1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15
Plane /sinogram Plane /sinogram
Fig. 14.4 Axial sensitivity profile for a line source positioned along the scanner axis
166 M.M. Ahasan, D.J. Parker
10000
1000
Counts
Counts
1000
100
100
10 10
0 100 200 300 400 0 100 200 300 400
Projection bin Projection bin
Fig. 14.5 Profile across the sinogram measured for a 68-Ge line source at the centre of a 20 cm
diameter cylinder of water without septa (left) and with septa (right)
Fig. 14.6 Central plane (plane 8) sinogram measured from three 68-Ge line sources inside a large
(62 38 cm2 / water tank (no-septa mode)
tank, and coincidence events were acquired for 66 h. Figure 14.6 shows the resulting
sinogram.
Values for scatter fraction were obtained by analysing the profiles at angles A
(looking directly along the long axis of the water tank, with the three sources in line)
and B (looking along the short axis). SF values of approximately 63% and 38% were
obtained along A (62 cm long) and B (38 cm wide) respectively. This measurement
was repeated with septa in plane but the results were inconclusive because the count
rate was very low and there was significant influence from out-of-field activity.
168 M.M. Ahasan, D.J. Parker
Count rate performance is evaluated by calculating the noise equivalent count rate
(NEC) which takes into account the statistical noise due to scatter and random
events. The general formula [14], to calculate the NEC rate is NEC D T2 =.T C
S C kR/, where T, S and R are the true, scatter and random rates respectively. The
parameter k is the randoms correction factor, with a value of 1 or 2 depending on
the randoms correction method. A value of 2 is used when randoms are measured
using a delayed coincidence window as in the present work.
NEC rates are usually quoted using a 20 cm diameter cylindrical phantom with
an axial length of 20 cm. For macroPET, a similar phantom was used but it was only
filled to a depth of around 5 cm to avoid the effect of out-of-field of activity due to
the lack of any shielding on the current scanner. NEC rates were measured in both
no-septa and septa acquisition mode. The phantom was filled with 1,500 ml water
containing an initial activity of 425 kBq/ml (1.1 MBq/ml) of 18-F, and was then
placed in the centre of the scanner. A set of 10 min scans was acquired every half
hour for almost 12 h. The contribution of scattered events was estimated assuming
a scatter fraction of 26% (17%). The noise equivalent count rates were calculated
from the NEC formula.
Figure 14.7 shows the various components of the count rate in both acquisition
modes, as a function of source activity, and also the resulting NEC rate. Although
the scanner delivers coincidence rates of up to 100 kcps, the NEC rate peaks at
around 30 kcps in the no-septa case due to the significant contributions of random
and scattered events. The peak NEC rate was also measured as 28 kcps in septa con-
figuration. The NEC performance would be even poorer in both acquisition modes
if out of field activity were present.
Randoms
100 100
Multiples
75 NEC 75
50 50
25 25
0 0
0 100 200 300 400 0 200 400 600 800
Activity concentration (kBq/ml) Activity concentration (kBq/ml)
Fig. 14.7 Count rates for a 20 cm diameter phantom: without septa (left), with septa (right)
14 Design and Performance Evaluation of a Prototype Large Ring PET Scanner 169
The Jaszczak SPECT phantom is often used for assessing the image quality in
nuclear medicine images. It consists of a cylinder with perspex inserts which cre-
ate cold lesions when the phantom is filled with radioactive solution. In the present
work a Jaszczak phantom with internal diameter 18.6 cm was used, containing six
different sets of cylindrical rods with diameters 4.8, 6.4, 7.9, 9.5, 11.1, 12.7 mm.
The images reported here were obtained in no-septa mode by filling only the
lower half of the phantom with approximately 2.5 l of water, and adding approxi-
mately 200 MBq of 18-F. Data were acquired for 600 s per image, with the phantom
mounted at the centre of the field of view and at different radial offsets. Each image
was reconstructed using filtered backprojection and incorporating the calculated
attenuation correction for a water-filled cylinder.
Due to the curved nature of the PET geometry, lines of response (LORs) near the
periphery of the field of view are more closely spaced than those near the centre. To
avoid distortion due to this effect, the sinogram data were rebinned before recon-
struction (a process referred to as “arc correction”). Figure 14.8 shows the centre
plane image from the phantom mounted with its axis displaced by 45 cm from the
centre of the field of view, (a) before arc correction and attenuation correction, and
(b) after applying these corrections. It can be seen that the distortion due to chang-
ing LOR separation is removed by the arc correction process, and a reliable image is
obtained out to a radius of at least 55 cm. In this image, only the two largest sets of
cold rods (12.7 and 11.1 mm diameter) are clearly visible. In other images measured
with the phantom at the centre of the FOV the 9.5 mm rods can also be seen.
Fig. 14.8 Forty-five centimeters off-axis Jaszczak phantom images: before and after attenuation
and arc correction
170 M.M. Ahasan, D.J. Parker
35
30
25
Arbitrary unit
20
15
10
5
0
-5
Fig. 14.9 Uniform phantom image (slice 3) and the corresponding horizontal profile across the
centre of the image
Fig. 14.10 Contrast phantom image (central slice): no-septa (left) and septa (right)
in no-septa mode. Figure 14.9 shows the image (slice 3) for a measurement in which
150 M counts were obtained in 600 s. Also shown is the horizontal profile across this
image. From the ratio of standard deviation () to the mean pixel counts () within
a circular region of interest (ROI) covering the central part of the phantom in the
reconstructed image, the coefficient of variation (CV) was calculated as 5.9%.
in each case. Contrast recovery coefficient (CRC) [15] values close to unity can
be achieved for the 2 and 4 cm hot lesions and for the 3 cm cold lesion. For both
the 1 cm hot and cold lesions the CRC is around 0.3.The 1 cm hot lesion is more
prominent than the 1 cm cold lesion.
14.4 Conclusions
After design, setup and calibration of the macroPET system, initial results of its
performance characteristics were obtained. The spatial resolution is around 11 mm,
significantly poorer than for a standard scanner, probably because of the significant
contribution from acollinearity. Without septa, the sensitivity measured for a 20 cm
diameter phantom is 900 cps/kBq/ml and the peak NEC rate is 30 kcps. The septa
used (which were not designed specifically for this scanner) reduce the scatter
fraction by around 35% at the cost of reducing the sensitivity by 55%.
The main weakness of the prototype scanner is the lack of side shielding, so
that it is sensitive to out-of-field activity. Most of the measurements reported here
were obtained with the activity confined axially to the field of view. Out of field
activity will contribute significantly to the randoms rate. Also, the scatter fraction
grows dramatically as larger objects are imaged. The septa used have only a limited
effect in blocking out-of-field activity (and can actually make the situation worse
as they reduce the true rate more than the background rate). To date all the images
in this work have been reconstructed using simple 2D FBP. In the future it may be
appropriate to use 3D image reconstruction and/or iterative techniques.
With these improvements, the results so far indicate that a geometry like
macroPET offers a practical solution for large animal imaging.
References
1. M. E. Phelps. Molecular imaging with positron emission tomography, Annu. Rev. Nucl. Part.
Sci. 52, 303–338 (2002).
2. C. Das, R. Kumar, Balakrishnan, B. Vijay, M. Chawla, and A. Malhotra. Disseminated tuber-
culosis masquerading as metastatic breast carcinoma on PET-CT, Clin. Nucl. Med. 33(5),
359–361 (2008).
3. R. Myers, S. Hume, P. Bloomfield, and T. Jones. Radio-imaging in small animals, Psychophar-
macol 13(4), 352–357 (1999).
4. A. P. Jeavons, R. A. Chandler, C. A. R. Dettmar. A 3D HIDAC-PET camera with sub-
millimetre resolutionnfor imaging small animals, IEEE Trans. Nucl. Sci. 46, 468–473
(1999).
5. J. Missimer, Z. Madi, M. Honer, C. Keller, A. Schubiger, and S. M. Ametamey. Performance
evaluation of the 16-module quad-HIDAC small animal PET camera, Phy. Med. Biol. 49(10),
2069–2081 (2004).
6. W. W. Moses, P. R. G. Virador, S. E. Derenzo, R. H. Huesman, and T. F. Budinger. Design
of a high-resolution, high-sensitivity PET camera for human brains and small animals, IEEE
Trans. Nucl. Sci. 44(4), 1487–1491 (1997).
172 M.M. Ahasan, D.J. Parker
Keywords video watermarking, wavelet, edge detection, low bitrate lossy compres-
sion
15.1 Introduction
Digital video streaming is a more and more popular service among content providers.
These streams can be displayed on various types of devices: computer, PDA etc. The
content can be easily distributed through the internet, but digital contents carry a big
security risk, i.e. the copying and reproduction of the content is very easy and effort-
less. Even users without special knowledge can easily share the downloaded content
with other people.
To avoid illegitimate access, digital content is often encrypted, and travels in
an encrypted form to the consumer. Although encryption secures the content on
the way to the consumer, during playback it must be decrypted, and this is the
point where it exposed to illegal copying. In these cases watermarks can be used to
T. Polyák (B)
Budapest University of Technology and Economics, Department of Telecommunications and
Media Informatics, Magyar tudósok krt. 2., 1117 Budapest, Hungary, BME TMIT,
E-mail: tpolyak@tmit.bme.hu
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 173
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
174 T. Polyák, G. Fehér
give some extra protection to the content, since watermarking can embed additional
information in the digital content.
Watermarks are embedded in the digital content in an imperceptible way. Water-
marks can carry some information about the content: owner of the content, metadata,
QoS parameters etc. This protection does not eliminate the need for encryption, but
is a supplemental technique to secure the content, or store additional data. These
watermarks are called robust watermarks because they must survive transformations
of the underlying content e.g. lossy compression.
Digital watermarking algorithms are generally divided into two groups: algo-
rithms that hide data in the spatial domain. It means that information is embedded
by modifying the pixel values directly [1–3], and algorithms that use a transform
domain for data hiding. Discrete cosine transform (DCT) and discrete wavelet trans-
form (DWT) are often used at transform domain watermarking [4–7]. These water-
marking techniques modify coefficients of the given transform domain. Wavelet
transform is commonly used because it has many advantages over DCT transform. It
is closer to the human visual system (HVS), instead of processing 8 8 pixel blocks
it processes the whole frame. It divides the frame into four parts with different
frequencies (LL, LH, HL and HH) and directions.
Some watermarking techniques embed data in object borders, and other percep-
tually important areas. It has several advantages: the HVS is less sensitive to changes
made in these components, and modern lossy compression algorithms leave them
relatively untouched to maintain high video quality. These properties make these
regions ideal for watermarking. The detection of suitable regions can be realized in
the uncompressed domain. Pröfrock, Schlauweg and Müller use the Normed Centre
of Gravity (NCG) algorithm to find the blocks that contain object borders or other
significant changes in the video [8]. Ellinas presents an algorithm that embeds data
in images using four level DWT and edge detection. This algorithm also modifies
the surroundings of the edges; it is accomplished using a morphological dilatation
with a structuring element of 9 9.
The proposed algorithm is designed for watermarking low resolution (CIF and
QCIF) video streams. During the design of a video watermarking algorithm com-
plexity has to be taken into count. A trade-off has to be made between complexity,
quality loss, and robustness. The algorithm uses the wavelet domain for data hiding.
It modifies the wavelet coefficients that belong to object borders. Visible artifacts
will not appear on the source video. The suitable coefficients are selected by an edge
detection algorithm. Watermark is inserted in an additive way that a spread spec-
trum pseudorandom noise is added to the luminance pane of the middle frequency
components.
Figure 15.1 shows the process of watermark embedding. First the input video signal
(Xi;j ) is transformed using a fast n-level (n D 1; 2 or 3) DWT transform, the Haar
wavelet [9].
15 Robust Wavelet-Based Video Watermarking Using Edge Detection 175
N+D R Te
Fig. 15.3 Video frame with Sobel edge detection in the middle frequency DWT coefficients
Figure 15.3 shows the frame after edge detection in the middle frequencies.
During watermark insertion the edges with value greater than a given threshold
(Te ), and the pixels around them in a given radius (R) are selected. The value of
the radius is different at the different levels of the transformed image. Radius of
two pixels is used at the first level, and radius of one pixel is used at the second
and third level considering that the higher level coefficients contain lower frequency
components, which affect the quality of the video.
Data is embedded into the 2 n selected middle frequency areas by adding a
pseudo random spread spectrum noise.
First, data is extended with a chip rate cr to improve robustness.
N R Te
The pseudorandom noise Ni is calculated from a seed, and from the u and v coor-
dinates of the selected coefficients. This seed is used as the key for the watermarking
method.
The same data are embedded into consequent frames to improve robustness.
The process of the watermark detection is also made in the wavelet domain. After
the n level DWT the middle frequency blocks get through an edge detection.
Figure 15.4 shows the block scheme of the watermark detecting process.
The embedded data are extracted by correlating the pseudorandom noise Ni with
the area containing the detected edges and their surroundings. The detection is blind.
Assuming that the value of Yu;v coefficients is almost equal,
1 X
dj D Yu;v Ni ; (15.6)
K u;v
can be used, where K is the number of suitable coefficients and dj is the extracted
value. The embedded bit is calculated the following way:
ı D 0; if d < Tw
; (15.7)
ı D 1; if d > Tw
The watermarking algorithm was tested on four video streams: “Mobile”, “Bus” and
“Flower”, which are highly detailed streams, and “Foreman”, which contains more
smooth regions. The size of all video streams is 352 288 pixels. Video quality was
evaluated using two metrics: PSNR and SSIM [11, 12].
178 T. Polyák, G. Fehér
Figure 15.5 shows the original and the watermarked version of the same video
frame. For better visibility it also contains cropped, 20 20 pixels blocks of the
two frames, which contain significant edges (situated at the top of the bus on the
right side of the frame). It can be seen that the noise caused by watermark signal is
imperceptible.
Figure 15.6 shows the PSNR and SSIM error maps between the original and the
watermarked frame. The error maps are converted to be negative for better visibility.
It can be seen that watermark is inserted around the edges. The PSNR error map
shows that numerous pixels have been altered (PSNR = 40.31 dB), but the SSIM
error map shows a better quality (SSIM = 0.9952).
Table 15.1 presents the objective quality values of the tested videos using two
level DWT.
It can be seen that the video quality depends on the content. If the content con-
tains more edges or textured components, then more noise is added to it. Although
it improves robustness of the watermark, the objective quality of the video becomes
worse.
Table 15.2 shows the objective quality values of the test sequence “Bus” after
embedding watermark into different number of DWT levels. At the first level R D 2,
at the second and the third levels R D 1 was used as radius to select the pixels to
modify.
Fig. 15.5 Original (a) and watermarked (b) video frame and blocks of 20 20 pixels from the
original (c) and watermarked (d) video frame using two level DWT
15 Robust Wavelet-Based Video Watermarking Using Edge Detection 179
Fig. 15.6 Error maps between original and watermarked frame using two-level DWT (converted
to negative for better visibility): (a) PSNR; (b) SSIM
Table 15.1 Average PSNR and SSIM values of watermarked test sequences
Videos Mobile Foreman Bus Flower
PSNR [dB] 35.29 45.17 37.37 35.27
SSIM 0.9842 0.9969 0.9905 0.9905
Table 15.2 Average PSNR and SSIM values after embedding data using different levels of DWT
DWT level 1 2 3
PSNR [dB] 40.90 37.37 35.88
SSIM 0.9967 0.9905 0.9818
As expected, the more DWT levels used the more pixels are modified, this results
worse video quality.
Robustness of the algorithm has been tested against lossy compression. The H.264/
AVC with Baseline profile was used, this codec is widely used for low bitrate video
coding.
First, test data of 128 bits were embedded in the video streams; 8 bits were
embedded at every frame using a watermarking strength (˛) of 5. Then the water-
marked videos have been compressed using the H264/AVC codec with differ-
ent bitrates. Finally, the accordance was measured between the original and the
extracted data produced by the detector.
Accordance of 100% means that every embedded bit can be correctly identified,
while 50% means that watermark signal can not been retrieved, the embedded and
the extracted data are uncorrelated.
The algorithm was tested using one-, two- and three-level DWT.
180 T. Polyák, G. Fehér
100%
95%
90%
85%
80% Erroneous bits
75% Unrecognized bits
70% Correct bits
65%
60%
55%
50%
Original 1008 kbps 720 kbps 384 kbps
Compression bitrate
Fig. 15.7 Robustness results against H.264/AVC lossy compression, 1-level DWT
Figure 15.7 shows the test results of the watermarking algorithm with one-level
DWT after H.264/AVC compression.
“Correct bits” means the number of bits that can be detected correctly.
“Erroneous bits” is the number of false bits. The embedded bit was 0 and the
detected was 1 and vice versa.
“Unrecognized bits” is the number of bits that the detecting algorithm can not
decide whether the embedded bit was 0 or 1. Results show that the watermark can
be extracted even from videos compressed using low bitrate H.264 compression.
Figure 15.8 shows the test results of the watermarking algorithm with two-level
DWT after H.264/AVC compression
“Correct bits”, “Erroneous bits” and “Unrecognized bits” mean the same.
Using two levels produces better results: even at bitrate of 384 kbps the 94% of
the embedded data can be extracted.
The number of false bits also has reduced.
Figure 15.9 shows the test results of the watermarking algorithm with three-level
DWT after H.264/AVC compression
15 Robust Wavelet-Based Video Watermarking Using Edge Detection 181
100%
95%
90%
85%
80% Erroneous bits
75% Unrecongnized bits
70% Correct bits
65%
60%
55%
50%
Original 1008 kbps 720 kbps 384 kbps
Compression bitrate
Fig. 15.8 Robustness results against H.264/AVC lossy compression, 2-level DWT
100%
95%
90%
85%
80% Erroneous bits
75% Unrecongnized bits
70% Correct bits
65%
60%
55%
50%
Original 1008 kbps 720 kbps 384 kbps
Compression bitrate
Fig. 15.9 Robustness results against H.264/AVC lossy compression, 3-level DWT
“Correct bits”, “Erroneous bits” and “Unrecognized bits” mean the same. It can
be seen that using the third level of DWT does not improve the robustness of the
algorithm.
15.5 Conclusion
frames and their surroundings. The HVS is less sensitive to modifications on middle
and high frequencies. Compression algorithms make only minor changes to these
areas. The watermark itself is a pseudo random noise, which is calculated from the
input data and a seed value. Watermark is detected by correlating the pixel values of
the selected area with a pseudo random noise.
Test results show that the embedded watermark can be imperceptible to the HVS,
and the method is resistant to the H.264/AVC lossy compression algorithms.
References
Keywords Edge detection image soft computing fuzzy logic Lukasiewicz alge-
bra operator
16.1 Introduction
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 183
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
184 N.M. Hussein, A. Barriga
The process of edge detection in an image consists of a sequence of stages. The first
stage receives the input image and applies a filter in order to eliminate noise. Then
a binary image is obtained applying a threshold in order to classify the pixels of the
image under two categories, black and white. Finally, in the last stage the detection
of edges is performed.
The filter stage allows to eliminate noise patterns. The target of the filter step
consists in eliminating all those points that do not provide any type of informa-
tion of interest. The noise corresponds to nonwished information that appears in
the image. It comes principally from the capture sensor (quantisation noise) and
from the transmission of the image (fault on transmitting the information bits).
Basically we consider two types of noise: Gaussian and impulsive (salt & pep-
pers). The Gaussian noise has its origin in differences of gains of the sensor, noise
in the digitalization, etc. The impulsive noise is characterized by some arbitrary
values of pixels that are detectable because they are very different from its neigh-
bours. A way to eliminate these types of noise is by means of a low pass filter.
This filter makes smoothed of the image replacing high and low values by average
values.
The filter used in the proposed edge detection system is based on the bounded-
sum Lukasiewicz’s operator. This operator comes from multi-valued Lukasiewicz
algebra and is defined as:
where x; y 2 Œ0; 1. The main advantage of applying this operator comes from the
simplicity of the hardware realisation as it is seen in Section 16.3.
The Lukasiewicz’s bounded sum filter performs the smoothing of the image and
is suitable for salt & peppers noise as well as Gaussian. Figure 16.1 shows the effect
of applying this type of filter.
16 High Speed Soft Computing Based Circuit for Edges Detection in Images 185
Fig. 16.1 (a) Input image with salt & peppers noise, (b) Lukasiewicz’s bounded sum filter output
The application of the filter has been made using a mask based on a 3 3 array.
For the pixel xij the weighted mask is applied to obtain the new value yij as is
shown in the following expression:
1 X X
1 1
yij D min.1; xi Ck;j Cl / (16.2)
8
kD1 lD1
The techniques based in thresholding an image allow classifying the pixels in two
categories (black and white). This transformation is made to establish a distinc-
tion between the objects of the image and the background. The way of generating
this binary image is making the comparison of the values of the pixels with a
threshold T .
0 if xi;j < T
yi;j D (16.3)
255 if xi;j > T
where xi;j is a pixel of the original image and yi;j is the pixel corresponding to
the binary image. In the case of a monochrome image in that the pixels are codified
with 8 bits the range of values that the pixels take corresponds to the range between
0 and 255. It is usual to express the above mentioned range with normalized values
between 0 and 1.
A basic technique for threshold calculation is based on the frequency of grey
level. In this case, for an image with n pixels and ni pixels with the grey level i; the
threshold T is calculated by means of the following expression:
P
255 P
255
T D pi i with pi D ni =n and pi D 1 (16.4)
i D1 i D1
186 N.M. Hussein, A. Barriga
where pi represents the grey level frequency (also known as the probability of the
grey level).
In 1965 Zadeh [6] proposed the fuzzy logic as a reasoning mechanism that uses
linguistic terms. The fuzzy logic is based on the fuzzy set theory in which an element
can belong to several sets with different membership degree. This is opposed with
the classic set theory in which an element belongs or not to a certain set. Thus a
fuzzy set A is defined as
where x is an object of the set of objects X and .x/ is the membership degree of
the element x to the set A. In the classic set theory .x/ takes values 0 or 1 whereas
in the fuzzy set theory .x/ belongs to the range of values between 0 and 1.
An advantageous aspect of applying fuzzy logic in the calculation of the thresh-
old T is that the calculation mechanism improves the processing time since it only
requires processing the image once and allows calculating in a direct way the value
of the threshold.
The fuzzy system has an input that receives the pixel that is going to be evalu-
ated and an output that corresponds to the result of the fuzzy inference. Once read
the image the output shows the value of threshold T . Basically the operation that
makes the fuzzy system corresponds to the calculation of the centre of gravity of the
histogram of the image with the following expression:
,M R
XM X R XX
T D ˛ij cij ˛ij (16.6)
i D1 j D1 i D1 j D1
where T is the threshold, M is the number of pixels of the image, R is the number
of rules of the fuzzy system, c is the consequent of each rule and ˛ is the activation
degree of the rule.
The knowledge base of the fuzzy system contains both the membership functions
and the rule base. It is made a partition of the universe of discourse of the histogram
in a set of N equally distributed membership functions. Figure 16.2a shows a par-
tition example for N D 9. Triangular membership functions have been used since
they are easier for hardware implementation. These functions have an overlapping
degree of 2 what allows to limit the number of active rules. The membership func-
tions of the consequent are singletons equally distributed in the universe of discourse
of the histogram. The use of singleton-type membership functions for the conse-
quent allows applying simplified defuzzification methods such as the Fuzzy Mean.
This defuzzification method can be interpreted as one in which each rule proposes
a conclusion with a “strength” defined by its grade of activation. The overall action
of several rules is obtained by calculating the average of the different conclusions
weighted by their grades of activation. These characteristics of processing based on
active rules and simplified defuzzification method allows a low cost and high speed
hardware implementation.
16 High Speed Soft Computing Based Circuit for Edges Detection in Images 187
L1 C1
L2 L3 L4 L5 L6 L7 L8 L9 C2 C3 C4 C5 C6 C7 C8 C9
0 a) 255 0 b) 255
if x is L1 then c is C1;
if x is L6 then c is C6;
if x is L2 then c is C2;
if x is L7 then c is C7;
if x is L3 then c is C3;
if x is L8 then c is C8;
if x is L4 then c is C4;
if x is L9 then c is C9;
if x is L5 then c is C5;
c)
Fig. 16.2 Membership functions for N = 9, (a) antecedent, (b) consequent, (c) rulebase
The rule base of the system of Fig. 16.2c used the membership functions pre-
viously defined. The knowledge base (membership functions and the rule base) is
common for any images, for that reason the values can store in a ROM memory.
It is possible to optimize the expression shown in Eq. (16.6) if the system is
normalized. In this case the sum extending to the rule base of the grades of activation
P
of the consequent takes value 1 . R j D1 ˛ij D 1/. Then Eq. (16.6) transforms in:
1 XX
M R
T D ˛ij cij (16.7)
M
i D1 j D1
For each pixel the system makes the inference in agreement with the rule base of
Fig. 16.2c. The output of the system accumulates the result corresponding to the
numerator of Eq. (16.7). The final output is generated with the last pixel of the
image.
The input image for the edge detection is a binary image in which pixels take value
black (0) or white (255). In this case the edges appear when a change between black
and white takes place between two consecutive pixels. The edge generation consists
of determining if each pixel has neighbours with different values. Since the image is
binary every pixel is codified with a bit (black = 0 and white = 1). This operation of
edge detection is obtained calculating the xor logic operation between neighbouring
pixels using a 3 3 mask. Using the 3 3 mask it is possible to refine the edge
generation detecting the orientation of the edges. Figure 16.3c shows an example
of applying the xor operator on the binary image. Figure 16.4 shows the obtained
results when making the edge detection on a set of test images.
188 N.M. Hussein, A. Barriga
Fig. 16.3 (a) Lena‘s image, (b) binary image, (c) edge detection
The circuit that realizes the edge detection has been implemented on a low cost
FPGA device of the Xilinx Spartan3 family. This circuit receives the image from a
vision sensor based on a CMOS chip that gives a CIF resolution .352 288/. The
image is stored in a double port RAM memory. The data memory width is 32 bits.
This allows to read two words simultaneously.
Figure 16.5 shows the block diagram of the system [5]. The image is captured by
the camera. The camera provides an 8 bits pixel in each clock cycle. Then the pixel is
stored in the RAM memory under the control of the memory control circuit. During
the reading of the image the threshold is calculated since the threshold generation
circuit receives the pixel coming from the camera. As soon as the image is stored
in the memory the threshold generation circuit produce the threshold value of the
image.
Next, the edge detection circuit initiates its operation reading from the memory
eight pixels in each clock cycle (2 words of 32 bits). This way the edge detection
circuit is able to provide four output data in parallel that are stored in the external
memory. Each data corresponds to a pixel of the edge image. This image is binary
16 High Speed Soft Computing Based Circuit for Edges Detection in Images 189
Double port
Camera
RAM
8 32 32
Memory control
8
4 32
reason why only a bit is needed to represent the value of the pixel (0 if edge or 1 if
background). The new image of the edges is stored in the above mentioned memory.
In order to implement the fuzzy system circuit that generates the threshold the
design methodology is based on the hardware description language VHDL as a
way to describe and model the system at high level [7]. To achieve a behavioural
modelling of the fuzzy inference module a VHDL description style will be used. In
this description the system structure (fuzzy sets, rule base) and the operator descrip-
tion (connectives, fuzzy operations) are defined separately. This makes it possible
to describe independently both the fuzzy system structure and the processing algo-
rithm. The fuzzy system description must be synthesizable in order to generate the
hardware realizations.
Figure 16.6 shows the VHDL architecture of the fuzzy system. The rule base is
described in the architecture body. It contains a rule base structure with the nine
rules of Fig. 16.2. Each rule can be divided into two components: the antecedent of
the rule and the consequent. The antecedent is an expression of the input variables
related to their linguistic values. The consequent sets the linguistic value of the rule
output.
The processing mechanisms of the fuzzy operation is (=) and the inference then
(rule(,)) are not defined in the VHDL description. Only the structure of the rulebase
is defined. Such a description is a high level description because it does not assume
any specific implementation criteria. It only describes the knowledge base in terms
of a behavioral rule base.
Linguistic labels represent a range of values within the universe of discourse of
input and output variables. These labels can be described by functions in order to
compute the membership degree of a certain input value. Membership functions
190 N.M. Hussein, A. Barriga
The edge detection algorithm basically is constituted by three stages. In the first
stage the Lukasiewicz bounded-sum is performed. After the filter stage a threshold-
ing step is applied. This gives rise to a black and white monochrome image. In the
third stage the edges of the image are obtained. For it the final value of each pixel is
16 High Speed Soft Computing Based Circuit for Edges Detection in Images 191
R1 R2 R3
R4 R5 R6
UF1
UF2
UF3
Unidad
pixel 7 pixel 8 pixel 9 Funcional R13 R14 R15
(UF)
UF4
UF5
UF6
Fig. 16.7 Block diagram of the (a) 3 3 architecture and (b) 8 3 architecture
evaluated. Only those pixels that are around the target pixel are of interest (a 3 3
mask). By this reason, if in the surroundings of a pixel the value is the same (all
white or all black) then it means no edge and the output value associates the above
mentioned pixel with the background of the image. When it is detected that some
value of the surroundings of the pixel changes indicates that the pixel at issue is in
an edge, reason why the black value is assigned to it.
Figure 16.7a shows the basic processing scheme to calculate the value of one
pixel. Pixels 1–9 correspond to the 3 3 mask register that moves through the
image. The Functional Unit (FU) makes the processing of the data stored in the
mask registers.
In order to improve the image processing time the mask was spread to an 8 3
matrix as is shown in Fig. 16.7b. Each Functional Unit (FU) operates on a 3 3
mask in agreement with the scheme of Fig. 16.7a. The data are stored in the input
registers .R3; R6; R9; : : :/ and they move in each clock cycle to their interconnected
neighbours registers. In the third clock cycle the mask registers contain the data of
the corresponding image pixels. Then the functional units operate with the mask
data and generate the outputs. In each clock cycle the mask advances a column in
the image. Pixels enter by the right and shift from one stage to another outgoing at
the left side. It is a systolic architecture that follows a linear topology and allows
processing several pixels in parallel.
The system receives two input data of 32 bits (these mean eight pixels). These
data come from a double port memory that stores the image. The circuit also receives
as input data the threshold (T) that has been calculated previously. The circuit
192 N.M. Hussein, A. Barriga
DinR1
DinR2
DinR3
DinR4
DinR5
DinR6
DinR7
DinR8
DinR9
1
BW4
BW5
BW6
BW7
BW8
BW9
1
8
8 Dout
Lukasiewicz 0
“0000”DinR1(7:3)
9
Filter ...
“0000”DinR9(7:3)
+ carry
Thresholding b)
Logic
BW3
1 1
Dout
Pipeline
Register 1 8
BW2 Din 0 0
Pipeline
8 <
Th
Register 2
BW1
c)
Ege
Detection BW1-BW9 Dout
Dout
a) d)
Fig. 16.8 (a) Functional Unit (FU) circuit schematic, (b) Lukasiewicz filter, (c) thresholding logic
circuit, (d) edge detection circuit
generates as output the 4 bits corresponding to the output values of the processed
pixels stored in R5, R8, R11 and R14.
The functional unit operates on the 3 3 mask and generates the output value
corresponding to the evaluated element of the mask. A block diagram of a functional
unit is shown in Fig. 16.8a. The circuit consists of two pipeline stages so that the data
has a latency of two clock cycles. The first stage is the image filter. Then threshold
T is applied. The edge detector, in the output stage, operates on the binary mask
(black and white image).
Figure 16.8 shows the circuits corresponding to the different blocks of the func-
tional unit (FU). As we can observe in Fig. 16.8b the filter based on Lukasiewicz’s
bounded sum receives the data stored in registers R1 to R9. These data are scaled
by the factor 0.125 entailing division by 8, which signifies a displacement of three
places to the left. The sum of the data is compared (using the carry as control signal)
with value 1. The threshold logic (Fig.16.8c) compares the pixel with the threshold
value. The output is a binary image (black and white) and only therefore requires
one bit. Finally, the edge detection circuit receives a 3 3 binary image. It carries
out the xor operation of the bits. If all the bits of the mask are equal the output pixel
is in the background, whereas if some bit is different the output is an edge pixel.
Figure 16.9 shows the chronogram of the circuit. It can be observed that the
operation of the system begins with the falling edge of signal CS. Whenever a row
begins two clock cycles are needed to initialize the mask registers. In the third clock
16 High Speed Soft Computing Based Circuit for Edges Detection in Images 193
cycle Dvalid signal take value 1, indicating a valid output. Input data are provided
in each clock cycle. Once Dvalid has been activated the output data in the following
cycles is also valid since Dvalid = 1 (data of the following columns being processed
in successive cycles).
The circuit occupies an area of 293 slices on an FPGA of the Spartan3 Xilinx
family that supposes a cost of 5,361 equivalent gates. It can operate at a 77 MHz
frequency, although the development board used has a 50 MHz clock. This supposes
that the required time to process an image is 0.335 ms what allows to process 2,985
images per second.
16.4 Conclusion
The hardware implementation of an edge extraction system has been described. The
edge detection algorithm is based on soft computing technique. Thus the calculation
of the threshold that allows to obtain a binary image is realized by means of a fuzzy
logic inference mechanism. The conditioning of the image for its later processing is
based on a filter that uses Lukasiewicz’s bounded sum. The main advantage of this
edge detection technique is that it allows a very efficient hardware implementation
in terms of cost and speed. This makes it specially indicated in applications that
require a real time processing.
References
Abstract In this chapter, we will only concern our selves with the two most widely
used algorithms in SPECT reconstruction, namely the FBP and Ordered-Subsets
Expectation Maximization (OSEM) algorithms. This chapter describes the basic
principles of these two algorithms, and summarises SPECT image attenuation and
scatter correction methods. Finally, it presents a study evaluating the volumetric iter-
ative reconstruction algorithm OSEM 3-D, and compares its performance with the
FBP algorithm.
17.1 Introduction
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 195
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
196 K.S. Alzimami et al.
17.2 Background
photons that are detected in the out-of-slice projection pixels to be accounted for [9].
Although the computational burden of full 3-D iterative reconstruction is somewhat
high, the accuracy is significantly improved and there are noteworthy improvements
of signal-to-noise ratio (SNR) in comparison to 2D reconstruction [10]. A detailed
description of the 3-D reconstruction problem is given in [9, 11, 12].
In general, image reconstruction in SPECT can be divided into two main
approaches. The first approach is based on direct analytical methods, while the
second is based on algebraic and statistical criteria and iterative algorithms.
Filtered back projection (FBP) is the most commonly used reconstruction algorithm
in SPECT imaging. This is principally due to the simple concept of the algorithm
and relatively quick processing time. FBP can be viewed as a two-step process:
filtering of the data and back projection of the filtered projection data. There is
extensive literature on these reconstruction algorithms and reviews of FBP and its
applications to SPECT [1, 4, 13, 14].
17.2.1.1 Filtering
3-D SPECT images are created by back projecting planar 2-D views of the object
into image space along the line of response [15]. This process will result in blurred
images and reconstruction star like artefacts. To overcome this problem, a ramp
filter is applied to the projection data, which in turn increases the magnitude of
the high frequency components of the image including statistical noise. Further fil-
tration is therefore required to temper this effect. This can be done by applying
an image smoothing kernel, but because convolution is a computationally inten-
sive task, the data is first transferred to the frequency domain and projection data
filtration is performed in the frequency (Fourier) domain [1, 2].
There are basically three types of reconstruction filters, these can be categorised
as low pass, high pass and band pass filters. While high pass filters allow high fre-
quency information including image noise to pass, these usually referred to as image
enhancement filters. Band-pass filter allows only data with frequencies within the
pass band to be retained, while suppressing or eliminating all other frequencies
[15, 16].
Reconstruction filters can be described by two important parameters: the “cut-
off frequency” and “order or power” of the filter function. The cut-off frequency
determines the filter rolling off to an infinitely low gain whereas the shape of the
curve is determined by the order of the filter function. The location of the cut-off
frequency determines how the filter will affect both image noise and resolution.
198 K.S. Alzimami et al.
The main reconstruction step involves the backprojection of acquired data into the
image domain. For simplicity, the position of photon detection within the gamma
camera head is assumed to be perpendicular the location photon emission. So by
smearing back the number of counts at each point in projection profiles into the
image space, a 3-D image of radioactive tracer distribution can be built. Regions
with higher concentration of back projected lines (ray sums) and greater number of
counts form a reconstructed image of radiotracer concentration within the object.
This process is done in spatial (image) domain unlike image filtration, which is
normally carried out in the frequency domain [1, 13, 14].
The ML-EM algorithm has proven to be effective, but also to be very slow for daily
use. Splitting up the measured datasets into different subsets and using only one
17 A Comparison Between 3D OSEM and FBP Image Reconstruction 199
subset for each iteration speeds up the algorithm by a factor equal to the number
of subsets. The method is known as Ordered Subsets Expectation Maximization
(OSEM) [6].
In SPECT, the sequential processing of ordered subsets is natural, as projection
data is collected separately for each projection angle. In other words, counts on
single projections can form successive subsets. With OSEM, the careful selection
of subset ordering can greatly accelerate convergence. However, the lack of con-
vergence of OSEM algorithms is of theoretical importance since in practice, the
algorithm is terminated after only a few iterations [20].
In recent years many attempts have been made to improve the quality of the
3D OSEM algorithms by incorporating corrections for the major image-degrading
factors in the projection and backprojection operations of the iterative steps. Specif-
ically, several commercial algorithms (such as Flash 3-D by Siemens [21], Astonish
3-D by Philips [22], HOSEM by HERMES [23] and Wide-Beam Reconstruction
by UltraSPECT [24]) have been developed to improve the SNR of the image by
modelling Poisson noise that results from low counts as well as implementing reso-
lution recovery algorithms to restore resolution degradation due to collimator spread
function and source detector distance in the case of WBR.
Several physical factors, the most important being photon attenuation, Comp-
ton scatter and spatially varying collimator response, degrade the qualitative and
quantitative accuracy of SPECT images. Theoretically, the number of counts in
reconstructed SPECT images should be directly proportional to the absolute con-
centration and distribution of the radiotracer within the imaged object. It has been
reported that the presence of scatter and attenuation in the images limits the accu-
racy of quantification of activity. The uncertainty could be as high as 50–100% if
the scatter and attenuation are not corrected [25].
Attenuation causes inconsistent projection data that can increase or decrease counts
in the image, especially near hot regions. It also causes shadows or regions of dimin-
ished activity, within the patient. Thus, attenuation may potentially affect tumour
detectablility or artificially enhance a noise blob [26].
AC methods can be classified as: (a) constant attenuation coefficient (), known
as the Chang method [27]; or (b) variable , or transmission source method. The
most widely used AC method is the Chang method, but it can only be used in the
brain or abdomen regions as they can be considered essentially uniform. Obviously,
the Chang method will only work well where is, in fact, approximately constant.
200 K.S. Alzimami et al.
However, this is not the case when the attenuating medium is non-uniform (e.g.
cardiac studies). Therefore, AC requires a variable attenuation coefficient dependent
on the spatial location of the pixel in the object (i.e. patient or phantom) [28].
Scattered photons limit resolution and act as a modifier to the effective attenuation
coefficients in a depth dependent way. Moreover, the presence of scattered photons
in SPECT projection data leads to reduced contrast and loss of the accuracy of
quantification in reconstructed images [25].
In order to compensate for the effects of scatter, several scatter compensation
techniques have been developed. These techniques can be categorised into two
main groups; subtraction based and reconstruction based scatter correction. In the
subtraction-based approach the scatter component is estimated and subtracted from
the projection data prior to reconstruction. Typically the scatter estimate is produced
from multiple energy window acquisition. Kadrmas et al. have illustrated the advan-
tages of using multiple energy windows when imaging isotopes that have multiple
emission peaks [29].
In the reconstruction-based approach, the scatter is incorporated in the algorithm
model. The main advantage of this approach is that the noise increase associated
with the scatter subtraction method is avoided. This is because there is no explicit
subtraction of scatter counts. A number of studies have shown that iterative recon-
struction with accurate modelling of scatter is superior to pre-reconstruction scatter
subtraction [25]. However, the major setback to model based reconstruction is that
it is not obvious how the scatter originating from radio-nuclides in object regions
out of the field of view (FOV) sources is compensated [10].
process. This modelling may be made in two dimensions, by ignoring the interslice
blur. However, for best results calculations should be done in 3-D [30, 31].
Both deconvolution and modelling methods improve contrast of the recon-
structed images. However, the main drawback of deconvolution methods is that
they may also substantially increase high frequency noise. To compensate for this
consequence one needs to apply a low pass filter, which, in turn, degrades image
resolution. The modelling method on the other hand has a regularising effect on the
reconstructed image. Compared to reconstruction without blurring compensation,
high frequency noise is reduced, while contrast and resolution are improved [30,31].
17.3.1 Methods
The performance of 3-D OSEM, with and without AC, was characterized with
respect to subset and iteration number for a constant pixel size and the same amount
of post reconstruction filtering. Due to space constraints, more details about methods
and data analysis are given in [32].
As shown in Fig. 17.1, for the same number of subsets, there is a linear relation-
ship between noise termed as variation coefficient and the number of iterations. The
graph also shows a linear relationship between noise and number of subsets for the
same number of iterations.
This linear relationship (R2 1) leads to a predictable and accurate characteri-
zation of noise. Also, Fig. 17.1 shows that the effect of subset and iteration number
is additive over noise, in accordance with OSEM theory [33].
As shown in Fig. 17.2, the number of iterations needed to reach a constant
FWHM was approximately 17 for the periphery and 22 for the centre of the FOV.
These results agree with those reported by Yokoi et al. and Kohli et al. [34, 35].
Kohli et al. concluded that the reason for the slow convergence in the central region
was the effect of attenuation [35]. However this is not right as simulations by Yokoi
et al. assumed no attenuating medium for a point source [34]. According to Yokoi
et al. the reason of the slow convergence is that the SPECT resolution is originally
better at the periphery than at the centre [34]. Another reasonable explanation of the
slow convergence could be due to the fact that many more ray sums contribute to
the formation of objects located at the centre of the image than in the periphery [1].
202 K.S. Alzimami et al.
24
23 1 4 8 16 32
22 Subsets Subsets Subsets Subsets Subsets
21 R2 = 0.9146
20
Noise
19
18
17
16
15
24
12
24
8
12
1
1
24
NO. of Iteration
Fig. 17.1 Relationship between noise and number of iteration at different number of subsets
26
FWHM(mm)
21
16
11
6
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
No of Iterations
Fig. 17.2 Variation of the spatial resolution (FVHM) with the number of iterations for a range of
subsets at the centre (C) and the periphery (P)
13
12
11
10
9
8
0.5/4 0.6/8 0.7/12 0.8/25 0.9/30
Cut-off Frequency/Iteration NO
Fig. 17.3 Spatial resolution versus number of iteration in 3D OSEM with eight subsets and cut-off
frequency in FBP with Wiener filter with order of 20
20
18
16
14
12
10
0 0.4/1 0.5/4 0.6/8 0.7/12 0.8/24
Cutoff Frequency/Iteration NO.
Fig. 17.4 Estimated noise versus number of iteration in 3D OSEM and cut-off frequency in FBP
with Metz filter with order of 15 for 10 s per projection angle
10
8
6
4
2
0
0.4 / 1 0.5 / 4 0.6 / 8 0.7 / 12 0.8 / 25
Cut-off Frequency / Iteration NO.
Fig. 17.5 Measured contrast with and without AC versus number of iteration in 3D OSEM and
cut-off frequency in FBP with Butterworth filter with order of 15
204 K.S. Alzimami et al.
Fig. 17.6 Reconstructed images of bone scan (above): (a) using FBP – Wiener filter (Cut-off:
0.7/Order: 15); and (b) using 3D OSEM (Subsets: 8/Iterations: 16) and reconstructed images of
adrenal scan (bottom); (c) using FBP – Metz filter (Cut-off: 0.7/Order: 15); and (d) using 3D
OSEM (Subsets: 8/Iterations: 12)
17.4 Summary
We have described the basic principles of the two most commonly used reconstruc-
tion algorithms in SPECT imaging: FBP and OSEM, and presented a brief review
of current compensation methods for the major image-degrading effects. Iterative
image reconstruction methods allow the incorporation of more accurate imaging
models rather than the simplistic Radon model assumed in the FBP algorithm.
These include scatter and attenuation corrections as well as collimator and distance
response and more realistic statistical noise models. In our study we have demon-
strated the superior performance of 3-D OSEM compared to FBP particularly for
low count statistics studies, including improved image contrast and spatial resolu-
tion. Using 3-D OSEM with suitable AC may improve lesion detectability due to
the significant improvement of image contrast. Indeed, 3-D iterative reconstruction
algorithms are likely to replace the FBP technique for most SPECT and PET clinical
17 A Comparison Between 3D OSEM and FBP Image Reconstruction 205
applications. However, for the full potential of these methods to be fully realised,
more exact image compensation methods need to be developed and optimal image
reconstruction parameters need to be used. The full impact of these methods on
quantitative SPECT imaging is yet to be assessed. Finally, with development of new
faster and more accurate 3-D and possibly 4D iterative algorithms, the future of
SPECT image reconstruction is certain to be based on iterative techniques rather
than any analytical methods.
References
20. R. L. Byrne, Editorial recent developments in iterative image reconstruction for PET and
SPECT, IEEE Transactions on Medical Imaging 19(4), 257–260 (2000).
21. Flash 3D and e-soft, 2008, Siemens Medical Solutions (September 4, 2008); http://www.
medical.siemens.com/siemens
22. Astonish 3-D, 2008, Philips (September 4, 2008); http://www.medical.philips.com
23. HOSEM, 2008, HERMES (September 4, 2008); http://www.hermesmedical.com
24. WBR, 2008, UltraSPECT (September 4, 2008); http://www.ultraspect.com
25. H. Zaidi, Scatter modeling and compensation in emission totmography. European Journal of
Nuclear Medicine and Molecular Imaging 31(5), 761–782 (2004).
26. T. Kauppinen, M. O. Koskinen, S. Alenius, E. Vanninen and J. T. Kuikka, Improvement of
brain perfusion SPET using iterative reconstruction with scatter and non-uniform attenuation
correction, European Journal of Nuclear Medicine 27(9), 1380–1386 (2000).
27. L. T. Chang, A method for attenuation correction in radionuclide computed tomography, IEEE
Transactions on Nuclear Science NS-25, 638–643 (1978).
28. M. G. Erwin, SPECT in the year 2000: basic principles, Journal of Nuclear Medicine 28(4),
233–244 (2000).
29. D. J. Kadrmas, E. C. Frey, and B. M. W. Tsui, Application of reconstruction-based scatter
compensation to thallium-201 SPECT: implementations for reduced reconstructed image
noise. IEEE Transactions on Medical Imaging 17(3), 325–333 (1998).
30. B. F. Hutton and Y. H. Lau, Application of distance-dependent resolution compensation and
post-reconstruction filtering for myocardial SPECT, Physics in Medicine and Biology 43,
1679–1693 (1998).
31. P. Gantet, P. Payoux, P. Celler, P. Majorel, D. Gourion, D. Noll and J. P. Esquerré, Iterative
three-dimensional expectation maximization restoration of single photon emission computed
tomography images: application in striatal imaging. Medical Physics 33(1), 52–59 (2006).
32. 32. K. Alzimami, S. Sassi, and N. M. Spyrou, Optimization and comparison of 3D-OSEM with
FBP SPECT imaging, The 2008 International Conference of Signal and Image Engineering
Proceeding I, London July 2008, 632–636.
33. M. Brambilla, B. Cannillo, M. Dominietto, L. Leva, C. Secco and E. Inglese, Characterization
of ordered-subsets expectation maximization with 3D post-reconstruction Gauss filtering and
comparison with filtered backprojection in 99m Tc SPECT, Annals of Nuclear Medicine 19(2),
75–82 (2005).
34. T. Yokoi, H. Shinohara and H. Onishi, Performance evaluation of OSEM reconstruction algo-
rithm incorporating three-dimensional distance-dependent resolution compensation for brain
SPECT: A simulation study, Annals of Nuclear Medicine 16(1), 11–18 (2002).
35. V. Kohli, M. King, S. J. Glick, T. S. Pan, Comparison of frequency-distance relationship and
Gaussian-diffusionbased method of compensation for distance-dependent spatial resolution in
SPECT imaging, Physics in Medicine and Biology 43, 1025–1037 (1998).
Chapter 18
Performance Improvement of Wireless MAC
Using Non-Cooperative Games
Abstract This chapter illustrates the use of non-cooperative games to optimize the
performance of protocols for contention-prone shared medium access in wireless
networks. Specifically, it proposes and analyses a set of new utility functions con-
forming to a recently proposed generic game-theoretic model of contention control
in wireless networks [5]. The functions ensure that the game has a unique non-trivial
Nash equilibrium. Simulations carried out for IEEE 802.11 style networks indicate
that the aggregate throughput of the network at the Medium Access Control (MAC)
layer has improved while the collision overhead is reduced.
Keywords Wireless networks medium access control game theory nash equilib-
rium throughput
18.1 Introduction
In most wireless networks, including ad-hoc and sensor networks, access to the
shared wireless medium is random-access based. In absence of a central regulat-
ing authority, it is only natural that multiple wireless nodes will attempt to access
the medium simultaneously, resulting in packet collisions. IEEE 802.11 (here-
after briefly called 802.11) is the most commonly followed standard in wireless
networks. It defines a distributed mechanism called the distributed coordination
function (DCF) to resolve contention among the contending nodes. It involves chan-
nel sensing to assess the state of the shared medium and adjusting the channel access
probability accordingly to minimize chances of collision. In 802.11, the channel
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 207
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
208 D.K. Sanyal et al.
18.2 Background
Game theory [9] provides a tool to study situations in which there is a set of rational
decision makers who take specific actions that have mutual, possibly conflicting,
consequences. A game models an interaction among parties (called the players)
who are rational decision makers and have possibly conflicting objectives. Each
player has a set of actions called its strategies and with each strategy is associated a
payoff function. Rationality demands each player maximize its own payoff function.
Non-cooperative games are commonly used to model problems in medium access
control in telecommunication networks. In such a game, the solution concept is a
notion called a stable set or Nash equilibrium that identifies a set of strategies for
all the participating players, from which no player has an incentive to unilaterally
deviate as any unilateral deviation will not result in an increase in payoff of the
deviating player.
In [12] the authors model the nodes in an Aloha network as selfish players in
a non-cooperative Aloha game. They assume that the number of backlogged users
is always globally known. This limitation is addressed in [13] where only the total
number of users is known. More recent research includes modeling the DCF in
802.11 in a game-theoretic framework [11], and reverse engineering exponential
backoff as a strategy [10] of non-cooperative games.
In contradistinction to these works, we do not explicitly reverse-engineer 802.11
or DCF into a game model but use ideas from non-cooperative game theory to
optimize the performance of 802.11. We consider each node in a wireless net-
work as having a set of strategies which are its channel access probabilities. Unlike
common practices in optimization of 802.11 like [4], we do not assume that each
node knows the number of users in the network. This is more reflective of prac-
tical situations where such knowledge is difficult to acquire. Non game-theoretic
approaches that do not depend on the nodes’ knowing the network size include the
ones presented in [2] and [3]. We use selfishness to achieve optimized system-wide
performance. In [5, 6, 8] the authors use game theory to study and optimize 802.11.
Although they remain our motivation, their illustrative utility functions are differ-
ent from ours. Their utility functions are well-behaved in a much more restricted
domain than ours. Moreover, as we show, our utility functions give far better results
for large network sizes. Supermodular games for contention control are presented
in [7].
The system we consider is a network of N nodes that are all able to hear one another.
To facilitate analysis we will adopt a description of our access mechanism in terms
of channel access probabilities. It has been shown in [1] that in a saturated regime
the constant channel access probability p relates to the corresponding contention
210 D.K. Sanyal et al.
window cw according to
2
pD (18.1)
cw C 1
Now we turn to the game model. We define the game G as a 3-tuple G D< N ,
fSi gi 2N ; fui gi 2N >, where N is a set of players (wireless nodes), player i 2 N ,
each player having a strategy set Si D fpi jpi 2 Œvi ; wi g with 0 < vi < wi <
1 and payoff ui . The strategy set of each player is the set of its channel access
probabilities. Note that it is a generalization to a continuous space of the simpler
strategy set fwait, transmitg denoting the two deterministic actions that a node can
perform. Denote the strategy profile of all nodes by p D .p1 ; p2 ; : : : ; pjN j /. The
payoff is naturally of the form: ui D Ui .pi / pi qi .p/. Here Ui .pi / is the utility
function denoting the gain from channel access and qi .p/ is the conditional channel
probabilty given by Y
qi .p/ D 1 .1 pj / (18.2)
j 2N fi g
Since
0 ln.wi =pi /
Ui .pi / D (18.6)
ln ri
is one-to-one in Si , the inverse function
0 qi
.Ui /1 .qi / D wi ri
0
exists, which is continuous and decreasing in qi . Now observe, .Ui /1 .0/ D wi and
0
.Ui /1 .1/ D vi . Thus, fi .p/ maps any p 2 S1 S2 SjN j into a point pi 2
Œvi ; wi . Hence the vector function f .p/ D .f1 .p/; f2 .p/; : : : ; fjN j .p// maps the set
S1 S2 SjN j into itself. Hence, by Brouwer’s fixed point theorem, there is a
fixed point of f .p/ in i 2N Si , that is, fi .p / D pi for each i 2 N . It immediately
follows from the definition of fi .p/ in (18.5) above that at this fixed point, the
condition (18.4) for non-trivial NE holds. In other words, G has a non-trivial NE.
The existence of non-trivial NE signifies that the channel access probability of
any node in NE is not at the boundary of strategy space. This reduces unfair sharing
of the wireless channel among the contending nodes.
wi
Theorem 18.3. The NE is unique for all pi in Œvi ; wi for all i 2 N if vi > .
e 1=wi 1
0 1 1 pi
'i .pi / D . 1 ln. //
ln ri pi vi
pi 0
If vi > 1 1 , then, ln vi C . p1i 1/ > ln pi , which means 'i .pi / > 0. Since
e pi
pi wi 0
is strictly increasing in pi , if vi > , we have 'i .pi / > 0. For the
1
pi 1 e 1=wi 1
e
rest of the proof, assume that the above relation between vi and wi holds. Recollect
from condition (18.4) that at non-trivial NE,
0
Y
Ui .pi / D qi .p / D 1 .1 pj /
j 2N fi g
making 'i .pi / identical for all nodes i . Now assume for the sake of contradiction,
that there exist at least two distinct non-trivial NE x and y of the game G. Let
212 D.K. Sanyal et al.
'i .xi / D 1 and 'i .yi / D 2 for some node i . Since 'i is strictly increasing,
1 ¤ 2 . Assume without loss in generality, 1 > 2 . Then xi > yi for all
0
i 2 N . From definition of qi .p/, since qi .p/ is increasing in p, Ui .xi / D qi .x / >
0 00 0
qi .y / D Ui .yi /. But Ui .pi / D 1=.pi ln ri / showing that Ui .pi / is decreasing,
thus a contradiction. Hence x D y proving the uniqueness of the NE.
Theorem 18.4. Suppose all the players in G have identical strategy sets. Then at
the unique non-trivial NE of G, all the players choose the same strategies.
Proof. Since all players have the same strategy sets, for any two players i and j ,
wi D wj and vi D vj . Now suppose that the unique non-trivial NE is p . Let
there exist pi and pj in p corresponding to nodes i and j such that pi ¤ pj .
Denote the set of strategies in some order of all nodes except i and j as pij so
that the NE can be described as an ordered set .pij ; pi ; pj /. By symmetry, every
permutation of the NE must be an NE. Interchange the strategies of nodes i and
j to get a new NE .pij ; pj ; pi /. But this contradicts the uniqueness of the NE,
violating Theorem 18.3. Hence, pi D pj . i and j being arbitrary, it follows that
pi D pj for all i; j 2 N .
This has a very important and useful consequence: the wireless channel is
accessed by the nodes with identical access probabilities at the equilibrium point
thus resulting in fair sharing of the resource among the contenders, and consequently
short-term fairness.
The non-cooperative game G is played repeatedly by the nodes, that is, it is a multi-
stage game. A stage may be a single transmission or a sequence of K transmissions
for a fixed K > 1.
Suppose that each node after playing a round observes the cumulative effect (in
the sense of conditional collisional probability) of the actions of all players in the
previous round. This knowledge may now be used by the node to select its strategy
for the next round. Based on how this knowledge is used, there are many common
techniques to push the system to its Nash equilibrium. Two more popular ones are:
Best response: This is the most obvious mechanism where at each stage every
player chooses the strategy that maximizes its payoff given the actions of all
other players in the previous round:
for each node i 2 N where step size fi .:/ > 0 can be a function of the strategy
of player i . It has an intuitively appealing interpretation: if at any stage of the
0
game, the marginal utility Ui .pi .t// exceeds the “price of contention” qi .p.t//,
the persistence probability is increased, but if the price is greater, the persistence
probability is reduced.
To eliminate the need to know the current strategies of all other players, a
node needs to estimate qi .p/ in an indirect way. A simple method is to track the
number of packet transmissions (nt) and the number of packet losses (nl) over
a period of time and calculate qi D nl=nt. A more sophisticated approach [3]
involves observing the number of idle slots which follows a geometric distri-
Q I D p =.1 p id le / where p id le D probability of a slot
id le
bution with mean
being idle = i 2N .1 pi /. Thus by observing I , qi can be estimated as qi D
1 .I =.I C 1//=.1 pi / allowing a completely distributed update mechanism.
See [5] for a thorough discussion of the convergence of these procedures.
pisuc p t rans }
Si D
p id le C p t rans pisuc C p col col
is the slot time, suc is the average duration of a slot in which there is a successful
transmission occurs and col is the average duration of a slot in which a collision
occurs.
In our experiments, the slot time is 20 μs, SIFS = 10 μs, DIFS = 20 μs, basic
rate = 1 Mbps, data rate = 1 Mbps, data rate = 11 Mbps, propagation delay = 1 μs,
214 D.K. Sanyal et al.
PHY header = 24 bytes, MAC header = 34 bytes, ACK = 14 bytes and the packet
payload is fixed at 1,500 bytes. We assume all nodes use the same parameters for
the utility function. To compare with DCF, we set bounds of the contention window
to powers of 2 (specifically 32 and 1,024) and derive the bounds of the strategy sets
using (18.1). This gives wi D 2=33 D 0:06061 and vi D 2=1025 D 0:00195. Note
wi
that in this range, the condition of Theorem 18.3 is satisfied since vi > 1=w i 1
e
1 108 . In the following sub-sections, we call our game-theoretic distributed
coordination function GDCF.
Figure 18.1 shows the variation of the probability with which a node accesses the
wireless channel in the saturation regime in GDCF and DCF as the network size
increases. In both protocols, the domain of the probability is the set [0.06061,
0.00195] since contention window varies from 32 to 1,024. Recollect from Sec-
tion 18.3 that the probability with which any given node in GDCF accesses the
channel at the NE is identical for all nodes. We observe that the channel access
probabilities are always observed to be higher in DCF than in GDCF. This has an
important consequence as will be evident in the ensuing discussion.
The aggregate throughput (see Fig. 18.2) which measures the total throughput of all
nodes under saturation conditions is higher in DCF when the network size is small
but it gets far lower than GDCF as the network size grows. This is because the chan-
nel access probabilities are higher in DCF. In a small network, collisions are few and
18 Performance Improvement of Wireless MAC Using Non-Cooperative Games 215
We observe from Fig. 18.3 that the conditional collision probability is higher in case
of DCF than in GDCF. This is expected as the channel access probabilities are lower
in case of GDCF resulting in reduced medium contention even when the number of
nodes increases. Indeed GDCF is designed with the aim of keeping the price of
contention low.
In 802.11, a large number of slots are wasted due to collisions. This seriously affects
the performance and lifetime of the power-constrained devices in ad-hoc and sensor
networks. Figure 18.4 makes clear that the number of slots wasted in collisions for
every successful transmission is much higher in DCF than in GDCF, making the
later attractive in these networks.
216 D.K. Sanyal et al.
18.6 Discussion
The motivation for this work is [5]. So we make a brief comparison with it in this
section. Our utility function produces unique non-trivial NE for a wide range of vi
and wi while the one in [5] has more strict constraints on vi once wi is specified.
This makes our utility functions admit a much larger game space and hence is supe-
rior for widely varying network sizes since the channel access probability should be
very small for very large network sizes and high for small network sizes. Table 18.1
presents a performance comparison as regards throughput and collision for large
networks. Here, as in our experiments, we vary contention windows from 32 to
1,024 as is the default for DSSS PHY in 802.11 standard while in [5] (call it U Chen)
the window ceiling is limited to 256 if the base is 32 and only powers of 2 are cho-
sen. As expected, with these settings, our utility function exhibits particularly better
performance for large network sizes.
18.7 Conclusion
References
19.1 Introduction
N. Qasim (B)
The Centre for Telecommunications Research, King’s College London, Strand,
WC2R 2LS, London, UK,
E-mail: nadia.qasim@kcl.ac.uk
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 219
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
220 N. Qasim et al.
routers, which are free to move arbitrarily and perform management. MANETs as
shown in Fig. 19.1. It has characteristics that network topology changes very rapidly
and unpredictably where many mobile nodes (MNs) can move to and from a wire-
less network without any fixed access point, consequently routers and hosts move,
hence topology is dynamic.
MANETs has to support multi hop paths for mobile nodes to communicate with
each other and can have multiple hops over wireless links; therefore its’ connection
point to the internet may also change. If mobile nodes are within the communi-
cation range of each other than source node can send message to the destination
node otherwise it can send through intermediate node. Now-a-days MANETs have
robust and efficient operation in mobile wireless networks as it can include routing
functionality into MNs which is more than just mobile hosts and reduces the rout-
ing overhead and saves energy for other nodes. Hence, MANETs are very useful
when infrastructure is not available [1], unfeasible, or expensive because it can be
rapidly deployable, without prior planning. Mostly mobile ad hoc networks are used
in military communication by soldiers, automated battlefields, emergency manage-
ment teams to rescue [1], search by police or fire fighters, replacement of fixed
infrastructure in case of earthquake, floods, fire etc., quicker access to patient’s
data from hospital database about record, status, diagnosis during emergency sit-
uations, remote sensors for weather, sports stadiums, mobile offices, electronic
payments from anywhere, voting systems [2], vehicular computing, education sys-
tems with set-up of virtual classrooms, conference meetings, peer to peer file sharing
systems [2].
Major challenges in mobile ad hoc networks are routing of packets with fre-
quently MNs movement, there are resource issues like power and storage and there
are also wireless communication issues. As mobile ad hoc network consists of wire-
less hosts that may move often. Movement of hosts results in a change in routes.
In this paper we have used routing protocols from reactive, proactive and hybrid
categories to make evaluation.
19 Performance Evaluation of Mobile Ad Hoc Networking Protocols 221
AODV DSR ACOR ABR DSDV OLSR WRP CGSR TORA ZRP ARPAM OORP
when it was demanded and has bi-directional route from the source to destination.
When it has packets to send from source to destinations MN then it floods the net-
work with route request (RREQ) packets. All MNs that receive the RREQ checks
its routing table to find out that if it is the destination node or if it has fresh route
to the destination then it unicast route reply (RREP) which is routed back on a tem-
porary reverse route generated by RREQ from source node, or else it re-broadcast
RREQ.
Optimized link state routing is a proactive routing protocol [4]. In which each
node periodically broadcasts its routing table allowing each node to build a global
view of the network topology. The periodic nature of the protocol creates a large
amount of overhead. In order to reduce overhead it limits the number of MN
that can forward network wide traffic and for this purpose it uses multi point
relays (MPRs) which is responsible for forwarding routing messages and opti-
mization for controlled flooding and operations. Each node independently elects
a group of MPRs from its one hop neighbors. All MNs periodically broadcast a
list of its MPR selectors instead of the whole list of neighbors. MPRs are also
used to form a route from MN to destination node and perform route calcula-
tion. All MNs maintain the routing table that contains routes to all reachable
destination nodes. OLSR does not notify the source immediately after detecting a
broken link.
Temporary ordered routing algorithm is hybrid protocol, which is distributed and
routers only maintain information about adjacent routers [5]. During reactive oper-
ation, sources initiate the establishment of routes to a given destination on demand.
Where in dynamic networks, it is efficient with relatively sparse traffic patterns; as
it does not have to maintain routes at all the time. It does not continuously execute
a shortest path computation and the metric used to establish the routing structure
does not represent a distance. TORA maintains multiple routes to the destination
when topology changes frequently. It consists of link reversal of the Directed Acyclic
Graph (ACG). It uses internet MANET encapsulation protocol (IMEP) for link sta-
tus and neighbor connectivity sensing. IMEP provide reliable, in-order delivery of
all routing control messages from a node to all of its neighbors, and notification to
the routing protocol whenever a link neighbors is created or broken. As TORA is
for multihop networks which is considered to minimize the communication over-
head associated with adapting to network topological changes by localization of
algorithmic reaction. Moreover, it is bandwidth efficient and highly adaptive and
quick in route repair during link failure and providing multiple routes to destination
node in wireless networks.
[6], which is used for network modeling and simulation results as it has fastest
event simulation engine. In our Mobility Model, MNs in the simulation area move
according to random waypoint model [1]. Radio Network Interface used are the
physical radio characteristics of each mobile node’s network interface, such as the
antenna gain, transmit power, and receiver sensitivity, were chosen to approximate
the direct sequence spread spectrum radio [2]. Media Access Control is the distribu-
tion coordination function (DCF) of IEEE 802.1 1b, which was used for underlying
MAC layer [2]. Default values are used for MAC layer parameters. For Network
Traffic, in order to compare simulation results for performance of each routing pro-
tocol, communication model used for network traffic sources is FTP. For Traffic
Configuration, all experiments have one data flow between a source node to a sink
node consisting of TCP file transfer session and TCP transmits with the highest
achievable rate. TCP is used to study the effect of congestion control and reliable
delivery [7].
It consists of 50 wireless nodes which were placed uniformly and forming mobile
ad hoc network, moving about over a 1; 000 1; 000 m area for 900 s of simulation
time [8].
The AODV simulation parameters used are the same as in [8] except the active
route timeout which was set to 30 s, the TORA parameters we used are simi-
lar to those in [9]; moreover, the OLSR parameters we used are similar to those
in [8]. In OLSR’s Hello interval and TC interval were set to 2 and 5 s respectively,
its neighbor hold time was 6 s, and entries in topology table expire after 15 s. In
OLSR scenario the MNs in the network are grouped in clusters and each clus-
ter has MPR. The transmission range is 300 m. The MPRs are selected in each
cluster, which are the MNs which have high willingness parameter. We have five
clusters each having its own five MPRs that move towards stable state. MPR of
MNs in the network sends topology control messages periodically. The numbers of
MPR in network are directly proportional to the number of TC traffic sent. Each MN
sends periodically Hello messages in the network that consists of list of neighbors
and node movement’s changes [6] (Tables 19.1–19.3).
When the mobile ad hoc network simulations were run than result shows that all
MNs were in range of each other, no data packets experience collisions in presence
of ftp traffic load and all MNs were capable of sending packets. Hence, it shows that
carrier sense and back off mechanisms of the 802.11b were working precisely. All
results were obtained by averaging over ten random mobility scenarios of mobile ad
hoc networks.
19 Performance Evaluation of Mobile Ad Hoc Networking Protocols 225
The most of events were simulated by OLSR which are 143,571,00. Conse-
quently, average number of events simulated by TORA and AODV are 199,354,5
and 229,537 respectively. On the other hand, high simulation speed for most of
events simulated per seconds was observed in TORA routing protocol simulation
runs that was 544,829 events per second, than it was in AODV and OLSR for
about 398,557 and 232,943 events per seconds. These statistics shows that proactive
protocol can simulate millions of more event than reactive and hybrid protocols.
19.5.1 Throughput
Figure 19.3c shows that OLSR has lowest steady end to end delays which are about
0.0004 s. Further on, the end to end delay start to rise and fall abruptly in AODV and
TORA therefore ends up in less end to end delays in AODV as compare to TORA
that is around on average 0.0015 s and 0.0032 respectively. TORA had higher delays
because of network congestion. As created loop where the number of routing pack-
ets sent caused MAC layer collisions, and data, Hello and ACK packets were lost
that resulted in assuming that links to neighbors was broken by IMEP. Therefore,
TORA reacted to these link failures by sending more UPDATEs, in turn that created
226 N. Qasim et al.
Fig. 19.3 Comparison between the MANETs routing protocol’s simulation results: (a) Through-
put, (b) Throughput for TORA and AODV, (c) End to End Delay, (d) Media Access Delay, (e)
Packet Delivery Ratio, (f) Packet Delivery Ratio for AODV and TORA
19 Performance Evaluation of Mobile Ad Hoc Networking Protocols 227
In Fig. 19.3d we plotted media access delay which is very important for multime-
dia and real time traffic; furthermore it is vital for any application where data is
processed online. Media access delay was low for OLSR that is around 0.0001 s.
However, the media access delay for AODV and TORA fluctuates more frequently
but AODV fluctuates more frequently above and below its mean while TORA
mainly around its mean, thus in both case fluctuation is higher and more frequent as
compared to OLSR that remains steady over 900s of simulation time (Table 19.4).
The fraction of the originated application data packets by each protocol was able
to deliver at varying time as shown in Fig. 19.3e. As packet delivery ratio shows
both the completeness and correctness of the routing protocol and also the measure
of efficiency. We used packet delivery rate as the ratio of number of data packets
received at the sink node to the number of data packets transmitted by source nodes
having a route in its routing table after a successful route discovery. For all protocols
packet delivery ratio is independent of offered traffic load, where routing protocols
OLSR, AODV, TORA delivering about 81%, 53.6% and 53.1% of the packets in all
cases. OLSR provides better packet delivery rate than all other routing protocols,
on the other hand AODV has higher delivery ratio as compared to TORA. AODV
and TORA takes time for computing route to send data packets as these protocols
constructs route on demand whereas OLSR is an proactive routing protocol and uses
routing table to send data packets at once. As packet delivery ratio indicates the loss
rate that can be seen by the transport protocols that effects the maximum throughput
that the network can handle. OLSR have MRPs for each cluster which maintains
routes for the group of destination, packets that the MAC layer is unable to deliver
are dropped since there are no alternate routes. In Fig. 19.3f, we have used different
scales of axes to show results of packet delivery ratio visibly for reactive and hybrid
protocol. In AODV and TORA graph starts after 100 s because AODV and TORA
228 N. Qasim et al.
takes time for computing route to receive data packets on destination as these proto-
cols constructs route on demand whereas OLSR is a proactive routing protocol and
uses routing table to send data packets at once. TORA in 50 MN wireless networks
delivered around 53% of data packets over simulation time, TORA fall short to con-
verge because of increased congestion. In TORA mainly data packets were dropped
because of short lived routing loops, which are part of its link reversal process.
When the packet of next hop and previous hop are same then more data packets
were dropped because the packets were looped until time to live expires or when
loop exited; moreover, data packets which were in loops interfered by broadcast
UPDATE packet from neighbor MNs which in turn can resolve routing loop. It was
observed that packet delivery ratio was less in TORA than AODV. Moreover, routing
protocols have differed in how much protocols can deliver packets to destination
MN.
19.6 Conclusion
The mobile nodes’ mobility management is key area since mobility causes route
change and frequent changes in network topology, therefore effective routing has to
be performed immediately. This paper makes contributions in two areas. Firstly, this
paper compared the performance of reactive ad hoc on demand distance vector pro-
tocol; proactive optimized link state routing protocol and hybrid temporary ordered
routing algorithm protocol in mobile ad hoc networks under ftp traffic. Secondly, we
have presented the comprehensive results of packet delivery ratio, throughput, media
access delay, and end to end delay over mobile ad hoc networks of 50 MNs moving
about and communicating with each other. The simulation results were presented
for a range of node mobility at varying time. OLSR performs quite predictably,
delivering virtually most data packets at node mobility. In [8] also shows that OLSR
shows the best performance in terms of data delivery ratio and end-to-end delay.
TORA, although did not perform adequate in our simulation runs in terms of rout-
ing packet delivery ratio, delivered over fifty three percentage of the packets. Since
the network was unable to handle all of the traffic generated by the routing protocol
and a significant fraction of data packets were dropped. As well as in [9] shows that
the relative performance of TORA was decisively dependent on the network size,
and average rate of topological changes; TORA can perform well in small network
size but TORA’s performance decreases when network size increases to 50 nodes.
On the other hand, AODV performed better than TORA in most performance met-
rics with response to frequent topology changes. Finally, the overall performance of
OLSR was very good when MNs movement was changing over varying time. We
have analyzed that all routing protocol successfully delivers data when subjected to
different network stresses and topology changes. Moreover, mathematical analysis
and simulation results both show that optimized link state routing protocol, from
proactive protocol category, is a very effective, efficient route discovery protocol for
MANETs.
19 Performance Evaluation of Mobile Ad Hoc Networking Protocols 229
References
20.1 Introduction
Recent years have seen an immense growth in the popularity of wireless applications
that require high throughput. To support such growth, standardization bodies such
as the IEEE 802.11 have formed task groups to investigate and standardize features
providing increased quality of service (QoS) and higher throughputs. One of these
extensions is the acknowledgement (ACK) policy feature included in the ratified
IEEE 802.11e amendment for QoS support [3], which is the focus of this work. In
O. Cabral (B)
Instituto de Telecomunicacoes, DEM-UBI Calcada Fonte do Lameiro 6201-001 Covilha Portugal,
E-mail: orlando@ubi.pt
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 231
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
232 O. Cabral et al.
particular, we investigate the policy regarding the block size for a video application.
The Block Acknowledgement (BA) procedure improves system throughput results
by reducing the amount of overhead required by a station to acknowledge a burst of
received traffic [1,2]. It acknowledges a block of packets by a single ACK, instead of
using several ACKs, one for each packet, saving the Arbitration Inter-frame Spacing
(AIFS) period, the backoff counter time, and the acknowledgement time. The num-
ber of frames that can be transmitted within a block is called block size. It is limited
and is not specified in the standard. In this chapter, to find the most suitable block
size we have tried several block sizes and several loaded scenarios with and without
mixture of services. This chapter is organised as follows. In Section 20.2, a brief
introduction to the IEEE 802.11e standard is presented along with the main direc-
tive lines of the BA procedure. Section 20.3, gives the description of the state of the
art. Section 20.4 defines the problem and the scenario, including details on traffic
parameters. Section 20.5 gives the simulation results obtained for several scenarios
with and without the use of the BA procedure, with and without mixtures of traffic.
Conclusions are given in Section 20.6, as well as suggestions for further work.
Slot Time
20 IEEE 802.11E Block Acknowledgement Policies
Defer Access
Select Slot and decrement
backoff as long as the
medium stays idle
ADDBA Request
ACK
Data
BlockACK
DELBA Request
(c) Tear Down
ACK
Originator Recipient
to as the originator, and the receiver of that data as the recipient. The BA mech-
anism is initialized by an exchange of Add Block Acknowledgment (ADDBA)
Request/Response frames. After initialization, blocks of QoS data frames can be
transmitted from the originator to the recipient. A block may be started within a
polled TXOP or by winning EDCA contention. The number of frames in the block
is limited, and the amount of state that is to be kept by the recipient is bounded. The
MAC Packet Data Units (MPDUs) within the block of frames are acknowledged
by a BA control frame, which is requested by a BlockACKReq control frame. The
response to the BlockACKReq will acknowledge all the correctly received frames
and request the badly received to be transmitted again.
Analytical frameworks to model BA have been published [6-8] but the results are
not based on a realistic approach to the problem nor account for the achievable
QoS, because the use of several service classes with different priorities (the base
20 IEEE 802.11E Block Acknowledgement Policies 235
for EDCA) is not considered at all. The existing theoretical approaches [6-8] do
not consider the hidden terminal problem, assume that the buffer is always full,
nor assume a multi-rate scenario. In [6,7], an analytical framework was presented
to model an ad-hoc network under the standard IEEE 802.11e with the BA proce-
dure for a completely simplified scenario. The hidden/exposed terminal problem is
one fundamental issue in WLANs but most of the existing analytical models either
assume it does not exist, or do not consider the EDCA features of IEEE 802.11e
(or do not account for the delay or any other QoS metric) [6,7]. Results presented
in [6,7] express the block size as a function of the goodput in saturation conditions.
Results show that the block size should be as high as possible, which can be mis-
leading when we consider QoS. In [8] an analytical approach to model the BA in
IEEE 802.11e was proposed without accounting for the hidden terminal problem.
The multi-rate feature in the same environment was also not considered. Further,
the packet loss due to errors in the channel was not taken in consideration. Finally,
the use of EDCA procedures, like several virtual queues, for the several classes of
service are also not considered, i.e., this work does not consider the IEEE 802.11e
standard at all. From the simulation approaches presented in the literature the one
that is most similar to the work proposed here was proposed in [9]. In [9] several
combinations for the block size are presented where a scheduler based on the delay
and amount of data in the buffer is proposed. The work presented here is an improve-
ment of this approach and provides a more extensive study on the block size while
considering use-perceived QoS.
A cellular WiFi system was considered where each cell has a set of N+1 IEEE
802.11e stations communicating through the same wireless channel. While station
0 is the QoS Access Points (QAP), the other N are QoS stations (QSTA). Each
station has four buffers whose size depends on the kind of service being dealt in
order to guarantee a given value for the goodput (payload of the packet at MAC
level). These buffers will be filled with a MAC Service Data Units (MSDU) gener-
ated that characterises the service being dealt in the given buffer. If the MSDU is
bigger than a fragmentation threshold, it will be fragmented. In order to cope with
service quality the packet transmission follows the Enhanced Distributed Channel
Access (EDCA) IEEE 802.11e MAC procedure. Due to collisions or interference
a packet may not be correctly received. The interference issues are addressed by
using a radio propagation model. Each packet exits the buffer only after the recep-
tion of an acknowledgement, or if it has suffered more than a given collision
threshold. The users are assumed to be static, and are distributed uniformly in a
square area of 2,500 square meter. The physical layer specification assumed in this
work is the IEEE 802.11a standard 802, 1999, [10], that defines an Orthogonal Fre-
quency Division Multiplexing (OFDM) based PHY layer that operates in the 5 GHz
frequency bands, being able to achieve bit-rates as high as 54 Mbps. The physical
236 O. Cabral et al.
and MAC parameters are presented in Table 20.1. The use of the Request/Clear-to-
send (RTS/CTS) procedure is implemented only if the packet size is larger than a
given threshold, RTS threshold in Table 20.1.
In the algorithm proposed in [10] the sender chooses the bit-rate that achieves the
highest throughput taking into account the SINR estimate. More details on physi-
cal layer implementation used in the simulator can be found in [11]. Three types
of traffic sources were chosen, namely high priority voice (VO), medium prior-
ity video (VI), and low priority FTP data as background traffic (BK). The traffic
sources parameters are presented in Table 20.2 as well as the Access Categories
(AC) of each type of traffic.
We implemented the BA procedure with minor modification to the existing fea-
tures like TXOP and RTS/CTS as already explained and without disturbing the
overall TXOP and RTS/CTS procedure. When a TXOP is gained, the transmission
starts and the origin will know if a BA procedure is implemented with this desti-
nation. If this is true, it will not wait for an acknowledgement, but just a SIFS and
start the next transmission for the same destination or not depending on which is
the next packet in the buffer. Figure 20.3 presents this procedure for destination 1
that has the BA procedure implemented and for destination 2 without the BA. At the
beginning of a transmission for a user with active BA procedure, if the block size
threshold is reached, a block ACK request packet is added to the buffer to the top-1
place in the queue to be transmitted as the next in line packet.
20 IEEE 802.11E Block Acknowledgement Policies 237
MPDU Dest1
MPDU Dest1
MPDU Dest2
MPDU Dest2
BA procedure within a TXOP
ACK
ACK
TXOP Limit
20.5 Results
The objective of the simulations was to investigate the gain of the BA procedure for
a single service and for a mixture of services. In terms of grade of service, the voice
application supports delays up to 30 ms, the video application supports delays up
to 300 ms, while the delay for background applications can be up to 500 ms [13].
As previously mentioned, the BA mechanism improves the channel efficiency by
aggregating several acknowledgments into one frame. To investigate the proposed
BA procedure for the stand-alone service a BA procedure with a block size of 16
fragments was adopted. This procedure is simulated only for BK and VI traffic. For
VO application it is certain that BA is not a solution because of the large delays
that occur when the buffer is filled with 16 packets (i.e., the delay of the first one
would be 320 ms, application). Figure 20.4 presents the delay for BK and VI traffic.
It starts to increase around 12 stations for BK and at 16 stations for VI and increases
more for a higher number of stations. As expected, the delay is lower when the BA
procedure is used. The improvement (reduction) for BK traffic is 300 ms on the
average after 12 stations, while for VI traffic it is 420 ms on the average after 16
stations. The improvement in the goodput is of 2 and 2.2 Mb/s in average, after 12
stations, for BK traffic and after 16 stations for VI, respectively [12].
This part of the work explores which should be the policy regarding the block size
with mixtures of applications. For the purpose we have investigated the BA policy
for a video service, an application that is delay-sensitive. Additionally, we have
investigated the BA block size for background service that is not delay sensitive.
Simulations were performed for 100 s and around 40 times for each number of
stations, 15, 20, 25, 30, 35, 40, 45 and each block size, 4, 8, 12, 16. The users
started the negotiation to initiate BA procedure in the beginning of the simulation
238 O. Cabral et al.
Fig. 20.4 Delay for BK and VI with and without Block ACK
1 2 3 4 2 5 6
Delay = actual_delay +
Bad transmission clock-time_tx_packet (3) +clock-
time tx packet (4)
so all the packets of video and background traffic were transmitted within the BA
procedure. The use of the BA procedure in a scenario with a mixture of applications
running at the same time was further studied. On the one hand, the overhead caused
by the negotiation to establish the BA procedure, and to maintain the BA causes bad
performance when a small block size is used. On the other hand, the packet losses
caused by the voice users with higher priority in the system, influences the overall
QoS on the system and mainly on the video service since we considered in this
simulations, that if a packet is not correctly received at a given time the following
packets sent within this block will have the delay of the badly sent packet after
transmitted added to their delay, Fig. 20.5.
The impact of packet retransmissions in the delay therefore increased. Figure 20.6
presents the delay for each access category. For BK traffic the delay starts to increase
considerably with more than 20 stations (12 in the standard procedure). This service
class will transmit very rarely since the voice and video traffic will always be sched-
uled first. For video traffic the delay starts to increase in the 35 stations. In contrast
with BK and VI, VO applications present a negligible delay.
The delay impacts the most the video application. Figure 20.7 shows the results
for various block sizes for transmitted video application. Regardless on the number
of stations, the BA procedure reduces the delay. Certain sensitivity is observed for
a block size of 16, which is the size that gives the highest delay for a total of 1–25
stations. For a larger number of station the best block size is 16 as smaller block
sizes are not that efficient in decreasing the delays.
When 30 stations are being served in the system the results for the delay with
BA is near 5 ms for all block sizes, while without BA the values of the delay
20 IEEE 802.11E Block Acknowledgement Policies 239
Fig. 20.6 Delay for all services with Block ACK implemented for VI and BK traffic
10
1.5 5
Delay [ms]
0
15 20 25 30 35
1 block size 4 Number of stations
block size 8
block size 12
0.5 block size 16
standard ACK
0
15 20 25 30 35 40 45
Number of stations
goes up to 80 ms. For 35 stations the difference is even higher. A minimum near
5 ms is obtained for block size 12 while for delay without BA the delay is near
2.3 s. Results for 40 and 45 advise the use of block size 16, although the network
is already overloaded (and the delay will just keep growing up to infinity). When
less then 40 stations are present the confidence intervals are very small for all the
buffers. When there are 40 or more stations the confidence intervals increase. One
can therefore extrapolate that there are stations that manage to transmit and have a
small delay while others transmit from time to time, causing very high delays. The
behaviour observed in Fig. 20.7 occurs due to the overhead caused by the blockACK
request, and the delays caused by bad receptions affected mostly the block size 16,
but providing lower delays for higher loaded scenarios. The solution is neither to
use a large block size (as large as 16) nor a small block size (as low as 4). The
240 O. Cabral et al.
Fig. 20.8 Total goodput with BA implemented for video and background traffic
Fig. 20.9 Video goodput with Block ACK implemented for video traffic
former increases the delay causing problems to the application using these packets
(for more than 35 stations), while the latter causes some unnecessary overhead by
requesting very often block ACK requests (for less than 35 stations). The advised
block size shall be 12 since it is the one that provides lower delay in a scenario
where the load is still manageable, at least for the case where BA is used.
Figure 20.8 presents the results of the goodput in the system in the downlink.
Without the BA the maximum goodput achieved is near 11 Mbit/s, while with BA is
near 14 Mbit/s for block sizes 12 and 16. The decreasing behaviour after 25 stations
occurs due to the high number of collision, and as the background traffic has low
priority ends up not being transmitted giving its turn to voice and video users. As
voice traffic provides higher overhead the resulting goodput is lower.
20 IEEE 802.11E Block Acknowledgement Policies 241
Figure 20.9 presents the goodput only for the video service class. Only after 30
stations the achieved throughput is different when using and not using BA. The
highest goodput is found for block size 16, and is more than 10% higher relatively
to the case of standard ACK. The increasing behaviour of the goodput is verified up
to 40 stations.
20.6 Conclusions
This work investigated the BA size as a policy to decrease delays and ensure a stable
system. The investigation was based on tuning several parameters and investigated
the effect for a procedure with and without BA. Several policies were investigated.
Future work will test policies that provide access to the medium, that ensure some
degree of service, based on the channel SINR, delays, bit error rate, can be tested.
Although, for lower values of the number of stations, the use of BA leads to a slightly
worst system performance, the BA procedure provides an improvement in highly
loaded scenarios. The improvement is of 2 and 2.2 Mb/s in average, for BK traffic
and VI traffic, respectively. In a scenario with mixture of services the most advised
block size is 12 since it is the one that provides lower delays in a highly loaded
scenario while the users are still within the capacity of the AP. The number of sup-
ported user increases from 30 to 35. Note that 35 stations is the total number of VO
plus VI and BK user.
References
1. A. R. Prasad, and N. R. Prasad, (2005). 802.11 WLANs and IP Networking. Artech House,
Boston, MA.
2. J. L. Mukerjee, R. ă Dillinger, M. ăMohyeldin and Schulz, E.ă (2003). Investigation of
radio resource scheduling in wlans coupled with 3g cellular network. IEEE Communications
Magazine, 41(6):108–115.
3. T. Li, Q. Ni, T. Turletti and Y. Xiao, (2005). Performance analysis of the ieee 802.11e block ack
scheme in a noisy channel. In IEEE BroadNets 2005 – The Second International Conference
on Broadband Networks, Boston, MA.
4. T. Li, Q. Ni and Y. Xiao, (2006). Investigation of the block ack scheme in wireless ad-hoc
networks. Wiley jornal of Wireless Communications and Mobile Computing, 6(6):877–888.
5. I. Tinnirello, and S. Choi, (2005). Efficiency analysis of burst transmissions with block
ack in contention-based 802.11e wlans. In ICC 2005 – IEEE International Conference on
Communications, Seoul, Korea.
242 O. Cabral et al.
6. V. Scarpa, G. Convertino, S. Oliva and C. Parata, (2005). Advanced scheduling and link adap-
tation techniques for block acknowledgement. In 7th IFIP International Conference on Mobile
and Wireless Communications Networks (MWCN 2005), Marrakech, Morocco.
7. Ni, Quiang (2005). Performance analysis and enhancements for ieee 802.11e wireless
networks. IEEE Networks, 19(4):21–27.
8. A. Grilo, (2004). Quality of Service in IP-based WLANs. Ph.D. thesis, Instituto Superior
Tcnico, Technical University of Lisbon, Lisbon, Portugal.
9. O. Cabral, A. Segarra and F. J. Velez, (2008). Event-driven simulation for ieee 802.11e
optimization. IAENG International Journal of Computer Science, 35(1):161–173.
Chapter 21
Routing in a Custom-Made IEEE 802.11E
Simulator
21.1 Introduction
Wireless networks are gaining more and more importance in our world. Cellular
phones with GPRS/UMTS, Wi-Fi, and WiMAX networks are very common these
days, and they share a common feature: they require some sort of backbone infras-
tructure in order to allow for packets from different communication peers to reach
each other. For example, if someone makes a phone call, the conversation will
always pass from the cell phone to the operators’ infrastructure, and then to the
J. M. Ferro (B)
Instituto de Telecomunicações, DEM-UBI, Portugal,
E-mail: ferro@lx.it.pt
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 243
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
244 J.M. Ferro, F.J. Velez
receivers phone, even if they are both in the same building. Resources would be
certainly saved if somehow the cell phones could connect directly to each other.
In an ad-hoc network, all the participants (also called nodes) can communicate
directly with their neighbours. Two nodes are considered neighbours if their com-
munication devices can reach each other. Nodes wanting to communicate to others
that are not neighbours will simply send a message to another node which is located
nearer the destination. As so, a centralised infrastructure to establish the connec-
tivity is not required, since each node will determine by itself to where it should
forward its data. The specification IEEE 802.11 refers to this type of network as
Independent Basic Service Set (IBSS).
This chapter is organized as follows. Section 21.2 presents some background
information about the previous simulator and about IEEE 802.11e. Section 21.3
describes the features of the new simulator highlighting the modifications we per-
formed. Section 21.4 presents the results of the initial simulations, while
Section 21.5 presents conclusions and suggestions for future work.
acting as the coordinator. It provides some basic QoS support, but since it is defined
as optional and is more complex and costly to implement, only a few APs support it.
The former does not present QoS guarantees at all. To ensure the QoS performance,
IEEE 802.11e introduced the Hybrid Coordination Function (HCF), which defines
the HCF Controlled Channel Access (HCCA) and Enhanced Distributed Channel
Access (EDCA). In EDCA, a Transmission Opportunity (TXOP) is won according
to the traffic type, scheduling firstly higher priority traffic, like voice and video. The
HCCA is more complex, but allows greater precision in terms of QoS. All these
enhancements are fully explained by Prasad [2].
In the EDCA simulator each station (and AP) has four buffers, one for each traffic
type. By using an event-driven approach, for each new packet arrived at a machine
a new event is generated. The simulator engine uses its internal clock to pick up the
next event and pass the corresponding packet to the developed MAC + PHY layers.
A more comprehensive description of the lower layers of the simulator can be found
in [3].
21.3 Overview
The current simulator is an evolution of the previous one. To provide support for
multi-hop environment it was necessary to implement the routing algorithm above
the already existing Medium Access Control (MAC) plus Physical (PHY) layers, as
it is presented in Fig. 21.1.
After the random placement of all the stations in the field, the simulator deter-
mines which can communicate directly. With this data, the selected routing algo-
rithm determines the shortest path from a station to all the others. For these initial
tests we chose the well-known Dijkstra’s algorithm, [4], which calculates the least
cost path from one node to all the remaining ones. This characteristic allows the
routing calculation to take place before the simulation starts, making the routing
table available to all nodes since time instant t D 0 s. Besides this initial calcula-
tion, the routing table can be accessed and modified at any time, allowing the use
of cross-layer design (i.e., gather information from physical and MAC layers and
use it in the routing algorithm) for optimisation purposes. In this work, however,
we present no cross-layer at all, and the routing table is not modified during the
simulations.
The engine starts collecting events at the beginning of the simulation. When a
packet arrives at a wireless machine, a new simulation module verifies if the packet
has another destination by checking the two additional fields now included in the
packet header: besides the “payload”, “origin station”, and “destination station”,
among others, each packet now includes the “initial origin” and the “final destina-
tion” fields. If it is found that the destination is another node, the routing table is
accessed to determine the next hop, and the packet is added to the machine buffer
with the new destination stamped.
246 J.M. Ferro, F.J. Velez
Fig. 21.1 Multi-hop simulator flowchart, modules added to the previous version are displayed in
grey
The simulation will run until the time limit is reached, and will be repeated during
a pre-determined number of times. The final output of the simulator is obtained as
an average of the results for these simulations.
For the end user, this simulator allows for simulating a network with an unre-
stricted number of nodes, in any configuration type. By using the chosen routing
protocol, it supports connections to every reachable station from its neighbours,
allowing for a message to reach any destination. Stations are randomly placed in a
field and even if a station is isolated the simulator will still run (unless this station is
the source/final destination of a packet, which will terminate the simulation). Sev-
eral parameters can be tuned by the user to its needs, some of them are presented in
Table 21.1.
21.4 Results
In order to verify if the modified simulator was able to simulate a multi-hop envi-
ronment, we ran a few tests. In the first one we tried to verify if the simulation time
was going to affect the end-to-end delay. For this purpose, we considered a fixed
21 Routing in a Custom-Made IEEE 802.11E Simulator 247
After this initial test, 10 s was assumed as a reasonable simulation time, and the
subsequent tests were made with this value for the duration.
Besides this, further tests were made to evaluate the influence of multiple streams
in the delay and packet loss. For these tests we used the same deployment of stations
as in the previous test, Fig. 21.4.
The inner circle has a radius of 15 m, the next one 30 m, and the outer one 45 m.
The square has 120 120 m2. In the next set of tests we added sequentially video
streams between the following stations:
A. From station 1–3
B. From station 4 to 2
21 Routing in a Custom-Made IEEE 802.11E Simulator 249
C. From station 5 to 6
D. From station 8 to 9
For each of these streams the routing algorithm established the following paths:
A. 1 ! 9 ! 13 ! 14 ! 10 ! 3
B. 4 ! 11 ! 14 ! 7 ! 2
C. 5!6
D. 8 ! 14 ! 13 ! 9
Figure 21.5 presents the results obtained for the delay of video stream from station
1 to station 3. By increasing the number of stations that generate packets, the end-
to-end delay (latency) increases.
Another interesting metric is the number of lost packets, which is presented as the
darker areas in Fig. 21.6. When the number of packets to be transmitted increases,
there are more collisions and packet losses.
One interesting characteristic of this simulator is the possibility of simulating
four different types of traffic and their mixtures. By using this feature we produced
a new set of tests, keeping the same topology but modifying the type of packets in
each stream:
A. Video stream from station 1–3
B. Background traffic from station 4–2
C. Voice traffic from station 5–6
Delay [ms]
Number of packets
The details for the configuration of each traffic type are presented in Table 21.2 and
were taken from Qiang [5], while the results for the number of packets successfully
delivered (displayed in light grey) or lost (displayed darker) are presented in
Figs. 21.7 and 21.8.
While Fig. 21.7 presents the results for each packet, Fig. 21.8 addresses each
stream, i.e., the latter only counts a successful packet when this one arrives at the
final destination, while the former presents results for the accounting of packets that
arrive at any station. For this reason, and looking at the results for the video stream
alone (VI), in the former case, 5,000 packets were successfully delivered, while for
the latter only 1,000 packets arrived at the final destination. Remember that this
stream is transmitted in a 5-hop path, and that the most relevant result is the latter.
With only one stream, the system behaves perfectly, and all of the packets are
correctly delivered. This happens because there are no collisions, since the period
is longer than the end-to-end delay for every string. However, when more than one
type of traffic is being generated simultaneously, the packets start to collide and
some of them are considered lost (current policy establishes that a station will drop
21 Routing in a Custom-Made IEEE 802.11E Simulator 251
Number of packets
Traffic generated
Traffic generated
a packet after it experiences two collisions). The extension of this problem is larger
for background traffic, which can be explained by the largest size of each packet,
which will require the fragmentation into four smaller packets. For voice traffic, the
delay is very low and it seems that it does not affect the remaining traffic at all. This
is due to the small packets generated by this traffic type (no fragmentation required),
and because these stations are in line-of-sight, so no multi-hop is needed.
For the next tests, a new deployment was made changing the position of station 6,
while keeping all the others in the same position, Fig. 21.9.
252 J.M. Ferro, F.J. Velez
The traffic is the same generated before, and as so the routing algorithm com-
puted the following paths:
A. 1 ! 9 ! 13 ! 14 ! 10 ! 3
B. 4 ! 11 ! 14 ! 7 ! 2
C. 5 ! 13 ! 14 ! 6
D. 8 ! 14 ! 13 ! 9
To have faster results the simulation time was cut to 1 s (and not 10 s anymore). The
results for the new simulation presented in Fig. 21.10 also differ from the previous
because a new policy establishes that a packet will be discarded after experiences
eight collisions (previously it was discarded after just two collisions). By increasing
the collision limit, one should expect that the number of packets lost will be reduced
(for each packet it will be given eight chances to be transmitted, four times more
than previously). Comparing the new results with the previous (Fig. 21.8), one can
see that in fact the number of packets lost was reduced, while the number of packets
successfully delivered increased.
21.5 Conclusions
Traffic generated
Traffic generated
algorithm in order to allow the simulation of traffic among stations that can not
communicate directly. We performed some initial tests to verify if the simulator
was performing as expected, and the results were encouraging. A few improve-
ments for this simulator are being considered, as well as the release of its source
code to the scientific community for research purposes. Another possibility cur-
rently under study, is to adapt the simulator to allow simulations in the field of
Wireless Sensor Networks (WSN) – another research the authors of this chapter are
interested in. The simulator was built using the MAC and PHY layers of the standard
IEEE 802.11e, and it is intended to replace it by using a MAC layer specification
developed specifically for WSN.
References
1. H. Holma and A. Toskala, in: WCDMA for UMTS: HSPA Evolution and LTE, edited by Harri
Holma and Antti Toskala (Wiley, Chichester, 2007).
2. A. R Prasad and N. R. Prasad, in 802.11 WLANs and IP Networking, edited by Anand R. Prasad
and Neeli R. Prasad (Artech House, London, 2005).
3. O. Cabral and F. J. Velez, Event-Driven Simulation for IEEE 802.11e Optimization, IAENG
International Journal of Computer Science 35(1), 161–173 (2008).
4. E. W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik 1,
269–271 (1959).
5. N. Qiang, Performance analysis and enhancements for IEEE 802.11e wireless networks, IEEE
Network 19(4), 21–27 (2005).
Chapter 22
Web-Based Management of Distributed Services
Abstract This paper presents WebDMF, a Web-based Framework for the Manage-
ment of Distributed services. It is based on the Web-based Enterprise Management
(WBEM) family of standards and introduces a middleware layer of entities called
“Representatives”. WebDMF can be integrated with existing WBEM infrastructures
and is not limited to monitoring. On the contrary, it is capable of actively mod-
ifying the run-time parameters of a managed service. Due to its abstract design,
it is suitable for the management of a variety of distributed services, such as
grids and content delivery networks. The paper includes a discussion on Web-
DMF’s design, implementation and advantages. We also present experiments on
an emulated network topology as an indication of the framework’s viability.
22.1 Introduction
G. Oikonomou (B)
Department of Informatics, Athens University of Economics and Business, Athens, Greece
E-mail: geo@aueb.gr
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 255
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
256 G. Oikonomou, T. Apostolopoulos
Existing research and development efforts investigate techniques for the manage-
ment of distributed applications and services. The Open Grid Forum’s GMA [8]
and the Relational GMA (R-GMA) [9] focus on monitoring grids. MonALISA [10]
and the CODE toolkit [11] have wider scope but still only perform monitoring.
There are some proposals that can go beyond monitoring. The Unified Grid
Management and Data Architecture (UGanDA) [12] contains an infrastructure man-
ager called MAGI. MAGI has many features but is limited to the management of
UGanDA deployments. MRF is a Multi-layer resource Reconfiguration Framework
for grid computing [13]. It has been implemented on a grid-enabled Distributed
Shared Memory (DSM) system called Teamster-G [14].
Ganglia is another noteworthy proposal [15]. It performs monitoring of dis-
tributed environments (mainly computer clusters) by adopting a hierarchical
approach and by breaking down a cluster into “federations”. Astrolabe [16] parti-
tions a distributed system into non overlapping zones. Each zone has a “representa-
tive” that is chosen automatically among the zone’s nodes through a gossip protocol.
Users can monitor a zone’s operational parameters by specifying aggregation func-
tions. Lastly, UPGRADE-CDN [17] introduces “AMonitor”, a monitoring frame-
work for content delivery networks. AMonitor uses state-of-the-art technologies but
is limited to monitoring.
22 Web-Based Management of Distributed Services 257
Thorough research reveals that, existing management systems with many fea-
tures tend to target specific distributed applications. On the other hand, generic
approaches with wider scope offer less features and are often limited to moni-
toring. Furthermore, many current research efforts are based on proprietary tech-
nologies and are limited in terms of interoperability and ease of integration with
existing infrastructures. Lastly, there are questions about the scalability of some
approaches.
Compared to those necessary efforts, the framework discussed in this paper has
wide scope while maintaining a rich set of features, including the ability to perform
modifications on the managed service. It is based on open standards and is easy to
integrate with existing WBEM infrastructures.
Representative
CIM-XML SNMP
Managed Service
Service Node
Provider Managed Service
MIB
Managed Service
CIM Schema
Fig. 22.2 Two methods of communication between the WebDMF representative and service nodes
Initial Request
CIMOM
Other
Management Representative
Node WebDMF Representative
Provider Service Node
WBEM Client CIMOM
are delegated to the WebDMF provider module for further processing. The module
performs the following functions:
Determines whether the request can be served locally.
If the node can not directly serve the request then it selects the appropriate
representative and forwards it.
If the request can be served locally, the representative creates a list of service
nodes that should be contacted and issues intermediate requests.
It processes intermediate responses and generates the final response.
It maintains information about the distributed system’s topology.
In some situations, a service node does not support WBEM but is only manageable
through SNMP. In this case, the representative attempts to perform the operation
using SNMP methods. This is based on a set of WBEM to SNMP mapping rules.
There are limitations since it is not possible to map all methods. However, even
under limitations, the legacy service node can still participate in the deployment.
22.3.3 Domains
WebDMF_DomainHierarchy
0..1
WebDMF_Domain
* *
1
WebDMF_DomainRepresentativeNodes
WebDMF_DomainServiceNodes
WebDMF_Node
*
WebDMF_NodeService
*
WebDMF_NodeRepresentative
that they wish to perform on the target application. Each request towards the dis-
tributed deployment is treated as a managed resource itself. For example, users can
create a new request. They can execute it periodically and read the results. They can
modify it, re-execute it and finally delete it.
In order to complete a vertical operation, the following message exchange takes
place:
The management node sends a CreateInstance() WBEM message to any
representative. This requests the creation of a new instance for class WebDMF
RequestWBEM. This instance defines the management operation that needs to
be performed on service nodes.
The representative determines whether the request can be served locally. If not, it
chooses the appropriate representative and issues a new CreateInstance()
request.
If the representative can serve the request, it generates a list of service nodes that
must be contacted, based on the values of properties of the recently generated
instance.
The representative sends the appropriate requests to service nodes. The type
of CIM operation used for those requests is also based on values of proper-
ties in the instance. This operation is usually a WBEM GetInstance() or
a ModifyInstance().
After all service nodes have been contacted and responses have been sent,
the instance on the first representative contains results. It remains available to
the user for potential modification and/or re-execution. All other intermediate
instances are deleted.
Request factory classes are generic. They are not related in any way with the CIM
schema of the managed application. This makes WebDMF appropriate for the man-
agement of a wide variety of services. Furthermore, no re-configuration is needed
with each new release of the target service.
R1 R2
WBEM
client
The above experiment was repeated 200 times. Table 22.1 summarizes the results
with times measured in seconds. Consider the fact that each repetition involves 204
request-response exchanges among various nodes. Furthermore, consider that pack-
ets crossing the network are of a small size (a few bytes). The total execution time
includes the following:
Communication delays during request–response exchanges. This includes TCP
connection setup for all WBEM message exchanges.
Processing overheads on R1 and R2. This is imposed by WebDMF’s functio-
nality.
Processing at the service nodes to calculate the requested value and generate a
response.
The absolute value of the average completion time may seem rather high. However,
in general terms, processing times are minimal compared to TCP connection setup
and message exchange. With that in mind, we can see that each of the 204 request-
responses completes in less than 20 ms on average. This is normal.
After 200 repetitions we observe low statistical dispersion (variance and standard
deviation). This indicates that the measured values are not widely spread around the
mean. We draw the same conclusion by estimating a 95% confidence interval for
the mean. This indicates that the same experiment will complete in the same time
under similar network load conditions.
22.5 Conclusions
Existing monitoring solutions are generic. However, ones that offer more features
and the ability to “set” tend to have narrow scope. We wanted to design a frame-
work that would be generic enough and suitable for a wide variety of services. At
the same time, it should not be limited to monitoring. WebDMF achieves that by
22 Web-Based Management of Distributed Services 265
detaching the details of the managed service from the representative logic. Manage-
ment functions for a specific service are realised by WBEM providers on the service
nodes. Representatives unify those on a deployment scale. WebDMF has some other
noteworthy features:
It is based on WBEM. This is a family of open standards. WBEM has been
considered adequate for the management of applications, as opposed to other
approaches (e.g. SNMP) that focus on the management of devices.
It provides interoperability with existing WBEM-based management infrastruc-
tures without need for modifications.
References
1. W. Stallings, SNMP, SNMPv2, SNMPv3, RMON 1 and 2. Addison Wesley, Redwood City, CA,
1999.
2. CIM Infrastructure Specification, DMTF Standard DSP0004, 2005.
3. Representation of CIM in XML, DMTF Standard DSP0201, 2007.
4. CIM Operations over HTTP, DMTF Standard DSP0200, 2007.
5. G. Oikonomou, and T. Apostolopoulos, Using a web-based framework to manage grid
deployments, in Proceedings of The 2008 International Conference on Grid Computing and
Applications (part of WORLDCOMP 08), Las Vegas, 2008, pp. 10–16.
6. G. Oikonomou, and T. Apostolopoulos, Web-based management of content delivery networks,
in Proceedings of 19th IFIP/IEEE International Workshop on Distributed Systems: Opera-
tions and Management (DSOM). Managing Large Scale Service Deployment (MANWEEK
08), Samos, Greece, 2008, pp. 42–54.
7. G. Oikonomou, and T. Apostolopoulos, WebDMF: A web-based management framework for
distributed services, in Proceedings of The 2008 International Conference of Parallel and
Distributed Computing (part of WCE 08), Vol. I, London, 2008, pp. 593–598.
8. A Grid Monitoring Architecture, Open grid Forum GFD.7, 2002.
9. W. Cooke, et al., The Relational Grid Monitoring Architecture: Mediating Information about
the Grid, Journal of Grid Computing, 2(4), 2004, 323–339.
10. I. C. Legrand, H. B. Newman, R. Voicu, C. Cirstoiu, C. Grigoras, M. Toarta, and C. Dobre,
MonALISA: An Agent based, Dynamic Service System to Monitor, Control and Optimize
Grid based Applications, in Proceedings of Computing in High Energy and Nuclear Physics
(CHEP), Interlaken, Switzerland, 2004.
11. W. Smith, A System for Monitoring and Management of Computational Grids, in Proceedings
of International Conference on Parallel Processing (ICPP ’02), 2002, p. 55.
12. K. Gor, D. Ra, S. Ali, L. Alves, N. Arurkar, I. Gupta, A. Chakrabarti, A. Sharma, and
S. Sengupta, Scalable enterprise level workflow and infrastructure management in a grid
computing environment, in Proceedings of Fifth IEEE International Symposium on Cluster
Computing and the Grid (CCGrid ’05), Cardiff, UK, 2005, pp. 661–667.
13. P.-C. Chen, J.-B. Chang, T.-Y. Liang, C.-K. Shieh, and Y.-C. Zhuang, A multi-layer resource
reconfiguration framework for grid computing, in Proceedings of 4th International Workshop
on Middleware for Grid Computing (MGC ’06), Melbourne, Australia, 2006, p. 13.
14. T.-Y. Liang, C.-Y. Wu, J.-B. Chang, and C.-K. Shieh, Teamster-G: A grid-enabled software
DSM system, in Proceedings of Fifth IEEE International Symposium on Cluster Computing
and the Grid (CCGrid ’05), Cardiff, UK, 2005, pp. 905–912.
15. M. L. Massie, B. N. Chun, and D. E. Culler, The ganglia distributed monitoring system:
Design, implementation, and experience, Parallel Computing, 30, 817–840, 2004.
16. K. P. Birman, R. V. Renesse, and W. Vogels, Navigating in the storm: Using astrolabe to
adaptively configure web services and their clients, Cluster Computing, 9, 127–139, 2006.
266 G. Oikonomou, T. Apostolopoulos
17. G. Fortino, and W. Russo, Using p2p, grid and agent technologies for the development of
content distribution networks, Future Generation Computer Systems, 24, 180–190, 2008.
18. M. Kahani, and P. H. W. Beadle, Decentralised approaches for network management, ACM
SIGCOMM Computer Communication Review, 27(3), 36–47, 1997.
19. Common Manageability Programming Interface, The Open Group, C061, 2006.
20. A. Vahdat, K. Yocum, K. Walsh, P. Mahadevan, D. Kostic, J. Chase, and D. Becker, Scalabil-
ity and Accuracy in a Large-Scale Network Emulator, in Proceedings of 5th Symposium on
Operating Systems Design and Implementation (OSDI), December 2002.
21. OpenCDN Project [Online]. Available: http://labtel.ing.uniroma1.it/opencdn/
Chapter 23
Image Index Based Digital Watermarking
Technique for Ownership Claim and Buyer
Fingerprinting
23.1 Introduction
S. S. Bedi (B)
MJP Rohilkhand University, Bareilly (UP), India-243006
E-mail: dearbedi@gmail.com
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 267
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
268 S.S. Bedi et al.
integrity, access control, and availability. The need for information security has been
termed as security attack, mechanism, and services [1]. The various data hiding tech-
niques like cryptography, stenography, digital signatures, finger printing, have been
developed to address the information security issues but fail to provide the com-
plete solutions to protect the intellectual property rights of digital multimedia data.
The existing basket of technologies like cryptography secures the multimedia data
only during storage or transmission and not while it is being consumed [2]. Digital
Watermarking provides an answer to this limitation as the watermark continues to
be in the data throughout its usage.
Digital Watermarking (DWM) is the process of embedding information into dig-
ital multimedia contents such that the embedded information (watermark) can be
extracted later [3]. The extracted information can be used for the protection of
intellectual property rights i.e. for establishing ownership right, ensuring authorized
access and content authentication. The watermarking system can be implemented
using either of two general approaches. One approach is to transform the origi-
nal image into its frequency domain representation and embed the watermark data
therein. The second is to directly treat the spatial domain data of the host image to
embed the watermark.
According to Hartung [4] most proposed watermark method utilize the spatial
domain, this may be due to simplicity and efficiency. The spatial domain method
is about embedding watermark information directly into image pixels proposed
by [5]. These techniques embed the watermark in the LSB plane for perceptual
transparency which is relatively easy to implement but their significant disadvan-
tages includes the ease of bypassing the security they provide [5,6] and the inability
to lossy compression the image without damaging the watermark.
The methods [6, 7] extended the work to improve robustness and localization in
their technique, in which watermark is embedded by adding a bipolar M-Sequence
in the spatial domain and detection is via a modified correlation detector. But these
schemes were not very much capable to protect the watermark and also not resist
with lossy compression.
Regarding security and content authentication a new method [8] introduce the
concept of hash function, in which author insert the binary watermark into the LSB
of original image using one-way hash function. The technique is too sensitive since
the watermark is embedded into the LSB plane of the image and algorithm also does
not very resist with lossy compression. Thus the limitations of spatial domain meth-
ods are that, in general they are not robust to common geometric distortion and have
a low resistance to JPEG lossy compression. Therefore a scheme is required to fulfill
the existing gap in the use of watermarking and cryptographic techniques together.
In this paper, the robust and secure digital invisible watermarking scheme in
spatial domain is proposed. The proposed scheme combines the advantages of cryp-
tographic concept and imperceptibility feature of digital image watermarking. The
security and perceptual transparency are achieved by using cryptographic one-way
hash function and computation of threshold value respectively. For the robustness
the proposed technique does not depend upon perceptually significant regions;
rather it utilizes the concept of image key and buyer fingerprint generator. The
23 Image Index Based Digital Watermarking Technique 269
unique binary sequence serves as the buyer authenticator of a particular image. The
recovery of watermark also protects the copyrights.
The rest of the paper is organized as follows. Section 23.2 describes the concept
of generation of image key and buyer fingerprint. Section 23.3 explains the proposed
watermarking scheme with watermark insertion and extraction process. Experimen-
tal results with discussion and conclusion are given in Section 23.4 and Section 23.5
respectively.
The image key and generation of buyer fingerprint used in proposed scheme are
described as below.
Second is computationally infeasible to find any two distinct inputs that map to
same output referred as collision-resistant. The generation of Buyer fingerprint is
discussed in Section 23.3.
In this watermarking scheme, the original image, Im is divided into blocks based
on the image key. The intensity value of pixels in each block is modified depend-
ing upon the bit of watermark to get the watermarked image. Only those pixels
are modulated whose value is greater than the threshold, T .k/ for 1 k 2n .
The motivation for selecting the threshold is to increase the perceptual transparency
as it filters out the pixels having intensity values less than threshold. The thresh-
old is calculated for each block. The arithmetic mean of pixels for each group is
calculated separately which is taken as the threshold. In the extraction phase water-
marked blocks are compared with the original image block and the buyer fingerprint
is generated.
for the watermark bit ‘0’, no modification is recommended. The detailed algorithm
of watermark insertion is as given in the following steps:
1. For 1 k 2n
(a) Let T .k/ be the threshold of the block having index k.
(b) Suppose F .k/ is the kt h bit of watermark.
2. For 1 i 2x , 1 j 2y
(a) Let Im .i; j / be pixel intensity of original image at location .i; j /.
(b) Assume that .i; j / belongs to the block B.k/.
(c) If F .k/ D 0, then Wm .i; j / D Im .i; j /.
(d) If F .k/ D 1, then
(e) If Im .i; j / > T .k/, then Wm .i; j / D Im .i; j / C ˛ and d.k/ D d.k/ C ˛.
The factor ˛ is taken as positive throughout the insertion. The value of ˛ is chosen
so as to maintain the fidelity. The larger values will degrade the quality of the image.
From step 2e of algorithm it is clear that the factor d.k/ records the increase in the
value of intensities for each block.
The watermark insertion process is described in the block diagram as shown
in Fig. 23.1. The buyer signature, threshold and image key are the required input
attributes for watermark insertion.
Image Threshold
Key
Watermark Watermarked
Generation Image
Generation
Image
Key Threshold
Watermark
Watermarked Buyer
Extraction
Image Signature
Process
Owner
Claimed
Original
Image
Table 23.1 NCC values of images for some common image processing operations
Attacks Scaling Scaling C Compression MPF LPF Modify Cropping
image rotation
The simulation results have been produced on various sets of images. The orig-
inal gray-scale image of “Lena” of size 128 128 pixels is taken as shown in
Fig. 23.3a. Unique image key of size 128 128, and MSB array of 128 bits are
taken as input to MD5 hash function. The 128-bit string buyer fingerprint is gener-
ated and inserted in original image which produced the watermarked image of size
128 128 as shown in Fig. 23.3b. The result shows that the watermark is invisible.
The NCC value between the original image and the watermarked image is 0.99998.
However the watermark is extracted from watermarked image. The exact bit-wise
match between extracted watermark and the inserted buyer fingerprint identifies the
true buyer of the original image.
The effect of some attacks on the watermarked image is also shown in Table 23.1
and Fig. 23.1. Table 23.2 shows the bit-wise match of inserted buyer fingerprint and
extracted buyer fingerprint. The graph of the resultant values have been plotted as
shown in Fig. 23.4.
In the low pass filter attack, a mask of 3 3 is used. The NCC value of 0.96294
is obtained between the original and the modified watermarked image whereas it is
0.98461 for median-pass filter attack. The modified watermarked image is shown in
Fig. 23.3c. An exact match of 128 bits is obtained for both the filtering operations
as illustrated in Table 23.2. The watermarked image is scaled to twice of its size as
shown in Fig. 23.3d and the measured value of NCC is 0.9999. For the combined
274 S.S. Bedi et al.
attack of rotation of 17ı followed by resizing to the size of 128128 pixels, the NCC
value between original and modified watermarked image is 0.9878 (Fig. 23.3e).
The 126–127 bits are recovered for scaling and combined attack. The cropped and
randomly modified images are shown in Fig. 23.3f–g. In case of modification, the
watermarked image has been tampered at specific locations by changing the pixel
values. In case of severe manipulation to pixel intensities, a bit-wise match of 120–
23 Image Index Based Digital Watermarking Technique 275
Table 23.2 BIT-wise match between inserted and extracted buyer fingerprint (LENA IMAGE)
Attacks ˇ Scaling Scaling C rotation Compression MPF LPF Modify Cropping
Fig. 23.4 Wise match between the inserted and extracted buyer signature for the image, “Lena”
126 bits is obtained. With the use of damping factor of 0.9, exact 128 bits is obtained
for the buyer fingerprint and the value of NCC is 0.68474. In case of cropping the
NCC value becomes 0.66389. In a rigorously cropped image 3–4 bits of inserted
Buyer fingerprint are lost which can be recovered by using a damping factor of 0.9.
The robustness against lossy JPEG compression with quality factor 90 is demon-
strated in Fig. 23.3h and the NCC value 0.9843 is obtained with all the 128 bits of
Buyer fingerprint are recovered.
The NCC values in Table 23.1 shows that scaling attack and the combined attack
of scaling and rotation as the pixel values are not changed intensely at the center
of the image, so there is not much distortion in the NCC values that are obtained.
In the Low Pass Filter (LPF) attack only the pixels having lower intensities are
retrievable, so depending on the image if the LPF’s threshold for a block of the
image is higher than the threshold assumed in the watermark insertion algorithm
then the NCC value is higher than that obtained if the threshold of the LPF is lower
than the one assumed. In the Medium Pass Filter (MPF) attack only the pixels hav-
ing medium intensities are retrievable, so depending on the image if the threshold
assumed in the watermark insertion algorithm lies towards the lower range of the
MPF’s thresholds for a block of the image then the NCC value is higher than that
276 S.S. Bedi et al.
obtained if the assumed threshold lies towards the higher threshold of the MPF. In
the modification and cropping attacks as some of the pixels are tampered badly the
original pixel values cannot be identified. Therefore results demonstrates that pro-
posed scheme is more robust to geometric attacks and compression, whereas robust
to modification and cropping.
23.5 Conclusion
The proposed watermarking technique is for copyright protection and buyer finger-
printing. The image key and the unique watermark are the key features for sustaining
the security of algorithm. The watermark generated uses the property of the hash
function which is an important feature in making the watermark cryptographically
secure. When the positive value of ˛ is used the watermark increased the intensity
of an image block. An attacker who is known to the watermarking process might
intentionally utilize this fact and try to remove the watermark. For this purpose the
attacker must know the image key. The attacker cannot guess the secret image key as
the image key has been kept secret and the indexes have been generated randomly.
The watermark extraction algorithm rectifies the pixel values in case the attacker
increases or decreases the pixel values. An invalid image key would be unsuccessful
in forging the watermark. The technique survives with common image transforma-
tions as well as intentional attacks, maintaining the objective of buyer fingerprinting
and ownership claim.
References
1. W. Stallings, “Cryptography and Network Security: Principles and Practices,” Pearson Educa-
tion, Inc., NJ, 3rd Ed., 2005, ISBN 81–7808–902–5.
2. S. S. Bedi and S. Verma, “Digital Watermarking Technology: A Demiurgic Wisecrack Towards
Information Security Issues,” Invertis Journal of Science and Technology, vol. 1, no. 1,
pp. 32–42, 2007.
3. A. Kejariwal, “Watermarking,” Magazine of IEEE Potentials, October/November, 2003,
pp. 37–40.
4. F. Hartung and M. Kutter, “Multimedia Watermarking Techniques,” Proceedings of IEEE,
vol. 87, no. 7, pp. 1079–1106, July 1999.
5. M. Yeung and F. Mintzer, “Invisible watermarking for image verification,” Journal of Electric
Imaging, vol. 7, no. 3, pp. 578–591, July 1998.
6. R. Wolfgang and E. Delp, “Fragile watermarking using the VW2D watermark,” Proceedings of
the IS & T/SPIE Conference on Security and Watermarking of Multimedia Contents, pp. 204–
213, San Jose, CA, January 1999.
7. J. Fridrich, “Image watermarking for temper detection,” Proceedings of the IEEE International
Conference on Image Processing, vol. 2, pp. 404–408, Chicago, IL, October 1998.
8. P. W. Wong and N. Memon, “Secret and Public key Image Watermarking Schemes for Image
Authentication and Ownership Verification,” IEEE Transaction on Image Processing, vol. 10,
no. 10, October 2001.
Chapter 24
Reverse Engineering: EDOWA Worm Analysis
and Classification
Abstract Worms have become a real threat for computer users for the past few years.
Worm is more prevalent today than ever before, and both home users and system
administrators need to be on the alert to protect their network or company against
attacks. It is coming out so fast these days that even the most accurate scanners
cannot track all of the new ones. Indeed until now there is no specific way to classify
the worm. To understand the threats posed by the worms, this research had been
carried out. In this paper the researchers proposed a new way to classify the worms
which later is used as the basis to build up a system which is called as the EDOWA
system to detect worms attack. Details on how the new worm of classification which
is called as EDOWA worm classification is produced are explained in this paper.
Hopefully this new worm classification can be used as the basis model to produce a
system either to detect or defend organization from worms attack.
24.1 Introduction
Computer worm can caused millions dollars of damage by infecting hundreds and
thousands of host in a very short period of time. A computer worm is a computer
program or a small piece of software that has the ability to copy itself from machine
to machine. It uses computer networks and security holes to replicate itself. Com-
puter worm is classified as a highly threat to the information technology world.
McCarthy [1] defines computer worm as a standalone malicious code program that
copies itself across networks. Meanwhile Nachenberg [2] stated that the computer
worm is a program that is designed to copy itself from one computer to another,
dominate some network medium such as through email. The computer worm would
M. M. Saudi (B)
Faculty Science & Technology, Universiti Sains Islam Malaysia (USIM), Nilai, Malaysia
E-mail: madihah@usim.edu.my
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 277
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
278 M. M. Saudi et al.
For a new computer worm, the main purpose of analyzing it is to know the intention
of the code. As for computer worm that has been released in real time, the worm
analysis process is for verification of what the worm source code intention and to
verify as what has been published in anti virus website or CERT website or it might
be a new worm feature. The analysis techniques can be divided into two techniques,
either the static analysis or dynamic analysis. Before loading the computer worm
specimen into the machine, researchers make sure all the preparation and the veri-
fication has been done. While conducting the analysis, all the process is document
in writing. A written record of analytic techniques and the computer worm action is
useful in understanding how the computer worm works, tracing through its function
in a repeatable fashion and improving the worm analyst skills.
When all the preparation and the verification have been done, the lab is disconnected
from any production network. The USB thumb drive is used to transfer the computer
worm specimen onto the lab system.
Once the specimen already placed in the lab system, the analysis can be carried
out. To determine the purpose and the capabilities of this piece of code, researchers
can used either the static analysis or dynamic analysis.
Static analysis is also known as white box analysis. It involves analyzing and under-
standing source code. If only binary code available, the binary code has to be
compiled to get the source code. White box analysis is very effective in finding
programming errors and implementation errors in software and to know the flow
of the program. The static analysis looks at the files associated with the computer
worm on the hard drive without running the program. With static analysis, the gen-
eral idea of the characteristics and purpose of the computer worm can be analyzed.
The static analysis phase involves antivirus checking with research, analyzing the
strings, looking for scripts, conducting binary analysis, disassembling and reverses
compiling.
When computer worm specimen already copied to the testing machine, researchers
will run the antivirus to check if the antivirus installed detects anything. If the
antivirus detects the computer worm, check the name of the computer worm and
280 M. M. Saudi et al.
search it in any antivirus website for further information. If the computer worm is
in compressed or archived form, researchers will open the archive to get its con-
tent. The researchers need to verify if the information available from the antivirus
website is correct.
String Analysis
The strings that extracted from the computer worm could help the researchers to
know more about the computer worm characteristics. A few tools such as TDS3 and
Strings.exe (from Sysinternal) were used to extract the strings. The information that
could be retrieved from the extracted strings are consist of the worm specimen’s
name, user dialog, password for backdoors, URLs associated with the malware,
email address of the attacker, help or command-line options, libraries, function calls
and other executables used by the malware.
The language written for the computer worm can be identified based on strings
extracted from it. The following Table 24.1 gives some clues:
Disassemble Code
Disassemble and debugger is used to convert a raw binary executable into assembly
language for further analysis. Researchers use the tools that have been listed in
Appendix A to disassemble and debug the computer worm.
Dynamic analysis involves executing the computer worm and watching its actions.
The computer worm is activated on a controlled laboratory system.
Most computer worms read from or write to the file system. It might attempts to
write files, altering existed programs, adding new files or append itself to the file
system. By using tool such as Filemon all actions associated with opening, reading,
writing, closing and deleting files can be monitored.
Monitoring Process
The monitoring tool such as Prcview v3.7.3.1 or Process Explorer, displays each
running program on a machine, showing the details of what each process is doing.
With this kind of tool the files, registry keys and all of the DLLs that each process
has loaded can be monitored. For each running process, the tool displays its owner,
its individual privileges, its priority and its environment variables.
From a remote machine which will be in the same LAN with the infected testing
machine, the port scanner, Nmap program and a sniffer will be installed. The port
scanner and Nmap program are used to monitor the listening port. A sniffer will
be installed to sniff the worm traffic. All of the related tools like Ethereal, NeWT
and TDS-3 use the sniffer. Using the sniffer, details of individual packets and all
packets transmitted across the LAN can be monitored. As for the local network
monitoring tool (TDIMon), it will monitor and records all request to use the net-
work interface and show how the worm grabbed the network resources and used
them.
The computer worm might have placed the network interface in promiscu-
ous (broadcast) mode which allowed it to sniff all packets from LAN. To deter-
mine if the infected machine in promiscuous mode state of the interface, run the
Promiscdetect.exe tool.
The registry need to be monitored as the registry is the hierarchical database con-
taining the configuration of the operating system and most programs installed on the
machine. The monitoring registry access can be done by using the Regmon.
After the analysis done in the laboratory, it leads researchers to produce a new
classification for the EDOWA system. A proposal of the classification of worm is
made. This classification is based on several factors: infection, activation, payload,
282 M. M. Saudi et al.
24.3.1 Infection
Infection is the phase on how a computer gets infected by a worm. From the research
of eight classifications that is available, we found out that only one research made
infection as the first phase. Albanese et al. [6] says that infection refers to how a
worm gains initial control of a system. Worms rely on two general methods to infect
a host. Either they exploit an error in software running on a system, or they are the
result of some action taken by a user. Here we proposed two techniques:
24 Reverse Engineering: EDOWA Worm Analysis and Classification 283
24.3.1.1 Host
Host is a mechanisms needed by the worm copy itself to a new systems that are not
yet been infected. It cannot propagate autonomously across the network. Host com-
puter worms where the original terminates itself after launching a copy on another
host so there is only one copy of the worm running somewhere on the network at
any given moment. It requires human help to move from one machine to another.
CD, Floppy Disks, USB (thumb-drive and external hard disk) and File are the most
common host available now.
24.3.1.2 Network
Network is the fastest way in moving worm. It consists of multiple parts, each run-
ning on different machines and possibly performing different actions also using
the network for several communication purposes. Propagating from one machine
to another is only one of those purposes. It can infect a computer without human
interaction. Most simply copy themselves to every computer with which the host
computer can share data. Most Windows networks allow machines within defined
subgroups to exchange data freely, making it easier for a worm to propagate itself.
24.3.2 Activation
24.3.2.1 No Activation
Worm with no activation will just stay in the computer doing nothing. It just used
up some hard disk space.
Human trigger is the slowest activation mechanisms. Usually this approach use
worm that propagates using emails. The social engineering technique is used to
attract user to click on the file to activate the worm [7]. According to Christoffersen
et al. [8], some worms are activated when the user performs some activity, like reset-
ting the machine, logging onto the system and thereby running the login scripts or
executing a remotely infected file. Evidently, such worms do not spread very rapidly.
284 M. M. Saudi et al.
Based on Weaver et al. [4], the second fastest worms activate is by using scheduled
system processes. Schedule process is an activation that is based on specific time
and date. Many desktop operating systems and applications include auto-updater
programs that periodically download, install and run software updates.
The worms that are fastest activated are able to initiate their own execution by
exploiting vulnerabilities in services that are always on and available (e.g., Code
Red [9] exploiting IIS Web servers) or in the libraries that the services use (e.g.,
XDR [10]). Such worms either attach themselves to running services or execute
other commands using the permissions associated with the attacked service.
Hybrid Launch uses the combination of two or more activation mechanism to launch
a worm. ExploreZip [2] is an example of a hybrid-launch worm. ExploreZip send
an e-mail that required a user to launch the infected attachment to gain control of
the system. Once running on the computer system, ExploreZip would automati-
cally spread itself to other computers over the peer-to-peer network. These targeted
machines would then become infected on the next reboot without any known user
intervention.
24.3.3 Payload
24.3.3.1 No Payload
Worm with no payload does not do any harm to the computer system. This kind
of worm will just propagate without infecting any destructive mechanisms to the
computer.
24 Reverse Engineering: EDOWA Worm Analysis and Classification 285
24.3.3.4 Destructive
This will do harm to the machine or the host. According to Shannon et al. [12],
Witty worm deletes a randomly chosen section of the hard drive, which, over time,
renders the machine unusable.
24.3.3.5 Phishing
one research have operating algorithms in their classification. Albanese et al. [6]
classified it as survival. Operating algorithms are the mathematical and logical ways
that a worm attempts to avoid detection. It can be categorized as:
24.3.4.1 Polymorphic
A polymorphic worm is a worm that changes all part of their code each time they
replicate, this can avoid scanning software. Kruegel et al. [14] paper defined poly-
morphic worms as a worm that is able to change their binary representation as part
of the spreading process. It can be achieved by using self-encryption mechanisms or
semantics-preserving code manipulation techniques. As a consequence, copies of a
polymorphic worm might no longer share a common invariant substring of sufficient
length and the existing systems will not recognize the network streams containing
the worm copies as the manifestation of a worm outbreak.
24.3.4.2 Stealth
Terminate and stay resident (TSR) worm exploit a variety of techniques to remain
resident in memory once their code has been executed and their host program has
terminated. These worms are resident or indirect worm, known as such because they
stay resident in memory, and indirectly find files to infect as they are referenced by
the user.
Anti anti-virus will corrupt the anti-virus software by trying to delete or change
the anti-virus programs and data files so the anti-virus does not function properly.
According to Nachenberg [16], anti anti-virus or are usually called retroviruses, are
computer viruses that attack anti-virus software to prevent themselves from being
detected. Retroviruses delete anti-virus definition files, disable memory resident
anti-virus protection and attempt to disable anti-virus software in any number of
ways.
24 Reverse Engineering: EDOWA Worm Analysis and Classification 287
24.3.5 Propagation
Propagation is defined as a worm that spread itself to another host or network. After
researching the propagation issue, we strongly believe that there are two ways for a
worm to reproduce itself: scanning and passive.
24.3.5.1 Scanning
Scanning is a method used by worms to find their victims. We strongly agree with
the method proposed by Weaver et al. [4]. There are two possible scanning method
that is random scanning and sequential scanning.
Random Scanning
It is the most popular method where the worm simply picks a random IP address
somewhere in the Internet Address space and then tries to connect to it and infect it.
Example of a random scanning worm is the Blaster [11] that picks a random num-
ber to determine whether to use the local address it just generated or a completely
random one.
The worm releaser scans the network in advance and develops a complete hitlist
of all vulnerable systems on the network. According to Staniford [17], the worm
carries this address list with it, and spreads out through the list.
Passive
Worms using a passive monitoring technique are not actively searching for new
victims. Instead, they are waiting for new targets to contact them or rely on the user
to discover new targets. Christoffersen et al. [8] says passive worms tend to have a
slow propagation rate, they are often difficult to detect because they generate modest
anomalous reconnaissance traffic.
24.4 Conclusion
This new classification is produced based on the research and testing that have been
done in the laboratory. The classifications are divided into five main categories:
Infections, Activation, Payload, Operating Algorithms and Propagation. Efficient
288 M. M. Saudi et al.
References
1. L. McCarthy, “Own Your Space: Keep Yourself and Your Stuff Safe Online (Book),” Addison-
Wesley, Boston, MA, 2006.
2. C. Nachenberg, “Computer Parasitology,” Proceedings of the Ninth International Virus
Bulletin Conference, September/October 1999, pp. 1–25.
3. J. Nazario, “Defense and Detection Strategies against Internet Worms” (BOOK), Artech House
Inc., 2003. Or a paper entitles “The Future of Internet Worm” by Nazario, J., Anderson, J.,
Wash, R., and Connelly, C. Crimelabs Research, Norwood, USA, 2001.
4. N. Weaver, V. Paxson, S. Staniford and R. Cunningham, “A Taxonomy of Computer
Worms,” Proceedings of the ACM CCS Workshop on Rapid Malcode (WORM), pp. 11–18,
2003.
5. D.M. Kienzle and M.C. Elder, “Recent Worms: A Survey and Trends,” Proceedings of the
ACM CCS Workshop on Rapid Malcode (WORM), pp. 1–10, 2003.
6. D.J. Albanese, M.J. Wiacek, C.M. Salter and J.A. Six, “The Case for Using Layered Defenses
to Stop Worms (Report style),” UNCLASSIFIED-NSA Report, pp. 10–22, 2004.
7. C.C. Zou, D. Towsley and W. Gong, “Email worm modeling and defense,” Computer
Communications and Networks, ICCCN 2004, pp. 409–414, 2004.
8. D. Christoffersen and B.J. Mauland, “Worm Detection Using Honeypots (Thesis or Disser-
tation style),” Master dissertation, Norwegian University of Science and Technology, June
2006.
9. H. Berghel, “The Code Red Worm: Malicious software knows no bounds,” Communication of
the ACM, vol. 44, no. 12, pp. 15–19, 2001.
10. CERT. CERT Advisory CA-2002–25 Integer Overflow in XDR Library, http://www.
cert.org/advisories/ca-2002–25.html
11. M. Bailey, E. Cooke, F. Jahanian, D. Watson and J. Nazario, “The Blaster Worm: Then and
Now,” IEEE Security & Privacy, vol. 3, no. 4, pp. 26–31, 2005.
12. C. Shannon and D. Moore, “The Spread of the Witty Worm,” IEEE Security & Privacy, vol. 2,
no. 4, pp. 36–50, 2004.
13. A. Tsow, “Phishing With Consumer Electronics: Malicious Home Routers,” 15th International
World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 2006.
14. C. Kruegel, E. Kirda, D. Mutz, W. Robertson and G. Vigna, “Polymorphic Worm detection
using structural information of executables,” 8th International Symposium on Recent Advances
in Intrusion Detection (RAID), 2005.
15. S.G. Cheetancheri, “Modeling a computer worm defense system (Thesis or Dissertation
style),” Master dissertation, University of California, 1998.
16. C. Nachenberg, “The Evolving Virus Threat,” 23rd NISSC Proceedings, Baltimore, Maryland,
2000.
17. S. Staniford, President of Silicon Defense. “The Worm FAQ: Frequently Asked Questions on
Worms and Worm Containment,” The Worm Information Center, 2003
Chapter 25
Reconfigurable Hardware Implementation
of a GPS-Based Vehicle Tracking System
Abstract In this chapter, we build on a recently produced VTS (The Aram Locator)
offering a SOC replacement of the microcontroller-based implementation. Although
the microcontroller-based system has acceptable performance and cost, an FPGA-
based system can promise less cost and a more cohesive architecture that would
save processing time and speeds up system interaction. The performance of the
proposed implementations is evaluated on different FPGAs. The suggested designs
show enhancement in terms of speed, functionality, and cost.
25.1 Introduction
After a great evolution, reconfigurable systems fill the flexibility, performance and
power dissipation gap between the application specific systems implemented with
hardwired Application Specific Integrated Circuits (ASICs) and systems based on
standard programmable microprocessors. Reconfigurable systems enable extensive
exploitation of computing resources. The reconfiguration of resources in different
parallel topologies allows for a good matching with the inherent intrinsic parallelism
of an algorithm or a specific operation. The reconfigurable systems are thus very
well-suited in the implementation of various data-stream, data-parallel, and other
applications.
The introduction of a new paradigm in hardware design called Reconfigurable
Computing (RC ) offers to solve any problem by changing the hardware configu-
rations to offer the performance of dedicated circuits. Reconfigurable computing
I. Damaj (B)
Electrical and Computer Eng’g Dept, Dhofar University, P.O. Box 2509, 211 Salalah, Oman
E-mail: i damaj@du.edu.om
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 289
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
290 A. Yaqzan et al.
enables mapping software into hardware with the ability to reconfigure its con-
nections to reflect the software being run. The ability to completely reprogram the
computer’s hardware implies that this new architecture provides immense scope for
emulating different computer architectures [1, 2].
The progression of field programmable gate arrays (FPGAs) RCs has evolved to
a point where SOC designs can be built on a single device. The number of gates and
features has increased dramatically to compete with capabilities that have tradition-
ally been offered through ASIC devices only. FPGA devices have made a significant
move in terms of resources and performance. The contemporary FPGAs have come
to provide platform solutions that are easily customizable for system connectivity,
digital signal processing (DSP), and data processing applications. Due to the impor-
tance of platform solutions, leading FPGA vendors are coming up with easy to-use
design development tools [3, 4].
As the complexity of FPGA-based designs grows, a need for a more efficient
and flexible design methodology is required. Nowadays, hardware implementations
have become more challenging, and the device densities have increased at a pace that
such flows have become cumbersome and outdated. The need for a more innovative
and higher-level design flow that directly incorporates model simulation with hard-
ware implementation is needed. One of the modern tools (also used in the proposed
research) is Quartus II, started with Altera. Quartus II is a compiler, simulator,
analyzer and synthesizer with a great capability of verification and is chosen to be
used for this implementation. It can build the verification file from the input/output
specification done by the user. Quartus II design software provides a complete, mul-
tiplatform design environment that easily adapts to your specific design needs. It is
a comprehensive environment for system-on-a-programmable-chip (SOPC) [5, 6].
An application that needs real-time, fast, and reliable data processing is GPS-
based vehicle tracking. In this chapter, we build on a recently produced VTS (The
Aram Locator) offering a SOC replacement of the microcontroller-based implemen-
tation. Although the microcontroller-based system has acceptable performance and
cost, an FPGA-based system can promise less cost and a more cohesive architecture
that would save processing time and speeds up system interaction.
This chapter is organized so that Section 25.2 presents the currently avail-
able existing microprocessor-based VTS and its proposed update. Section 25.3
presents different designs and implementations with different levels of integration.
In Section 25.4, performance analysis and evaluation of results are presented.
Section 25.5 concludes the chapter and sheds light by summarizing the achieve-
ments described in the chapter and suggesting future research.
One recently implemented VTS is the Aram Locator [5, 7]. It consists of two main
parts, the Base Station (BS ) and the Mobile Unit (MU ). The BS consists of a PIC
Microcontroller based hardware connected to the serial port of a computer. The MU
25 Reconfigurable Hardware Implementation of a GPS-Based Vehicle Tracking System 291
The microcontrollers make use of the same memory using a specific protocol.
The system is performing properly and has a big demand in the market. However,
FPGAs promise a better design, with a more cohesive architecture that would save
processing time and speeds up system interaction.
The two microcontrollers along with memory would be incorporated into or bet-
ter supported with a high-density PLD. This will transform the hard slow interface
between them into a faster and reliable programmable interconnects, and therefore
makes future updates simpler. This design estimated to save a considerable per-
centage of the overall cost of one working unit. For a large number of demands,
there would be a significant impact on production and profit. Hence, PLDs such as
FPGAs are the way, for a better design in terms of upgradeability and speed, and it
292 A. Yaqzan et al.
is a promising advancement for the production cost and revenue. The block diagram
of the FPGA system is depicted in Fig. 25.3.
Hiding the detailed architecture of the underlying FPGA, The proposed system
is of two communicating processes, P1 and P 2, along with a shared memory. In
addition to the FPGA-based system, the GPS antenna and the mobile unit play sig-
nificant roles. The memory block of the microcontroller-based design is replaced by
hardware entity controlled by the I 2C .
25 Reconfigurable Hardware Implementation of a GPS-Based Vehicle Tracking System 293
This process has to deal with the message received from the GPS. The default com-
munication parameters for NMEA (the used protocol) output are 9,600 bps baud rate,
eight data bits, stop bit, and no parity. The message includes information messages
as shown in Table 25.1.
$GPGGA,161229.487,3723.2475,N,12158.3416,W,1,07,1.0,9.0,M,,,,0000*18
$GPGLL,. . . $GPGSA,. . . $GPGSV,. . . $GPGSV,. . .
$GPRMC,161229.487,A,3723.2475,N,12158.3416,W,0.13,309.62,120598,*10,
$GPVTG,. . . $GPMSS,. . . $GPZDA,. . .
From these GPS commands, only necessary information is selected (i.e. longi-
tude, latitude, speed, date, and time). The data needed is found within the commands
RMC and GGA; others are of minor importance to the FPGA. The position of the
needed information is located as follows:
$GPRMC: < time > , < validity > , < latitude > , latitude
hemisphere, < longitude > , longitude hemisphere, < speed > , < course over
ground > , < date > , magnetic variation, check sum [7]
$GPGGA, < date > , latitude, latitude hemisphere, longitude, longitude hemi-
sphere, < GPS quality > , < # of satellites > , horizontal dilution, < altitude > ,
Geoidal height, DGPS data age, Differential reference, station Identity (ID), and
check sum. This information is stored in memory for every position traversed.
Finally and when the VTU reaches its base station (BS), a large number of positions
is downloaded to indicate the route covered by the vehicle during a time period and
with a certain download speed.
Initially, a flag C is cleared to indicate that there’s no yet correct reception of
data. The first state is “Wait for GPS Parameters”, as mentioned in the flow chart,
there’s a continuous reception until consecutive appearance of the ASCII codes of
“R, M, C” or “GGA” comes in the sequence. For a correct reception of data, C is
set (i.e. C D “1”), indicating a correct reception of data, and consequently make
the corresponding selection of parameters and saves them in memory. When data
294 A. Yaqzan et al.
storing ends, there is a wait state for the I 2C interrupt to stop P1 and start P 2,
P 2 download the saved data to the base station (BS). It is noted that a large num-
ber of vehicles might be in the area of coverage, and all could ask for reserving the
channel with the base station; however, there are some predefined priorities that are
distributed among the vehicles and therefore assures an organized way of commu-
nication. This is simply achieved by adjusting the time after which the unit sends its
ID when it just receives the word “free”.
The Base station is continuously sending the word “free”, and all units within the
range are waiting to receive it and acquire communication with the transceiver. If the
unit received the word “free”, it sends its ID number, otherwise it resumes waiting. It
waits for acknowledge, if Acknowledge is not received, the unit sends its ID number
and waits for feedback. If still no acknowledgement, the communication process
terminates, going back to the first step. If acknowledge is received, process 2 sends
Interrupt to process 1, the latter responds and stops writing to memory.
Process 2 is then capable of downloading information to the base station. When
data is transmitted, the unit sends the number of points transmitted, to be compared
with those received by the base station. If they didn’t match, the unit repeats down-
loading its information all over again. Otherwise, if the unit receives successful
download, it terminates the process and turns off.
Initially, the circuit shown in Fig. 25.4 is off. After car ignition, current passes
through D1, and continues its way towards the transistor. This causes the relay to
switch and supports an output voltage of 12 V. The circuit (C ) is now powered and
could start its functionality. Using the two regulators, it becomes feasible to provide
an adequate voltage to the FPGA, which in turn navigates the whole switching tech-
nique of the system. In other words, the FPGA adapts itself so that it can either put
a zero or 5 V at the side connecting D2. For the 5 V, the circuit is all on, and the
vehicle is in normal functionality. When data download ends, the FPGA perceives
that, and changes the whole circuit into an idle one, and waits for another car igni-
tion. So, it is well known now that the FPGA will be the controller of the behavior
of the VTS system.
The suggested memory blocks are addressed by a 12-bit address bus and stores 8-bit
data elements. This means that the memory can store up to 4 KB of data. The mem-
ory controller navigates the proper memory addressing. Multiplexers are distributed
along with the controller to make the selection of the addressed memory location
and do the corresponding operation (See Fig. 25.5).
25 Reconfigurable Hardware Implementation of a GPS-Based Vehicle Tracking System 295
The I 2C bus is a serial, two-wire interface, popularly used in many systems because
of its low overhead. It is used as the interface of process 1 and process 2 with the
shared memory. It makes sure that only one process is active at a time, with good
reliability in communication. Therefore, it writes data read from the GPS during
process 1, and reads from memory to output the traversed positions into the base
station. The Universal Asynchronous Receiver Transmitter (UART) is the most
widely used serial data communication circuit ever. UART allows full duplex
communication over serial communication links as RS232. The UART is used to
interface Process 1 and the GPS module from one side, and Process 2 and the Base
Station (BS) from the other side.
Three different systems are to be tested for the FPGA implementation. The sug-
gested systems gradually add more parts to the designed FPGA implementation till
we reach a complete stand alone system. The three suggested integrations are as
follows:
According to the local market cost, around 9.6% could be saved per unit if the
FPGA-based all-in-one system is adopted. For the kind of memory implemented in
the system, the vehicle cannot store many locations, so the Vehicle Tracking System
(VTS) is to be used within a small city. If the sampling rate is to be set one reading
every 2 min, one could get a general but not very specific overview of the tracks
traversed.
The vehicle tracking needs 344 bits of data to store the five important parameters
(longitude, latitude, speed, date, and time). Consequently, this would need 43 mem-
ory locations. In other words, the system needs 43 locations for one reading lapsed
2 min.
With 4,096 memory locations, the system makes 95.25 readings, meaning that
the vehicle should come back to its base station every 3 h and 10 min to download
its information. This would be 4 h and 45 min if the rate is one reading every 3 min.
This is not very satisfactory but is tolerated as long as the intended area has a small
range. However, this is one problem subject to upgradeability. One solution is to use
an FPGA with a large memory.
Table 25.2 shows the results of simulation done on integration of parts (mod-
ules) forming the FPGA-based system. Each integrated system is tested on the
component, integration, and system levels.
25 Reconfigurable Hardware Implementation of a GPS-Based Vehicle Tracking System 297
Table 25.2 Results summary taken from the syntheses of integrations on STRATIX
EP1S10F484C5. The execution time in some cases varies according to “RMC”, “GGA”, or “free”
Integration % Area in Prop. delay Execution time Max. op. freq.
logic elem. (ns) (ns) (MHz)
First 18 118:77 Varies 8:149
Second 50 116:00 Varies 8:62
Third 50 94:47 Varies 10:58
Table 25.3 The readings obtained from the integrations compiled on STRATIX EP1S10F484C5
Integration Size (bits) Number of Prop. delay Speed of
clock cycles (ns) processing (μs)
First 1,448 181 118:773 21.497
Second 1,448 181 116:003 20.996
Third 1,448 181 94:474 17.099
The first design used 1,910 logic elements out of 10,570 (STRATIX EP1S10
F484C5, 175.47 MHz), and a maximum operating frequency of 8.149 MHz, leav-
ing 82% free space of the FPGA capacity. However, after adding memory to the
integration, the number of logic elements increased to 5,303, with 50% usage of the
capacity. The propagation delay decreased slightly inside the FPGA, The decrease in
propagation delay means that the optimizer found a better way to reduce its critical
path.
Similar results are shown when the UART part is added (standalone FPGA), with
an improvement in propagation delay. Although the number of logic elements has
increased, it contributed to better interaction among the parts and raised the operat-
ing frequency to 10.58 MHz. Therefore, integration of parts has enhanced the delay
with an expected increase in number of logic elements positively affects the process-
ing speed when propagation finds its paths among combinations of gates. Suppose
that the GPS message “M” received by the UART has come in the following
sequence:
$GPGGA,161229.487,3723.2475,N,12158.3416,W,1,07,1.0,9.0,M,,,,0000*18
$GPGLL,3723.2475,N,12158.3416,W,161229.487,A*2C
$GPRMC,161229.487,A,3723.2475,N,12158.3416,W,0.13,309.62,120598,*10
“M” is to be tested on the three obtained integrations (Table 25.3), taking into
account that the GPS parameters needed for the application come as represented
in Section 25.3. The system takes selective parameters according to their position
in the sequence, and checks if they correspond to the desired information. Every
character is represented by its 8-bit ASCII representation. Table 25.3 shows an exact
interpretation of data when received from the GPS and processed via several mod-
ules of the integration. After integrations have been inspected, the proposed system
synthesized on different FPGAs, and the results appear in Table 25.4 [7, 8].
298 A. Yaqzan et al.
Table 25.4 Syntheses of the VTS on different FPGAs. The execution time in some cases varies
according to “RMC”, “GGA”, or “free”
FPGA Logic area in Prop. delay Exec. time Max. operating
logic elements (ns) (ns) frequency (MHz)
STRATIX EP1S10F484C5 50% 94:474 Varies 10:58
STRATIX-II EP2S15F484C3 36% 151:296 Varies 6:609
MAX3000A EPM3032ALC44–4 Doesn’t fit NA NA NA
Cyclone EP1C6F256C6 90% 90:371 Varies 11:06
FLEX6000 EPF6016TI144–3 Doesn’t fit NA NA NA
APEXII EP2A15B724C7 30% 181:418 Varies 5:512
From the readings of Table 25.4, the following could be concluded testing the
all-in-one system:
STRATIX EP1S10F484C5 (175.47 MHz) has enough number of logic elements,
and the capacity taken by the project is one half its total capacity. The propagation
delay is 94.474 ns, thus, the system runs with a frequency of 10.58 MHz.
STRATIX-II EP2S15F484C3 (420 MHz) has a larger number of logic elements,
so the project took lesser capacity (36%), and caused more propagation delay.
The project fits 90% in Cyclone EP1C6F256C6 (405 MHz), but with minimum
propagation delay of 90.371 ns and thus 11.065 MHz operating frequency.
APEXII EP2A15B724C7 (150 MHz) has the largest capacity among the listed
devices allocating 30% only for the ARAM project with a largest propagation
delay (181.418 ns) and minimum frequency (5.512 MHz).
25.5 Conclusion
References
26.1 Introduction
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 301
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
302 Y.-x. Lai, Z.-h. Liu
order to detect those same instances in the future [1]. Security products such as
virus scanners are examples of such application. While this method yields excel-
lent detection rates for existing and previously encountered malicious executables,
it lacks the capacity to efficiently detect new unseen instances or variants. Due
to detect malicious accurately is a NP problem [2, 3], heuristic scanners attempt
to compensate for this lacuna by using more general features from viral code,
such as structural or behavioral patterns [4]. Although proved to be highly effec-
tive in detecting unknown malicious executables, this process still requires human
intervention.
Recently, attempts to use machine learning and data mining for the purpose of
identifying new or unknown malicious executables have emerged. Schultz et al.
examined how data mining methods can be applied to malicious executables detec-
tion [5] and built a binary filter that can be integrated with email servers. Kolter et al.
used data mining methods, such as Naı̈ve Bayes, J48 and SVM to detect malicious
executables [6]. Their results have improved the performance of these methods.
Bayes or improved Bayes algorithm has the capability of unknown malicious detec-
tion, but it spends more time to study. A new improved algorithm (half-increment
Bayes algorithm) is proposed in this paper.
In this paper, we are interested in applying data mining methods to malicious
executables detection, and in particular to the problem of feature selection. Two
main contributions will be made through this paper. We will show how to choose
features which are most representative properties. Furthermore, we propose a new
improved algorithm and will show that our method achieve high learning speed
and high detection rates, even on completely new, previously unseen malicious
executables.
The rest of this paper is organized as follows: Section 26.2 is a brief discussion
of related works. Section 26.3 gives a brief description of Bayesian algorithm. Sec-
tion 26.4 presents details of our methods to obtain high learning speed. Section 26.5
shows the experiment results. Lastly, we state our conclusions in Section 26.6.
At IBM, Kephart et al. [7] proposed the use of Neural Networks to detect boot
sector malicious binaries. Using a Neural Network classifier with all bytes from
the boot sector malicious code as input, it had shown that 80–85% of unknown
boot sector malicious programs can be successfully identified with low false pos-
itive rate (<1%). The approach for detecting boot-sector virus had incorporated
into IBM’s Anti-Virus software. Later, Arnold et al. [8, 9] applied the same tech-
niques to win32 binaries. Motivated by the success of data mining techniques in
network intrusion system [10, 11], Schultz et al. [5] proposed several data min-
ing techniques to detect different types of malicious programs, such as RIPPER,
Naı̈ve Bayes and Multi-Naı̈ve Bayes. The authors collected 4,301 programs for the
Windows operating system and used MacAfee Virus Scan to label each as either
26 Unknown Malicious Identification 303
malicious or benign. The authors concluded that the voting naı̈ve-Bayesian classifier
outperformed all other methods. In a companion paper [12] the authors developed
an Unix mail filter that detect malicious Windows executables based on the above
work. Kolter et al. [6] also used data mining methods, such as Naı̈ve Bayes, J48 and
SVM to detect malicious codes. The authors gathered 1,971 benign and 1,651 mali-
cious codes and encoded each as a training example using n-grams of byte codes
as features, boosted decision trees outperformed other methods with an area under
the ROC curve of 0.996 (applied detectors to 291 malicious executables discovered,
and boosted decision trees achieved a TP rate of 0.98 for a desired FP rate of 0.05).
Zhang et al. [13] used SVM and BP neural network to virus detection, the D-S the-
ory of evidence was used to combine the contribution of each individual classifier
to give the final decision. It showed that the combination approach improves the
performance of the individual classifier significantly. Zhang et al. [14, 15] estab-
lished methods based on fuzzy pattern and K-nearest neighbor recognition applying
to detect malicious executables for the first time.
The goal of our work was to improve a standard data mining technique to compute
accurate detectors for new binaries. We gathered a large set of programs from public
sources and separated the problem into two classes: malicious and benign executa-
bles. We split the dataset into two subsets: the training set and the test set. The data
mining algorithms used the training set while generating the rule sets. We used a
test set to check the accuracy of the classifiers over unseen examples.
In a data mining framework, features are properties extracted from each example
in the data set – such as strings or byte sequences. A data mining classifier trained
with features can use to distinguish between benign and malicious programs. We
used strings that are extracted from the malicious and benign executables in the data
set as features.
We propose an exhaustive search for strings. Typically there are many “disorder-
words” when files are read as ASCII code, like “autoupdate.exe ?soft-
ware*abbbbbb download”, etc. The string selection program extracts consec-
utive printable characters from files. To avoid yielding a large number of unwanted
features, characters are filtered before they are recorded. Our feature selection
involves an extraction step followed by an elimination step.
In string extraction step, we scan files and read character one by one, record con-
secutive printable characters (like English letter, number, symbolic and etc.), and
construct lists of all the strings. The length of the string is specified as a number
304 Y.-x. Lai, Z.-h. Liu
of characters. The shorter the length, the more likely the feature is to have general
relevance in the dataset. But a short length will yield a larger number of features.
Extracted strings from an executable are not very robust as features because some
strings are unmeaning, like “abbbbbb”, so we select a subset of strings as feature set
by an eliminate step.
Many have noted that the need for a feature filter is to make use of conventional
learning methods [13,14], to improve generalization performance, and to avoid over-
fitting. Following the recommendation of those, the glossary filter criterion is used
in the paper to select a subset of strings.
The glossary is a computer glossary which includes 7,336 items:
Computer words, such as function name, API name
Abbreviations on computer networks, like “QQ”, “msn”, etc
Postfixs, like “.dll”, presenting dynamic link libraries
The naive Bayes classifier computes the likelihood that a program is malicious given
the features that are contained in the program. We treat each executable’s features as
a text document and classified based on that. Specifically, we want to compute the
class of a program given that the program contains a set of features F . We define
C to be a random variable over the set of classes: benign, and malicious executables.
That is, we want to compute P .C jF /, the probability that a program is in a certain
class given the program contains the set of features F . We apply Bayes rule and
express the probability as:
P . F jC / P .C /
P . C jF / D (26.1)
P .F /
To use the naı̈ve Bayes rule we assume that the features occur independently from
one another. If the features of a program F include the features F1 ; F2 ; F3 : : :; Fn ,
then Eq. (26.1) becomes:
Qn
P . Fi jC / P .C /
P . C jF / D i D1Qn (26.2)
j D1 P Fj
In Eq. (26.3), we use maxC to denote the function that returns the class with the
highest probability. Most Likely Class is the class in C with the highest probability
and hence the most likely classification of the example with features F:
To train the classifier, we record how many programs in each class contained each
unique feature. We use this information to classify a new program into an appropri-
ate class. We first use feature extraction to determine the features contained in the
program. Then we apply Eq. (26.3) to compute the most likely class for the program.
The Naı̈ve Bayes algorithm requires a table of all features to compute its prob-
abilities. This method requires a machine with one gigabyte of RAM, because the
size of the binary data was too large to fit into memory.
To update the classifier, when new programs are added to the training set, we
update feature set at first, and apply Eq. (26.3) to compute the most likely class for
the program again. So it is time-consuming to update the classifier by NB algorithm.
To correct NB algorithm problem the training set is divided to smaller pieces that
would fit in memory. For each set we train a Naı̈ve Bayes classifier. Each classifier
gives a probability of a class C given a set of strings F which the Multi-Naı̈ve Bayes
uses to generate a probability for class C given F over all the classifiers.
For each classifier, the probabilities in the rules for the different classifiers may
be different because the underlying data the each classifier is trained on is different.
The prediction of the Multi-Naı̈ve Bayes algorithm is the product of the predictions
of the underlying Naı̈ve Bayes classifier.
Y
n
P .C jF / D Pk .C jF / (26.4)
kD1
When new programs are added to the training set, these new programs as a subset
and train a Naı̈ve Bayes classifier over the subproblem. Based on the Eq. (26.4), we
update the probability. The Multi-Naı̈ve Bayes algorithm does not need to compute
its probabilities over all the training set, but the accuracy of the classifier will be
worsen.
306 Y.-x. Lai, Z.-h. Liu
The above NB and MNB algorithms obtain feature set over all training set or subset
at first. Once the final feature set was obtained, we represent our positive and zero
data in the feature space by using, for each feature, “1” or “0” to indicate whether
or not the feature is present a given executable file. The probability for a string
occurring in a class is the total number of times it occurred in that class’s training
set divided by the total number of times that the string occurred over the entire
training set.
We can derive a method purely from the NB algorithm for increment update. In
our method, feature set is increased with studying of classifier. That is, composed
there are k1 string features extracted from the first sample, so k1 strings are elements
of set F . If there are S2 strings extracted from the second sample, k2 elements are
not found in F , these elements should be added in feature set F . Set F will includes
k1 C k2 elements. Classifier is trained based on the evolutional feature set.
ˇ
Claim 1 We can obtain the class-conditional probability P F .nC1/ ˇC over
n C 1 samples by that h
of former
ˇ n samples andi the .n C 1/th sample, that is
ˇ .n/ ˇ
P Fi ˇCj n C P . x .nC1/ jCj / ˇ
.nC1/ ˇ
P Fi ˇCj D nC1
. Where, P F .nC1/ ˇ C is the
ˇ
class-conditional probability over n Cˇ 1 samples, P F .n/ ˇC is class-conditional
probability over n samples, P x .nC1/ ˇC is class-conditional probability over the
.n C 1/th sample.
Proof. Composed there are a features obtained from n samples, that is F .n/ D
fF1 ; F2 : : :; Fa g, then
ˇ Y
a
ˇ
P F .n/ ˇC D P Fi jCj .j D 1; 2/ (26.5)
i D1
P .Fi Cj / count.F D Fi ^ C D Cj /
P Fi jCj D D (26.6)
P .Cj / count.C D Cj /
ˇ count..F D Fi.n/ / ^ .C D Cj //
ˇ
P Fi.n/ ˇCj D (26.7)
n
If the (n C 1)th sample for class C D Cj was added, there are two cases.
Case 1. the strings in the (n C 1)th sample are all found in F , then
ˇ count..F D Fi.n/ / ^ .C D Cj //
ˇ
P Fi.nC1/ ˇCj D (26.8)
nC1
26 Unknown Malicious Identification 307
Case 2. there are b strings in the (n C 1)th sample are not found in F , then these
ˇ D fF1 ; F2 : : :; Fa ; FaC1 ; ; FaCb g. For those
.nC1/
strings are added in F , that is F
ˇ
new b probabilities, due to P Fi.n/ ˇCj D 0 a < i a C b, then
ˇ
ˇ 1
P Fi.nC1/ ˇCj D a <i aCb (26.9)
nC1
We rewrite Eqs. (26.8) and (26.9) as Eq. (26.10):
h ˇ ˇ i
ˇ
ˇ P Fi.n/ ˇCj n C P x .nC1/ ˇCj
ˇ
P Fi.nC1/ ˇ Cj D (26.10)
nC1
Therefore, information of n C 1 samples can obtained from those of former n
samples and the (n C 1)th sample.
Claim 2 For NB and HIB, based on same training set, same feature sets can be
obtained by same string extraction and elimination methods, that is FN D FH .
Where FH is feature set obtained by half increment algorithm, FN is feature set
obtained by naı̈ve bayes algorithm.
Y
aCb
C1 D arg max P .C / P .Fi jC / (26.12)
i D1
308 Y.-x. Lai, Z.-h. Liu
Y
aCb
C2 D arg max P .C / P .Fi jC / (26.13)
i D1
Therefore, the C1 D C2 .
26.4.4 Complexity
Based on former algorithm, time-consuming of NB and HIB are made of two parts:
1. Extract unrepeated strings from samples
2. Fix on feature set and training set, build up classifier
For step (1), time-consuming of two algorithms are same. But for step (2), they are
different.
For HIB, feature set is increased with studying of classifier. That is, composed
there are S1 strings in the first sample, and k1 strings are unrepeated, so k1 strings
are elements of set F . If there are S2 strings in the second sample, the time-
consuming of computing whether k1 elements are found in S2 or not is TH 2 D
O.S2 k1 /. If k2 elements are not found in F , these elements should be added
in feature set F . Set F will includes k1 C k2 elements. For n samples, the all
time-consuming of fix on set F is
For NB, the set F are obtained before classifier study. Composed there are K ele-
ments in F , the time consuming to indicate whether or not the feature is present a
given executable file is
Based on Claim 2, F N DF H , K D k1 C k2 C : : : C kr .
Based on Eq. (26.14),
TH D O.K .S2 CS3 C: : :CSn /S2 .k2 C: : :Ckr /: : :Sn1 kr / (26.16)
If k2 D k3 D : : : D kr D 0, or K D k1 , TH D TN , the time-consuming of
two algorithm are same. But malicious executables include viruses, Trojan horses,
worms, back doors, spyware, Java attack applets, dangerous ActiveX and attack
scripts, a sample cannot include all feature elements.
26 Unknown Malicious Identification 309
Our experiment are carried out on a dataset of 2,995 examples consisting of 995
previously labeled malicious executables and 2,000 benign executables, collected
from desktop computers using various versions of the Windows operating system.
The malicious executables are taken from an anti-Virus software company.
For each run, we extract strings from the executables in the training and testing
sets. We select the most relevant features from the training data, apply elimination
method, and use the resulting classifier to rate the examples in the test set.
26.5.2.1 Time-Consuming
4000
3500
Number of feature elements
3000
2500
2000
1500
1000
500
0
0 200 400 600 800 1000 1200 1400 1600 1800
Number of samples
Fig. 26.1 Curve between number of feature elements and number of samples
310 Y.-x. Lai, Z.-h. Liu
14 NB
12 MNB
Time-consuming (h)
HIB
10
0
0 500 1000 1500 2000
Sample
1
0.9
0.8
True Positive Rate
0.7
0.6
0.5
NB
0.4
0.3 MNB
0.2
HIB
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Positive Rate
Table 26.1 Experimental results using our method and traditional methods
Method Feature Classifier Accuracy (%)
We compare our method with a model used in previous research [7, 15]. The
results, displayed in Table 26.1, indicate that a virus classifier can be made more
accurate by using features representative of general viral properties, as generated by
our feature search method. With up to 97.9% overall accuracy, our system outper-
forms NB and MNB algorithms and achieves better results than some of the leading
research in the field, which performs at 97.11%.
26.6 Conclusion
The naı̈ve Bayes classifier is widely used in many classification tasks because its
performance is competitive with state-of-the-art classifiers, it is simple to imple-
ment, and it possesses fast execution speed. In this paper, we discussed the problem
of how to classify a set of query vectors from the same unknown class with the
naı̈ve Bayes classifier. Then, we propose the method HIB algorithm and compare it
with naı̈ve Bayes and multi-naı̈ve Bayes. The experimental results show that HIB
algorithm can take advantage of the prior information, can work well on this task.
Finally, HIB algorithm was compared with a model used in previous research [7,15].
Experimental results reveal that, HIB can reach a higher level of accuracies as
97.9%. HIB’s execution speed is much faster than MNB and NB, and HIB has
low implementation cost. Hence, we suggest that HIB is useful in the domain of
unknown malicious recognition and may be applied to other application.
References
1. G. McGraw, G. Morrosett, Attacking malicious code: a report to the infosec research council,
IEEE Transactions on Software, vol. 2, Aug. 1987, pp. 740–741.
2. F. Cohen, Computer Viruses Theory and Experiments, Computers & Security, vol. 6, Jan.
1987, pp. 22–35.
3. D. Spinellis, Reliable Identification of Bounded-Length Viruses Is NP Complete, IEEE
Transactions on Information Theory, vol. 49, Jan. 2003, pp. 280–284.
4. MacAfee. Homepage-MacAfee.com. Online Publication, 2000. http://www.macafee.com
312 Y.-x. Lai, Z.-h. Liu
5. M.G. Schcltz, E. Eskin, E. Zadok, S.J. Stolfo, Data mining methods for detection of new
malicious executables, In: Proceedings of IEEE Symposium in Security and Privacy, 2001.
6. J.Z. Kolter, M.A. Mloof, Learning to detect malicious executables in the wild, In: Proceed-
ings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, ACM Press, New York, 2004, pp. 470–478.
7. J.O. Kepahart, G.B. Sorkin, W.C. Arnold, D.M. Chess, G.J. Teauro, S.R. White, Biologically
inspired defenses against computer viruses. In: Proceedings of IJCAI’95, Montreal, 1995.
pp. 985–996.
8. W. Arnold, G. Tesauro, Automatically generated Win32 heuristic virus detection, In: Proceed-
ings of the 2000 International Virus Bulletin Conference, 2000.
9. G. Tesauro, J.O. Kephart and G.B. Sorkin, Neural networks for computer virus recognition,
IEEE Expert, vol. 11, Apr. 1996, pp. 5–6.
10. W. Lee, S. Stolfo, K. Mok, A Data Mining Framework for Building Intrusion Detection
Models, IEEE Symposium on Security and Privacy, 1999.
11. W. Lee, S.J. Stolfo, P.K. Chan, Learning patterns from UNIX processes execution traces
for intrusion detection, AAAI Workshop on AI Approaches to Fraud Detection and Risk
Management, AAAI Press, 1997, pp. 50–56.
12. M.G. Schcltz, E. Eskin, E. Zadok, M. Bhattacharyya, S.J. Stolfo, MEF: Malicious Email filter,
A Unix mail filter that detects malicious windows executables, In: Proceeding of USENIX
Annual Technical Conference, 2001.
13. B. Zhang, J. Yin, J. Hao, Unknown computer virus detection based on multi-naive Bayes
algorithm. Computer Engineering, vol. 32, Oct. 2006, pp. 18–21.
14. B. Zhang, J. Yin, J. Hao, Using fuzzy pattern recognition to detect unknown malicious
executables code, Lecture Notes in Computer Science, vol. 36, Mar. 2005, pp. 629–634.
15. B. Zhang, J. Yin, D. Zhang, J. Hao, Unknown computer virus detection based on K-nearest
neighbor algorithm, Computer Engineering and Applications, vol. 6, 2005, pp. 7–10.
Chapter 27
Understanding Programming Language
Semantics for the Real World
Trong Wu
27.1 Introduction
T. Wu
Department of Computer Science at Southern Illinois University Edwardsville, IL 62026, USA
e-mail: twu@siue.edu
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 313
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
314 T. Wu
to group data objects into types. Types are discriminates to each other, unless a
type conversion function is applied that can converts from one type to another. Data
objects in a computer system are analogous to merchandise in a department store or
devices, parts, and equipment in an engineering project. The association rules used
to manage merchandise, parts, devices, and equipments are similar to respect oper-
ations of types in a computer system. To create a new department in a department
store is to expand its business; much like creating a new type in a computer system
enlarges its application domain [4] to multidisciplinary areas and to the control of
more complex physical systems. These new types are called user-defined types.
Programming language likes FORTRAN IV have only basic types and do not
allow users to define their own types. Therefore, its application domain is so limited.
However, the Ada and the C CC programming languages do provide user-defined
type capability; and their application domain is not limited [4]. Today, these two
languages are considered general purpose programming languages. Hence, user-
defined types are vital for this complex and chaotic world in computing.
a value by finding the location associated with the name and extracting the value
from the location. Let names, locations, and values be three sets, we define two
mappings:
Differences in the parameter passing mechanism are defined by when and are
applied.
1. Parameter passing by value
and both applied at point of call, argument completely evaluated at point of
call.
2. Parameter passing by reference (location)
Location is determined at the point of call and location is bound as the value of
parameter. is applied at point of call and is applied with every reference to
the parameter.
3. Parameter passing by name
At time of call, neither nor the is applied. and are applied with every
reference to the parameter.
These three parameter-passing mechanisms formed hierarchical structures; the dia-
gram is given in Fig. 27.1. For parameter passing by value, both and are applied
at calling time; for parameter passing by reference, is applied at calling time and
is applied at reference time; for parameter passing by name both and are applied
with every reference to the parameter.
The ALGOL, C , C CC, and Pascal languages provide parameter passing by
value mechanism; and it is a convenient and effective method for enforcing write
protection. The ALGOL, C , C CC, and Pascal languages also implement param-
eter passing by reference. This eliminates duplication of memory. But, there are
disadvantages in the parameter passing by reference. First, it will likely be slower
because one additional level memory addressing is needed compared to parameter
passing by value. Second, if only one-way communication to the called subprogram
is required, unexpected and erroneous changes may occur in the actual parameter.
Name
Reference
Value
Finally, parameter passing by reference can create aliases, which are harmful to
readability and reliability. They also make program verification difficult.
The ALGOL programming language by default provides parameter passing by
name. When implementing parameter passing by name, the system will create
a run-time subprogram to evaluate the expression in the calling unit of the pro-
gram and return the result to the called unit of the program. Therefore, it requires
some additional overhead to implement such a run-time subprogram. In the ALGOL
programming language, if one wants parameter passing by value, he or she must
declare with the word “value” for that variable in the actual parameter. ALGOL
treats parameter passing by reference, as a special case of parameter passing by
name; therefore, the programmer does not need to specify anything. We can define
a hierarchical structure for parameter passing by value, reference, and name in the
ALGOL language.
In the programming language PL/1, for actual parameters with a single variable,
PL/1 uses parameter passing by value. However, for a constant or an expression as
an argument in the calling statement, PL/1 refers to a dummy formal parameter in
the called subprogram and it implements a default value or parameter passing by
value.
In most FORTRAN implementations before FORTRAN 77, parameters were
passed by reference. In later implementations parameter passing by value-result
has been used commonly.
For the Ada language, parameter passing has three modes: mode in, mode out,
and mode in out. These are different from parameter passing by value and by
reference [10–13].
Mode in
This is the default mode (i.e., in may be omitted).
The actual parameter is copied into a local variable. The actual parameter must
have a defined value at the point of call.
The actual parameter may be an expression of compatible type.
Mode out
The result is copied into the actual parameter upon exit.
The actual parameter may be a variable of compatible type.
The actual parameter need not have a value upon entry.
Mode in out
The value of the actual parameter is copied into a local variable upon entry.
The value of local parameter is copied into the actual parameter upon exist.
The actual parameter must be a variable with a defined value upon entry.
These parameter-passing mechanisms are serving communication facilities between
main program and its subprograms or between one subprogram and another. The
purpose of engineering computing is to solve problems in the complicated world by
means of granulation [14,15], organization, and causation. Granulation subdivides
the problem into a set of more manageable sub-problems. Granulation is an effective
27 Understanding Programming Language Semantics for the Real World 319
tool to modularize the original problem and to write it into subprograms. From a pro-
grammer viewpoint, communication between the main program and subprograms
and the communication from one subprogram to another is the causation within
the program. This paper emphasizes using features of programming languages for
engineering computation. It is worth noting that the Ada programming language
is designed for embedded systems, safety-critical software, and large projects that
require high portability, reliability, and maintainability. For example, over 99% of
the aviation software in the Boeing 777 airplane uses the Ada language [16]. It is
not surprisingly; the Ada language was the first object-oriented design programming
language to be accepted as an International Standard.
Today, we design software systems to meet complex application requirements.
Almost all the activities in human society, the biological world, physical systems,
and engineering projects are concurrent or parallel; and purely sequential activities
are special cases. Therefore, concurrency reflects the nature of designing software
projects. In the next section, we will address the multitasking features in the Ada
programming language [17, 18].
Among all commonly used programming languages, the Ada language has the most
complete and best features for multitasking. Multitasking permits a programmer
to partition a big job into many parallel tasks [1,13]. Other programming languages
like the C and C CC programming languages can only apply some predefined
functions. Thus, they are lack of flexibility and limit their applicability.
A task is a unit of computation that can be scheduled independently and in par-
allel with other such units. An ordinary Ada program can be thought of as a single
task; in fact, it would be called the main task. Other tasks must be declared in the
main task (as subtasks) or be defined in a package [7, 10, 13, 19]. Several indepen-
dent tasks are often to be executed simultaneously in an application. Ada tasks can be
executed in true parallelism or with apparent concurrency simulated by interleaved
execution. Ada tasks can be assigned relative priorities and the underlying operat-
ing system can schedule them accordingly. A task is terminated when its execution
ends. A task can be declared in packages, subprograms, blocks, or other tasks. All
tasks or sub-tasks must terminate before the declaring subprogram, block, or task
can be terminated.
A task may want to communicate with other tasks. Because the execution speed
of tasks cannot be guaranteed, a method for synchronization is needed. To do
this, the Ada language requires the user to declare entry and accept statements in
two respective tasks engaged in communication. This mechanism provides for task
interaction and is called a rendezvous in the Ada language [19].
The Ada language also gives an optional scheduling called a priority that is asso-
ciated with a given task. A priority expresses relative urgency of the task execution.
An expression of a priority is an integer in a given defined range. A numerically
320 T. Wu
smaller value for priority indicates lower level of urgency. The priority of a task, if
defined, must be static. If two tasks with no priorities or two tasks of equal priority
exist, they will be scheduled in an arbitrary order. If two tasks of different priorities
are both eligible for execution, they could sensibly be executed on the same proces-
sor. A lower priority task cannot execute while a higher priority task waits. The Ada
language forbids time-sliced execution scheduling for tasks with explicitly speci-
fied priorities. If two tasks of prescribed priorities are engaged in a rendezvous, the
rendezvous is executed with the higher of the two priorities. If only one task has a
defined priority, the rendezvous is executed at least at that priority.
A task may delay its own execution or put itself to sleep and not use process-
ing resources while waiting for an event to occur by a delay statement. The delay
statement is employed for this purpose. Zero or negative values have no effect. The
smallest delay time is 20 ms or 0.020 s. The maximum delay duration is up to 86,400
s or 24 h. The duration only specifies minimum delay; the task may be executed any
time thereafter, if the processor is available at that time.
The Ada language also provides a select statement. There are three forms of
select statements. Selective wait, conditional entry call, and timed entry call. A
selective wait may include (1) a terminate alternative, (2) one or more delay alter-
natives, or (3) an else part, but only one of these possibilities is legal. A task may
designate a family (an array) of entries by a single name. They can be declared as:
Entry request (0..10) (reqcode: integer);
Entry alarm (level); – where type level
– must be discrete
An accept statement may name an index entry
Accept request (reqcode: integer) (0) do . . .
Accept request (1) (reqcode: integer) do . . .
Accept alarm (level);
An entry family allows an accepting task to select entry calls to the same function
deterministically.
Among all commonly used programming languages, the Ada language is the
unique one that provides multitasking features at the programming level, and it
is very important useful feature for modeling and simulating of real-time and
concurrent events in programming.
The goals of computing are reliability, efficiency, accuracy, and ease of use.
From the programming point of view, to provide reliable computation is to prevent
or eliminate overflow, underflow, and other unexpected conditions so that a program
can be executed safely, completely, and efficiently. Efficient computation requires
an effective computational algorithm for the given problem using proper program-
ming language features for that computation. For accurate computation, one should
consider problem solving capability, accuracy features, and parallel computation
abilities in a given programming language. For ease of use, the software engineer
should put himself in the situation of the user. A software engineer should remember
that users have a job to be done, and they want the computer system to do the job
27 Understanding Programming Language Semantics for the Real World 321
with a minimum of effort. In the next section, we will discuss issues of accuracy
and efficiency of the Ada language in numerical computation capability [8].
The area of numerical computation is the backbone of computer science and all
engineering disciplines. Numerical computation is critical to real world engineering
applications. For example, on November 10, 1999, the U.S. National Aeronautics
and Space Administration (NASA) reported that the Mars Climate Orbiter Team
found:
The ‘root cause’ of the loss of the spacecraft was the failed translation of English units into
metric units in a segment of ground-based, navigation-related mission software as NASA
previously announced [20].
This example indicates that numerical computation and software design are crucial
tasks in an engineering project. The goal of numerical computation is to reach to
a sufficient level of accuracy for a particular application. Designing software for
efficient computation is another challenge.
For engineering applications, we need to deal with many of numerical compu-
tations. Among most commonly used programming languages, the Ada language
has the best numerical computation capability. From the precision aspect, the Ada
language allows a user to define his own accuracy requirement. This section will
address the Ada language numerical computation capability [8]. To deal this, we
should consider the following four criteria: problem solving capability, accuracy of
computation, execution time for solving problems, and the capability of parallelism
1. Problem solving capability: The Ada language provides user-defined types and
separate compilation. The former supports programmers solving a wide range
of engineering problems and the latter permits development of large software
systems. The Ada language provides data abstraction and exception handling
that support information hiding and encapsulation for writing a reliable program.
In the real world, many engineering projects consist of concurrent or parallel
activities in their physical entities. Ada multitasking meets this requirement [21,
22]. In fact, multiprocessor computer systems are now available, thus simulating
a truly parallel system becomes possible.
2. Precision and accuracy: The Ada language’s real number types are subdivided
into float-point types and fixed-point types. Float-point type have values are
numbers
with the format, ˙:dd::d 10˙dd . Fixed-point types have values with the formats
˙dd.ddd; ˙dddd:0 or ˙0:00ddd [1, 11, 13].
For the float-point number types, model numbers other than zero, the numbers
that can be represented exactly by a given computer, are of the form:
In this form, sign is either C1 or 1; mantissa is expressed in a number base given
by radix and exponent is an integer. The Ada language allows the user to specify the
number of significant decimal digits needed. A floating-point type declaration with
or without the optional range constraint is shown:
In addition, most Ada compilers provide the types long float and long long float
(used in package standard) and f float, d float, g float, and h float (used in package
system) [22]. The size and the precision of each of the Ada floating-point types are
given as follows:
The goal of computation is accuracy. Higher accuracy will provide more reliability
in the real-time environment. Sometimes, a single precision or a double precision
of floating point numbers in FORTRAN 77 [21] is not enough for solving some
critical problems. In the Ada language one may use the floating point number type:
long long float (h float) by declaring digit 33 to use 128 bits for floating point num-
bers provided by Vax Ada [23] to provide a precision of 33 decimal digits accuracy,
and the range of exponent is about from 10134 to 10C134 or -448 to +448 of base
2 [1,11,13,24] for the range. The author has employed this special accuracy feature
in the computation of hypergeometric distribution function [25, 26].
For the fixed-point types, the model numbers are in this form:
The sign is either C1 or 1; mantissa is a positive integer; small is a certain positive
real number. Model numbers are defined by a fixed-point constraint, the number
small is chosen as the largest power of two that is not greater than the delta of a
fixed accuracy definition. The Ada language permits the user to determine a possible
range and an error bound which is called delta for computational needs. Examples
are the follows:
These indicate small is 0.0078125 which is 27 and model numbers are
12; 800; 000small toC12; 800; 000small. The predetermined range provides a
reliable programming environment. The user assigned error bound delta guarantees
an accurate computation. These floating-point number and fixed-point number
27 Understanding Programming Language Semantics for the Real World 323
types not only provide good features for real-time critical computations, but also
give extra reliability and accuracy for general numerical computations.
3. The Ada for parallel computation: The author has used exception handlings
and tasks [17, 19] for computation of a division of a product of factorials and
another product of factorials in the computation of the hypergeometric distri-
bution function [26, 27]. Exception handling is used to prevent an overflow or
underflow of multiplications and divisions, respectively. The tasks are used to
compute the numerator and denominator concurrently. In addition, tasks and
exception handling working together can minimize the number of divisions
and maximize the number of integer multiplications in both of the numerator
and denominator, reduce round off errors, and obtain the maximum accuracy.
When both products in the numerator and denominator have reached a maxi-
mum before an overflow occurs, both task one and task two stop temporarily and
invoke task three to perform a division of the products that have been obtained
in the numerator and denominator before an overflow occurs. After task three
completes its job, task one and task two resume their computation and repeat this
procedure until the final result is obtained. These tasks work together and guar-
antee that the result of this computation will be the most accurate and the time
for the computation is reasonable. The author has performed these computations
on a single processor machine, so the parallelism is a logical parallelism. If one
has a multiprocessor machine, he can perform an actual parallelism, tasking and
exception handling can easily be employed in the computation of the hypergeo-
metric distribution function and some computation results and required time for
this problem are given in [19], along with those for the computation of the multi-
nomial distribution function, multivariate hypergeometric distribution function,
and other comparable functions. We conclude here that it is not possible to carry
out the computation without using these Ada special features.
4. Execution time: In the 1990s, compiler technology was inadequate to support
many Ada features. In a real-time system, the response time for multitasking fea-
tures was seemed not fast enough. Therefore, speed was an important criterion for
choosing an Ada compiler for real-time applications. However, the second gen-
eration of Ada compilers has doubled the speed of the first generation compilers.
Today, compilers are fast enough to support all Ada features. Currently, Ada com-
pilers are available for supercomputers, mainframe computers, minicomputers,
and personal computers at reasonable prices.
In running an application, a program crash is a disaster. If it is not possible to prevent
a crash from occurring, the programmer should provide some mechanisms to handle
it, to eliminate it, or to minimize its damage. In the next section, we will address
exception-handling features for these purposes.
324 T. Wu
Numeric error
When an arithmetic operation cannot deliver the correct result
Overflow or underflow condition occurs
Storage error
It is raised if out of memory during the elaboration of a data object
During a subprogram execution, creation of a new-access type object
Tasking error
It is raised during inter-tasking communication
Program error
A program attempts to enter an unelaborated procedure, e.g. a forward is not
declared
The Ada language encourages its user to define and raise his exceptions in order
to meet the specific purposes needed. An exception handler is a segment of
subprogram or block that is entered when the exception is raised. It is declared
at the end of a block or subprogram body. If an exception is raised and there is
no handler for it in the unit, then the execution of the unit is abandoned and the
exception is propagated dynamically, as follows:
If the unit is a block, the exception is raised at the end of the block in the
containing unit.
27 Understanding Programming Language Semantics for the Real World 325
If the unit is a subprogram, the exception is raised at the point of call rather
than the statically surrounding units; hence the term dynamic is applied to the
propagation rule.
If the unit is the main program, program execution is terminated with an appro-
priate (and nasty) diagnostic message from the run time support environment.
The Ada language strongly encourages its user to define and use his own excep-
tions even if they manifest in the form of predefined exceptions at the outset, rather
than depend on predefined ones. The following is an example that shows the excep-
tion, handler, and propagation working together for the computation of factorial
function.
Function Power (n, k: natural) return natural is
Function Power (n, k: natural) return natural is
begin – inner
if k < 1 then
return 1;
else
return n * Power(n, k1);
end if;
end Power; – inner
begin – outer
return Power(n, k);
Exception
when numeric error = >
return natural’last;
end Power; – outer
The advantage of this segment of code is that when an overflow condition occurs,
the inner Power function will exit. Once the outer function returns the
natural’last, it is not possible to get back to the exact point of the exception.
Function Power (n, k: natural) return natural is
begin
if k < 1 then
return 1;
else
return n * Power (n, k1);
end if;
Exception
when numeric error = >
return natural’last;
end Power;
If the function had been written as above, each execution of the function would
encounter an exception when an overflow occurs. An undesired example is given as
follows for comparison. This function will continuously raise exceptions when the
first overflow is detected and at the end of all the recursive calls.
326 T. Wu
27.7 Conclusion
References
1. Demillo, R. and Rice, J. eds. Studies in Computer Science: In Honor of Samuel D. Conte,
Plenum, New York, 1994.
2. Myers, G. J. Software Reliability, Wiley, New York, 1976.
3. Pfleeger, S. L. Software Engineering, the Production of Quality Software, 2nd edition,
Macmillan, New York, 1991.
4. Wu, T. Some tips in computer science, a talk given at University of Illinois at Springfield, Dec.
2, 2004.
5. China Daily, CHINAdaily.com.cn, October 28, 2006.
6. The Aviation Book, A visual encyclopedia of the history of aircraft, www. Chronicale-
Books.com
7. Boeing 777, globalsecurity.org/military/systems/ aircraft/b777.html
8. Pratt, T. W. and Zelkowitz, M. V. Programming Languages, Design and Implementation, 3rd
edition, Prentice-Hall, Englewood Cliffs, NJ, 1996.
9. Sebesta, R. W. Concepts of Programming Languages, 6th edition, Addison-Wesley, Reading,
MA, 2003.
10. Barnes, J. G. P. Programming in Ada, 3rd edition, Addison-Wesley, Reading, MA, 1989.
11. Barnes, J. Programming in Ada 95, Addison-Wesley, Reading, MA, 1995.
12. Smedema, C. H., et al. The Programming languages Pascal, Modula, Chill, and Ada, Prentice-
Hall, Englewood Cliffs, NJ, 1983.
13. Watt, D. A., et al. Ada Language and Methodology, Prentice-Hall, Englewood Cliffs, NJ, 1987.
14. Zadeh, L. A. Fuzzy sets and information granularity, In M. M. Gupta, P. K. Ragade, R. R.
Yager, eds., Advances in Fuzzy Set Theory and Applications, North-Holland, Amsterdam, pp.
3–18, 1979.
15. Zadeh, L. A. Towards a theory of fuzzy information granulation and its centrality in human
reasoning and fuzzy logic, Fuzzy Sets and Systems. 19, 111–117, 1997.
16. Ada information clearance house, the web site: www.adaic.org/atwork/boeing.html
17. Booch, G. Software engineering with Ada, 2nd edition, Benjamin/Cummings Publishing,
Reading, MA, 1987.
18. Habermann, A. N. and Perry, D. E. Ada for Experienced Programmers, Addison-Wesley,
Reading, MA, 1983.
19. Welsh, J. and Lister, A. A Comparative Study of Task Communication in Ada, Software
Practice and Experience. 11, 257–290, 1981.
20. Mas project http://mars.jpl.nasa
21. Parnas, D. L. Is Ada Too Big? (letter) Communications of ACM. 29, 1155–1155, 1984.
22. Wichmann, B. A. Is Ada Too Big? A Designer Answers the Critics, Communications of ACM.
29, 1155–1156, 1984.
23. Struble, G. Assembler Language Programming, The IBM System/370 Family, Addison-
Wesley, Reading, MA, 1984.
328 T. Wu
24. Vax Ada Language Reference Manual, Digital Equipment Corporation, Maynard, MA, 1985.
25. Wu, T. Built-in reliability in the Ada programming language, Proceedings of IEEE 1990
National Aerospace and Electronics Conference, pp. 599–602, 1990.
26. Wu, T. An Accurate Computation of the Hypergeometric Distribution Function, ACM
Transactions on Mathematical Software. 19 (1), 33–43, 1993.
27. Wu, T. Ada programming language for Numerical Computation, Proceedings of IEEE 1995
National Aerospace and Electronics Conference, pp. 853–859, 1995.
28. Joint Task Force for Computing Curricula 2004, Overview Report. ACM and IEEE-CS, 2004.
Chapter 28
Analysis of Overhead Control Mechanisms
in Mobile AD HOC Networks
Abstract In this chapter a description of several techniques that are proposed for
minimizing the routing overhead in ad hoc routing protocols is discussed. Differ-
ent algorithms are classified into several categories such as clustering, hierarchical,
header compression and Internet connectivity to mobile ad hoc networks based on
main objective of minimizing the routing overheads. Clearly, the selection of this
area in this paper is highly subjective. Besides, the routing overhead minimizing
schemes discussed in this chapter, there are dozen of research schemes that are cur-
rently the focus of the research community. With this tutorial, readers can have a
comprehensive understanding of different schemes that are employed to reduce the
routing overhead.
28.1 Introduction
Mobile Ad hoc networks are autonomous systems formed by mobile nodes with-
out any infrastructure support. Routing in MANET is challenging because of the
dynamic nature of the network topology. Fixed network routing protocols can
assume that all the routers have a sufficient description of the underlying network,
either through global or decentralized routing tables. However, dynamic wireless
networks do not easily admit such topology state knowledge. The inherent random-
ness of the topology ensures that a minimum overhead is associated with the routing
in MANET and is greater than that of fixed networks. It is therefore of interest to
S. Gowrishankar (B)
Department of Electronics and Telecommunication Engineering, Jadavpur University,
Kolkata 700032, West Bengal, India,
E-mail: gowrishankarsnath@acm.org
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 329
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
330 S. Gowrishankar et al.
know how small the routing overhead can be made for a given routing strategy and
random topology [1].
To evaluate the performance of routing protocols in MANET, several perfor-
mance metrics such as packet delivery ratio, average end-to-end delay and routing
overhead are commonly used. Among these metrics routing overhead is the impor-
tant one as it determines the scalability of a routing protocol. Routing Overhead
means how many extra messages were used to achieve the acceptance rate of
improvement.
To evaluate the routing overhead in mobile ad hoc routing protocols different
methods are followed. They are (a) simulations, (b) physical experiments, and (c)
theoretical analysis [2].
In simulations a controlled environment is provided to test and debug many of
the routing protocols. Therefore, most of the literature [3–5] evaluates the control
overhead in routing protocols using software simulators like NS-2 [6], Glomosim
[7], Qualnet [8] and OPNET [9]. However, simulations are not foolproof method
and it may fail to accurately reveal some critical behaviors of routing protocols, as
most of the simulation experiments are based on simplified assumptions.
Physical experiments evaluate the performance of routing protocols by imple-
menting them in real environment. Some of the papers in the literature evaluate
routing overhead in real physical environment [10, 11]. But, physical experiments
are much more difficult and time consuming to be carried out.
Analysis of routing overhead from a theoretical point of view provides a deeper
understanding of advantages, limitations and tradeoffs found in the routing proto-
cols in MANET. Some of the papers in literature [12–15] have evaluated routing
protocols in MANET from theoretical analysis perspective.
Traditionally the routing schemes for ad hoc networks are classified into proactive
and reactive routing protocols. Proactive protocols like DSDV [16] and OLSR [17]
maintain routing information about the available paths in the network even if these
paths are not currently used. The drawback of such paths is that it may occupy a
significant part of the available bandwidth. Reactive routing protocols like DSR,
TORA [5] and AODV [18] maintain only the routes that are currently available.
However, when the topology of network changes frequently, they still generate large
amount of control traffic.
Therefore, the properties of frequent route breakage and unpredictable topology
changes in MANET make many of the routing protocols inherently not scalable with
respect to the number of nodes and the control overhead. In order to provide routing
scalability a hierarchy of layers is usually imposed on the network. Scalability issues
are handled hierarchically in ad hoc networks. Many hierarchical routing algorithms
28 Analysis of Overhead Control Mechanisms in Mobile AD HOC Networks 331
are adopted for routing in ad hoc wireless networks. For e.g., cluster based routing
and the dominating set based routing.
Sucec and Marsic provide a formal analysis of the routing overhead i.e., they
provide a theoretical upper bound on the communication overhead incurred by the
clustering algorithms that adopt the hierarchical routing schemes. There can be
many scalability performance metrics like hierarchical path length, least hop path
length and routing table storage overhead. Among these metrics, control overhead
per node (‰) is the most important one because of scarce wireless link capacity,
which has severe performance limitation. The control overhead ‰ is expressed
as a function of jVj, where V is the set of network nodes. It is shown that with
reasonable assumptions, the average overhead generated per node per second is
only polylogarithmic in the node count i.e., ‰ D O log2 jV j bits per second per
node [19].
Communication overhead in hierarchically organized networks may result from
the following phenomenon: (a) Hello Protocols; (b) level-k cluster formation and
cluster maintenance messaging, k 2 f1; 2: : :Lg, where L is the number of levels
in the clustered hierarchy; (c) flooding of the cluster topology updates to cluster
members; (d) addressing information required in Datagram headers; (e) location
management events due to changes in the clustered hierarchy and due to node
mobility between clusters; (f) hand off or transfer of location management data; (g)
Location query events. Total communication overhead per node in hierarchically
organized networks is the sum of the above contributing elements.
The control overhead and network throughput under a cluster based hierarchi-
cal routing scheme is discussed in [20]. The authors claim that when the routing
overhead is minimized in a hierarchical design then there is a loss in the throughput
from the same hierarchical design. A strict hierarchical routing is assumed which is
not based on any specific routing protocol. In MANET, hierarchical routing proto-
cols do not require every node to know the entire topology information. Only a few
nodes called the cluster head nodes need to know about the topology information
and all the other nodes can simply send their packets to these cluster heads.
Hierarchical routing protocols reduce the routing overhead, as lesser nodes need
to know the topology information of an ad hoc network. The throughput 2of ad hoc
network with hierarchical routing scheme is smaller by a factor of O N N1 , where
N2 is the number of cluster head nodes and N1 is the number of all the nodes in
the network. Hence, the authors claim that there is a tradeoff between the gain from
the routing overhead and the loss in the throughput from the hierarchical design
of the ad hoc routing protocols.
The control overhead in a hierarchical routing scheme can be due to packet trans-
missions per node per second (), due to the maintenance of routing tables as well
as due to the address management or location management. Therefore the overhead
required by hierarchical
routing is a polylogarithmic function of the network node
count (N) i.e., ˆ D ‚ log2 jN j packet transmissions per node per second. In this
equation, overhead due to hierarchical cluster formation and location management
are identified [15].
332 S. Gowrishankar et al.
The concept of dividing the geographical regions into small zones has been pre-
sented as clustering in the literature.
Clustering basically transforms a physical network into a virtual network of inter-
connected clusters or group of nodes. These clusters are dominated by clusterheads
(CH) and connected by gateways or border terminals as shown in Fig. 28.1. Any
node can be CH, if it has the necessary functionality such as the processing and
transmission power. The node registered with the nearest CH becomes the member
of that cluster. By partitioning a network into different clusters both the storage and
communication overhead can be reduced significantly.
Different clustering algorithms may use different clustering schemes but gener-
ally three different types of control messages are needed: (a) Beacon messages also
known as Hello messages are used to learn about the identities of the neighboring
nodes, (b) cluster messages are used to adapt to cluster changes and to update the
role of a node, (c) route messages are used to learn about the possible route changes
in the network [21].
The various types of control messages overhead are (a) Hello Overhead: To
reduce the hello overhead messages; the frequency of hello messages generated
by a node to learn about its neighboring node when a new link is formed should
be at least equal to the link generation rate. The link generation between any two
nodes can be notified by sending the hello messages and each of the nodes can hear
the hello message sent by the other node. (b) Cluster message overhead due to link
break between cluster members and their cluster heads: This event causes the node
to change its cluster or become a cluster head when it has no neighboring cluster-
ing heads. The cluster members send the cluster messages due to this type of link
changes. To minimize the control message overhead the ratio of such link breaks
to total link breaks should be equal to the ratio of links between the cluster mem-
bers and cluster heads divided by the total number of links in the entire network. (c)
Cluster message overhead due to link generation between two cluster heads: When
a link is generated between two cluster heads, one of the cluster heads needs to give
up its cluster head role, which is decided by the clustering algorithm. Every time a
link between two cluster heads appears, the number of cluster messages generated
is same as the number of nodes in the cluster that needs to undergo reclustering. (d )
Routing overhead: When a particular node in the cluster should be updated with the
route to other nodes in the cluster, the routing storage overhead is proportional to
the size of the cluster.
An analytical study on routing overhead of two level cluster based routing proto-
cols for mobile ad hoc networks is done in [2]. Routing protocols can be summarized
into generic proactive routing protocol and a generic reactive routing protocol. It’s
generic because there may be some different strategy employed for each of the
groups, but the underlying nature of the routing is similar.
In two level cluster based routing scheme, the routing is divided into two separate
parts, i.e., routing among different clusters (i.e., intercluster routing) and routing
within a cluster (i.e., intracluster routing). Since there are two types of routing
schemes i.e., proactive and reactive which can be employed in intercluster rout-
ing and intracluster routing, there are totally four types of two level hierarchical
routing schemes with different combinations of them. Hence we have proactive to
proactive, reactive to reactive, proactive to reactive and reactive to proactive routing
scheme.
When a proactive scheme is adapted for intercluster routing each cluster head
periodically collects its local cluster topology and then broadcasts it to its direct
neighbor cluster head via gateways. When a reactive routing scheme is used for
inter cluster routing, a route request for a route to the destination node that is in
another cluster will be broadcasted among cluster heads. When a proactive rout-
ing scheme is utilized for intracluster routing, each node broadcasts its local node
topology information, so the route to the destination within the same cluster will be
available when needed. When a reactive routing scheme is employed for intracluster
routing, a route request to the destination will be flooded within the same cluster.
Thus a proactive to proactive routing scheme will work as a proactive routing
protocol with a hierarchical structure. The proactive to proactive routing scheme
produces overhead due to periodical topology maintenance of
1 2 1
O N C N2
n kNc
where n is the total number of clusters in the network, N is the network size, k is the
cluster radius, Nc is the cluster size.
A reactive to reactive routing protocol operates as a purely reactive routing pro-
tocol with a hierarchical structure. Reactive to reactive routing protocol yields a
routing overhead due to route discovery of
1 2
O N :
k
334 S. Gowrishankar et al.
In a reactive to proactive routing scheme each node in the cluster will periodi-
cally broadcast local node topology information within the cluster. Thus, when the
destination is within the cluster, the route is immediately available. Otherwise, the
node will send a route request packet to its cluster head, which will be broadcasted
among the cluster heads. Reactive to proactive routing protocol have a basic routing
overhead due to cluster maintenance and route discovery, which is approximately
equal to
1 2 1 2
O N C N :
n k
A mathematical framework for quantifying the overhead of a cluster based rout-
ing protocol (D-hop max min) is investigated by Wu and Abouzeid [13]. The authors
provide a relationship between routing overhead and route request traffic pattern.
From a routing protocol perspective, ‘traffic’, could be defined as the pattern by
which the source destination pairs are chosen. The choice of a source destination
pair depends on the number of hops along the shortest path between them. Also the
network topology is modeled as a regular two-dimensional grid of unreliable nodes.
It is assumed that an infinite number of nodes are located at the intersections of a
regular grid. The transmission range of each node is limited such that a node can
directly communicate with its four immediate neighbors only. It is reported that the
clustering does not change the traffic requirement for infinite scalability compared
to flat protocols, but reduces the overhead by a factor of
1
o where M is the cluster size:
m
Wu and Abouzeid show that the routing overhead can be attributed to events like
(a) route discovery, (b) route maintenance, and (c) cluster maintenance.
Route discovery is the mechanism where by a node i wishing to send a packet to
the destination j obtains a route to j. When a source node i wants to send a packet
to the destination node j, it first sends a route request (RREQ) packet to its clus-
ter head along the shortest path. The route reply (RREP) packet travels across the
shortest path back to the cluster head that initiated the RREQ packet. So the route
discovery event involves an RREP and RREQ processes. The overhead for RREQ
is generally higher than the RREP since it may involve flooding at the cluster head
28 Analysis of Overhead Control Mechanisms in Mobile AD HOC Networks 335
4 .M 1/ M .M C 1/
C 2M Where, M is the radius of the cluster:
3 .2M 2 C 2M C 1/
The work done in [14] by Zhou, provide a scalability analysis of the routing
overheads with regard to the number of nodes and the cluster size. Both the inte-
rior routing overhead within the cluster and the exterior routing overhead across the
clusters are considered. The routing protocol includes a mechanism for detecting,
collecting and distributing the network topology changes. The process of detecting,
collecting and distributing the network topology changes contribute to a total routing
overhead Rt . The routing overhead can be separated into interior routing overhead
(Ri ) and the exterior routing overhead (Re ). The interior routing overhead Ri is the
bit rate needed to maintain the local detailed topology. This includes the overhead of
detecting the link status changes by sending “HELLO” message, updating the clus-
ter head about the changes in link status and maintaining the shortest path between
the regular nodes to their cluster head.
Exterior routing overhead (Re ) is the bit rate needed to maintain the global owner-
ship topology, which includes the overheads of the distributing the local ownership
topologies among the cluster heads. Hence Rt D Ri C Re
336 S. Gowrishankar et al.
In literature it has been studied that [22] approximately half of the packets sent
across the Internet are 80 bytes long or less. This percentage has increased over the
last few years in part due to widespread use of real time multimedia applications.
The multimedia application’s packet size is usually smaller in size and these small
packets must be added with many protocol headers, while traveling through the
networks. In Ipv4 networks there can be at least 28 bytes (UDP) or 40 bytes (TCP)
overheads per packet. These overheads consume much of the bandwidth, which
is very limited in wireless links. Small packets and relatively larger header size
translates into poor line efficiency. Line efficiency can be defined as the fraction of
the transmitted data that is not considered overhead. Figure 28.2 shows some of the
common header chains and size of each component within the chain.
Ad hoc networks create additional challenges such as context initialization over-
head and packet reordering issues associated with node mobility. The dynamic
nature of ad hoc networks has a negative impact on header compression efficiency.
A context is established by first sending a packet with full uncompressed header
that provides a common knowledge between the sender and receiver about the static
field values as well as the initial values for dynamic fields. This stage is known as
context initialization. Then the subsequent compressed headers are interpreted and
decompressed according to a previously established context. Every packet contains
a context label. Here the context label indicates the context in which the headers are
compressed or decompressed.
A novel hop-by-hop context initialization algorithm is proposed in [23] that
depends on the routing information to reduce the overhead associated with the con-
text initialization of IP headers and uses a stateless compression method to reduce
the overhead associated with the control messages. Context initialization of IP head-
ers is done on a hop-by-hop basis because the headers need to be examined in an
uncompressed state at each of the intermediate nodes. The context initialization
20 28 40 48 60
Today Internet has become the backbone of the wired and wireless communica-
tion. Also mobile computing is gaining in popularity. In order to meet the rapid
growing demand of mobile computing, many of the researchers are interested in the
integration of MANET with the Internet.
When a mobile node in MANET wants to exchange packets with the Internet,
first the node must be assigned a global IP address and then the available Internet
gateways has to be discovered to connect to the Internet as shown in Fig. 28.3. But,
this is achieved at the cost of higher control overhead.
For gateway discovery, a node depends on periodic gateway advertisement. To
make efficient use of this periodic advertisement, it is necessary to limit the adver-
tisement flooding area. For gateway discovery, a node depends on periodic gateway
advertisement. To make efficient use of this periodic advertisement, it is necessary
to limit the advertisement flooding area. A complete adaptive scheme to discover IG
in an efficient manner for AODV is given in [13]. In this approach both the periodic
advertisement and adaptive advertisement schemes are used. At a relatively long
interval each gateway sends the periodic advertisement messages. Periodic adver-
tisements performed at a widely spaced interval do not generate a great deal of
overhead but still provides the mobile nodes with a good chance of finding the short-
338 S. Gowrishankar et al.
D
Internet
Internet Gateway
Internet Gateway
S S
S
est path to a previously used gateway. The TTL of the periodic gateway message is
used as a parameter to adjust the network conditions. A heuristic algorithm called
“Minimal Benefit Average” [25] decides the next TTL to be used for the periodic
gateway advertisement messages.
The goal of the adaptive advertisement scheme is to send advertisement pack-
ets only when the gateway detects the movement of nodes, which would result
in the paths used by the source mobile nodes communicating with the gateway
to be changed. Adaptive advertisement is performed when needed, regardless of
the time interval used for periodic advertisement. [26]. In this approach there is
reduction in overhead messages since the periodic advertisements are sent at a long
time interval and perform adaptive advertisement only if there is mobility in the
network.
The various parameters that affect the control overhead created by interoperat-
ing the ad hoc routing protocols and IP based mobility management protocols is
addressed in [27]. Mobile IP is used as the baseline mobility management proto-
col and AODV is chosen as the ad hoc routing protocol. IP tunneling is used to
separate the ad hoc network from the fixed network. In mobile IP, a mobile node
can tell which router is available by listening to router advertisements, which are
periodically broadcasted by the routers.
A fixed access router is assigned the role of mobility agent and has connection
to at least one of the MANET nodes. Such router is referred to as Ad hoc Internet
Access Router (AIAR) and it maintains a list called ad hoc list, which keeps a list
of IP address of the mobile nodes that wish to have Internet connectivity. In an
integrated network the control overhead comprises of AIAR registration packets,
routing protocol control packets, mobile IP registration packets and mobile IP router
advertisement.
In mobile IP majority of the overhead is due to the route advertisement packets
that are being repeatedly and periodically forwarded among the mobile nodes. Also,
the router advertisement used by the mobility management protocol to carry network
28 Analysis of Overhead Control Mechanisms in Mobile AD HOC Networks 339
References
1. R. C. Timo and L. W. Hanlen, “MANETs: Routing Overhead and Reliability”, IEEE, 2006.
2. Z.-H. Tao, “An Analytical Study on Routing Overhead of Two level Cluster Based Routing
Protocols for Mobile Ad hoc Networks”, IEEE, 2006.
3. J. Broch, D. A. Maltz, D. B. Johnson, Y.-C. Hu, and J. Jetcheva, “A Performance Compar-
ison of Multi-hop Wireless Ad Hoc Network Routing Protocols,” inProceedings of ACM
MobiCom, Oct. 1998.
4. S. R. Das, R. Castaneda, and J. Yan, “Simulation Based Performance Evaluation of Routing
Protocols for Mobile Ad Hoc Networks,” Mobile Networks and Applications, vol. 5, no. 3,
pp. 179–189, Sept. 2000.
340 S. Gowrishankar et al.
30. A. M. Abbas and B. N. Jain, “Mitigating Overheads and Path Correlation in Node-Disjoing
Multipath Routing for Mobile Ad Hoc Networks”, IEEE, 2006.
31. R. C. Timo and L. W. Hanlen,” Routing Overhead and Reliability”, IEEE, 2006.
32. P. M. Ruiz and A. F. Gomez Skarmet, “Reducing Data Overhead of Mesh Based Ad Hoc
Multicast Routing Protocols by Steiner Trees”, IEEE, 2004.
33. S. Bansal, R. Shorey, and A. Misra, “Comparing the Routing Energy Overheads of Ad Hoc
Routing Protocols”, IEEE, 2003.
34. R. Teng, H. Morikawa, and T. Aoyama, “A Low Overhead Routing Protocol for Ad Hoc
Networks with Global Connectivity”, IEEE, 2005.
35. E. R. Inn Inn and K. G. S. Winston, “Clustering Overhead and Convergence Time Analy-
sis of the Mobility Based Multi hop Clustering Algorithm for Mobile Ad Hoc Networks”,
Proceedings of the 11th International Conference on Parallel and Distributed Systems, IEEE,
2005.
36. S.-H. Wang, M.-C. Chan, and T.-C. Hou, “Zone ased Controlled Flooding in Mobile Ad Hoc
Networks”, International Conference on Wireless Networks, Communications and Mobile
Computing, IEEE, 2005.
37. www.scalable-networks.com
Chapter 29
Context Aware In-Vehicle Infotainment
Interfaces
Today’s cars provide an increasing number of new functionalities that enhance the
safety and driving performance of drivers or raise their level of comfort. In addi-
tion, infotainment systems are innovating themselves, providing luxury facilities or
enabling modern communication mechanisms. With every new generation of cars,
there are more built-in software functions.
The current state-of-practice for developing automotive software for Infotain-
ment and Telematics systems offers little flexibility to accommodate such hetero-
geneity and variation. Currently, application developers have to decide at design
time what possible uses their applications will have and the applications do not
change or adapt once they are deployed on an infotainment platform. In fact,
In-vehicle infotainment applications are currently developed with monolithic archi-
tectures, which are more suitable for a fixed execution context.
This chapter presents a framework that enables development of context aware
interaction interfaces for in-vehicle infotainment applications. Essential framework
concepts are presented with development and discussion of a lightweight component
model. Finally, the design of an example application is presented and evaluated.
H. Sharma (B)
Software Engineer at Delphi Delco Electronics Europe GmbH, Bad Salzdetfurth, Germany,
E-mail: hemant.sharma@delphi.com
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 343
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
344 H. Sharma, A.K. Ramani
With increased commuting times, vehicles are becoming a temporary mobile work-
place and home place for both drivers and passengers, implying significant addi-
tional content and tools related to depiction and interaction with information about
one’s own identity, entertainment, relationship-building/maintenance (e.g., com-
munication with others), self-enhancement (including education and health mon-
itoring), access to information resources (e.g., travel information, references, and
other databases), and commerce. Vehicles as communication, entertainment, and
information environments may involve distributed displays that depict virtual pres-
ence in other spaces, games played with other vehicles’ drivers and passengers,
and status or instruction concerning driving/parking maneuvers, repairs, parts re-
ordering, etc.
In-vehicle infotainment systems offer a new generation of ultra-compact embed-
ded computing and communication platforms, providing occupants of a vehicle with
the same degree of connectivity and the same access to digital media and data that
they currently enjoy in their home or office. These systems are also designed with
an all-important difference in mind: they are designed to provide an integrated,
upgradeable control point that serves not only the driver, but the individual needs
of all vehicle occupants.
Infotainment system manufacturers seeking to serve worldwide markets face a
complex matrix of 15 major OEMs (car companies) with 78 major marks (brands)
and perhaps as many as 600 vehicle models. Models may each have a number of trim
levels; serve multiple languages and regulatory regimes (countries). To compete, the
suppliers will also need to deliver new functionality; features and a fresh new look
on an annual basis. Building Human Machine Interface (HMI) for these complex
applications is difficult and code-intensive, consuming disproportionate amount of
resources and adding considerably to project risk.
The component metamodel, as shown in Fig. 29.2, is a Meta Object Facility (MOF)
compliant extension of the UML metamodel.
It builds upon and extends the UML concepts of Classifier, Node, Class, Inter-
face, Data Type, and Instance. The most novel aspect of the component model is the
way in which it offers distribution services to local components, allowing instances
to dynamically send and receive components at runtime.
The component metamodel is a local, or in process, reflective component meta-
model for HMI applications hosted on infotainment platforms. The model uses
logical mobility primitives to provide distribution services and offers the flexible
use of those primitives to the applications; instead of relying on the invocation of
remote infotainment services via the vehicle network. The HMI framework com-
ponents are collocated on the same address space. The model supports the remote
cloning of components between hosts, providing for system autonomy when appli-
cation service connectivity is missing or is unreliable. As such, an instance of HMI
framework, as part of HMI application, is represented as a collection of local com-
ponents, interconnected using local references and well-defined interfaces, deployed
on a single host. The model also offers support for structural reflection so that appli-
cations can introspect which components are available locally, choose components
to perform a particular task, and dynamically change the system configuration by
adding or removing components.
29 Context Aware In-Vehicle Infotainment Interfaces 349
Components
Containers
The central component of every HMI application is the container component. A con-
tainer is a component specialization that acts as a registry of components installed
on the system. As such, a reference to each component is available via the container.
The container component implements a specialization of the component facade that
exports functionality for searching components that match a given set of attributes.
An adaptive system must also be able to react to changes in component availabil-
ity. For example, a media player interface for iPOD must be able to reason about
which streams it can decode. Hence, the container permits the registration of listen-
ers (represented by components that implement the ComponentListener facade) to
be notified when components matching a set of attributes given by the listener are
added or removed.
To allow for dynamic adaptation, the container can dynamically add or drop
components to and from the system. Registration and removal of components is del-
egated to one or more registrars. A registrar is a component that implements a facade
that defines primitives for loading and removing components, validating dependen-
cies, executing component constructors, and adding components to the registry.
350 H. Sharma, A.K. Ramani
The HMI framework supports a very simple and lightweight component life cycle.
When a component is passed on to the container for registration by loading it from
persistent storage, using a Deployer, etc., the container delegates registration to a
registrar component. The registrar is responsible for checking that the dependencies
of the component are satisfied, instantiating the component using its constructor,
and adding it to the registry. Note that the component facade prescribes a single con-
structor. An instantiated component can use the container facade to get references
to any other components that it may require. A component deployed and instanti-
ated in the container may be either enabled or disabled. The semantics of those and
the initial state of the component depend on the component implementation. The
functionality needed to manipulate the state of the component is exported by the
component facade.
The aim of HMI Framework core in general is to provide higher level inter-
action primitives than those provided by the vehicle infotainment network and
infotainment service system as a layer upon which HMI applications are then
constructed. In doing so, the framework hides the complexities of addressing
distribution, heterogeneity, and failures (Fig. 29.3).
The framework core along with underlying middleware provides a number of
services to applications. The services themselves are seen as regular components
built on top of the container. As such, they can be dynamically added and removed.
An HMI application built using the framework can reconfigure itself by using logical
mobility primitives. As different paradigms can be applied to different scenarios,
our metamodel does not build distribution into the components themselves, but it
provides it as a service; implementations of the framework metamodel can, in fact,
dynamically send and receive components and employ any of the above logical
mobility paradigms.
The framework considers four aspects of Logical Mobility: Components, Classes,
Instances, and Data Types; the last is defined as a bit stream that is not directly
executable by the underlying architecture. One such, the Logical Mobility Entity
(LME), is defined as an abstract generalization of a Class, Instance, or Data Type.
In the framework component metamodel, an LMU is always deployed in a
Reflective component. A Reflective component is a component specialization that
can be adapted at runtime by receiving LMUs from the framework migration ser-
vices. By definition, the container is always a reflective component, as it can receive
and host new components at runtime.
The Application Model (Fig. 29.4) of HMI framework provides support for building
and running context-aware infotainment HMI applications on top of the framework
context model.
HMI applications access the framework functionality through an Application-
Interface: each time an application is started, an ApplicationSkeleton is created to
allow interactions between the framework and the application itself. In particular,
application interfaces allow applications to request services from the underlying
framework, and to access their own application context profile through a well-
defined reflective meta-interface. The HMI application is realized as composition
of components based on the component model of the HMIFramework.
The application model shall guide modeling of infotainment HMI application
components with the use of interfaces of component of the framework in an efficient
manner. The application model is also expected to ease the integration of context
awareness to applications by proper use of the framework interfaces.
notification service to be notified whenever the tuner façade component that has
Traffic Message attribute implemented is registered. Moreover, it uses the deployer
and the discovery components to premium Traffic message service that are found
remotely.
Figure 29.5 presents the platform independent application level model for TMC
HMI application.
The Traffic Message HMI demonstrates an application that uses the container
to listen to the arrival of new components, adapting its interface and functionality
upon new TMC service component arrival. It also demonstrates reaction to con-
text changes, as the application monitors the discovery services for new service
components and schedules them for use as soon as they appear.
29.4.2 Evaluation
The TMC HMI application demonstrates an application that uses the container to
listen to the arrival of new components and then adapts its interface and function-
ality to reflect the arrival of a new component. It also demonstrates reaction to
context changes as the application monitors the TMC services for new messages
29 Context Aware In-Vehicle Infotainment Interfaces 353
and schedules them for navigation reroute as soon as they appear. The operation is
transparent to the end user.
Finally, it illustrates the ease with which existing code can be adapted to run
under the framework environment. The adapted code gains the ability to be deployed
dynamically at runtime and to be used as part of a component-based application,
such as the TMC HMI application. Hence, it can express dependencies on other
components or platforms, it can be required by other components, and a component-
based application can be built with it.
29.5 Summary
This chapter presented an approach for building context aware HMIs for In-vehicle
infotainment software applications by means of components. The HMI framework
enable this by using a logical mobility based component model. The lightweight
component metamodel is instantiated as a framework system for adaptable HMI
application and systems. The framework offers logical mobility primitives as first-
class citizens to achieve context awareness.
354 H. Sharma, A.K. Ramani
References
Abstract Face recognition has drawn considerable interest and attention from many
researchers. Generally pattern recognition problem rely upon the features inherent
in the pattern for efficient solution. Conversations are usually dominated by facial
expressions. A baby can communicate with its mother through the expressions on
its face. But there are several problems in analyzing communication between human
beings through non-verbal communication such as facial expressions by a computer.
In this chapter, video frames are extracted from image sequences. Using skin color
detection techniques, face regions are identified. PCA is used to recognize faces.
The feature points are located and their coordinates are extracted. Gabor wavelets
are applied to these coordinates and to the images as a whole.
Keywords Facial Expression PCA Face recognition video frame skin color
detection Gabor wavelet
30.1 Introduction
Face recognition has drawn considerable interest and attention from many resear-
chers for the last 2 decades because of its potential applications, such as in the areas
of surveillance, secure trading terminals, control and user authentication. The prob-
lem of face recognition is either to identify facial image from a picture or facial
region from an image and recognize a person from set of images. A robust face
recognition system has the capability of distinguishing among hundreds, may be
even thousands of faces. For automated access control, most common and accepted
method is face recognition. A number of face recognition methods have been
proposed.
V. P. Lekshmi (B)
College of Engineering, Kidangoor, Kottayam, Kerala, India
E-mail: vplekshmi@yahoo.com
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 355
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
356 P.V. Lekshmi et al.
Generally pattern recognition problem rely upon the features inherent in the
pattern for efficient solution. The challenges associated with face detection and
recognition problem are pose, occlusion, expression, varying lighting conditions,
etc. Facial expression analysis has wide range of applications in areas such as
Human Computer Interaction, Image retrieval, Psychological area, Image under-
standing, Face animation etc. Humans interact with each other both verbally and
non-verbally.
Conversations are usually dominated by facial expressions. A baby can com-
municate with its mother through the expressions on its face. But there are several
problems in analyzing communication between human beings through non-verbal
communication such as facial expressions by a computer because expressions are
not always universal. It varies with ethnicity. Further facial expressions can be
ambiguous. They have several possible interpretations. To analyze the facial expres-
sion, face regions have to be detected first. Next step is to extract and represent the
facial changes caused by facial expressions. In facial feature extraction for expres-
sion analysis, there are two types of approaches, namely Geometric based methods
and Appearance based methods. In Geometric based method, the shape and loca-
tion of facial features are extracted as feature vectors. In Appearance based method,
image filters are applied either to whole face or to specific regions of facial image
to extract facial features.
In this paper, video frames are extracted from image sequences. Using skin color
detection techniques, face regions are identified. PCA is used to recognize faces.
The feature points are located and their coordinates are extracted. Gabor wavelets
are applied to these coordinates and to the images as a whole.
Rest of the paper is organized as follows. Section 30.2 gives background and
related works. Section 30.3 discusses the proposed method. Results are given in
Section 30.4. Conclusion and future work are given in Section 30.5.
Most face recognition methods fall into two categories: Feature based and Holistic
[1]. In feature-based method, face recognition relies on localization and detection
of facial features such as eyes, nose, mouth and their geometrical relationships [2].
In holistic approach, entire facial image is encoded into a point on high dimensional
space. Images are represented as Eigen images. A method based on Eigen faces
is given in [3]. PCA [4] and Active Appearance Model (AAM) for recognizing
faces are based on holistic approaches. In another approach, fast and accurate face
detection is performed by skin color learning by neural network and segmentation
technique [5]. Facial asymmetry information can be used for face recognition [7].
In [8], ICA was performed on face images under two different conditions: In one
condition, image is treated as a random variable and pixels are treated as outcomes
and in the second condition which treated pixels as random variables and image as
outcome. Facial expressions are extracted from the detailed analysis of eye region
30 Facial Expression Analysis Using PCA 357
images is given in [9]. In the appearance based approaches given in [10], facial
images are recognized by warping of face images. Warping is obtained by automatic
AAM processing.
Another method of classification of facial expression is explained in [11] in
which the geometry and texture information are projected in to separate PCA spaces
and then recombined in a classifier which is capable of recognizing faces with differ-
ent expressions. Kernel Eigen Space method based on class features for expression
analysis is explained in [12]. In [13] facial expressions are classified using LDA, in
which the gabor features extracted using Gabor filter banks are compressed by two
stage PCA method.
30.3 Method
Our proposed method consists of training stage, face recognition and expression
classification stages.
The face image sequences with certain degree of orientation, wearing glasses and
large variations in the facial expressions are considered. As a first step, frames
with peak expression called key frames are identified. Face regions from these key
frames are extracted using skin color detection method. Skin regions are identi-
fied and connected component labeling is done to classify the sub regions in the
image.
Faces are detected from these detected skin regions. Figure 30.1 shows detected
face regions from skin areas. Frontal looking faces with neutral expressions called
normal faces and faces with set of non-neutral expressions form the database. There
were ‘K’ frames with ‘N’ expressions for each face so that ‘K x N’ face images
were used as the database. Normalization is done to make the frames with uniform
scale. The normalized faces are shown in Fig. 30.2.
Facial expressions are highly reflected in eyes and mouth regions. Fourteen mark-
ers as mentioned in [11] are used to automatically select for registering important
facial features. A triangulation method is applied to fit the mask on the faces. The
marking points represented as white dots are shown in Fig. 30.3. The coordinates
are used for further verification.
358 P.V. Lekshmi et al.
Fig. 30.1 (a) Skin region identified, (b) connected component labeling, (c) detected face regions
Face recognition has long been a primary goal of computer vision, and it has turned
out to be a challenging task. The primary difficulty in attempting to recognize faces
from image gallery comes from immense variability of face appearance due to sev-
eral factors including illumination, facial expressions, and view points of camera,
poses, and occlusions. This method treated face recognition as a 2-D recognition
problem.
30 Facial Expression Analysis Using PCA 359
PCA is a useful statistical technique that has found applications in the fields such
as face recognition; image compression, etc. It is a common technique for finding
patterns in the data of high dimensions. In PCA, face images are projected into
feature space or face space. Weight vector comparison was done to get the best
match.
Let the training set of face images be X1 , X2 , X3 ,. . . ,Xn, then the average set or
mean of faces be defined as
Pn
Xi
i D1
mD (30.1)
n
Notice that the symbol m to indicate the mean of set X . The average distance of
each face from the mean of the data set is given by
Q1 D X1 mI Q2 D X2 m: : :Qn D Xn m (30.2)
C D A A0 (30.3)
Eigen values and Eigen vectors are calculated for the covariance matrix. All the face
images in the database are projected in to Eigen space and weight for each image is
calculated.
Then image vectors for each face image is obtained as
X10
Image Vector D weight.i / Eigevector.i / (30.4)
i D1
Usually while using PCA, normal images are used as reference faces. To overcome
the large variations in transformations, mean image is used as the reference face.
Ten face image sequences are used here, each with five facial expressions as the
database. Mean faces of five key frames with peak expressions and Eigen faces are
shown in Figs. 30.4 and 30.5.
The probe image is also subjected to preprocessing steps before projecting it
into feature space. The weight vector is calculated to identify the image from the
database with closest weighting vector.
So far a reference face for each testing face is identified. After recognizing the
face, the coordinates of the facial features are extracted as explained in Section 30.1.
The coordinates of each testing face is compared with its reference face by calcu-
lating the mean square error between the testing face and all the prototypes of same
individual. This mean square error tells us how far the expression on the testing face
is from each type of expressions and thus can be used to classify expressions.
360 P.V. Lekshmi et al.
30.4 Results
Ten face image sequences from FG-NET consortium with variations in lighting con-
ditions, small orientations, wearing glasses, heavy expressions, etc. selected. The
expressions used were ‘happy’, ‘sad’, ‘normal’, ‘surprise’, and ‘anger’. Frames with
peak expressions from color face image sequences were extracted. Face regions
were identified using skin color. Eigen faces were used for recognizing faces. These
faces were converted to gray scale and normalized to a size 192X192. Fourteen
points were marked at the highly expression reflected face regions. The face images
in the database and the test image were projected to face space for face recognition
and their coordinates were verified to identify which expression belongs to the test
face. The performance ratios are 100% for expression recognition from extracted
faces, 88% for expression recognition from frames and 88% for the combined
recognition. As a comparison two experiments were conducted on theses face
images. Gabor wavelet transformation is applied to all these coordinates and the
resultant Gabor coefficients were projected to feature space or PCA space.
As the second part, the whole face images were subjected to Gabor wavelet
transformation. Figure 30.6 shows Gabor faces. The high dimensional Gabor coef-
ficients were projected to PCA space to reduce he dimension. Mean square error
Anger Y Y Y Y Y Y Y Y Y Y
Happy Y Y Y Y Y Y Y Y Y Y
Neutral Y Y Y Y Y Y Y Y Y Y
Sad Y Y Y Y Y Y Y Y Y Y
Surprise Y Y Y Y Y Y Y Y Y Y
is calculated between feature vectors of test image and all the reference faces. The
performance ratio for the application of Gabor wavelets on extracted features is 86%
and to the whole image is 84%.
A comparison was made between the approaches for different databases are given
in the Tables 30.1, 30.2 and 30.3. Table 30.4 and 30.5 shows the expression recog-
nition performance by applying Gabor wavelets to the extracted fudicial points and
to the whole image.
30 Facial Expression Analysis Using PCA 363
Table 30.5 Expression recognition by applying Gabor wavelet to the whole image
Expressions I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
Anger Y Y Y N Y Y N Y Y Y
Happy N Y Y Y Y Y N Y Y Y
Neutral Y Y Y Y Y Y Y N Y Y
Sad Y Y Y N Y Y Y Y Y Y
Surprise Y Y Y Y Y N Y Y Y N
30.5 Conclusion
In this paper, face recognition and expression classification from video image
sequences are explained. Frames are extracted from image sequences. Skin color
detection method is applied to detect face regions. A holistic based approach in
which whole face is considered for the construction of Eigen space. As a first step,
images are projected to PCA space for recognizing face regions. After recognizing
the face, our system could efficiently identify the expression from the face.
To compare the performance, two experiments are also conducted. Gabor wavelet
transformation is applied to the extracted fudicial points and to the images as a
whole. The Gabor coefficients are projected to feature space or face space for
comparison with that of test image.
This logic performs well for recognition of expressions from face sequences. The
computational time and complexity was also very small.
Acknowledgements We thank FG-NET consortium [13] for providing us the database for the
experiment.
References
1. Rama Chellappa, Charles L. Wilson and Saad Sirohey, “Human and Machine Recognition of
Faces: A Survey”, In Proceedings of IEEE, Vol. 83(5), 705–740, 1995
2. Stefano Arca, Paola Campadelli and Raffaella Anzarotti, “An Automatic Feature Based Face
Recognition System”, In Proceedings of the 5th International Workshop on Image Analysis
for Multimedia Interactive Services (WIAMIS 2004)
3. Matthew Turk and Alex Pentland, “Face Recognition Using Eigen Faces”, In Proceedings of
IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–591, 1991
4. Sami Romdhani, Shaogang Gong and Alexandra Psarrou, A Multi-view Nonlinear Active
Shape Model Using Kernel PCA, BMVC99
5. Hichem Sahbi, Nozha, Boueimma, Tistarelli, J. Bigun and A.K. Jain (Eds), “Biometric
Authentication” LNCS2539, Springer, Berlin/Heidelberg, 2002, pp. 112–1206
6. Sinjini Mitra, Nicola A. Lazar and Yanxi Liu, “Understanding the Role of Facial Asymme-
try in Human Face Identification”, Statistics and Computing, Vol. 17, pp. 57–70, 2007. DOI
10.1007/s 1222–006–9004–9
364 P.V. Lekshmi et al.
7. Marian Stewrt Bartlett, R. Javier, Movellan and Terrence J. Seinowski, “Face Recognition by
Independent Component Analysis”, IEEE Transactions on Neural Networks, Vol. 113, No. 6,
Nov. 2002
8. Tsuyoshi Moriyama, Takeo Kanade, Jing Xiao and Jeffrey F. Cohn, “Meticulously Detailed
Eye Region Model and It’s Application to Analysis of Facial Images”, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 28, No. 5, May 2006
9. Hui Li, Hui Lin and Guang Yang, “A New Facial Expression Analysis System Based on Warp
Images”, Proceedings of Sixth World Congress on Intelligent Control and Automation, Dalian,
China, 2006
10. Xiaoxing Li, Greg Mori and Hao Zhang, “Expression Invariant Face Recognition with Expres-
sion Classification”, In Proceedings of Canadian Conference on Computer and Robot Vision
(CRV), pp. 77–83, 2006
11. Y. Kosaka and K. Kotani, “Facial Expression Analysis by Kernal Eigen Space Method Based
on Class Features (KEMC) Using Non-linear Basis for Separation of Expression Classes.”
International Conference on Image Processing (ICIP), 2004
12. Hong-Bo Deng, Lian-Wen Jin, Li-Xin Zhen and Ian-Cheng Huang, “A New Facial Expression
Recognition Method Based on Local Gabor Filter Bank and PCA Plus LDA”, International
Journal of Information Technology, Vol. 11, No. 11, 2005
13. Frank Wallhoff, Facial Expressions and Emotion Database http://www.mmk.ei.tum.de/
waf/fgnet/feedtum.html,Technische Universität München 2006
Chapter 31
The Design of a USART IP Core
31.1 Introduction
Interest in embedded systems has grown drastically in recent years; the world mar-
ket for embedded software will grow from about $1.6 billion in 2004 to $3.5 billion
by 2009, at an average annual growth rate (AAGR) of 16%. On the other hand,
embedded hardware growth will be at the aggregate rate of 14.2% to reach $78.7
billion in 2009, while embedded board revenues will increase by an aggregate rate
A. H. El-Mousa (B)
University of Jordan Computer Engineering Department,
Faculty of Engineering & Technology, University of Jordan, Amman, Jordan
E-mail: elmousa@ju.edu.jo
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 365
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
366 A.H. El-Mousa et al.
of 10% [1]. At the same time, embedded systems are increasing in complexity and
more frequently they are also networked. As designs become more complex, embed-
ded systems based on FPGA are required to interact with software running on stock
commercial processors [2]. This interaction more often than not makes use of a
serial communications transmission link. Design techniques based on hardware-
software co-design are generally implemented on platforms that utilize FPGAs as
accelerators together with embedded CPU cores for control and operational proce-
dure definition [3]. Frequently these platforms also require high speed serial data
communication blocks.
USARTs have been around for years and they have become established for easy
and simple serial transmission. However, most of these take the form of hardwired
specialized ICs which make them unattractive and unsuitable for use in recent
embedded systems; especially those utilizing FPGA technology, since they can-
not be incorporated within the HDL design. Also, most are designed with limited
features and capabilities; for example: limited speed and no capability for use in
multi-drop networks. Attempts at the design of an HDL-based USART have been
reported in the literature. Many are just HDL implementation of the well known
industry standard 16550 UART without additional features [4–6].
This chapter presents the design, implementation and verification of a high speed
configurable USART suitable to be used on platforms that utilize FPGAs. The
architectural design allows serial communications in multi-drop networks using
9-bit operational mode using master-slave operation. It also features configurable
high speed transmission rates and transmission error detection and recovery. User
configuration is accomplished through specialized internal registers. The design is
suitable to be used for inter-chip, inter-processor, and inter-system communications
among others. The design implements both modes of operation synchronous and
asynchronous.
The rest of the chapter is organized as follows: Section 31.2 presents a general
description of USART theory of operation and describes the modes of operation.
Section 31.3 provides a detailed description of the specifications of the developed
USART. Section 31.4 discusses the design methodology followed and provides
a description and operational features of the various blocks used in the design.
Section 31.5 is concerned with the testing and verification procedures followed.
USARTs operate as parallel to serial converters at the transmitter’s side where data
is transmitted as individual bits in a sequential fashion whereas at the receiver’s
side, USARTs assemble the bits into complete data and thus act as serial to parallel
converters.
USARTs mandate preliminary configuration of data format, transfer rates and
other options specific to the design model. The user can send individual or a block
of data, whose size is determined by the design model, to the transmitter section of
31 The Design of a USART IP Core 367
the USART. Data is stored in the internal buffer and framed according to the transfer
mode and user defined options. Finally, the frame is sent to its destination one bit
at a time. On the receiver side, the data frame bits are received and sampled. The
extracted data from the received frame resides in the internal receiver buffer waiting
to be read. The receiver monitors the reception for possible errors and informs the
recipient of their existence should the proper control bit(s) be set. Most USARTs
offer a status monitoring mechanism via dedicated status register(s).
There are two major modes of operation in USARTs: synchronous and asyn-
chronous modes. The latter prevails in most applications.
In synchronous mode, both the transmitter and receiver are synchronized to the same
clock signal which is usually generated by the transmitter and sent along with the
data on a separate link to the receiver. The receiver in turn utilizes the received
clock in extracting the timing sequence and determining the beginning and end of
the received bit interval. Therefore, the receiver knows when to read the bit’s value
and when the next bit in the sequence begins. However, when the line is idle (i.e. no
data is exchanged), the transmitter should send a fill character.
A synchronous communication session begins by sending one or more synchro-
nizing frames to indicate the beginning of transmission then the sequence of data
frames follow (a parity bit is appended to the end of the data frame if single error
detection is required), transmission is terminated by sending a stop frame after
which the line returns to the idle state.
An idle line remains at a predetermined level. The frame consists of a start bit
which differs in polarity to that of the line’s idle state, followed by the data bits and
a parity bit – if single error detection is used – and ends with at least one stop bit
which has the same polarity as that of an idle line. A stop bit might be followed by
another frame – back to back transmission – or an idle state of transmission. Both
the transmitter and receiver are preconfigured to run at the same fixed clock rate
which is an exact multiple of the required baud rate. Once the receiver recognizes
the transition from the idle state polarity to the opposing polarity, it waits a half
bit interval duration and verifies the presence of a start bit, if start bit arrival is
confirmed, the receiver reads the values of the bits every full bit-width interval until
the reception of a stop bit is confirmed denoting end of frame.
368 A.H. El-Mousa et al.
The methodology adopted in the USART system design was based on systems and
software engineering approaches. It used a variation of both the waterfall and incre-
mental approaches suited to the design environment and constraints. The design
steps that the system went through are [7]:
1. System Requirements Definition. The requirements definition phase specified the
functionality as well as essential and desirable system properties. This involved
the process of understanding and defining services required from the system and
identifying constraints on its operation.
2. System/Subsystem Design. This phase was concerned with how system func-
tionality was to be provided by the components and how system specification
was converted into an executable system specification.
3. Subsystems Implementation and Testing. Here subsystems were implemented
and mapped into hardware code using HDL. In this stage, individual modules
were extensively tested for correct functional operation.
4. System Integration. During system integration, the independently developed
subsystems were merged together to build the overall USART system in an
incremental approach.
5. System Testing. The integrated system was subjected to an extensive set of tests
to assure correct functionality, reliability and performance.
Figure 31.1 shows the major parts of the USART system:
The input/output signals involved are:
Sys. clock: the oscillator clock Reset: master reset of the system
Serial out: transmitted data Serial in: received data
TXInt: transmitter interrupt RXInt: receiver interrupt
ErrInt: error interrupt
31 The Design of a USART IP Core 369
Support for the following transmission modes Full duplex mode is employed in modern
(programmable): Asynchronous (full/half systems while half duplex mode is retained
duplex modes) Synchronous (full/half for compatibility with legacy systems
duplex modes)
Supports a wide range of High frequencies are essential for high speed
transmission/reception rates (from 50 Hz communication. Lower speeds are needed to
to 3 MHz) communicate with older USARTs. Moreover,
lower speeds can be used to minimize cross
talk if similar links are adjacent
Eight-level transmitter/receiver buffer To account for the high speeds of communication
that the USART can reach, blocks of data can
be received and buffered until read by the
user. Also, this allows the user to issue the
transmission of a block of eight-frame size in
a single operation. This will also reduce the
load on the module that controls the USART
operation in the system
Parity supported (programmable – Single error detection techniques might prove
enable/disable parity and odd/even parity) beneficial in noisy operation environments
Variable data lengths supported Byte communication is the modern norm. Five to
(programmable – 5–8 bits) seven bits data length is to retain
compatibility with legacy systems
Variable stop bits supported (asynchronous This is to comply with the RS232 standard where
mode) (programmable – 1 or 2 stop bits) two stop bits mode is used to accommodate
slightly different clocks in the transmitter and
receiver sides when USARTs from different
vendors are connected together
Error detection of the following errors: Parity Parity error detection provides a measure of the
error Overrun error Framing error reliability of communication. Framing error
(asynchronous mode) detection indicates the necessity of
reconfiguring the internal clocking sources at
both ends correctly. Finally, overrun error
informs that data has been lost and the need
to frequently read the received data
Supports interrupts (programmable – with Most modern systems are interrupt-driven for the
ability of global masking) reason that interrupt techniques save
processing clock cycles in comparison with
polling techniques and are essential in real
time applications
Supports addressability (8-bit universal – Used in industrial and applications in multi-drop
addresses up to 256 devices) while networks where a master can communicate
sending 9-bits with a certain other slave
370 A.H. El-Mousa et al.
Figure 31.2 shows the functional block diagram of the transmitter module.
It consists of the following sub modules:
– Transmitter buffer – Parity generation logic
– Bypass logic – Shift logic
– Transmit shift register (TSR) – TSR empty detection logic
In the following, due to space constraints, we only show the structure of the
transmitter buffer.
The transmitting buffer is the memory where data to be sent are stored waiting for
the transmit shift register (TSR) to be empty. It becomes an essential component
when the inter arrival time of transmitted frames becomes very small. Moreover,
when the system is accessed using a processor/DSP that operates at a higher
31 The Design of a USART IP Core 371
frequency than the transmission baud clock, this buffer will reduce the number of
times the processor is interrupted to request for new data. The signal TDR empty is
generated to indicate that the buffer has at least one empty slot. Figure 31.3 shows
the input/output signals associated with the transmitting buffer while Fig. 31.4
shows the data flow diagram for it.
372 A.H. El-Mousa et al.
In the asyn. mode, the receiver waits for a transition from a mark to space after an
idle line state or an expected stop bit to initiate the data reception logic provided that
the transition is not caused by a noise notch. This is ensured by sampling each of the
received bits at three different times and then using a majority detection circuit. The
end of asynchronous reception is detected by waiting for a stop bit at the end of the
frame.
In the synch. mode, instead of waiting for logic transition, the receiver waits for a
synchronizing character which if received after an idle state line, a start of reception
is detected. An end of reception is signaled if a certain stop character is received.
When data reception is detected at the serial in input, the internal receiver logic is
enabled and the received data bits are serially shifted into the Receiver Shift Register
(RSR). Meanwhile, the parity is calculated per each received bit for the received
word and finally compared to the received parity value. Parity and framing errors
are evaluated and stored along with the data in the receiver buffer. The buffer can
hold up to seven received words, if data are received while the buffer is full, the data
is dropped and an overrun error is indicated.
If 9-bit address detection mode is enabled, transmission is fixed at eight bits and
the ADD-Data bit is substituted for parity. The address of the receiving node must
be received with ADD-Data bit is set to “1” in order for the frame to be considered
an address frame.
In the synchronous addressable mode of operation, a synchronizing character
with ADD-Data bit value set to zero must be initially received, followed by a frame
containing the address of the receiving node but with ADD-Data bit value set to one,
followed by the data frames with ADD-Data bit reset again. Figure 31.5 shows the
functional block diagram of the receiver.
31.4.2.1 Rx Buffer Write with Synch and Stop Character Detect Logic
This sub-module is responsible for generating an internal signal to write the received
frames in RSR, together with their corresponding parity and framing information,
into RDR buffer only if certain conditions are met. In synchronous mode of opera-
tion, this involves checking for the reception of valid sync and stop characters that
delimit a block of consecutive frames. If 9th bit mode is used, all received frames
are dropped if the USART is not the targeted node, which is indicated by receiving a
valid address frame prior to receiving data. Figure 31.6 shows the data flow diagram
for this sub-module.
The proposed USART design was implemented and tested using firmware from
Xilinx Corporation. For software, the free ISE Webpack version 8.2i was used [8].
As for hardware, different development kits were used. These include: Digilab 2
XL (D2XL) Development Board [9], Digilent Spartan-3 System Board [10], and
Spartan-3E Starter Kit Board [11]. Since the design was entirely implemented using
a universal hardware description language, Verilog HDL, it is expected to be directly
interoperable with any environment provided by other vendors.
In general, the project went through several phases during the testing process as
illustrated in Fig. 31.7.
A unit test is a test of the functionality of a system module in isolation, and
the unit test should be traceable to the detailed design. A unit test consists of
a set of test cases used to establish that a subsystem performs a single unit of
31 The Design of a USART IP Core 375
functionality to some specification. Test cases should be written with the express
intent of uncovering undiscovered defects [12].
After the units of a system have been constructed and tested, they are integrated
into larger subcomponents leading eventually to the construction of the entire sys-
tem. The intermediate subsystems must be tested to make sure that components
operate correctly together. The purpose of integration testing is to identify errors in
the interactions of subsystems [12].
The primary goal of an acceptance test is to verify that a system meets its require-
ment specification. To demonstrate that the system meets its requirements, it must
be shown that it delivers the specified functionality, performance and dependabil-
ity, and that it does not fail during normal use. Ideally, the acceptance test plan is
developed with the engineering requirements and is traceable to them. Acceptance
testing is usually a black-box testing process where the tests are derived from the
system specification. The system is treated as a black box whose behavior can only
be determined by studying its inputs and the related outputs [7].
Before connecting the USART to an external system, it was tested at first by
connecting the serial output of its transmitter section to the serial input of its receiver
section to locate any potential errors within the system itself. Similarly, the clock
output of the transmitter section was connected to the clock input of the receiver
section in the synchronous mode.
The entire system was implemented on two Xilinx development boards. Both
were programmed with the same USART design. However, one was used to transmit
data, while the other was used to receive the sent data. Special temporary modifi-
cations to the internal design were implemented to allow certain internal signals to
be observed with a digital storage scope. Initially, data was exchanged with the PC
using the EPP mode of the parallel port. The PC was used to configure the USARTs
with the different communication options by sending the appropriate control words
to the respective registers and also to supply the data to be transmitted serially.
From this test, more indications were obtained that the system complies with its
requirements specification. Error conditions reflected the true state of the received
data when the two USARTs were deliberately configured with conflicting commu-
nications options. Moreover, the USARTs functioned as expected in the 9-bit mode
of operation.
Figure 31.8 illustrates a snapshot of a communication session that was estab-
lished between the two USARTs.
In performance testing, an effective way to discover defects is to design tests
around the limits of the system; that is, stressing the system (hence the name stress
testing) by making demands that are outside the design limits of the system until the
system fails.
During the test cases described previously, the USART design was subjected to
some extreme conditions to explore various aspects of the limits of its operation. For
example, the output of the baud rate generator was examined at the highest baud rate
possible, and the buffers were tested with increasing read and write speeds. More-
over, the operation of the entire USART was checked while operating at the highest
baud rate possible when two systems on separate boards were connected together,
376 A.H. El-Mousa et al.
Fig. 31.8 Asynchronous 9-bit address transmission and detection between two USARTs
as well as when the transmitter section was used to drive the receiver section of the
same system.
References
1. Ravi Krishnan, Future of Embedded Systems Technology, BCC Research Report, ID:
IFT016B, June 2005.
2. John A. Stankovic, Insup Lee, Aloysius Mok and Raj Rajkumar, Opportunities and Obliga-
tions for Physical Computing Systems, IEEE Computer Magazine, 11, pp. 23–31 (2005).
3. Wayne Wolf, High Performance Embedded Computing (Elsevier, Amsterdam, Netherlands
2007), pp. 383–387.
4. Mohd Yamani Idna Idris, Mashkuri Yaacob and Zaidi Razak, A VHDL Implentation of UART
Design with BIST Capabilty, Malaysian Journal of Computer Science, 19 (1), pp. 73–86
(2006).
5. Azita Mofidian, DesignWare Foundation DW 16550: A fine work of UART, Designware
Technical bulletin, Technical Forum for Design Automation Information, 4 3 (1999);
http://www.synopsys.com/news/pubs/dwtb/q499/dwtb art1.html
6. Shouqian Yu, Lili Yi, Weihai Chen and Zhaojin Wen, Implementation of a Multi-channel
UART Controller Based on FIFO Technique and FPGA, Proceedings of 2nd IEEE Conference
on Industrial Electronics and Applications ICIEA 2007, pp. 2633–2638 (2007).
7. Ian Sommerville, Software Engineering, 8th Edition (Addison-Wesley, England, 2007).
8. http://www.xilinx.com/ise/logic design prod/webpack.htm
9. Digilent D2XL System Board Reference Manual;Revision: (June 9, 2004); www.digilentinc.
com
10. Spartan-3 Starter Kit Board User Guide, version 1.1: (May 13, 2005); www.xilinx.com
11. Spartan-3E Starter Kit Board User Guide, version 1.0: (March 9, 2006); www.xilinx.com
12. Ralph M. Ford and Chris Coulston, Design for Electrical and Computer Engineers (McGraw-
Hill, New York, 2005).
Chapter 32
Multilayer Perceptron Training Optimization
for High Speed Impacts Classification
32.1 Introduction
There are a wide number of systems which, during their service life, can suffer the
impact of objects moving at high speed (over 500 m/s). This phenomenon is named
impact ballistic and the most characteristic examples are found in the military field.
However over the last decades, this kind of problems has become of interest in civil
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 377
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
378 A. Garcia-Crespo et al.
applications. In them, the structural elements are required to absorb the projectile
energy so that it does not damage critical parts of the global system.
Due to this, there are new possibilities in similar fields, among which passive
safety in vehicles stands out. In this field, it is especially relevant the design of
structures whose mission is to absorb energy in crashes of Crashworthiness type
(200 m/s speed 500 m/s), as well as those that can appear in road or railway
accidents, helicopters’ emergency landings, etc. Therefore, what is being sought is
to design structures capable of absorbing energy to avoid or lessen the damages to
the passengers of the concerned vehicles.
The construction of structures subjected to impact was traditionally carried out
empirically, relying on real impact tests, each one using a given configuration of
projectile and target. The mathematical complexity of solving the equations that rule
the impact phenomenon, and the relative ignorance of the mechanical behaviour of
the materials at high strain rates, discouraged any simulation of the problem.
The need for design tools to simulate this process triggered the development in
recent years of a large number of models of different types; all of them belong
to two families: analytical modelling and numerical simulation. Thus, the use of
expensive experimental tests has been postponed to the final stage of the design. All
the previous stages can be covered by the use of this kind of simulation tools.
Taking into account the difficulties of these methods, poor precision and high
computational cost, a neural network for the classification of the result of impacts
on steel armours was designed. Furthermore, the numerical simulation method was
used to obtain a set of input patterns to probe the capacity of the model development.
In the problem tackled with, the available data for the network designed include, the
geometrical parameters of the solids involved – radius and length of the projectile,
thickness of the steel armour – and the impact velocity, while the response of the
system is the prediction about the plate perforation.
Artificial Neural Networks (ANNs) are statistical models of real world systems
which are built by tunning a set of parameters. These parameters, known as weights,
describe a model which forms a mapping from a set of given values, the inputs, to
an associated set of values, the outputs.
The mathematical model that the neural network builds is actually made up of
a set of simple functions linked together by the weights. The weights describe the
effect that each simple function (known as unit or neuron) will have on the overall
model [1].
Within the field of research of this article, the ANNs have been applied success-
fully within the limits of the fracture mechanic [2] to estimate material breakage
parameters such as concrete [3], and in no-destructive tests to detect breaks in more
fragile materials [4]. In all these applications, the input data needed for the training
has been obtained by means of experimentation and numeric simulation.
Nevertheless there are not enough studies dedicated to the application of ANN
to solve problems of ballistic impacts. At the present time the investigations made
have focused on low speed impacts or situations where it is necessary an energy
absorption (Crashworthiness) [5–7].
Therefore, and due to the mentioned experience on the mechanical problems and
on the good results shown on the previous researches [8], a MultiLayer Perceptron
(MLP) neural network with backpropagation algorithm has been defined. The most
important attribute of this kind of ANN is that it can learn a mapping of any
complexity or implement decision surfaces separating pattern classes.
An MLP has a set of inputs units whose function it is to take input values from the
outside, a set of outputs units which report the final answer, and a set of processing
hidden units which link the inputs to the outputs. The function of the hidden neurons
is to extract useful features from the input data which are, in turn, used to predict
the values of the output units.
MLPs are layered neural network, see Fig. 32.2, that means they are based on
several layers of adaptive weights and neurons. There are three different layers: the
input layer, at least one hidden layer and finally the output layer. Between the units
or neurons that compose the network there are a set of connections that link one to
each other. Each unit transmits signals to the ones connected with its output. Asso-
ciated with each unit there is a output signal or transference signal, that transform
current state of the unit into an output signal.
The feedforward connections between the units of the layers in a MLP represent
the adaptive weights that permit the mapping between the input variables and the
output variables [9].
Different authors have showed that MLP networks with as few as a single hidden
layer are universal approximators, in other words ANN are capable to approximate,
with accurate, arbitrary regions, if enough hidden units are available [10, 11].
32 Multilayer Perceptron Training Optimization for High Speed Impacts Classification 381
Within the limits this article deals with, there are different variables or parameters
that characterize the behaviour of the projectile when it impacts on an steel armour.
Therefore, there are various parameters to shape the input data to the network, and
these define the number of neurons of the input layer.
The available variables are: kinetic energy (K), velocity (V), radius (R), length
(L), mass (M) and quotient L/R, being all of them related to the projectile, and on
the other hand the thickness of the objective (H).
However, the use of all the available variables is not always necessary to carry
out the training. In some cases, an information overload or the existing connections
between the variables can saturate the prediction model that adjusts the output of the
network, therefore complicating the learning and reducing the generalization rates.
In this domain, K is correlated with V and M. On the other hand, the ratio L/R
and M are correlated with R and L because density is constant. So for the neural
network performance it is better to remove K and ratio L/R from the list of available
variables.
Moreover, in order to support the latter assertion, a series of studies were
designed to measure the influence that each variable has, separately and in con-
nection with the remainder, on the learning and the network generalization ability.
382 A. Garcia-Crespo et al.
For the network design, a three level MLP architecture was selected. According to
Lippman’s studies, this type of structure allows to solve most of the problems, being
able to form any complex random limit of decision [12].
The number of inputs in a MLP is determined by the number of available input
parameters in the problem dealing with. The number of outputs is the same as the
number of classes needed to solve the problem.
The number of hidden layers and the number of neurons of these layers have
to be chosen by the designer. There is not a method or rule that determines the
optimum number to solve a problem given. In some cases these parameters are
determined by test and error. However, there are current techniques for obtaining
automatic estimations [13], although this research follows the steps described by
Tarassenko [14].
The neural network was created with the commercial software Neurosolutions
for Excel v4 and its architecture is shown in Fig. 32.3.
Five neurons in the input layer linked with the five input variables (R,L,H,M,V).
The chosen transference function is the identity.
Four neurons in the hidden layer with hyperbolic tangent transference function.
Two neurons in the output layer associated to the output variable plate perforation
(both outputs are complementary as it allows to improve the learning process).
The chosen transference function in this case is the hyperbolic tangent.
In the light of the results obtained by Garcia et al. in 2006 [8], the neural networks
present themselves as an alternative to be borne in mind to recreate impact prob-
lems. The network results reliable to predict the projectile arrest with the advantage,
opposing to the use of simulation tools, that it boasts a computational cost very
inferior once the training has been carried out.
384 A. Garcia-Crespo et al.
However, these conclusions lead to another question. Could there be a more effi-
cient ANN model? That is to say a model that would need less input variables and
input data to be able to approximate the output function correctly.
The research carried out to solve this question established two fundamental goals,
first, to minimize the number of training data without affecting the generalization
ability of the model; and second, to analyse the influence of the different input
variables on the network learning ability.
These input variables form the independent variables of the function that allows
to approximate the perforation function. The result of this answer function is a
dichotomic variable that depends on the input variables, and admits “Yes” or “No”
values.
Hence it is possible to infer that the easier the function that describes the net-
work’s behaviour is, the less costly will be generate the set of training data through
numeric simulation.
The type of heuristic selection of input variables and training set size selected is
well documented in the early literature of Neural Networks within other domains,
and the results obtained certifies the suitability of this election [17].
The results obtained in the former research [8], present the uncertainty if they
depend on a possible randomness of the data intended for training and testing. In
other words, if the assignment of the available data influence on the predictions of
the network.
To carry out a research with conclusive results, it was established a number of
trials by far superior to the previous work. The study carried out is broken out in a
series of experiments, in which the number of patterns intended for training and test
varies in each series. This way, the minimum number of data needed to ensure an
acceptable percentage of correct answers will be found.
It has been accomplished 100 different trials in each experiment, exchanging in
each of them and in a random way the data destined to train and test the network.
Moreover, in two different trials of a same experiment the same training and test
data will never coincide, therefore increasing the reliability on the results obtained.
Thanks to this, the possible random data that are provided to the network, which
could lead to obtain not very reliable results, are eliminated.
The catalogue of experiments to be done in the study is the following: the first
one includes 100 trials with 50 training data and 10 testing data. From this point
onwards, in each of them the number of data intended to training is reduced to 5.
The last experiment consists of 100 trials with 15 training data and 10 testing data.
The result obtained in each one of these 100 trails is processed and the average
error porcentage is showed for each experiment. In this way, the results obtained
in each of the eight experiments can be compared. Thanks to this, it can be deter-
mined how many training data are necessary to achieve good generalization rates;
32 Multilayer Perceptron Training Optimization for High Speed Impacts Classification 385
and besides, it ensures that the possible random data of the input patterns does not
influence on these results.
The second part of this study is focused on the analysis of the influence of each
variable within the training process and, therefore, of the mathematical model that
regulates its output.
So the main question is to find the explanatory variables, in other words, to
include in the model just those variables that can influence on the answer, rejecting
those ones that do not provide information. It is also necessary look for the possible
interactions between the independent variables that affect the answer variable. For
example, this work will analyse the results obtained by including the thickness and
the velocity separately, as well as when they are both included.
32.5 Evaluation
The data shown in Table 32.1 are the average error percentages that the network
makes for each experiment. The different experiments are grouped in columns,
where it is indicated the parameters or variables used, e.g the column “All-H” points
out that the experiment uses all the variables except the parameter H. One result
example in Table 32.1 is the following one: in the model with all the variables, the
value 3,2 is the average error of the network for 100 trials with 50 data to train and
10 to testing.
Table 32.1 Error percentage obtained by varying the input variables and the number of training
data
Train. All All All All All All All All All
data -H -L -M -R -V -H&M -H&V -H,M&V
Regarding the objective sought, to find the network’s architecture that makes the
smallest error, when the data is analysed it could be established that the more input
variables the network has, the better it learns.
However, it can occur that the networks saturation, that is to say to introduce
all the available parameters, is not the best option. In this problem, this network
architecture has the lowest error probability only for 50 and 45 train data. In addition
to this, there are two designs that have lower average probability of error, the ones
without mass and without length. Specifically the design that omits the mass as a
network learning variable is the one that has the smallest average error.
On the other hand, for the average of the error obtained, it can be observed that
the velocity is a very relevant magnitude for the network learning. Its omission leads
the error probability to increase 342%, or what is the same, 4.4 times in relation
to the network’s design considered as the most efficient (architecture of network
without mass).
Finally as expected, the network with the worst results is the one with less infor-
mation, that is, with less input variables. However, it is advisable to highlight that
the network without the thickness and mass variables has quite acceptable results
(12.91%).
Taking the best configuration for the most architectures, 50 train and 10 testing,
the Figure 32.4 depicts the average error percentage made for each network archi-
tecture in which some variable is removed with regard to the network that holds all
of them.
Table 32.2 portrays the differences of the error percentages for each of the
network designs with regard to the one with the minimum error. As it has been
Fig. 32.4 Error made in each architecture for the configuration 50 train and 10 test
32 Multilayer Perceptron Training Optimization for High Speed Impacts Classification 387
Table 32.2 Deviation percentage made in each simulation in relation to the one that makes the
minimum error
Train. All All All All All All All All All
data -H -L -M -R -V -H&M -H&V -H,M&V
50 0 4,5 0,1 0,8 2,4 23,1 7 26,9 34,7
45 0 6,7 1,2 0,6 0,9 28,8 4,7 27,7 40,3
40 1,3 11,3 0 1,7 2,5 26,1 6,6 32,5 34,4
35 0,8 5,2 2,1 0,7 0 22,4 6,5 31,5 34,6
30 4,4 7,5 0 0,7 3,1 26,3 7,9 36 36,5
25 2,4 2,4 0,7 0 0,6 27,8 7,7 30,2 37,8
20 1,6 2,4 2,8 0 2,4 23,9 5,5 27,3 32,3
15 3,3 1,3 3 0 3,2 30,1 2,3 32,2 34,6
Total 13,8 41,3 9,9 4,5 15,1 208,5 48,2 244,3 285,2
mentioned, the network that includes all the available variables in the learning is
the one that makes the smaller error for 50 and 45 data; whilst the network without
mass is the one that boasts the smallest error for a smaller number of trials. Based
on these results, it can be concluded that the influence of the mass on learning is not
very relevant, and therefore the network that does not include it, it is considered the
most efficient.
32.6 Conclusions
In the light of the results and taking into account that perceptron is one of the sim-
plest topologies, this work shows clearly the possibilities of the neural networks in
the prediction of the material behaviour at high deformation speeds. Furthermore,
the architecture chosen presents a high reliability when predicting the result of a
projectile impact. Moreover, its computational cost once the training has started, is
smaller than the one of the simulations carried out with tools of finite elements.
It is crucial to highlight the small number of training data needed for the network
to learn with a relative small error in its predictions. With only 40 numeric simula-
tions of finite elements, the network obtains an error below 6% in both designs with
the best results (with all the variables and without the mass variable). In spite of the
small number of input data, the network does not present overlearning problems.
The experiments developed help to better understand the ballistic impact, analy-
sing the influence of the parameters that appear in the penetration problem. The
results obtained verify that the variable with the most influence is the velocity. Fur-
thermore, any network architecture where the velocity variable is removed obtains
error percentages quite high, what confirms this variable importance.
On the other hand, the research determines that the influence of the mass on learn-
ing is not very relevant. Therefore and taking into account its numeric simulation
cost, the network without this variable is considered the most efficient.
388 A. Garcia-Crespo et al.
The network with the worst results is the one with less information, that is to
say, with less input variables. However, it is advisable to highlight that the network
without thickness and mass variables has quite acceptable results taking into account
the little information it receives in comparison with the rest.
Finally, the knowledge acquired as a result of this research, can be spread out to
other fields of great practical interest. Among these, the design of structural compo-
nents of energy absorption stands out, being of great relevance in the field of passive
safety in vehicles.
References
1. K. Swingler. Applying Neural Networks: A Practical Guide. Morgan Kaufmann, San Fran-
cisco, CA 1996.
2. Z. Waszczyszyn and L. Ziemianski. Neural networks in mechanics of structures and materials.
New results and prospects of applications. Computers & Structures, 79 IS - 22-25:2261 EP–
2276, 2001.
3. R. Ince. Prediction of fracture parameters of concrete by artificial neural networks. Engineer-
ing Fracture Mechanics, 71(15):2143–2159, 2004.
4. S.W. Liu, J.H. Huang, J.C. Sung, and C.C. Lee. Detection of cracks using neural networks and
computational mechanics. Computer methods in applied mechanics and engineering (Comput.
methods appl. mech. eng.), 191(25-26):2831–2845, 2002.
5. W. Carpenter and J. Barthelemy. A comparison of polynomial approximations and artifi-
cial neural nets as response surfaces. Structural and multidisciplinary optimization, 5(3):166,
1993.
6. P. Hajela and E. Lee. Topological optimization of rotorcraft subfloor structures for crash
worthiness consideration. Computers and Structures, 64:65–76, 1997.
7. L. Lanzi, C. Bisagni, and S. Ricci. Neural network systems to reproduce crash behavior of
structural components. Computers structures, 82(1):93, 2004.
8. A. Garcia, B. Ruiz, D. Fernandez, and R. Zaera. Prediction of the response under impact of
steel armours using a multilayer perceptron. Neural Computing & Applications, 2006.
9. C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, USA, 1996.
10. G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of
Control, Signals, and Systems, 2:303–314, 1989.
11. K. Hornik and M. Stinchcombe. Multilayer feedforward networks are universal approxima-
tors. Neural Networks, 2(5):359–366, 1989.
12. R. Lippmann. An introduction to computing with neural nets. ASSP Magazine, IEEE [see also
IEEE Signal Processing Magazine], 4(2):4–22, 1987.
13. P. Isasi and I. Galvan. Redes de neuronas artificiales: un enfoque practico. Pearson Prentice
Hall, Madrid, 2004.
14. L. Tarassenko. A guide to neural computing applications. Arnol/NCAF, 1998.
15. B. Widrow. 30 years of adaptive neural networks: perceptron, madaline, and backpropagation.
Proceedings of the IEEE 78, 9:1415–1442, 1990.
16. USA. ABAQUS Inc., Richmond. Abaqus/explicit v6.4 users manual, 2003.
17. M. Gevrey, I. Dimopoulos, and S. Lek. Review and comparison of methods to study
the contribution of variables in artificial neural network models. Ecological Modelling,
160(16):249–264, 2003.
Chapter 33
Developing Emotion-Based Pet Robots
Abstract Designing robots for home entertainment has become an important appli-
cation of intelligent autonomous robot. Yet, robot design takes considerable amount
of time and the short life cycle of toy-type robots with fixed prototypes and repetitive
behaviors is in fact disadvantageous. Therefore, it is important to develop a frame-
work of robot configuration so that the user can always change the characteristics of
his pet robot easily. In this paper, we present a user-oriented interactive framework
to construct emotion-based pet robots. Experimental results show the efficiency of
the proposed framework.
33.1 Introduction
In recent years, designing robots for home entertainment has become an important
application of intelligent autonomous robot. One special application of robot enter-
tainment is the development of pet-type robots and they have been considered the
main trend of the next-generation electronic toys [1, 2]. This is a practical step of
expanding robot market from traditional industrial environments toward homes and
offices.
There have been many toy-type pet robots available on the market, such as Tiger
Electronics’ Furby, SONY’s AIBO, Tomy’s i-SOBOT and so on. In most cases, the
robots have fixed prototypes and features. With these limitations, their life cycle is
thus short, as the owner of pet robots may soon feel bored and no longer interested
in their robots. Sony’s robot dog AIBO and humanoid robot QRIO are sophisticated
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 389
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
390 W.-P. Lee et al.
pet robots with remarkable motion ability generated from many flexible joints [2,3].
But these robots are too expensive to be popular. Also the owners are not allowed
to reconfigure the original design. Therefore, it would be a great progress to have
a framework for robot configuration so that the user can always change the char-
acteristics of his robot according to his personal preferences to create a new and
unique one.
Regarding the design of pet robots, there are three major issues to be consid-
ered. The first issue is to construct an appropriate control architecture by which the
robots can perform coherent behaviors. The second issue is to deal with human-
robot interactions in which natural ways for interacting with pet robots must be
developed [4, 5]. And the third issue to be considered is emotion, an important
drive for a pet to present certain level of intelligence [6, 7]. In fact, Damasio has
suggested that efficient decision-making largely depends on the underlying mecha-
nism of emotions [8]. By including an emotional model, the pet robot can explicitly
express its internal conditions through the external behaviors, as the real living crea-
ture does. On the other hand, the owner can understand the need and the status of
his pet robot to then make appropriate interactions with it.
To tackle the above problems, in this paper we present an interactive framework
by which the user can conveniently design (and re-design) his personal pet robot
according to his preferences. In our framework, we adopt the behavior-based archi-
tecture ([9, 10]) to implement control system for a pet robot to ensure the robot
functioning properly in real time. A mechanism for creating behavior primitives and
behavior arbitrators is developed in which an emotion model is built for behavior
coordination. Different interfaces are also constructed to support various human-
robot interactions. To evaluate our framework, we use it to construct a control
system for the popular LEGO Mindstorms robot. Experimental results show that
the proposed framework can efficiently and rapidly configure a control system for a
pet robot. In addition, experiments are conducted in which a neural network is used
to learn a user-specified emotion model for behavior coordination. The results and
analyses show that an emotion-based pet robot can be designed and implemented
successfully by the proposed approach.
Our aim is to develop a user-oriented approach that can assist a user to rapidly design
(and re-design) a special and personal robot. Robot design involves the configuration
of hardware and software. Expecting an ordinary user to build a robot from a set of
electronic components is in fact not practical. Therefore, instead of constructing a
personal pet robot from the electronic components, in this work we concentrate on
how to imbue a robot with a personalized control system.
33 Developing Emotion-Based Pet Robots 391
Figure 33.1 illustrates the system framework of the proposed approach. As can be
seen, our system mainly includes three modules to deal with the problems in build-
ing a control system for a pet robot. The first module is to design an efficient control
architecture that is responsible for organizing and operating the internal control flow
of the robot. Because behavior-based control architecture has been used to con-
struct many robots acting in the real world successfully, our work adopts this kind
of architecture to design control systems for robots. The second module is about
human-robot interaction. As a pet robot is designed to accompany and entertain its
human partner in everyday life, interactions between the owner and his pet are essen-
tial. Here we develop three ways for human-robot interaction and communication,
including device-based (keyboard and mouse), voice-based and gesture-based meth-
ods. The third module is an emotion system that works as the arbitration mechanism
to resolve the behavior selection problem within the behavior-based control archi-
tecture. With the emotion system, a pet robot can act autonomously. It can choose
whether to follow the owner’s instructions, according to its internal emotions and
body states. Our system is designed to be user-oriented and has a modular struc-
ture. With the assist of the presented system, a user can build his robot according
to his preferences. If he is not satisfied with what the robot behaves, he can change
any part of the control software for further correction. Details of each module are
described in the following subsections.
As in the behavior-based control paradigm [9, 10],the behavior system here takes
the structure of parallel task-achieving computational modules to resolve the
392 W.-P. Lee et al.
control problem. In order to achieve more complex tasks in a coherent manner, the
behavior modules developed have to be well organized and integrated. Inspired by
the ethological models originally proposed to interpret the behavior motivations of
animals, robotists have developed two types of architectures: the flat and the hierar-
chical ones. The former arranges the overall control system into two levels; and the
latter, multiple levels. In the flat type arrangement, all subtasks are independent and
have their own controllers. The independent outputs from the separate controllers
will be combined appropriately in order to generate the final output for the overall
task. As this work means to provide an interactive and easy-to-use framework for
the design and implementation of pet robots, the straightforward way (that involves
less task decomposition techniques), the flat architecture, is chosen to be the default
control structure for a pet.
Using behavior-based control architecture, one needs to deal with the corre-
sponding coordination (or action selection) problem. It is to specify a way to
combine the various outputs from the involved controllers. Our work here takes the
method of command switching that operates in a winner-take-all fashion. That is,
only one of the output commands from the involved behavior controllers is chosen
to take over the control at any given moment, according to certain sensory stimuli.
As mentioned above, we intend to build low cost toy-type robots as embodied
digital pets. Yet, since a toy-type robot only has limited computational power and
storage resources, it is generally not possible to perform all the above computation
on such a robot. Therefore, to implement the above architecture, an external com-
puting environment is needed to build robots. The experimental section will present
how we construct an environment for the LEGO robots.
To communicate with a pet robot, our framework provides two natural ways for
interacting with the robot by using oral or body languages. For the oral communi-
cation, we implement the popular speech-control method, command-and-control, to
communicate with the robot, and adopt the Microsoft Speech API to implement the
speech recognition mechanism in our Windows application. Though more sophis-
ticated language interface can be developed, here we simply parse a sentence into
individual words, distinguish whether it is an affirmative or negative sentence, and
then recognize the user commands from the words.
Gesture recognition is another way used in this work to support natural commu-
nication between human and robots. Gesture recognition is the process by which
the user’s gestures are made known to the system via appropriate interpretation. To
infer the aspects of gesture needs to sense and collect data of user position, configu-
ration, and movement. This can be done by directly using sensing devices attached
to the user (e.g., magnetic field trackers), or by indirectly using cameras and com-
puter vision techniques. Here we take a dynamic gesture recognition approach, and
use digital cameras with image processing techniques for gesture recognition.
33 Developing Emotion-Based Pet Robots 393
command is used to drive the robot. Alternatively, if the behavior controllers can be
pre-uploaded to the on-board computer of the robot, the mapping result will send
an identification signal to trigger the corresponding controller on the robot. Here
the user is allowed to construct the mapping table to decide how to interact with
his robot.
pet robot then makes new decision for behavior selection, based on the modified
quantities of emotions. For example, a pet dog in a hungry state may be angry, may
keep looking for food, and would eat anything as soon as the pet dog finds it. After
that, the dog may not be hungry any more (the body state has been changed). Then
it is happy (the emotion has been changed too) and may want to sleep or fool around
(now new behavior is selected). The above procedure is repeated and the emotion
model continuously works as a decision-making mechanism for behavior selection.
As can be observed, in the above operating model, the most important part is the
one for selecting appropriate behavior at any time. In this work, we use a feedfor-
ward neural network to implement such a mechanism that maps the emotions and
body states of a robot into a set of desired behaviors. To allow the user to determine
the characteristics of his pet, our work also provides an interface by which the user
can define examples to train the neural network to achieve the specified mapping of
emotions and behaviors. Once the neural network is obtained, it works as a behavior
selector to choose appropriate controllers for the robot.
33.3.1 Implementation
To evaluate our approach, two robots have been built and a distributed and net-
worked computing environment has been developed for the robots. The robot used in
the experiments is LEGO Mindstorms NXT 9797. It has light sensors for light detec-
tion, ultrasonic sensors for distance measurement, and micro-switches for touch
detection. In addition, to enhance its vision ability, we equip an extra mini-camera
on the head of the robot so that it can capture images for further interpretation.
Because the LEGO NXT only has limited computational power and memory
devices, a distributed and networked computing environment is thus constructed for
the robot, in which the individual pre-programmed behavior controllers are installed
in the NXT on-board memory, and two PCs are connected through the Internet for
other computation. Here, one PC is responsible for the computation of emotion
396 W.-P. Lee et al.
To verify our approach of emotion-based control, this section describes how the
emotion model can be built, trained and modified. In the experiments, the sim-
ple types of emotions are modeled. They are so-called “basic emotions”, including
“happy”, “angry”, “fear”, “bored”, “shock”, and “sad”. Also three variables, “hun-
gry”, “tired”, and “familiar” are defined to indicate the internal body states of the
robot. As mentioned in Section 33.2.4, the user is allowed to define event proce-
dures and the relevant weight parameters for the above emotions and body states
to describe how the quantities of different emotions vary over time for their own
Fig. 33.5 Two LEGO NXT robots push the red box together
33 Developing Emotion-Based Pet Robots 397
robots. For naı̈ve users, our framework also provides three default emotion models
(aggressive, gentle, and shy) and events as choices to represent different charac-
teristics of a robot. For example, a robot with an aggressive model will change its
emotions rapidly than others. Users can choose a default model from the interface
without extra settings. Experienced users can develop more sophisticated models to
guide the variations of emotions.
Currently, ten basic behaviors are built, including target seeking, barking, wan-
dering, shaking, sniffing, sleeping, escaping, scratching, wailing, and wagging. As
mentioned above, with the emotion model developed, a user can build a behavior
selector manually or automatically and use it to map the emotions and body states
into appropriate behavior controllers. At each time step, the internal emotions and
states of the robot change and the newly obtained values are used as the input of an
emotion model to select behavior controller at that moment. Figure 33.6 shows the
interface that presents the numerical values of the internal and emotion variables
over time during a trial. These values are illustrated to provide information about
the body state and emotion of the robot, so that the user can inspect the detailed
information related to his personal robot accordingly.
As can be seen in Fig. 33.6, the interface also includes a set of event buttons on
the right hand side to simulate the happening of different events. Each button here is
associated with an event procedure that describes how emotions and body states are
changed by this event. It corresponds to a situated event in reality. That is, when the
pre-conditions of an event procedure are satisfied in the real world, the same effect
will be given to change the emotions and body states of a robot. With the assistance
of event buttons, users can examine the correctness of the effect caused by the event
procedure he defined in an efficient way.
In addition, our framework offers a learning mechanism to train a feedforward
neural network from examples as a behavior selector. The above variables (emotions
and body states) are arranged as the input of the network, and it output is used to
determine which behavior to perform at a certain time. In the training phase, the user
is allowed to give a set of training examples and each specifies which behavior the
robot is expected to perform when the set of emotions and states reaches the values
he has assigned. The back-propagation algorithm is then used to derive a model for
the set of data examples. Figure 33.7 presents the interface through which a user can
edit the training set. Based on the training set provided by the user, the system then
tries to learn a mapping strategy with best approximation.
It should be noted that it is possible the examples provided by the user are incon-
sistent and consequently a perfect strategy cannot be obtained. In the latter case,
the user can use the interface shown in Fig. 33.7 to correct the training examples to
re-build the behavior selector (i.e., the neural network) again. If the model has been
derived successfully but the behavior of the robot did not satisfy the owner’s expec-
tation, he can still correct the robot behavior for any specific time step by editing
the output produced by the mapping strategy learnt previously through the inter-
faces shown in Fig. 33.7. Then the modified outputs can be used as new training
examples to derive a new strategy of behavior arbitration. In this way, the user can
easily and conveniently design (and re-design) the characteristics of his personal
robot.
In this paper, we have described the importance of developing toy-type pet robots
as an intelligent robot application. We have also proposed to integrate knowledge
from different domains to build low-cost pet robots. To realize the development of
pet robot, a user-oriented interactive framework has been constructed with which
the user can conveniently configure and re-configure his personal pet robot accord-
ing to his preferences. Our system framework mainly investigates three issues:
robot control, human-robot interaction, and robot emotion. Different interfaces
have also been constructed to support various human-robot interactions. Most
importantly, an emotion-based mechanism has been developed in which different
emotions and internal states have been modeled and used to derive a behavior selec-
tor. The behavior selector is a neural network and the user is allowed to define
training examples to infer a behavior selector for his robot. To evaluate our frame-
work, we have used it to build LEGO NXT robots to achieve a cooperation task
successfully.
Based on the presented framework, we are currently trying different toy-type
robots with more sensors and actuators to evaluate our approach extensively. Also
33 Developing Emotion-Based Pet Robots 399
we are implementing a new vision module for the robot so that it can recognize
human facial expressions and interact with people accordingly. In addition, we
plan to define a specific language and construct a message-passing channel through
which different types of robots can communicate with each other.
400 W.-P. Lee et al.
References
34.1 Introduction
B. Vanstone (B)
Faculty of Business, Technology and Sustainable Development Bond University Gold Coast,
Queensland, 4229 Australia
E-mail: bvanston@bond.edu.au
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 401
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
402 B. Vanstone et al.
In this paper, the authors demonstrate the use of this methodology to develop a
financially viable, short-term trading system. When developing short-term systems,
the authors typically site the neural network within an already existing non-neural
trading system. This paper briefly reviews an existing medium-term long-only trad-
ing system, and then works through the Vanstone and Finnie methodology to create
a short-term focused ANN which will enhance this trading strategy.
The initial trading strategy and the ANN enhanced trading strategy are compre-
hensively benchmarked both in-sample and out-of-sample, and the superiority of the
resulting ANN enhanced system is demonstrated. To prevent excessive duplication
of effort, only the key points of the methodology outlined are repeated in this paper.
The overall methodology is described in detail in Vanstone and Finnie [1], and this
methodology is referred to in throughout this paper as ‘the empirical methodology’.
There are two primary styles of stockmarket trader, namely Systems traders, and
Discretionary traders. Systems traders use clearly defined rules to enter and exit
positions, and to determine the amount of capital risked. The strategies created
by systems traders can be rigorously tested, and clearly understood. The alterna-
tive, discretionary trading, is usually the eventual outcome of an individual’s own
experiences in trading. The rules used by discretionary traders are often difficult to
describe precisely, and there is usually a large degree of intuition involved. In many
cases, some of the rules are contradictory – in these cases, the discretionary trader
uses experience to select the appropriate rules. Despite these obvious drawbacks,
however, it is commonly accepted that discretionary traders produce better financial
results [2].
For the purposes of this paper, it is appropriate to have a simple, clearly defined
mathematical signal which allows us to enter or exit positions. This allows us to
accurately benchmark and analyze systems.
This paper uses the GMMA as the signal generator. The GMMA is the Guppy
Multiple Moving Average [3], as created and described by Daryl Guppy [4], a lead-
ing Australian trader. Readers should note that Guppy does not advocate the use of
the GMMA indicator in isolation (as it is used in this study), rather it is appropri-
ate as a guide. However, the GMMA is useful for this paper, as it is possible to be
implemented mechanically. In essence, any well defined signal generator could be
used as the starting point for this paper.
The GMMA is defined as:
00 1 0 11
ema.3/ C ema.5/ ema.30/ C ema.35/
GMMA D @@ Cema.8/ C ema.10/ A @ Cema.40/ C ema.45/ AA (34.1)
Cema.12/ C ema.15/ Cema.50/ C ema.60/
34 Designing Short Term Trading Systems with Artificial Neural Networks 403
Creation of the ANNs to enhance this strategy involves the selection of ANN
inputs, outputs, and various architecture choices. The ANN inputs and outputs are
a cut-down version of those originally described in Vanstone [5]. The original list
contained 13 inputs, and this paper uses only five. These five variables, discussed
later in this paper, were selected as they were the most commonly discussed in
the main practitioners’ journal, ‘The Technical Analysis of Stocks and Commodi-
ties’. Similarly, the choices of output and architecture are described in the empirical
methodology paper. Again, these are only briefly dealt with here.
For each of the strategies created, an extensive in-sample and out-of-sample
benchmarking process is used, which is also further described in the methodology
paper.
34.3 Methodology
This study uses data for the ASX200 constituents of the Australian stockmarket.
Data for this study was sourced from Norgate Investor Services [6]. For the in-
sample data (start of trading 1994 to end of trading 2003), delisted stocks were
included. For the out-of-sample data (start of trading 2004 to end of trading 2007)
delisted stocks were not included. The ASX200 constituents were chosen primarily
for the following reasons:
1. The ASX200 represents the most important component of the Australian equity
market due to its high liquidity – a major issue with some previously published
work is that it may tend to focus too heavily on micro-cap stocks, many of which
do not have enough trading volume to allow positions to be taken, and many of
which have excessive bid-ask spreads.
2. This data is representative of the data which a trader will use to develop his/her
own systems in practice, and is typical of the kind of data the system will be used
in for out-of-sample trading.
Software tools used in this paper include Wealth-Lab Developer, and Neuro-Lab,
both products of Wealth-Lab Inc (now owned by Fidelity) [7]. For the neural net-
work part of this study, the data is divided into two portions: data from 1994 up
to and including 2003 (in-sample) is used to predict known results for the out-of-
sample period (from 2004 up to the end of 2007). In this study, only ordinary shares
are considered.
The development of an ANN to enhance the selected strategy is based on simple
observation of the GMMA signals. One of the major problems of using the GMMA
in isolation is that it frequently whipsaws around the zero line, generating spurious
buy/sell signals in quick succession.
One possible way of dealing with this problem is to introduce a threshold which
the signal must exceed, rather than acquiring positions as the zero line is crossed.
The method used in this paper, however, is to forecast which of the signals is most
likely to result in a sustained price move. This approach has a major advantage over
404 B. Vanstone et al.
the threshold approach; namely, in a profitable position, the trader has entered ear-
lier, and therefore, has an expectation of greater profit. By waiting for the threshold
to be exceeded, the trader is late in entering the position, with subsequent decrease
in profitability.
However, for the approach to work, the trader must have a good forecast of
whether a position will be profitable or not. This is the ideal job for a neural network.
In Fig. 34.1, there are a cluster of trades taken between June 2006 and September
2006, each open for a very short period of time as the GMMA whipsaws around
the zero line. Eventually, the security breaks out into a sustained up trend. What
is required is an ANN which can provide a good quality short-term forecast of the
return potential each time the zero line is crossed, to allow the trader to discard the
signals which are more likely to become whipsaws, thus concentrating capital on
those which are more likely to deliver quality returns.
The neural networks built in this study were designed to produce an output signal,
whose strength was proportional to expected returns in the 5 day timeframe. In
essence, the stronger the signal from the neural network, the greater the expectation
of return. Signal strength was normalized between 0 and 100.
The ANNs contained five data inputs. These are the technical variables deemed
as significant from the review of both academic and practitioner publications, and
details of their function profiles are provided in Vanstone [5]. The formulas used to
compute these variables are standard within technical analysis. The actual variables
used as inputs, and their basic statistical characteristics are provided in Table 34.1.
For completeness, the characteristics of the output target to be predicted, the 5
day return variable, are shown in Table 34.2. This target is the maximum percentage
change in price over the next 5 days, computed for every element i in the input
series as: !
highest closei C5:::j C1 closei
100 (34.2)
closei
Effectively, this target allows the neural network to focus on the relationship
between the input technical variables, and the expected forward price change.
The calculation of the return variable allows the ANN to focus on the highest
amount of change that occurs in the next 5 days, which may or may not be the 5-
day forward return. Perhaps a better description of the output variable is that it is
measuring the maximum amount of price change that occurs within the next 5 days.
No adjustment for risk is made, since traders focus on returns and use other means,
such as stop orders, to control risk.
As explained in the empirical methodology, a number of hidden node archi-
tectures need to be created, and each one benchmarked against the in-sample
data.
The method used to determine the hidden number of nodes is described in the
empirical methodology. After the initial number of hidden nodes is determined, the
first ANN is created and benchmarked. The number of hidden nodes is increased
by one for each new architecture then created, until in-sample testing reveals which
architecture has the most suitable in-sample metrics. A number of metrics are avail-
able for this purpose, in this study, the architectures are benchmarked using the
Average Profit/Loss per Trade expressed as a percentage. This method assumes
unlimited capital, takes every trade signaled, and includes transaction costs, and
measures how much average profit is added by each trade over its lifetime. The
empirical methodology uses the filter selectivity metric for longer-term systems,
and Tharp’s expectancy [8] for shorter term systems. This paper also introduces the
idea of using overall system net profit to benchmark, as this figure takes into account
both the number of trades (opportunity), and the expected return of each trade on
average (reward).
34.4 Results
A total of 362 securities had trading data during the test period (the ASX200 includ-
ing delisted stocks), from which 11,897 input rows were used for training. These
were selected by sampling the available datasets, and selecting every 25th row as an
input row.
406 B. Vanstone et al.
Table 34.3 reports the Overall Net System Profit, Average Profit/Loss per Trade
(as a percentage), and Holding Period (days) for the buy-and-hold naı̈ve approach
(first row), the initial GMMA method (second row), and each of the in-sample ANN
architectures created (subsequent rows). These figures include transaction costs of
$20 each way and 0.5% slippage, and orders are implemented as day C1 market
orders. There are no stops implemented in in-sample testing, as the objective is not
to produce a trading system (yet), but to measure the quality of the ANN produced.
Later, when an architecture has been selected, stops can be determined using ATR
or Sweeney’s [9] MAE technique.
The most important parameter to be chosen for in-sample testing is the signal
threshold, that is, what level of forecast strength is enough to encourage the trader
to open a position. This is a figure which needs to be chosen with respect to the indi-
viduals own risk appetite, and trading requirements. A low threshold will generate
many signals, whilst a higher threshold will generate fewer. Setting the threshold
too high will mean that trades will be signalled only rarely, too low and the trader’s
capital will be quickly invested, removing the opportunity to take higher forecast
positions as and when they occur.
For this benchmarking, an in-sample threshold of 5 is used. This figure is chosen
by visual inspection of the in-sample graph in Fig. 34.2, which shows a break-
down of the output values of a neural network architecture (scaled from 0 to 100)
versus the average percentage returns for each network output value. The percent-
age returns are related to the number of days that the security is held, and these
are shown as the lines on the graph. Put simply, this graph visualizes the returns
expected from each output value of the network and shows how these returns per
output value vary with respect to the holding period. At the forecast value of 5
(circled), the return expectation rises above zero, so this value is chosen.
As described in the empirical methodology, it is necessary to choose which ANN
is the ‘best’, and this ANN will be taken forward to out-of-sample testing. It is for
this reason that the trader must choose the in-sample benchmarking metrics with
care. If the ANN is properly trained, then it should continue to exhibit similar
qualities out-of-sample which it already displays in-sample.
From the above table, it is clear that ANN – four hidden nodes should be selected.
It displays a number of desirable characteristics – it shows the highest level of
Profit/Loss per Trade. Note that this will not necessarily make it the best ANN for
34 Designing Short Term Trading Systems with Artificial Neural Networks 407
a trading system. Extracting good profits in a short time period is only a desirable
trait if there are enough opportunities being presented to ensure the traders capital
is working efficiently.
Therefore, it is also important to review the number of opportunities signalled
over the 10-year in-sample period. This information is shown in Table 34.4.
Here the trader must decide whether the number of trades signalled meets the
required trading frequency. In this case, there are likely to be enough trades to keep
an end-of-day trader fully invested.
This testing so far covered data already seen by the ANN, and is a valid indication
of how the ANN should be expected to perform in the future. In effect, the in-sample
metrics provide a framework of the trading model this ANN should produce.
Table 34.5 shows the effect of testing on the out-of-sample ASX200 data,
which covers the period from the start of trading in 2004 to the end of trading
in 2007. These figures also include transaction costs and slippage, and orders are
implemented as next day market orders.
This was a particularly strong bull market period in the ASX200.
408 B. Vanstone et al.
Although there appears a significant difference between the GMMA, and the
ANN enhanced GMMA, it is important to quantify the differences statistically. The
appropriate test to compare two distributions of this type is the ANOVA test (see
supporting work in Vanstone [5]). The results for the ANOVA test are shown in
Table 34.6 below.
The figures above equate to an F-statistic of 34.26 (specifically, F (14,456) =
34.261, p = 0.00 (p < 0.05)), which gives an extremely high level of significant
difference between the two systems.
34.5 Conclusions
The ANN out-of-sample performance is suitably close to the ANN in-sample per-
formance, leading to the conclusion that the ANN is not curve fit, that is, it should
continue to perform well into the future. The level of significance reported by the
ANOVA test leads to the conclusion that the ANN filter is making a statistically
significant improvement to the quality of the initial GMMA signals.
The trader now needs to make a decision as to whether this ANN should be
implemented in real-life.
One of the main reasons for starting with an existing successful trading strategy
is that it makes this decision much easier. If the trader is already using the signals
from a system, and the ANN is used to filter these signals, then the trader is still only
taking trades that would have been taken by the original system. The only difference
in using the ANN enhanced system is that trades with low expected profitability
should be skipped.
Often in trading, it is the psychological and behavioural issues which undermine
a traders success. By training ANNs to support existing systems, the trader can have
additional confidence in the expected performance of the ANN.
34 Designing Short Term Trading Systems with Artificial Neural Networks 409
Finally, Fig. 34.3 shows the same security as Fig. 34.1. The ANN has clearly met
its purpose of reducing whipsaws considerably, which has resulted in the significant
performance improvement shown in Tables 34.3 and 34.5.
Of course the result will not always be that all whipsaws are removed. Rather,
only whipsaws which are predictable using the ANN inputs will be removed.
References
1. Vanstone, B. and Finnie, G. (2007). “An Empirical Methodology for developing Stockmarket
Trading Systems using Artificial Neural Networks.” Expert Systems with Applications. In-Press
(DOI: http://dx.doi.org/10.1016/j.eswa.2008.08.019).
2. Elder, A. (2006). Entries & Exits: Visits to Sixteen Trading Rooms. Hoboken, NJ: Wiley.
3. guppytraders.com. “Guppy Multiple Moving Average.” Retrieved 04-05-2007, from
www.guppytraders.com/gup329.shtml
4. Guppy, D. (2004). Trend Trading. Milton, QLD: Wrightbooks.
5. Vanstone, B. (2006). Trading in the Australian stockmarket using artificial neural networks,
Bond University. Ph.D.
6. “Norgate Premium Data.” (2004). Retrieved 01-01-2004, from www.premiumdata.net
7. “Wealth-Lab.” (2005) from www.wealth-lab.com
8. Tharp, V. K. (1998). Trade Your Way to Financial Freedom. New York: McGraw-Hill.
9. Sweeney, J. (1996). Maximum Adverse Excursion: Analyzing Price Fluctuations for Trading
Management. New York, Wiley.
Chapter 35
Reorganising Artificial Neural Network
Topologies
Complexifying Neural Networks by Reorganisation
Abstract This chapter describes a novel way of complexifying artificial neural net-
works through topological reorganisation. The neural networks are reorganised to
optimise their neural complexity, which is a measure of the information-theoretic
complexity of the network. Complexification of neural networks here happens
through rearranging connections, i.e. removing one or more connections and plac-
ing them elsewhere. The results verify that a structural reorganisation can help to
increase the probability of discovering a neural network capable of adequately solv-
ing complex tasks. The networks and the methodology proposed are tested in a
simulation of a mobile robot racing around a track.
35.1 Introduction
Artificial Neural Networks (ANNs) have been used in many different applications,
with varying success. The success of a neural network, in a given application,
depends on a series of different factors, such as topology, learning algorithm and
learning epochs. Furthermore all of these factors can be dependent or independent
of each other. Network topology is the focus of this research, in that finding the
optimum network topology can be a difficult process. Ideally all network topolo-
gies should be able to learn every given task to competency, but in reality a given
topology can be a bottleneck and constraint on a system. Selecting the wrong topol-
ogy can result in a network that cannot learn the task at hand [1–3]. It is commonly
known that a too small or too large network does not generalise well, i.e. learn a
T. D. Jorgensen (B)
Department of Electronic and Computer Engineering,University of Portsmouth, UK
E-mail: Thomas.Jorgensen@port.ac.uk
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 411
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
412 T.D. Jorgensen et al.
given task to an adequate level. This is due to either too few or too many parameters
used to represent a proper and adequate mapping between inputs and outputs.
This chapter proposes a methodology that can help find an adequate network
topology by reorganising existing networks by rearranging one or more connections,
whilst trying to increase a measure of the neural complexity of the network. Assum-
ing complex task solving requires complex neural controllers, a reorganisation that
increases the controller complexity can increase the probability of finding an ade-
quate network topology. Reorganising an existing network into a more complex one
yields an increased chance of better performance and thus a higher fitness.
There are generally four ways to construct the topology of an ANN [3–5]. (1)
Trial and Error is the simplest method. This essentially consists of choosing a topol-
ogy at random and testing it, if the network performs in an acceptable way, the
network topology is suitable. If the network does not perform satisfactory, select
another topology and try it. (2) Expert selection; the network designer decides
the topology based on a calculation or experience [3, 6]. (3) Evolving connections
weights and topology through complexification. Extra connections and neurons can
be added as the evolutionary process proceeds or existing networks can be reorgan-
ised [7–12]. (4) Simplifying and pruning overly large neural networks, by removing
redundant elements [3–5]. This chapter seeks to add another one to this list, as
reorganisation of existing networks can potentially help discover the appropriate
topology of a network.
One advantage of the proposed methodology, compared to complexification
through adding components, is that the computational overhead is constant because
no extra components and parameters are added. The time it takes to compute the out-
put of the network is effected, as well as the time it takes for the genetic algorithm to
find appropriate values for the connection weights. More parameters yield a wider
and slower search of the search space and additionally it yields more dimensions to
the search space.
35.2 Background
adding connections or neurons and new components are frozen, so that fitness is not
reduced. This is similar to the first method of topological complexification proposed
by Fahlman [7], which increased network size by adding neurons.
Research into the use of neural complexity in complexification to produce bio-
logically plausible structures is limited, this is due to the lack of proper calculation
tools and the variety of definitions and focus. Neural complexity is a measure of
how a neural network is both connected and differentiated [11]. It is measure of
the structural complexity as well as the connectivity of the network. The measure
was developed to measure the neural complexity of human and animal brains by
estimating the integration of functionally segregated modules. This measure reflects
the properties that fully connected networks and functionally segregated networks
have low complexity, whilst networks that are highly specialised and also well inte-
grated are more functionally complex. Yaeger [12] has shown, that when optimising
an artificial neural network with a fixed number of neurons for neural complexity,
the fitness increases proportionally, suggesting a link between neural and functional
complexity. The more complex a network, the greater the likelihood that it will be
capable of solving complex tasks and surviving in complex environments [9–12].
Artificial neural network can be reorganised in several different ways and to dif-
ferent extents. The methodology proposed herein operates with two degrees of
reorganisation. Reorganising one connection is defined as a minor reorganisation,
414 T.D. Jorgensen et al.
X
n
I.X/ D H.xi / H.X/ (35.1)
iD1
The integration I(X) of segregated neural elements equals the difference between
the sum of entropies of all of the individual components xi of the neural network
and the entropy of the network as a whole. In order to be able to give an estimate
of the neural complexity of a neural network, not only the integration between the
individual elements is needed, but also the integration of any neural clusters in the
network. It is very likely that neurons in an artificial neural network cluster together
form some sort of functional cluster. The average integration between functionally
segregated neural groups with k (out of n) elements is expressed with <I.X/> j is
an index indicating that all possible combinations of subsets with k components are
used. The average integration for all subsets with k components is used to calculate
the neural complexity:
X
n
CN .X/ D Œ.k=n/ I.X/ <I.Xkj /> (35.2)
kD1
The neural complexity measure is used to optimise the complexity of the neu-
ral network. A reorganisation only takes place if this complexity increases. The
reorganisation methodology proposed is summarised by the following algorithm:
1. Determine a starting topology of sufficient size and complexity. This network
can be chosen randomly or based on the experience of the designer.
2. The starting network is trained to proficiency given some predefined measure.
3. The network is now reorganised. The number of connections to be reorganised
decides the degree of reorganisation. A connection is chosen at random, removed
and reinserted elsewhere in the network.
4. If this reorganisation increases the neural complexity of the network, the reorgan-
isation is deemed valid and the network is retrained. If the reorganisation does
not increase the neural complexity the reorganisation has been unsuccessful and
it is undone. Another reorganisation can be attempted or the process stopped. In
the experiments conducted here, five reorganisation attempts are made before the
process stops.
5. If it is desired and previous reorganisations have been successful further reorgan-
isations can take place.
Ideally it would be preferable to remove and reinsert the connection in the place
that yields the largest possible increase in the complexity out of all possible reor-
ganisations. This requires knowledge of all possible topologies given this number of
connections and neurons, which is not computationally efficacious. Only one con-
nection is reorganised at any time, this could be increased to several connections if
desired.
The controllers evolved here are tested in a simulated environment with a robot. In
this environment a robot has to drive around a track, which consists of 32 sections.
The objective of this task is to complete 3 laps in the shortest amount of time. If a
robot fails to complete three laps, the distance covered is the measure of its perfor-
mance. The robot has to drive around the track covering all of the sections of the
track, it is not allowed to skip any sections. In total the robot has to complete three
laps, with 32 sections in each lap, all visited in the correct order. If the robot is too
slow at driving between two sections the simulation is terminated. The following
Fig. 35.1, illustrates the task to be completed and the robot and its perception of the
track:
Figure 35.1-left illustrates the track and the robot driving around it. The robot is
not limited in its movement, i.e. it can drive off the track, reverse around the track
or adapt to any driving patterns desired, as long at its drives over the right counter
clockwise sequence of sections. Figure 35.1-right illustrates the robot driving on the
416 T.D. Jorgensen et al.
Start
Fig. 35.1 Left is the track and the robot driving around it; right is a top-view of the robot and its
sensory perception of the track
track seen from above. The track sections have alternating colours to mark a clear
distinction between sections. The arrows illustrate the three sensors of the robot. The
front sensor measures the distance to the next turn and the two side sensor measures
the distance to the edge of the track. As illustrated, the simulated robot has three
wheels and not four, to increase the difficulty of evolving a successful controller.
The risk when driving this three wheeled robot, in contrast to a four wheeled vehicle,
is that it will roll over if driven to abruptly. A robot that has rolled over is unlikely
to be able to continue. The front wheel controls the speed as well as the direction.
A total of three sets of experiments have been conducted. One set of experiments,
with a randomly selected neural network, acts as a benchmark for further compar-
isons. This network only has its connection weights evolved, whereas the topology is
fixed. The second set of experiments starts with the benchmark network selected in
the first experiment, which is then reorganised. This reorganisation is only minor, in
that only one connection is removed and replaced in each network. The new network
resulting from the reorganisation is tested. The third and final set of experiments also
uses the network from experiment one as a starting condition. The network is then
reorganised more extensively, by reshuffling several connections to create a new
network based on the old network. The results from these experiments are used to
compare the different strategies of evolution.
35 Reorganising Artificial Neural Network Topologies 417
The evolved neural network controllers are tested in a physics simulator to mimic
a real world robot subject to real world forces. The genetic algorithm has in all
tests a population size of 50 and the number of tests per method is 25. Uniformly
distributed noise has been added on the input and output values to simulate sensor
drift, actuator response, wheel skid and other real world error parameters. To give
the simulations similar attributes and effects as on real racing track, the track has
been given edges. Whenever the robot drives off the track it falls off this edge onto
another slower surface. This means, that if the robot cuts corners, it could potentially
have wheels lifting off the ground thus affecting stability and speed of the robot, due
to the edge coming back onto the track.
Fitness is rewarded according to normal motorsport rules and practice. Three laps
of the track have to be completed and the controller that finishes in the fastest time
wins the race, i.e. it is the fittest controller. If a controller fails to finish three laps,
the controller with the most laps or longest distance travelled wins. In the case that
two controllers have reached the same distance the racing time determines the fittest
controller. The following fitness function states that the longest distance covered
in the shortest amount of time yields the best fitness. Time is the time it takes to
complete the track. If a controller fails to finish this Time is set to 480 s, which is the
absolute longest time a controller is allowed to be in existence, before a simulation
is stopped. In the likely event that two controllers have covered the same distance,
the controller with the fastest time will be favoured for further evolving. The precise
version of the fitness function is:
Sections D .Laps Track Length/
Fitness D (35.3)
Time
The fitness is equal to the distance divided by the time. The distance is equal to
the number of track sections covered in the current lap, plus the number of sections
covered in previous laps. Track length is the total number of sections, which is 32.
The minimum fitness obtainable is 1=480 0:002.
The first set of experiments was conducted with a fixed structure network, where
the connection weights were evolved, is used as the benchmark for the other experi-
ments. The network used is a feed-forward connected network with three input
418 T.D. Jorgensen et al.
#1 #4
#7
ation
Benchmark Network
Reo rganis #2 #5
Minor
#8
Left sensor
#1 #4 #3 #6
Speed
#7
Right sensor
#2 #5
Steer
#8
Front sensor #1 #4
#3 #6 Major
Reorg
anisa
tion #7
#2 #5
#8
#3 #6
Fig. 35.2 The benchmark neural network and the reorganised networks
neurons, three hidden layer neurons and two output neurons. The inputs are the
sensor values as described previously and the outputs are the direction and speed of
the front wheel. This network was trained to competency and the results are shown
in Table 35.1. The neural complexity of this network is 14.71, calculated with Eq.
(35.2). The network is shown to left in Fig. 35.2.
The second set of experiments was conducted with a network that has been reor-
ganised. The benchmark network has undergone a minor reorganisation, which is
shown in Fig. 35.2 upper right. The connection between neuron 6 and neuron 8 has
been rearranged and is now a recursive connection. The new network has increased
its neural complexity by the reorganisation to 15.03. Immediately after the reorgan-
isation the network loses some of its fitness, this fitness is regained by retraining.
The reorganised network was retrained with the same weights as before the reor-
ganisation and in all cases the network, as a minimum, regained all of its previous
fitness and behaviour. Additionally, all of the connection weights were re-evolved
in another experiment to see if the results and tendencies were the same, and as
expected the results were the same.
35 Reorganising Artificial Neural Network Topologies 419
The final set of experiments conducted used a network, which has had a major
reorganisation. The benchmark network was changed by removing a connection
between neuron 3 and 5 and between neuron 1 and 6. These connections are moved
to between neuron 5 and 4 and between neuron 8 and 6. As the benchmark network
is feed-forward connected only recursive connections are possible for this particular
network. The new network is shown in Fig. 35.2 lower right. The neural complexity
of the new network has risen to 15.40, which is a 5% increase. Similar to the pre-
viously reorganised network, this network was, after a reorganisation, subject to a
fitness loss, but it was retrained to competency. The controller increased its fitness
over the original network. Even re-evolving all of the connection weights yields a
better overall performance.
The results from all of the experiments show that all networks learn the task pro-
ficiently, however some networks seem to perform better than others. Figure 35.3
shows the route that controllers choose to drive around the track. The route reflects
the results which are summarised in Table 35.1. The race car starts in (0, 0) and
drives to (20, 0) where it turns. Hereafter it continues to (20, 11) where it turns and
continues to (2:5, 11) and from here it continues to (2:5, 0) and on to (0, 0). The
controller tries to align the car on the straight line between the points. Figure 35.3
illustrates the difference between an average lap of the benchmark networks and
of the reorganised networks, and it clearly illustrates the route that the robot takes
around the track.
Figure 35.3 illustrates the degree of overshoot when turning and recovering to
drive straight ahead on another leg of the track. Figure 35.3 clearly shows that the
controllers that have been reorganised overshot less than the benchmark networks.
Less overshot, ultimately means that the racing car is able to move faster, which
means it has a better fitness. Table 35.1 shows the fitness from the benchmark
network experiments, and the fitness regained by the new networks after a reor-
ganisation and retraining. To put these results into context, the best human driver on
the same race track has a record of 1.036, which is worse than the average major
reorganised controller.
The hypothesis that artificial neural networks that have undergone a minor reor-
ganisation, where the neural complexity is optimised, are statistically better than the
fixed structure network it originates from does not hold true for these experiments.
A t-test, with a 5% significance level, indicates that there is no statistical differ-
ence between the two methods, despite the higher minimum, average and maximum
values. The second hypothesis tested in this chapter, states that artificial neural net-
works that have undergone a major reorganisation, where the neural complexity is
optimised, are better than the networks they originate from, holds true. A t-test,
with a 5% significance level, indicates that there is a statistical difference between
the two methods. This can be due to the increased neural complexity of the new
420 T.D. Jorgensen et al.
12 Reorganised
Benchmark
10
8
Y - Position
0
-5 0 5 10 15 20 25
-2
X - Position
Fig. 35.3 The route of an average benchmark network and a reorganised network
network created by the reorganisation. Some of this increased performance can pos-
sibly be accredited to the fact that one of the networks has a recursive connection,
which is a common way to increase the neural complexity and performance of a
network, but the experiments indicate that this is only part of the explanation. The
experiment clearly indicates that increased neural complexity yield a higher proba-
bility of finding suitable and well performing neural networks, which is in line with
other research in the field [9].
The results from the experiments don’t indicate any significant difference in the
speed of learning produced by either methodology. This means that it takes about
the same number of iterations to learn a given task for any network topology used
in the experiments, this was expected as they all have the number of parameters.
35.6 Conclusion
This chapter has presented a new methodology for complexifying artificial neural
networks through structural reorganisation. Connections were removed and rein-
serted whilst trying to increase the neural complexity of the network. The evolved
neural networks learned to control the vehicle around the track and the results indi-
cate the viability of the newly reorganised networks. The results also indicate that it
might be necessary to rearrange more than one connection in order to achieve sig-
nificantly better results. This chapter indicates that neural complexity in conjunction
with reorganisation can help unleash potential and increase the probability of find-
ing neural network controllers of sufficient complexity to adequately solve complex
tasks. Furthermore, the results indicate that a reorganisation can substitute structural
35 Reorganising Artificial Neural Network Topologies 421
elaboration as a method for improving network potential, whilst keeping the compu-
tational overhead constant. These results are in line with previous research done in
the field and they reconfirm the importance of high neural complexity and structural
change.
References
Abstract Emotion, personality and individual differences are those effective param-
eters on human’s activities such as learning. People with different personalities show
different emotions in facing an event. In the case of teaching and learning, personal-
ity difference between learners plays an important role. In virtual learning projects
this point should consider that the learners’ personalities are various and the teach-
ing method used for each learner should be different from the other learners. In this
chapter, a new model presented according to the learning model based on emotion
and personality and the model of virtual classmate. Based on their knowledge base,
the virtual teacher and classmate express suitable behaviors to improve the process
of learning according to the learner’s emotional status.
36.1 Introduction
S. Fatahi (B)
Department of Computer Engineering, University of Isfahan, Isfahan, Iran
E-mail: fatahi somayeh@yahoo.com
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 423
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
424 S. Fatahi et al.
In virtual learning systems created up to now, the learner’s emotions received much
more attention and the emotional agents were more employed. In a few of these
systems personality drew our attention as an independent parameter that some of
them are mentioned here:
In ERPA architecture by using ID3 algorithm, the learner’s emotional reaction
towards an event is predicted (for example, taking an exam score) [3]. Chaffar
and his colleagues used the Naı̈ve Bayes Classifier method to predict the learner’s
emotions [4]. In ESTEL architecture, the Naı̈ve Bayes Classifier method is used to
predict the optimized emotional status. In this architecture, in addition to emotion,
the learner’s personality is also considered. In this system, a module tries to create
and induce an optimized emotional state. For instance, when the learner enters the
system, after the identification of learner’s personality, for example extrovert, and
recognition of optimal emotional state, such as happiness, an emotion is induced to
that learner by showing various interfaces (e.g. music, picture, and etc.) to him [5].
In Passenger software designed by German researchers, cooperative learning meth-
ods are used. This software examines a series of emotions for the virtual teacher
that is present in the system based on OCC model, and invites the learners to group
work. The virtual teacher traces the learners’ activities and helps the learners who
are not able to do cooperative activities [6]. Abrahamian and his colleagues designed
an interface for computer learners appropriate for the type of their personality using
MBTI test and concluded that learning through this interface as a result of using
personality characters leads into developments in learning process [7]. In imple-
mentation performed by Maldonado and his colleagues, a virtual classmate agent
is used. This agent is placed beside the learners, and mostly plays the role of a co-
learner and a support. In this project each of the teacher, learner, and classmate has
own emotions and the learner’s emotions affected his/her classmate [8].
In this paper, a new model presented according to the learning model based on
emotion and personality [13] and the model of virtual classmate [14] in previous
our studies. The outline of the improved model is shown in Fig. 36.1. The model
contains six major modules:
Personality identification module: In first step, learner comes across MBTI ques-
tionnaire and his personality will be identified (for example ISFJ, ESTP, INTJ,
etc.).
Module of choosing a learning style commensurate with learner’s personality: Gen-
erally, there are three kinds of learning environment: individual, competitive and
collaborative [15]. System based on the identified personality of learner, put him
in one of three groups of independence, contribution with virtual classmate or
competition with virtual classmate [16].
Module of choosing virtual classmate agent: If the learner is put in the independence
group the process of learning and education will be started, otherwise the system at
first chooses a virtual classmate that matches the type of learner’s personality, then
the process of learning and education will get started. This module will explain at
the next section with more details.
426 S. Fatahi et al.
In this section we explained module of choosing virtual classmate agent with more
details. This module is displayed in Fig. 36.2. The module includes three main parts,
each of them described below:
The part of personality generation: In this part, using the MBTI questionnaire, the
personality of the learner is recognized. In this paper we only considered two dimen-
sions of E/I and S/N which are important in learning process [7]. Considering two
dimensions, four types of personality that are EI, EN, IS, and IN would be resulted.
The part of classmate selection: In this part, a VCA appropriate for the learner’s
personality is selected. Selected VCA is completely opposite in his MBTI dom-
inant with learner. Based on research, the opposite personality displays a higher
performance than the similar personality [19–21]. The personality selection for the
VCA is so that it would result in improvements in learning process.
The part of classmate behavior selection: During the learning process, regarding the
events that happen in the environment and the learning situation of the individual,
the VCA exhibits appropriate behaviors. Tactics knowledgebase is used to interact
with the learner.
For two personality dimensions considered in this paper four parameters are
elicited. Independence and replying speed parameters for the E/I dimension and
detail-oriented attitude and attention for the S/N dimension, now based on the extent
of these parameters in each type of personality, the VCA exhibits a certain behav-
ior. These behaviors are shown separately for each dimension in Table 36.1 and
Table 36.2. Mixing the two dimensions’ functions, four personality types would
be resulted, that are IN, IS, EN, and ES. A sample of these tactics is presented in
Table 36.3.
428
In our educational environment two agents – VTA and VCA – are used. These two
agents with identifying the learner’s emotions after each event, choose suitable tac-
tics when are face to face with the learner. In this environment the way of choosing
the tactics to face the learner is related to the learner’s type of group. Depending on
the point that which groups the learner belongs to, special tactic is chosen.
Knowledge base of this system contains 65 rules (Table 36.4): 16 rules are to
identify the learner’s group, 10 rules for independent learning group, 20 rules for
collaborative learning group and 19 rules for competitive group. Four examples of
these rules are in the following:
The first rule is an example of learner’s classifying into learning groups. Accord-
ing to the first rule, system groups the learner with ISFJ personality that is “Intro-
vert”, “Sensing”, “Feeling” and “Judging” groups in independent group. The second
rule is an example of the rules of teacher’s dealing in a situation that the learner is
in independent group. The third and fourth rules are examples of situations that the
learner is in collaborative and competitive groups respectively. As the rules shows
in these two situations relevant to the learner’s emotions the virtual teacher and
classmate use special tactics in interaction with learner.
36.7 Implementation
used for implementing. Also, to show the agents – VTA and VCA – we used two
present agents in Microsoft that are Merlin and Peedy respectively.
36.8 Results
We tested our model in real environment with 30 students. Students work with
our system, then they answered ten questions for evaluating our learning environ-
ment. We got rate of learner satisfaction Based on four question of our questioner
(Fig. 36.4).
36 Design and Implementation of an E-Learning Model 433
Results show that learners are satisfied of learning environment based on learner’s
emotion and personality.
We got rate of presence effect of VCA Based on six question of our questioner
(Fig. 36.5).
The results show that the presence of the VCA leads advancements in the learning
process and attractiveness of e-learning environment.
In this paper a model for using in e-learning was presented. In this model some mod-
ules for personality recognition and selecting an appropriate VCA for the learner’s
personality, were considered to develop the interaction with the learner. The Behav-
ior of VCA saved in knowledgebase of system. The results show that placing the
learner beside an appropriate VCA, lead to improvement in learning and makes the
virtual learning environment more enjoyable.
In the future we will try to improve the system with considering the parameters of
culture, case based reasoning and agent’s learning and also makes the virtual teacher
and classmate agents more credible for the user.
References
1. N. Begičević, B. Divjak, and T. Hunjak, Decision Making Model for Strategic Planning of
e-Learning Implementation (17th International Conference on Information and Intelligent
Systems IIS, Varaždin, Croatia, 2006).
2. J. Du, Q. Zheng, H. Li, and W. Yuan, The Research of Mining Association Rules Between
Personality and Behavior of Learner Under Web-Based Learning Environment (ICWL, 2005),
pp. 406–417.
434 S. Fatahi et al.
3. P. Chalfoun, S. Chaffar, and C. Frasson, Predicting the Emotional Reaction of the Learner
with a Machine Learning Technique (Workshop on Motivaional and Affective Issues in ITS,
International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan, 2006).
4. S. Chaffar, G. Cepeda, and C. Frasson, Predicting the Learner’s Emotional Reaction Towards
the Tutor’s Intervention (7th IEEE International Conference, Japan, 2007), pp. 639–641.
5. S. Chaffar and C. Frasson, Inducing Optimal Emotional State for Learning in Intelligent
Tutoring Systems (lecture notes in computer science, 2004), pp. 45–54.
6. B.F. Marin, A. Hunger, and S. Werner, Corroborating Emotion Theory with Role Theory
and Agent Technology: A Framework for Designing Emotional Agents as Tutoring Entities,
Journal of Networks (1), 29–40 (2006).
7. E. Abrahamian, J. Weinberg, M. Grady, and C. Michael Stanton, The Effect of Personality-
Aware Computer-Human Interfaces on Learning, Journal of Universal Computer Science (10),
27–37 (2004).
8. H. Maldonado, J.R. Lee, S. Brave, C. Nass, H. Nakajima, R. Yamada, K. Iwamura, and
Y. Morishima, We Learn Better Together: Enhancing e-Learning with Emotional Characters
(In Proceedings, Computer Supported Collaborative Learning, Taipei, Taiwan, 2005).
9. D.J. Pittenger, Measuring the MBTI . . . and Coming Up short, Journal of Career Planning and
Employment (54), 48–53 (1993).
10. M.D. Shermis, and D. Lombard, Effects of computer-based Test Administration on Test
Anxiety and Performance, Journal of Computers in Human Behavior (14), 111–123 (1998).
11. S. Rushton, J. Morgan, and M. Richard, Teacher’s. Myers-Briggs Personality Profiles: Identi-
fying Effective Teacher Personality Traits, Journal of -Teaching and Teacher Education (23),
432–441 (2007).
12. S.A. Jessee, P.N. Neill, and R.O. Dosch, Matching Student Personality Types and Learning
Preferences to Teaching Methodologies, Journal of Dental Education (70), 644–651 (2006).
13. N. Ghasem-Aghaee, S. Fatahi, and T.I. Ören, Agents with Personality and Emotional Fil-
ters for an e-Learning Environment (Proceedings of Spring Agent Directed Simulation
Conference, Ottawa, Canada, 2008).
14. S. Fatahi, N. Ghasem-Aghaee, and M. Kazemifard, Design an Expert System for Virtual
Classmate Agent (VCA) (Proceedings of World Congress Engineering, UK, London, 2008),
pp. 102–106.
15. S. Ellis, and S. Whalen, Cooperative Learning: Getting Started, Scholastic, New York (1996).
16. http://www.murraystate.edu
17. B. Kort, R. Reilly, and R.W. Picard, An Affective Model of Interplay between Emotions
and Learning: Reengineering Educational Pedagogy Building a Learning Companion (Pro-
ceedings IEEE International Conference on Advanced Learning Technology, Madison, 2001),
pp. 43–48.
18. S.M. Al Masum, and M. Ishizuka, An Affective Role Model of Software Agent for Effective
Agent-Based e-Learning by Interplaying between Emotions and Learning (WEBIST, USA,
2005), pp. 449–456.
19. K.S. Choi, F.P. Deek, and I. Im, Exploring the Underlying of Pair Programming: The
Impact of Personality, Journal of Information and Software Technology, doi: 10.1016/
j.infsof.2007.11.002 (2007).
20. L.F. Capretz, Implications of MBTI in Software Engineering Education, ACM SIGCSE
Bulletin – Inroads, ACM Press, New York, vol. 34, pp. 134–137 (2002).
21. A.R. Peslak, The impact of Personality on Information Technology Team Projects (Pro-
ceedings of the 2006 ACM SIGMIS CPR Conference on Computer Personnel Research:
Forty Four Years of Computer Personnel Research: Achievements, Challenges & the Future,
Claremont, California, USA, 2006).
Chapter 37
A Self Progressing Fuzzy Rule-Based System
for Optimizing and Predicting
Machining Process
37.1 Introduction
Rule-based systems represent the earliest and most established type of AI systems
that tend to embody the knowledge of a human expert in a computer program [1].
Though, rule-based systems have been quite effectively utilized in approximat-
ing complicated systems – for which analytical and mathematical models are not
available – with considerable degree of accuracy, they still pose a very static pic-
ture. The foremost shortcoming, in their design, is the lack of dynamism. The price
paid by this inadequacy is their failure in coping with fast changing environments.
This describes the main reason of inability of rule-based system technology to find
its full fledge application at industrial levels.
A. Iqbal (B)
Department of Mechanical Engineering, University of Engineering & Technology,
Taxila, Pakistan
E-mail: asif.asifiqbal@gmail.com
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 435
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
436 A. Iqbal, N. U. Dar
5. Automatically generates fuzzy sets for newly entered variables and regenerates
sets for other variables according to newly added data
6. Automatically generates rules for optimization and prediction rule-bases
7. Provides conflict resolution facility among contradictory rules
8. Updates interface of the system according to newly entered variables
The first two points depict the main objectives of the rule-based system, in con-
nection with any manufacturing process, while the others portray a picture of high
level automation required for the system to self-progress. The chapter moves ahead
with description of the configuration of the system, followed by explanation of the
methodology that consists of algorithms for different modules. At the end, the oper-
ation of the self-progressing fuzzy rule-based system is explained with the help of
examples related to optimization of machining process.
Shell
Knowledge-Base
User
Facts User Interface
combination of facts, the optimization rule-base, and the prediction rule-base. The
functional details of optimization and prediction rule-bases are also available in ref-
erence [11]. The shell consists of a user interface through which the input is taken
from the user. The data fuzzifier fuzzifies the values of numeric predictor variables
according to the relevant fuzzy sets. The user-interface and the fuzzy sets are also
auto-developed by the system itself. For development of the knowledge-base con-
sisting of two rule-bases, the inference mechanism of forward chaining shell, Fuzzy
CLIPS (C Language Integrated Production Systems) was used [12].
This module facilitates the automation of intake, storage, and retrieval of data.
The data could be the specifications of a new variable or the values of input
and output variables obtained from experiments. Data related to specifications of
a new variable are stored in file Variable.dat, while that related to the new val-
ues/records are stored in Data.dat on hard disk. Figure 37.2 shows the flow chart
of data acquisition algorithm.
Membership
function
Low High
0
0 10 20 30 40 50 60 70 80 90 100
Desirability (%)
This module covers three areas: (1) Rearranging the fuzzy sets for already entered
variables according to the newly entered data records; (2) development of fuzzy
sets for newly entered numeric variables; and (3) development of two fuzzy sets
(low & high) for each output variable that is included for optimization purpose.
The set low represents the minimization requirement and the other one represents
maximization. The design of sets for category (3) is fixed and is shown in Fig. 37.3,
while design of first two categories is dynamic and based upon the data values of
respective variables.
The desirability values shown in Fig. 37.3 are set by the user using the slider bar
available on interface of the rule-based system. Any value below 5% means desir-
ability is of totally minimizing the output variable (performance measure), and total
desirability of maximization is meant if the value is above 95%. The desirability of
50% means optimization of that output variable makes no difference.
Figure 37.4 shows the customized flow chart for the methodology used for self-
development of fuzzy sets. The user has to decide the maximum allowable number
of fuzzy sets for input as well as for output variables.
The logic involved in methodology is that a value of input variable, which has
higher frequency of appearance in the data records, has more right to be picked
up for allocation of a fuzzy set, while for output variable any value having greater
difference from its previous and next values in the list – termed as Neighbor Dis-
tance (Neighb dist in Fig. 37.4) – possesses more right for allocation of a fuzzy set.
Neighbor Distance can mathematically be represented as follows:
8
< ValueŒi C 1 ValueŒi I if .i D first/
Neighbor Distance D ValueŒi ValueŒi 1I if .i D last/
:1
=2 .ValueŒi C 1 ValueŒi 1/ I otherwise
(37.1)
This step consists of following two parts: (1) automatic development of rules, for
prediction of the manufacturing process’s performance measures, based on the data
440 A. Iqbal, N. U. Dar
records provided by the users; and (2) conflict resolution among self-developed
contradictory rules.
Figure 37.5 provides the graphical description of the first part. The objective is
to convert each node of 2-D linked list Data output (including list of related values
of input variables Data input) into a rule. 2-D linked list is a list that expands in two
directions as shown in Fig. 37.5. The objective is achieved by finding and assigning
the most suitable fuzzy sets for all of the values involved per node of Data output.
The list Data output is navigated from first node to last and for all of its values
the closest values in fuzzy sets of respective variables are matched. If the match is
perfect then certainty factor (CF) of 1 is assigned to the match of the data value
and the fuzzy set. If the suitable match of any fuzzy set for a given data value is
not found then the data value is assigned the intersection of two closest fuzzy sets.
All the rules are stored in a 2-D linked list, named Rule Consequent, each node of
whose represents a rule. Each node contains the assigned fuzzy set of the output
variable and also a linked list (Rule antecedent) containing assigned fuzzy sets of
all the relevant input variables.
37 A Self Progressing Fuzzy Rule-Based System 441
There is always a possibility that some anomalous data might be entered by the user
that could lead to self-development of some opposing rules. So it is necessary to
develop a mechanism that would detect such possible conflict and provide a way for
its resolution.
The mechanism of conflict resolution can be described as follows: Compare each
and every rule of the prediction rule-base to all the other rules of the same rule-base.
If, in the consequent parts of any two rules, following two conditions satisfy: (1) out-
put variables are same; and (2) assigned fuzzy sets are different, then check whether
the antecedent parts of both the rules are same (i.e., same input variables with same
fuzzy sets assigned). If yes, then these two rules form a pair of contradictory rules.
Next the user is inquired regarding which one of the two contradictory rules needs
to be abandoned. The CF value of the rule to be abandoned is set to zero. The same
procedure is continued for whole of the rule-base. At the completion of the process,
all the rules possessing the CF values greater than zero are printed to the CLIPS file.
442 A. Iqbal, N. U. Dar
This module generates a set of rules that is responsible for providing the opti-
mal settings of input variables that would best satisfy the maximization and/or
minimization of the selected output variables. Figure 37.6 describes the framework.
The idea exploited in development of this module is that for maximization of any
output variable select an ideal fuzzy set for each numeric input variable, which, on
average, would generate the maximum value of that output variable. For the mini-
mization purpose, select those fuzzy sets for respective input variables that would
result in the least possible value of the output variable, available in the data records.
1 1
Memb.
Memb.
func.
func.
S1 S2 S1 S2
0 0
100 150 200 250 300 -14.5 -7.5 -0.5 -6.5
Speed (m/min) Rake (degree)
1
Memb.
func.
S1 S2 S3 S4 S5 S6
0
2200 2900 3600 4300 5000 5700
Tool life (mm2)
1
Memb.
func.
S1 S2 S3 S4 S5 S6 S7 S8
0
472 572 672 772 872 972
Force (N)
to milling process. The first three variables, namely: speed, rake, and orientation
(milling-orientation) are predictor (input) ones, while the other two, tool life and
cutting force are response (output) variables. If the knowledge-base is developed
based entirely on the data presented in the table, it is very likely that the sys-
tem may provide anomalous results because of the fact that the other influential
milling parameters have not been taken into account, and thus, the self-progressed
knowledge-base can be termed as “Rookie Knowledge-Base”.
Suppose the system is asked to develop its rule-bases and update its interface
based on the data provided and it is also asked to include tool life, but not the cut-
ting force, as output variable for optimization. Figure 37.7 shows the detail of the
triangular fuzzy sets of numeric variables, developed itself by the rule-based system,
in addition to the sets of the objective, as shown in Fig. 37.3. Following is the detail
of the six rules, self-generated by the self-development mode and to be operated by
the optimization module of the rule-based system:
444 A. Iqbal, N. U. Dar
Rule 1: IF Objective Tool life is High AND Speed is not fixed THEN Speed is S1.
Rule 2: IF Objective Tool life is High AND Rake is not fixed THEN Rake is S1.
Rule 3: IF Objective Tool life is High AND Orientation is not fixed THEN
Orientation is Down.
Rule 4: IF Objective Tool life is Low AND Speed is not fixed THEN Speed is S2.
Rule 5: IF Objective Tool life is Low AND Rake is not fixed THEN Rake is S2.
Rule 6: IF Objective Tool life is Low AND Orientation is not fixed THEN
Orientation is Up.
Out of these six rules the first three perform the maximization operation, while
the others perform minimization. Table 37.2 presents the detail of eight rules, self-
generated by the rule-based system and to be operated by its prediction module.
Figure 37.8 shows the interface of the system related to the rookie knowledge-
base. The slider bar, shown at middle of the figure, prompts the user whether to
maximize or minimize the selected output variable and by how much desirability.
Suppose the rule-based system is provided with following input:
Objective: maximize tool life with desirability of 98%.
Rake angle of tool prefixed to 0ı .
Cutting speed and milling-orientation: open for optimization.
Pressing the Process button starts the processing and following results are displayed
in the information pane:
The recommended orientation is down-milling cutting speed is 154.2 m/min.
The predicted tool life is 4; 526:4 mm2 and cutting force is 649.68 N.
Suppose the same rule-based system is provided with more experimental data, cov-
ering effects of all the influential parameters of milling process. When the system
is asked to generate knowledge-base from that set of data, the resulting knowledge-
base would be a veteran knowledge-base. As more and more data will be provided
to the rule-based system it will keep improving its accuracy of optimization and
prediction processes.
Figure 37.9 presents the interface of the rule-based system from the experimental
data provided in papers [14, 15] in addition to that provided in Table 37.1.
37 A Self Progressing Fuzzy Rule-Based System 445
37.5 Conclusion
References
1. L. Monostori, AI and machine learning techniques for managing complexity, changes, and
uncertainty in manufacturing, Eng. Appl. Artif. Intell., 16: 277–291 (2003)
2. J.L. Castro, J.J. Castro-Schez, and J.M. Zurita, Use of a fuzzy machine learning technique in
the knowledge-acquisition process, Fuzzy Sets Syst., 123: 307–320 (2001)
3. H. Lounis, Knowledge-based systems verification: A machine-learning based approach,
Expert Syst. Appl., 8(3) 381–389 (1995)
4. K.C. Chan, A comparative study of the MAX and SUM machine-learning algorithms using
virtual fuzzy sets, Eng. Appl. Artif. Intell., 9(5): 512–522 (1996)
5. G.I. Webb, Integrating machine learning with knowledge-acquisition through direct interac-
tion with domain experts, Knowl-Based Syst., 9: 253–266 (1996)
6. A. Lekova and D. Batanov, Self-testing and self-learning fuzzy expert system for technological
process control, Comput. Ind., 37: 135–141 (1998)
7. Y. Chen, A. Hui, and R. Du, A fuzzy expert system for the design of machining operations,
Int. J. Mach. Tools Manuf., 35(12): 1605–1621 (1995)
8. B. Filipic and M. Junkar, Using inductive machine learning to support decision making in
machining processes, Comput. Ind., 43: 31–41 (2000)
9. S. Cho, S. Asfour, A. Onar, and N. Kaundinya, Tool breakage detection using support vector
machine learning in a milling process, Int. J. Mach. Tools Manuf., 45: 241–249 (2005)
10. P. Priore, D. De La Fuente, J. Puente, and J. Parreno, A comparison of machine learning
algorithms for dynamic scheduling of flexible manufacturing systems, Eng. Appl. Artif. Intell.,
19: 247–255 (2006)
11. A. Iqbal, N. He, L. Li, and N.U. Dar, A fuzzy expert system for optimizing parameters and
predicting performance measures in hard-milling process, Expert Syst. Appl., 32(4): 1020–
1027 (2007)
12. R.A. Orchard, Fuzzy CLIPS, V6.04A Users’ Guide, NRC, Canada (1998)
13. T. Childs, K. Maekawa, T. Obikawa, and Y. Yamane, Metal Machining: Theory and Applica-
tions, Arnold, London (2000)
14. A. Iqbal, N. He, L. Li, W.Z. Wen, and Y. Xia, Influence of tooling parameters in high-speed
milling of hardened steels, Key Eng. Mater. (Advances Machining & Manufacturing and
Technology 8), 315–316: 676–680 (2006)
15. A. Iqbal, N. He, L. Li, Y. Xia, and Y. Su, Empirical modeling the effects of cutting parameters
in high-speed end milling of hardened AISI D2 under MQL environment. Proceedings of 2nd
CIRP Conference on High Performance Cutting, Vancouver, Canada, 2006
Chapter 38
Selection of Ambient Light for Laser Digitizing
of Quasi-Lambertian Surfaces
Abstract Present work deals with the influence of ambient light on the quality
of surfaces digitized with a laser triangulation probe. Laser triangulation systems
project a laser beam onto the surface of the workpiece, so that an image of this pro-
jection is captured in a photo-sensor. The system is then able to establish vertical
position for each point on the projection, processing the information contained in
the mentioned image. As the sensor does not only capture the light projected by the
laser, but also captures the ambient light emitted in the same wavelength of the laser
beam, ambient light becomes a potential error source. A methodology for testing
different light sources under the same digitizing conditions has been developed and
applied to the digitizing of a 99% quasi-lambertian reflectance standard. Tests have
been carried out for six different types of ambient light sources and the resulting
point clouds have been set for comparison. Three different criteria have been used
to analyze the quality of the point cloud: the number of captured points, the average
dispersion of the test point cloud with respect to a reference point cloud, and the dis-
tribution of such geometric dispersion across the whole surface. Results show that
best quality is obtained for low pressure sodium lamps and mercury vapor lamps.
38.1 Introduction
In laser triangulation (LT), the projection of a laser beam onto a surface is cap-
tured as an image in a charged coupled device (CCD). Applying image processing
techniques and the triangulation principle, 3D coordinates of the surface points are
D. Blanco (B)
Department of Manufacturing Engineering, University of Oviedo,
Campus de Gijón, 33203 Gijón, Spain
E-mail: dbf@uniovi.es
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 447
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
448 D. Blanco et al.
acquired (Fig. 38.1). If the distance between a particular point P and the CCD
matches exactly the value of the reference distance (stand-off), its image in the
CCD will be placed exactly in a reference point P 0 . Otherwise, if the point onto
the surface were further away a distance H in the direction of the laser beam, its
image on the CCD will be placed a distance h from the reference point. This way,
it is possible to determine the spatial position of every single point from its image
position on the sensor [1]. When digitizing a part, the laser beam projection sweeps
the target surface, capturing a set of digitized points (point cloud) from its surface.
Accurate calculation of the spatial position for each point of the laser stripe
depends on the accurate calculation of the centroid of its light distribution in the
sensor [2]. If the intensity of the light distribution captured in the sensor is too weak,
the system can not properly calculate the position of the points. Otherwise, if laser
intensity is too high, the sensor will turn into saturation, so that the system could not
calculate the position of points. For intermediate situations, the light distribution is
analysed to determine its centroid position, which corresponds to distance h mea-
sured from reference point. Consequently, the light distribution affects the accuracy
of distance H calculation (Fig. 38.1).
The result of the scanning process depends on the LT system characteristics,
the geometry and quality of the surface and the environmental conditions [3]. These
elements determine the shape and contrast of the laser stripe onto the surface and the
image captured by the sensor. Since surface quality is an important influence factor,
most LT systems allow for adjusting laser intensity according to surface colour and
roughness requirements, achieving an improvement in the sharpness of the laser
beam projection.
CCD Sensor
-h
Laser +h
Diode
P’P’
Lens
Reference Distance
(Stand off)
+H P
-H
The ambient light present at the scanning process is one of the possible environ-
mental influences [4]. Usually, LT systems incorporate optical filters to reduce or
eliminate the influence of the ambient light. These filters accept only those wave-
lengths in the laser emission band. Commercial light sources emit light in a wide
spectrum of frequencies. Since ambient light emitted in the laser emission band will
not be filtered, it will become part of the information captured by the sensor and will
be used in the calculation of point position.
38.2 Objectives
In this work, ambient light influence on the quality of digitized point clouds is evalu-
ated. The tests have been set for the digitizing of a quasi-lambertian surface. In this
type of surfaces, the reflexion is ideally diffused, as the whole incident energy is
reflected in a uniform way in all spatial directions. The quasi-lambertian surface has
been chosen as its behaviour is accepted to provide the best results for LT digitizing.
Digitizing tests have been carried out to compare results under different ambient
light conditions. Although there is a wide range of commercial light sources, the
present work deals with the most commonly used lamps. Tests have been carried
out under laboratory conditions, where illumination for each experiment is reduced
to a single light source. For each test configuration, nature and importance of the
uncertainty introduced by ambient light have been established. Results for each light
source have been afterwards compared to elaborate usage suggestions.
Tests have been carried out using a LT stripe commercial system from Metris (model
Metris LC50) which has been mounted on a Brown & Sharpe Global CMM (model
Image). The LC50 uses a laser beam emitting in the red visible spectrum with a
wavelength emission band between 635 and 650 nm. The maximum peak power is
1 mW.
A reflectance standard from Labsphere has been used as the surface to be dig-
itized [5]. This surface (aka reference surface) is 99% reflectance certified. This
means that its surface is quasi-lambertian, so 99% of the received energy is reflected.
Orientation of the light source with respect to the reference surface and the LT sys-
tem has been selected so that the light direction theoretically makes an angle of 45ı
(' D 45ı ) with the standard surface.
Moreover, theoretical direction of the incident light is orthogonal to the sweep
direction. For every point, the system calculates the z coordinate value taking into
account distance H (Fig. 38.1) and sensor position and orientation. Therefore,
influence of ambient light affects the calculated vertical position of the points.
450 D. Blanco et al.
Distance
d
Latitude φ
φ=0 θ=0
Azimut θ
Fig. 38.2 Spherical coordinate system used to orientate the light source
MetrisLC 50
Reflectance
Standards
Light
Source
Figure 38.2 shows the spherical coordinate system used for the light sources ori-
entation. The origin of this coordinate system is the centre of the reference surface.
In order to incorporate the influence of light source intensity to this work, tests have
been carried out with two different positions for the light sources: 200 mm (ı1) or
400 mm (ı2) from the origin of the coordinate system.
Light sources and reference surface are mounted in a test-bench designed ad
hoc. This test-bench provides a proper position and orientation of the light source
according to the spherical coordinate system (Fig. 38.3). This mounting allows for
comparing point clouds obtained under different test conditions. The set formed by
the test-bench, the light source and the reference surface has been installed on the
CMM table.
The light sources used for this work are among the most usual types on a metro-
logical laboratory or a workshop. This way, three types of incandescent lamps (clear,
38 Selection of Ambient Light for Laser Digitizing of Quasi-Lambertian Surfaces 451
tinted in blue and halogen), one fluorescent lamp, one low pressure sodium lamp and
one mercury vapour lamp constitute the final selection.
Although there is a wide variety of light sources for each class that can be tested
(attending to power or shape), the selected lamps have similar values for the lumi-
nous flux (lumens). This selection criterium is based on finding alternatives that
offer the operator a similar visual comfort, when performing long-time running
digitizing processes.
Commercial references for the light sources used in this work are in Table 38.1.
Test where the only illumination comes from the laser itself will not be altered
by any external energy. Assuming this, a point cloud that has been digitized in the
absence of light will suffer no distortions. Hence, digitizing in the dark appears to be
the most appropriated way for scanning surfaces, although working in the absence
of light is an unapproachable situation for human operators. Nevertheless, a point
cloud obtained in the absence of light can be used as a reference when evaluating
the quality of point clouds digitized under normal ambient lighting.
Experimental procedure used in this work allows for comparing the results
obtained when digitizing under particular light sources with the results obtained
in the absence of ambient light.
Although a single reference point cloud for all test may seem a suitable option,
in practice, this approach is not recommended. Sensitivity of the internal geometry
of the sensor to thermal variations (related to the time the laser remains switched
on) must be taken into account. The fact that the manufacturer of the sensor rec-
ommends a minimum 40 min warm-up period since switching on the laser until a
proper stability is reached, confirms the importance of this effect. Therefore, instead
of using a single reference cloud for all tests, specific reference clouds have been
used in each test comparison. These reference clouds must be digitized immediately
after the capture of each single test cloud. This procedure will minimize the possible
alteration of the sensor internal geometry due to thermal drift.
Thus, the first cloud N kı is obtained by digitizing the reference surface under
a particular type of light source (k), placed at a given distance from the origin of
452 D. Blanco et al.
the coordinate system (ı). Immediately after the first one, a second point cloud,
known as reference point cloud P kı , is obtained by digitizing the same surface in
the absence of ambient light.
Comparison between these two clouds requires each point in the test cloud to
have its equivalent in the reference cloud. To ensure this relationship, a computer
application has been implemented, capable of selecting and classifying a group of
400 points (20 20) in the central area of the reference surface. Matrix constructed
in this way allows for the comparison between each point and its equivalent.
LT system test parameters (as laser light intensity) have been adjusted to avoid
loos of points due to saturation in the reference cloud. Distance between digitized
points has been set to be 1.5 mm in both X and Y directions of the CMM coordinate
system.
Three criteria have been used for comparing the quality of point clouds obtained
under different sources of light.
The first criterium evaluates the influence of lighting on the number of points
captured in the test cloud. As discussed previously, an excessive input of light energy
turns the sensor to saturation. Therefore, it becomes impossible to calculate a proper
value for the z coordinate of the saturated points.
The saturated points are not included in the cloud N kı as the system rejects them.
The parameter used to characterize this criterium is the number of valid points (nkı )
on the cloud.
The second criterium evaluates the influence of lighting in the proper calculation
of z coordinate value for each point. Improper values will cause the points of the
test cloud to appear in a higher or lower place than they really are.
The absolute difference between the z values for each equivalent pair of valid
points in both the test cloud N kı and its reference cloud P kı is calculated (38.1).
ˇ ˇ ˇ ˇ
ˇ kı ˇ ˇ Nkı ˇ
ˇdi ˇ D ˇzi ziPkı ˇ (38.1)
The standard deviation kı of the calculated differences dikı has been used as the
characteristic parameter for this second criterium (38.3).
1 X kı
n
kı
D di (38.2)
n
i D1
v
u n
u1 X 2
kı Dt dikı kı (38.3)
n
i D1
38 Selection of Ambient Light for Laser Digitizing of Quasi-Lambertian Surfaces 453
The last criterium used in this work is qualitative. It consists on a graphical rep-
resentation of deviations dikı for each digitized point cloud. This representation
shows how the ambient light modifies the position of each single point. It allows for
determining whether light influence is equal across the whole surface or not.
Attending to first criterium (nkı ), the results of the tests in Table 38.2, illustrate how
certain types of light sources cause a high percentage of points to become saturated.
Thus, in three of the tests (N 11 , N 12 y N 31 ), no point has been captured due to
saturation caused by the great amount of energy in the laser wavelength band. The
sensor can not obtain properly the z coordinate value of these points, therefore the
point cloud is empty.
A partial loose of points has only occurred in one case (test N 21 ). However, the
same light source placed on a further position (test N 22 ) provides a complete point
cloud. The rest of the tests provide complete point clouds, so that proper information
from surface geometry can be easily obtained.
An order of preference between different light sources can be established by
using the second criterium (explained in previous section) referring to the standard
deviation ( kı /.
From these tests it can be concluded that the best results for both testing positions
are obtained for the low pressure sodium lamp. This light source causes a lower
distortion for the test cloud over the reference cloud. Comparison between pairs of
clouds, all obtained in the absence of light, provides a medium value of 1.2 m as
the systematic error attributable to the system itself. Then, the result obtained for
Table 38.2 Results of the tests for the number of valid points and his standard deviation with
respect to the reference cloud
K ı N Kı nKı Kı .m/ di mi n .m/ di max .m/
1 1 N 11 0 – – –
2 N 12 0 – – –
2 1 N 21 44 8.36 29:24 17:33
2 N 22 400 4.64 21:49 21:12
3 1 N 31 0 – – –
2 N 32 400 4.83 19:4 25:15
4 1 N 41 400 4.33 7:33 21:97
2 N 42 400 2.14 8:97 9:89
5 1 N 51 400 2.15 10:75 9:21
2 N 52 400 1.88 7:75 9:89
6 1 N 61 400 1.99 18:19 13:79
2 N 62 400 1.13 5:92 5:07
454 D. Blanco et al.
the sodium lamp and the furthest position ( kı D 1:13 m) indicates a moderate
influence of this light source on points positions.
The behaviour of the fluorescent lamp is clearly worse than the mercury vapour
one for the closest position, while for the furthest one the difference is not so evident.
For the tinted blue incandescent lamp and the furthest position, the standard devi-
ation is approximately 4.1 times greater than the value calculated for the sodium
lamp. For the closest position, its standard deviation is extremely high, but it must
be remarked that this value has been calculated considering a small number of valid
points, as most of the theoretical points in the cloud have not been captured due to
saturation.
In the case of the halogen lamp, the results for deviation kı are the worst of all.
At the furthest position, the deviation is the maximal observed, approximately 4.3
times greater than the sodium lamp. This result was predictable as this lamp causes
all the points to become saturated for the distance of 200 mm.
Finally, the third criterium (graphic representation of the parameter dikı along
the whole area) shows how the distribution of these values depends on the type of
light source and is clearly non-uniform (Fig. 38.4 and 38.5).
Thus, when testing the blue incandescent lamp at a distance of 200 mm from the
origin of the coordinate system (Fig. 38.4), the graph shows how the lack of points
due to saturation is produced in points of the reference surface that are close to the
light source.
On the other hand, valid points are only registered in a narrow strip placed in
the area where the points of the surface are further from the light source. In certain
areas of each test cloud, some of the points are located in an upper vertical position
from their equivalent ones in the reference cloud, whereas points in other areas are
located in a lower position.
However, this distortion in the test clouds does not seem to be related to the prox-
imity of points to the light source, even when such a relationship can be established
for the saturation of points.
By contrast, the distribution of peaks and valleys in Fig. 38.4 and 38.5 shows a
parallel orientation to the laser stripe projection onto the surface.
This result does not fit with any of the previous assumptions. The effect may
be related to local irregularities on the reference surface properties. This should be
confirmed by later work.
The appearance of the differences plotted in Fig. 38.4 confirms the conclusions
obtained for the second criterium. Since sodium lamp generates less distortion in
the cloud, it provides the best performance among the tested lamps.
This distortion is increased for mercury lamp and for fluorescent lamp. On the
other hand, the blue incandescent lamp causes a huge distortion effect.
When testing the lights on the furthest position (ı D 400 mm), five completely-
full clouds have been obtained (Fig. 38.5). As it was set for the closest distance,
distribution of dikı shows a non-uniform behaviour. Furthermore, the parallelism
between the preferential direction of peaks and valleys in the graphs and the
orientation of the laser stripe can be noticed as in previous Fig. 38.4.
38 Selection of Ambient Light for Laser Digitizing of Quasi-Lambertian Surfaces 455
Moreover, the result in terms of the level of distortion for the test clouds follows
the same previously established pattern according to the type of light source.
The sodium lamp is again the source that has a lower influence in the distortion
introduced in the test cloud. For this position, the distortion introduced by mercury
456 D. Blanco et al.
and fluorescent lamps is very similar, when a better behaviour of the mercury one
has been established for the closest distance.
The results for the incandescent lamps (both the blue one and the halogen) are
the worst among the tested lamps. The differences (both positive and negative) are
significantly higher than for the rest of lamps.
38 Selection of Ambient Light for Laser Digitizing of Quasi-Lambertian Surfaces 457
38.7 Conclusions
This work has established the influence of ambient light on the quality of laser
triangulation digitized surfaces. Results of the tests show how the influence of the
light sources affects the digitizing in different ways. Sources that introduce a huge
amount of energy on the laser wavelength band will cause some of the points to
saturate. In severe cases this affects the whole point cloud and no information will
be obtained. It has been also demonstrated that different types of sources cause
different results when calculating vertical position for every point of the cloud.
The experimentation carried out confirms that laser digitizing of surfaces in com-
plete absence of external sources of light provides the best results. In the usual
case that this requirement can not be satisfied, results lead to recommend using
those sources of light that cause less distortion of the point cloud: low pressure
sodium lamps and the mercury vapour lamps. Sodium lamps emit in the orange
range (589 nm) of the visible spectrum, which is especially annoying when working
for long time periods, as it disables the operator for distinguishing different colors.
This leads to recommend mercury vapour lamps as the most appropriate election.
Acknowledgements The authors wish to thank the Spanish Education and Science Ministry and
the FEDER program for supporting this work as part of the research project MEC-04-DPI2004-
03517.
References
1. Hüser-Teuchert, D., Trapet, E., Garces, A., Torres-Leza, F., Pfeifer, T., Scharsich, P., 1995. Per-
formance test procedures for optical coordinate measuring probes final project report. European
Communities.
2. Hüsser, D., Rothe, H., 1998. Robust averaging of signals for triangulation sensors. Measure-
ment Science and Technology. Vol. 9, pp.1017–1023.
3. Boehler, W., Bordas Vicent, M. Marbs, A., 2003. Investigating Laser Accuracy. In: CIPA XIXth.
International Symposium, 30 Sept.–4 Oct., Antalya, Turkey, pp. 696–701.
4. Blais, F. 2003. A Review of 20 Years of Ranges Sensor Development, SPIE Proceedings,
Electronic Imaging, Videometrics VII. Santa Clara, CA, USA. Vol. 5013, pp. 62–76.
5. Forest, J., Salvi, J., Cabruja, E., Pous, C., 2004. Laser Stripe Peak Detector for 3D Scanners. A
FIR Filter Approach. Proceedings of the 17th International Conference on Pattern Recognition.
Vol. 3, pp. 646–649.
Chapter 39
Ray-Tracing Techniques Applied
to the Accessibility Analysis for the Automatic
Contact and Non Contact Inspection
Abstract Accessibility analysis represents one of the most critical tasks in inspec-
tion planning. The aim of this analysis is to determine the valid orientations of the
inspection devices used in probe operations and non-contact scanning operations
on a Coordinate Measuring Machine (CMM). A STL model has been used for dis-
cretizing the inspection part in a set of triangles, which permits the application of
the developed system to any type of part, regardless of its shape and complexity.
The methodology is based on the application of computer graphics techniques such
as ray-tracing, spatial subdivision and back-face culling. This analysis will take
into account the real shape and geometry of the inspection device and the con-
straints imposed by the CMM on which it is mounted. A simplified model has
been developed for each inspection device component using different basic geo-
metrical shapes. Finally, collision-free orientations are clustered for minimizing the
orientation changes during the inspection process.
39.1 Introduction
B. J. Álvarez (B)
Department of Manufacturing Engineering, University of Oviedo, Campus de Gijón, 33203 Gijón,
Spain
E-mail: braulio@uniovi.es
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 459
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
460 B.J. Álvarez et al.
like space partitioning and back-face culling have been applied in order to speed up
the searching of valid orientations.
The methodology has been applied to inspection processes based on a touch-
trigger probe [3] and to non-contact scanning processes based on a laser stripe [4].
In both cases, different constraints have been considered: real shape and dimen-
sions of the inspection devices, process parameters and possible orientations of
the motorized head of the CMM where the inspection device has been mounted.
This motorized head (PH10MQ) provides 720 feasible orientation of the inspection
device by rotating at a resolution of 7:5ı about both horizontal and vertical axes
(A, B).
The accessibility analysis for a touch-trigger probe deals with determining all the
feasible probe orientations that allow for performing the part inspection avoiding
collisions with the part or any other obstacle in the environment of the inspection
process. Moreover, the valid orientations of the non-contact scanning device will be
obtained to guarantee the visibility of the surface to be scanned. The methodology
applied in both cases will be exactly the same. In both cases several simplifications
have to be made. In a first stage, the inspection devices have been abstracted by
means of infinite half-lines. In a second stage, the real shape and dimensions of the
inspection devices are taken into account.
Another simplification is related to the model of the part being inspected. The
CAD model of the part, that may include complex surfaces, is converted to a STL
model, where each surface is discretized by a set of simple triangular facets. For
solving the accessibility analysis, the naı̈ve approach consists on testing the intersec-
tion between the part triangles and any feasible orientation of the inspection device.
In order to reduce the calculation time, different computer graphics techniques like
space partitioning based on kd-tree, ray traversal algorithm, back-face culling and
ray-triangle intersection tests have been also applied [5].
Figure 39.1 shows the abstractions made to contact and non contact inspection
devices. In the case of touch-trigger probe, each probe orientation is replaced by
an infinite half-line with the same direction of the orientation being analyzed. In the
case of the laser triangulation system, two infinite half-lines have been used. One
represents the laser beam while the second represents the vision space of the CCD
sensor. Then, the analysis is divided in two phases: local and global analysis.
39 Ray-Tracing Techniques Applied to the Accessibility Analysis 461
First, the local analysis only takes into account the possible interference between
the infinite half-line and the part surface being inspected (probed or digitized)
whereas the possible interferences with the rest of part surfaces or any other obstacle
are ignored. Hence, valid orientations l will be those which make an angle between
0 and =2 with the surface normal vector n .
In a second phase, the orientations obtained in the local analysis will be checked
in order to determine if they have or not collision with the rest of part surfaces or
any other obstacle. This analysis is called global analysis and it is complex and
expensive from a computational point of view because it involves the calculation
of multiple interferences tests. To make this calculation easier, a STL model of the
part is used, where each surface is discretized by a set of triangles. Thus the global
accessibility analysis is reduced to determine if there exist interferences between
orientations l obtained in the local analysis and the triangles that compose the STL
model of either the part or the obstacle.
The use of space partitioning structures like kd-trees allows for reducing the
number of the triangles to test in the global analysis because it involves checking
intersection exclusively with triangles which can potentially be traversed by each
inspection device orientation. The part is partitioned in regions bounded by planes
(bounding boxes) and each part triangle is assigned to the region within which it is
located. Then, regions traversed by each inspection device orientation are selected
by means of the ray-traversal algorithm that was first developed and applied by
Kaplan [2].
Before checking intersection between each inspection device orientation and all
triangles included in the traversed regions previously determined, the number of
intersection tests can be reduced even more by applying a back-face culling algo-
rithm [5]. Thus, from the initial set of triangles included in the traversed regions, a
subset is extracted that do not include those triangles whose visibility according to
an analyzed inspection device orientation is completely blocked by other triangles.
462 B.J. Álvarez et al.
Finally, the intersection test between each undiscarded triangle and the inspec-
tion device orientation l is carried out. This verification is based on the algorithm
developed by Möller and Trumbore [6]. If any intersection is detected then the ori-
entation is rejected. If there is no interference with any triangle, the orientation will
be considered as valid.
Fig. 39.2 Components of the inspection device and their simplified models
39.5 Clustering
From the previous analysis, the inspection device orientations that are valid for
probing each point or scanning each part triangle have been determined. These
orientations are mathematically represented by means of a binary matrix Ai .q; r/
where each element corresponds to a combination of discrete values of A and B
angles:
8 9
ˆ
ˆ 1 if .A D aq ; B D br / >
>
< =
is a valid orientation for point Pi
Ai .q; r / D (39.1)
ˆ 0 if .A D aq ; B D br / >
>
:̂ ;
is a not valid orientation for point Pi
To reduce the process operation time related to device orientation changes, orienta-
tions (aq , br / common to the greatest number of points to probe (clusters of points)
or triangles to scan (clusters of triangles) must be found. The classification of points
or triangles continues until no intersection can be found between the final clusters.
The algorithm used is similar to that developed by Vafaeesefat and ElMaraghy
[11]. Next, the algorithm is explained for an inspection process using a touch-trigger
probe.
Each point Pi .i D 1; 2; : : :; n/ to be probed is associated to a binary matrix of
valid orientations Aki .i D 1; 2; : : :; n/. Initially (k D 1), the clusters will be the
same as each of the points to be probed: Cik D Pi .
Starting from the clusters Cik and from the binary matrices Aki , a new matrix
CI k .i; j / D Aki \ Akj is built showing the common probe orientations to the
clusters two against two.
With the purpose of creating clusters whose points are associated with the great-
est number of valid orientations, the algorithm searches the indices (s, t/ of CI k
that correspond to the maximum number of common valid orientations. After that,
clusters Csk and Ctk associated to points Ps and Pt respectively will be regenerated
as follows:
Csk D Csk [ Ctk and Ctk D Ø (39.2)
and the binary matrix of valid orientation associated to the new cluster Csk will be
Aks D Aks \ Akt .
With these new clusters, matrix CI k is updated to:
The clustering process finishes when the number of common orientations corre-
sponding to all the elements above the main diagonal of matrix CI k have become
zero. A similar process is used to determine the clusters of triangles Ti (i D
1; 2; : : :; n) for a part to be scanned.
39 Ray-Tracing Techniques Applied to the Accessibility Analysis 465
Fig. 39.3 Accessibility map for the point P4 on the surface s39
466 B.J. Álvarez et al.
Apart from the inspection process, the developed methodology allows for determin-
ing the orientations of a laser stripe system to scan a part. In this type of scanning
systems a laser stripe of known width is projected onto the part surface to be scanned
and the reflected beam is detected by a CCD camera. Therefore, not only the incident
laser beam orientation has been taken into account for the accessibility analysis but
also the possible occlusion due to the interference of the reflected laser beam with
the part.
Figure 39.4 shows the laser head orientation map associated to the local and
global accessibility analysis for triangle 558 of the example part. The darkest color
39 Ray-Tracing Techniques Applied to the Accessibility Analysis 467
Fig. 39.4 Local and global accessibility maps for triangle 558
Fig. 39.5 Different stages to determine the global visibility map for a triangle and an incident laser
beam orientation
represents the head orientations (A, B) that coincide or are closest to the normal
direction of the triangle analyzed. These are considered as optimal orientations.
Grey colors represent head orientations far from the optimal value which lead to
worse scanning quality. White color represents head orientations that do not enable
to scan the considered triangles. For a better visualization of the orientation map,
increments of 15ı have been considered for angles A and B.
For triangle 558 an incident laser beam orientation (A D 30ı , B D 180ı/ has
been selected in order to show the part regions (bounding boxes) that it traverses
(Fig. 39.5). Triangles partially or totally enclosed in these bounding boxes are shown
in the figure before and after applying back-face culling.
468 B.J. Álvarez et al.
Figure 39.6 shows in a grey scale the ten triangle clusters obtained for the
example part.
39.7 Conclusions
Most of the accessibility analysis presented in other works only deal with a lim-
ited number of the inspection device orientations, simple parts with only planar
surfaces or specific geometrical shapes, simplified device representations or a short
number of points in the case of inspection by touch-trigger probe. However, in this
paper, a new methodology for accessibility analysis is presented which allows for
overcoming the previous limitations:
1. The methodology has been extended to the inspection process by means of a
touch-trigger probe and the scanning process by means of a laser stripe system.
2. All the possible orientations (720) of the inspection device are taken into consid-
eration.
3. The use of the STL model permits the application of the developed system to any
type of part, regardless of its shape and complexity.
4. The real shape and dimensions of the inspection device are considered for the
analysis.
5. The implemented algorithms based on Computer Graphics reduce computation
time and consequently can deal with a high number of inspection points and
complex surfaces.
6. Moreover, a clustering algorithm is applied that efficiently groups the inspection
points and triangles of the STL of the part to be scanned in order to reduce the
number of probe orientation changes.
The developed system has been applied to different parts with satisfactory results
demonstrating the application in practice. Future research will concentrate on devel-
oping new algorithms that further reduce computation time, and on generating the
inspection paths from the orientations obtained in the clustering process.
39 Ray-Tracing Techniques Applied to the Accessibility Analysis 469
Acknowledgements This paper is part of the result of a Spanish State Commission for Science
and Technology Research Project (DPI2000-0605) and was supported by this Commission and
FEDER.
References
1. Spyridi, A. J. and Requicha, A. A. G., 1990, Accessibility analysis for the automatic inspection
of mechanical parts by coordinate measuring machines, Proceedings of the IEEE International
Conference on Robotics and Automation. Vol. 2, 1284–1289.
2. Kaplan, M., 1985, Space-tracing: A constant time ray-tracer, Proceedings of the
SIGGRAPH’85, 149–158.
3. Limaiem, A. and ElMaraghy, H. A., 1999, CATIP: A computer-aided tactile inspection
planning system, Int. J. Prod. Res. 37(3): 447–465.
4. Lee, K. H. and Park, H. -P., 2000, Automated inspection planning of free-form shape parts by
laser scanning, Robot. Comput. Integr. Manuf. 16(4): 201–210.
5. Foley, J. D., Van Dam, A., Feiner, S. K., and Hughes, J. F., 1996, Computer Graphics,
Principles and Practice, 2nd ed., Addison-Wesley, Reading, MA, pp. 663–664.
6. Möller, T. A. and Trumbore, B., 1997, Fast, minimum storage ray/triangle intersection,
J. Graph. Tools 2(1): 21–28.
7. Schneider, P. J. and Eberly, D. H., 2003, Geometrical Tools for Computer Graphics, Morgan
Kaufmann, San Francisco, CA, pp. 376–382.
8. Gottschalk, S., Lin, M. C., and Manocha, D., 1996, OBBTree: A hierarchical structure for
rapid interference detection, Proceedings of the SIGGRAPH’96, 171–180.
9. Möller, T. A., 2001, Fast 3D triangle-box overlap testing, J. Graph. Tools 6(1): 29–33.
10. Eberly, D. H., 2000, 3D Game Engine Design. A Practical Approach to Real-Time Computer
Graphics, Morgan Kaufmann, San Diego, CA, pp. 53–57.
11. Vafaeesefat, A. and ElMaraghy, H. A., 2000, Automated accessibility analysis and measure-
ment clustering for CMMs, Int. J. Prod. Res. 38(10): 2215–2231.
Chapter 40
Detecting Session Boundaries to Personalize
Search Using a Conceptual User Context
Abstract Most popular Web search engines are carachterized by “one size fits all”
approaches. Involved retrieval models are based on the query-document matching
without considering the user context, interests ang goals during the search. Per-
sonalized Web search tackles this problem by considering the user interests in the
search process. In this chapter, we present a personalized search approach which
adresses two key challenges. The first one is to model a conceptual user context
across related queries using a session boundary detection. The second one is to per-
sonalize the search results using the user context. Our experimental evaluation was
carried out using the TREC collection and shows that our approach is effective.
40.1 Introduction
Most popular web search engines accept keyword queries and return results that are
relevant to these queries. Involved retrieval models use the content of the document
and their link structure to assess the relevance of the document to the user query.
The major limitation of such systems is that information about the user is completely
ignored. They return the same set of results for the same keyword query even though
these latter are submitted by different users with different intentions. For exam-
ple, the query python may refer to python as a snake as well as the programming
language.
Personalized Web search aims at tailoring the search results to a particular user
by considering his profile, which refers to his interests, preferences and goals during
M. Daoud (B)
IRIT, University of Paul Sabatier, 118, Route de Narbonne, Toulouse, France
E-mail: daoud@irit.fr
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 471
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
472 M. Daoud et al.
the search. One of the key challenges in personalized web search are how to model
accurately the user profile and how to use it for an effective personalized search.
User profile could be inferred from the whole search history to model long term
user interests [1] or from the recent search history [2] to model short term ones.
According to several studies [2], mining short term user interests is more effective
for disambiguating the Web search than long term ones. User profile representation
model has also an impact on the personalized retrieval effectiveness. Involved mod-
els could be arranged from a very simple representation based on bags of words to
complex representation based on concept hierarchy, namely the ODP [3, 4].
This chapter focuses on learning short term user interests to personalize search.
A short term user interest is represented by the user context in a particular search
session as a set of weighted concepts. It is built and updated across related queries
using a session boundary identification method. Search personalization is achieved
by re-ranking the search results of a given query using the short term user context.
The chapter is organized as follows. Section 40.2 presents some related works
of search personalization approaches and session boundary identification proposi-
tions. Section 40.3 presents our contribution of search personalization and session
boundary detection. In Section 40.4, we present an experimental evaluation, discus-
sion and results obtained. In the last section, we present our conclusion and plan for
future work.
Personalized Web search consists of two main components: the user profile model-
ing and the search personalization processes.
User profile is usually represented as a set of keyword or class of vectors [5,6] or
by concept hierarchy issued from the user search history [7, 8]. As we build the user
context using the ODP ontology, we review some related works based on the same
essence. The ODP ontology is used in [3] to learn a general user profile applicable
to all users represented by a set of concepts of the first three levels of the ontology.
Instead of using a set of concepts, an ontological user profile in [9] is represented
over the entire ontology by classifying the web pages browsed by the user into its
concepts. Similar to this last work, an ontological user profile is described in [4]
where the concept weights are accumulated using a spreading activation algorithm
that activates concepts through the hierarchical component of the ontology.
The user profile is then exploited in a personalized document ranking by means
of query reformulation [10], query-document matching [5] or result re-ranking [9].
Mining short term user interests in a session based search requires a session bound-
ary mechanism that identifies the most suitable user interests to the user query. Few
studies have addressed this issue in a personalized retrieval task. We cite the UCAIR
system [2] that defines a session boundary detection based on a semantic similarity
measure between successive queries using mutual information.
40 Detecting Session Boundaries to Personalize Search 473
Unlike previously related works, our approach has several new features. First, we
build a semantic user context as a set of concepts of the ODP ontology and not as an
instance of the entire ontology [9]. The main assumption behind this representation
is to prune non relevant concepts in the search session. Second, we build the user
profile across related queries, which allow using the most suitable user interest to
alleviate an ambiguous web search.
We present in this section how to build and update the user context in a search
session and how to exploit it in a personalized document re-ranking.
We build a short-term user context that refers generally to the user’s topics of inter-
ests during a search session. It is built for each submitted query using the relevant
documents viewed by the user and the ODP ontology. The first step consists of
extracting the keyword user context K s for a submitted query q s . Let Drs be the
set of documents returned with respect to query q s and judged as relevant by the
user, each represented as a single term vector using the tf*idf weighting scheme.
The keyword user context K s is a single term vector where the weight of term t is
computed as follows:
1 X
K s .t/ D wt d (40.1)
jDrs j s
d 2Dr
relations and is associated to a set of web pages classified under that concept. We
represent each concept by a single term vector cj where terms are extracted from
all individual web pages classified under that concept as detailed in a previous
work [11]. Given a concept cj , its similarity weight sw.cj / with Ks is computed
as follows:
sw.cj / D cos.cj ; Ks / (40.2)
We obtain a set of concepts that contain relevant concepts at different levels and
having different weights. In the next section, we proceed by disambiguating the
concept set in order to rank the most relevant concepts of general level in the top of
the user context representation.
(B) Disambiguating the Concept Set
We aim at disambiguating the obtained concept set using a sub-concept aggrega-
tion scheme. We retain the level three of the ontology to represent the user context.
Indeed, the level two of the ontology is too general to represent the user interests,
and leaf nodes are too specific to improve the web search of related queries.
Then, we recomputed the weight of each weighted concept by summing the
weights of its descendants. We are based on the assumption that the most rele-
vant concepts are those having a greater number of descendant concepts weighted
according to the ontology. As shown in Fig. 40.1a, we identify a cluster of weighted
concepts having a common general depth-three concept; we assign to this latter
a relevance score computed by adding the weights of its descendant concepts as
shown in Fig. 40.1b. The weight of a general concept cj , having a set of n related
descendant concepts S.cj /, is computed as follows:
1 X
sw.cj / D sw.ck / (40.3)
n
1kn^ck 2S.cj /
To To
depth depth
0.4
Aggregate sub-concepts under their general Compute depth-three concept scores by summing
depth-three concept its descendant concepts weights
(a) (b)
We update the user context across queries identified as related using a session
boundary recognition mechanism described in the next section. Let C s1 and C s be
respectively the user contexts for successive and related queries. Updating method
is based on the following principles: (1) enhance the weight of possible common
concepts that can appear in two successive user contexts, (2) alter the weight of
non-common concepts using a decay factor ˇ. This allows taking into account the
most recent concepts of interests to the user in the search session. The new weight
of a concept cj in the user context C s is computed as follows:
where swC s1 .cj / is the weight of concept cj in context C s1 , swC s .cj / is the
weight of concept cj in context C s .
We personalize search results of query q sC1 related to the user context C s repre-
sented as an ordered set of weighted concepts < cj ; sw.cj / > by combining for each
retrieved document dk , its initial score Si and its contextual score Sc as follows:
We propose a session boundary recognition method using the Kendall rank correla-
tion measure that quantifies the conceptual correlation I between the user context
C s and the query q sC1 . We choose a threshold and believe the queries are from
the same session if the correlation is above the threshold.
476 M. Daoud et al.
where the query frequency .QF / and the concept weight (CW) are formally
defined as:
jqjS
QF .ci / D S
; C W .q sC1 ; ci / D cos.qsC1
t ; ci / (40.8)
< jqj ; ci >
jqjS is the total number of related queries submitted in search session S , <
jqjS ; ci > is the number of user contexts built in the current search session S and
containing concept ci .
Thus, the similarity I gauge the changes in the concept ranks between the
query and the user context as follows.
P P
0 Scc 0 .q/ Scc 0 .C /
s
I D .qoC / D qP Pc c
s
P P (40.9)
c c 0 S 2
cc 0 .q/ c c 0 S 2
cc 0 .C s
/
v.ci / v.cj /
Sci cj .v/ D sign.v.ci / v.cj // D ˇ ˇ
ˇv.ci / v.cj /ˇ
Where, ci and cj are two concepts issued from both the query and the user context,
qsC1 s sC1
c .ci / (resp. C .ci /) is the weight of the concept ci in qc (resp. Cs ).
The experiments were based on the test data provided by TREC collection especially
from disks 1&2 of the ad hoc task that contains 741670 documents. We particu-
larly tested topics from q51 q100 presented in Table 40.1. The choice of this test
40 Detecting Session Boundaries to Personalize Search 477
collection is due to the availability of a manually annotated domain for each query.
This allows us, on one hand, to enhance the data set with simulated user interests
associated for each TREC domain. On the other hand, we can define a search session
as a set of related queries annotated in the same domain of TREC.
The goals of the session boundary recognition experiments are: (A) analyzing query
– context correlation values (I ) according to the Kendall coefficient measure, (B)
computing the accuracy of the session identification measure and identifying the
best threshold value ().
For this purpose, we apply a real evaluation scenario that consists of choosing
a query sequence holding six successive sessions related to six domains of TREC
listed in Table 40.1. For testing purpose, we build the user context for each query
using 30 of its relevant documents listed in the TREC assessment file and update it
using formula (8) across related queries of the same domain using ˇ D 0:2.
(A) Analyzing Query-Context Correlations
In this experiment, we computed the query-context correlations values between a
particular query and the user context built across previous and queries related to the
same TREC domain with respect to the query sequence. Figure 40.2 shows a query
sequence holding six search sessions presented on the X-axis and the query–context
correlation values on the Y-axis. A fall of the correlation curve means a decrease
of the correlation degree with the previous query and possible session boundary
identification. A correct session boundary is marked by a vertical line according to
0,00
69
79
71
91
72
61
80
93
99
0
62
76
78
83
84
92
70
74
85
64
67
57
77
87
10
-0,10
Kendall correlation values
-0,20
-0,30
-0,40
-0,50
-0,60
-0,70
Queries
the annotated queries in each TREC domain. We can notice that correlation values
vary between queries in the same domain. Indeed, in the domain “Environment” of
TREC, some queries are related to the environmental concepts of the ODP, while a
specific query (q59 ) is related to the “Weather” topic that has no match with the set
of the environmental concepts.
Based on the range of correlation values Œ0:61; 0:08, we identify in the next
paragraph the best threshold cut-off value ().
(B) Measuring the Session Boundary Measure Accuracy
The goal of this experiment is to evaluate the accuracy of the session boundary
detection measure. It is computed for each threshold value as follows:
jCQj
P ./ D (40.10)
jQj
jCQj is the number of queries identified as correctly correlated to the current user
context along the query sequence, and jQj is the total number of correlated queries
in the query sequence.
We show in Fig. 40.3 the accuracy of the Kendall correlation measure with vary-
ing the threshold in the range of (Œ0:61; 0:08. The optimal threshold value is
identified at 0:58 achieving the optimal accuracy of 70%. We can conclude that
the Kendall measure achieves significant session identification accuracy. Indeed, it
takes into account the concept ranks in both the query and the user context represen-
tations, which makes it tolerant for errors of allocating related queries in different
search sessions.
40 Detecting Session Boundaries to Personalize Search 479
%
80
70
Kendall measure accuracy
60
50
40
30
20
10
0
0
0
,6
,5
,3
,3
,5
,5
,4
,4
,4
,3
,5
,5
,4
,4
,3
,3
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
Threshold values
Fig. 40.3 Kendall correlation accuracy with varying the threshold value
Our experimental design for evaluating the retrieval effectiveness consists of com-
paring the personalized search performed using the query and the suitable user
context to the standard search performed using only the query ignoring any user
context.
We conducted two sets of controlled experiments: (A) study the effect of the
re-ranking parameter in the re-ranking formula (9) on the personalized precision
improvement, (B) evaluate the effectiveness of the personalized search compara-
tively to the standard search.
We used “Mercure” as a typical search engine where the standard search is based
on the BM25 scoring formula retrieval model. We measure the effectiveness of re-
ranking search results in terms of Top-n precision (P5, P10) and Mean average
precision (MAP) metrics.
The evaluation scenario is based on the k-fold cross validation explained as
follows:
For each simulated TREC domain in Table 40.1, divide the query set into k
equally-sized subsets, and using k 1 training subsets for learning the user
context and the remaining subset as a test set.
For each query in the training set, an automatic process generates the associated
keyword user context based on its top n relevant documents listed in the TREC
assessment file provided by TREC using formula (1), and then maps it on the
ODP ontology to extract the semantic user context.
Update the user context concept weights across an arbitrary order of the queries
in the training set using ˇ D 0:2 in formula (8) and use it for re-ranking the
search results of the queries in the test set using h D 3 in formula (9).
480 M. Daoud et al.
15,00
10,00
% P@5
5,00 % P@10
% MAP
0,00
0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9
-5,00
-10,00
values
γ
Domain P@5 P@10 MAP P@5 Impro P@10 Impro MAP Impro
Environment 0.25 0.32 0.18 0.35 40% 0.37 15:38% 0.19 1:73%
Military 0.25 0.27 0.05 0.35 40% 0.32 18:18% 0.07 46:46%
Law & Gov. 0.40 0.42 0.12 0.50 25% 0.45 5:88% 0.14 12:33%
Inter. Rel. 0.16 0.12 0.01 0.16 0% 0.16 33:33% 0.02 36:59%
US Eco. 0.26 0.30 0.09 0.33 25% 0.36 22:22% 0.10 8:35%
Int. Pol. 0.16 0.10 0.05 0.20 25% 0.16 60% 0.07 42:26%
not related concepts influences the precision improvement and probably reduce the
retrieval performance especially for Law & Gov TREC domain.
References
1. Bin Tan, Xuehua Shen, and ChengXiang Zhai. Mining long-term search history to improve
search accuracy. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 718–723, New York, NY, USA, 2006. ACM.
2. Smitha Sriram, Xuehua Shen, and Chengxiang Zhai. A session-based search engine. In
SIGIR’04: Proceedings of the International ACM SIGIR Conference, 2004.
3. Fang Liu, Clement Yu, and Weiyi Meng. Personalized web search for improving retrieval
effectiveness. IEEE Transactions on Knowledge and Data Engineering, 16(1):28–40, 2004.
482 M. Daoud et al.
4. Ahu Sieg, Bamshad Mobasher, and Robin Burke. Web search personalization with ontological
user profiles. In Proceedings of the CIKM’07 conference, pages 525–534, New York, NY,
USA, 2007. ACM.
5. Lynda Tamine-Lechani, Mohand Boughanem , Nesrine Zemirli. Personalized document rank-
ing: exploiting evidence from multiple user interests for profiling and retrieval. to appear. In
Journal of Digital Information Management, vol. 6, issue 5, 2008, pp. 354–366.
6. John Paul Mc Gowan. A multiple model approach to personalised information access. Master
thesis in computer science, Faculty of science, Universit de College Dublin, February 2003.
7. Alessandro Micarelli and Filippo Sciarrone. Anatomy and empirical evaluation of an adaptive
web-based information filtering system. User Modeling and User-Adapted Interaction, 14(2–
3):159–200, 2004.
8. Hyoung R. Kim and Philip K. Chan. Learning implicit user interest hierarchy for context in
personalization. In Proceedings of IUI ’03, pages 101–108, New York, NY, USA, 2003. ACM.
9. Susan Gauch, Jason Chaffee, and Alaxander Pretschner. Ontology-based personalized search
and browsing. Web Intelli. and Agent Sys., 1(3–4):219–234, 2003.
10. Ahu Sieg, Bamshad Mobasher, Steve Lytinen, Robin Burke. Using concept hierarchies
to enhance user queries in web-based information retrieval. In The IASTED International
Conference on Artificial Intelligence and Applications. Innsbruck, Austria, 2004.
11. Mariam Daoud, Lynda Tamine-Lechani, and Mohand Boughanem. Using a concept-based
user context for search personalization. to appear. In Proceedings of the 2008 International
Conference of Data Mining and Knowledge Engineering (ICDMKE’08), pages 293–298.
IAENG, 2008.
Chapter 41
Mining Weather Information in Dengue
Outbreak: Predicting Future Cases Based
on Wavelet, SVM and GA
Yan Wu, Gary Lee, Xiuju Fu, Harold Soh, and Terence Hung
Abstract Dengue Fever has existed throughout the contemporary history of mankind
and poses an endemic threat to most tropical regions. Dengue virus is transmitted to
humans mainly by the Aedes aegypti mosquito. It has been observed that there are
significantly more Aedes aegypti mosquitoes present in tropical areas than in other
climate regions. As such, it is commonly believed that the tropical climate suits
the life-cycle of the mosquito. Thus, studying the correlation between the climatic
factors and trend of dengue cases is helpful in conceptualising a more effective
pre-emptive control measure towards dengue outbreaks. In this chapter, a novel
methodology for forecasting the number of dengue cases based on climactic factors
is presented. We proposed to use Wavelet transformation for data pre-processing
before employing a Support Vector Machines (SVM)-based Genetic Algorithm to
select the most important features. After which, regression based on SVM was used
to perform forecasting of the model. The results drawn from this model based on
dengue data in Singapore showed improvement in prediction performance of dengue
cases ahead. It has also been demonstrated that in this model, prior climatic knowl-
edge of 5 years is sufficient to produce satisfactory prediction results for up to
2 years. This model can help the health control agency to improve its strategic plan-
ning for disease control to combat dengue outbreak. The experimental result arising
from this model also suggests strong correlation between the monsoon seasonality
and dengue virus transmission. It also confirms previous work that showed mean
temperature and monthly seasonality contribute minimally to outbreaks.
Y. Wu (B)
Agency for Science, Technology and Research
20 Biopolis Way, #07–01 Centros (A*GA), Singapore 138668
E-mail: yan wu@scholars.a-star.edu.sg
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 483
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
484 Y. Wu et al.
41.1 Introduction
In 2006, the World Health Organization reported that “dengue is the most rapidly
spreading vector-borne disease”. Between 1985 and 2006, in Southeast Asia alone,
an estimated 1,30,000 people were infected annually with dengue virus, which is
transmitted to humans mainly by the Aedes aegypti mosquito [1]. The Dengue virus
is transmitted from a female Aedes mosquito to a human or vice versa during the
feeding process, generally referred to as horizontal transmission. The Dengue virus
may also be passed from parent mosquitoes to offspring via vertical transmission.
However, mosquitoes can only survive in certain climate conditions. For exam-
ple, the Aedes aegypti mosquito cannot survive below the freezing temperature of
0ı C [2]. However, climatic conditions worldwide have displayed enormous short-
term abnormalities, which increases the likelihood of mosquitoes surviving longer
and penetrating temperate regions that were previously free from mosquito-borne
diseases [2].
In addition to climate conditions, other factors contribute to the time-series trans-
mission profile of the dengue virus, e.g. geographical expansion of Aedes mosquito
population and its density, the demographic distribution, mobility and susceptibility
of the human population. This dynamic relationship is illustrated in the Fig. 41.1
below.
Data selected for the model should represent the many variables affecting the
spread of the dengue virus. Prior studies have shown that the population size of
the mosquito depends largely on the presence of suitable breeding sites and climate
conditions [5]. As such, we can make the following assumptions of the input data:
Variations in the weather data represent the breeding patterns of the Aedes
mosquitoes and
The time-series of dengue cases in a relatively confined area represents the
ensemble of dengue spread information such as susceptibility of human popu-
lation to the virus
The coding pattern of the above two pieces of information is unknown but likely
non-linear. Moreover, not all information at different time resolutions superposed in
the time-series of an input feature, such as temperature, contributes to the fluctuation
of dengue cases. Perhaps only one or a few seasonal levels of detail in a particular
input feature contribute to the outbreak. Hence, the use of an input feature without
eliminating irrelevant levels of details induces unnecessary noise into the model.
486 Y. Wu et al.
In our model, the input features consisted of the past weekly dengue cases and
eight daily weather features, namely rainfall intensity, cloudiness (maximum, mean
and minimum), temperature and humidity. As the dengue cases were only recorded
when a patient was diagnosed and the incubation period of dengue virus is known
to differ across individuals, daily dengue cases do not accurately represent the out-
break. Thus, summing the daily cases into weekly buckets reduces such inaccuracy.
As such, all daily weather factors were averaged into weekly means after performing
wavelet decomposition as described in Section 41.2.2 below.
In order to separate the different levels of details encoded in the input elements of
the system, wavelet decomposition of the time-series inputs was chosen. WT are
similar to Fourier transforms. The fundamental difference is that instead of deriv-
ing the spectral density of a give signal, WT projects the signal onto functions
called wavelets with dilation parameter a and translation parameter b of their mother
wavelet:
1 t b
a;b .t/ D p (41.1)
a a
subject to the condition of:
Z
C1 2
j‰.!/j
‰o D d! is bounded (41.2)
j!j
0
Z
C1 Z
C1
1
f .t/ D Ta;b .f / .a; b/da=a2 dc (41.4)
‰Ø
1 0
If a finite number of terms are taken to represent the original signal, Eq. (41.4) can
be rewritten as a multi-resolution approximation [4]:
41 Mining Weather Information in Dengue Outbreak 487
^ X
N
f .t/ D ‰i .ai ; bi / (41.5)
i D1
^ X
N
f .t/ D A0 D AN C Dn ; where An D AnC1 C DnC1 (41.6)
nD1
While the decomposed signals resemble the fundamental ideas of the use of time-
lags, it separates signals at different levels which in turn separate potential noise
from signals.
After Wavelet decomposition, each signal input into the model generates a series of
daughter Wavelets inputs. As mentioned earlier, not all levels of detail of every input
signal may be relevant. Thus, it is important to filter sub-signals that are irrelevant
to the output.
For this purpose, we implemented a biologically-inspired, robust, general-
purpose optimisation algorithm: the genetic algorithm (GA). In the GA, each input
feature is represented as an allele in a chromosome [6]. The evolution of chromo-
somes begins from the original (randomly initialized) population and propagates
down generationally. During each generation, the fitness of all chromosomes in the
population is evaluated. Then, competitive selection and modification are performed
to generate a new population. This process is repeated until a predefined limit
or convergence is achieved. In order to confirm and validate the feature selection
model and to generate a better representative set of contributing factors, a 10 cross-
validation technique was introduced into the methodology. The converged solution
of weights in zeros and ones represented the non-selection/selection of a particular
sub-signal by the GA.
The core component of GA is to construct the learning classifier. Among the super-
vised learning techniques used to correlate input and output data, the Support Vector
Machine (SVM) stands firmly as one of the most successful methods. The SVM
maps input vectors to a higher dimensional space where a maximum hyperplane
is drawn to separate classes [7]. Given a set of input f.xi ; yi /g, where yi is the
corresponding output f 1; 1g to the ith input feature vector xi fxi 2 Rn g. ˆ.xi /
488 Y. Wu et al.
maps xi into a higher dimensional space where a possible linear estimate function
can be defined:
!
X
f .x/ D w ˆ.x/ C b D sgn
T
˛i yi K .xi ; x/ C b (41.7)
i "SVs
where K.xi ; x/ D ˆ.x/T ˆ.xi / is the kernel function, SVs are the Support Vectors,
˛ is the Lagrange multipliers and b is the bias. The SVM problem can be formulated
as a quadratic programming problem by optimising ˛ as:
The SVM problem can be formulated as a quadratic programming problem by
optimising:
X l
1 XX
l l
min ˛i ˛i ˛j yi yj K.xi ; xj / (41.8)
˛i
iD1
2 iD1 jD1
After the SVM-based GA selects the most relevant features, the input is fed into a
regression learning algorithm which generates a prediction model. An SVM regres-
sion (SVR) method based on the material discussed in Section 41.2.4 above can be
used for non-linear learning by introducing a threshold " analogous to the function
of C in SVM Classification [8].
To assess the quality of the final model, we compared predicted results to the actual
data using two simple metrics: the Mean Squared Error (MSE) and the Coefficient
of Determination (R2 ). A small MSE and large R2 .0 <D R2 <D 1/ suggest a better
modelling of the data.
Because Dengue fever occurs throughout the year in Singapore, we gathered the
dengue case and weather data in Singapore from 2001 to 2007. As shown in
Fig. 41.2 below, this set of dengue data includes three significant peaks at Weeks
190, 245 and 339.
41 Mining Weather Information in Dengue Outbreak 489
Dengue cases over these years in Singapore are reported weekly while climate
data from weather stations were collected daily. We corrected the resolution dif-
ference by first performing Wavelet transform of the weather data before sorting
the samples into weekly bins. Additionally, we discarded the finest three levels of
detail generated by Wavelet transforms because the down-sampling by seven makes
log2 .7/ 3 levels of detail redundant. The mother Wavelet employed here is the
Daubechies [9] 6 Wavelet which was selected because reconstruction of the signal
after suppression of the finest three levels of detail showed the best approximation
of the original signal compared to Daubechies 2 and 4.
Before the GA and SVM were applied, all the input and output vectors were
normalised. The normalisation process ensured that none of the variables over-
whelmed the others in terms of the distance measure, which was critical for proper
feature selection. In our experiment, a zero-mean unit-variance normalisation was
employed to give equal distance measures to each feature and levels of detail.
The current dengue cases and decomposed current weather series were used as
input features with future dengue cases as output into GA with 10 cross-validations
to generate a general model of the data represented. A feature was finally selected
if and only if it was selected in at least seven of the ten validation cycles. The
method employed in GA was a tournament with the aim of minimising the root
mean squared error. A recursive method was used to tune the parameter C for best
performance (C D 1 in this case). There are only 13 terms selected by GA out of
the 63 input signals, tabulated in Table 41.1 below. As compared to the 50 features
selected by the model in [3], the dimensionality required to describe the number
of dengue cases was significantly smaller in this model. In the Table, the first term
of a feature denotes its most general trend .AN ) while the second to seventh term,
490 Y. Wu et al.
Fig. 41.3 Prediction results of SVR with and without GA (a) SVR without GA (2001–2006),
(b) SVR with GA (2001–2006), (c) SVR without GA (2002–2007), (d) SVR with GA (2002–2007)
lated in Table 41.2 and plotted in Fig. 41.3 (red and blue lines denote actual and
predicted No. of cases respectively). The performance statistics show that mapping
the weather data into the RBF space (which is non-linear) produced a high corre-
lation between the input data and the actual dengue cases, reinforcing our claim of
non-linearity.
Statistics in Table 41.2 show that on average, SVR with GA yields a 70% reduc-
tion in MSE and a 25% increase in correlation with the actual data as compared
to that of SVR without GA. From this observation, we can infer that a significant
492 Y. Wu et al.
number of irrelevant input features were removed by the GA, supporting our claim
that not all levels of detail influenced the spread of dengue.
In order to test the stability of our proposed model, a further experiment was
carried out. We extended the testing samples to two consecutive years (2006–2007)
using the same training data (2001–2005). The result is tabulated in Table 41.3 in
comparison with the previous two experiments.
The performance statistics in Table 41.3 clearly show that the difference in MSE
is minimal .< 0:01/ and the correlation coefficient is within 2% of each other. These
empirical results suggest that the proposed model is capable of the followings:
1. Producing relatively reliable predictions for up to 2 years ahead
2. Constructing a stable model instance using only 5 years of data
41.4 Conclusion
In this chapter, a novel model for predicting future dengue outbreak was presented.
We proposed the use of Wavelet decomposition as a mean of data pre-processing.
This technique, together with SVM-based GA feature selection and SVR learning,
proved to be effective for analysing a special case of infectious diseases – dengue
virus infection in Singapore. Our empirical results strongly suggest that weather fac-
tors influence the spread of dengue in Singapore. However, the results also showed
that mean temperature and monthly seasonality contribute minimally to outbreaks.
Analysis of the results in this chapter gives rise to further epidemiological research
questions on dengue transmissions:
As low frequency (higher order) terms denote general trends of the signal, the
results tabulated in Table 41.1 indicate that dengue incidence seems to be closely
related to the rainfall and humidity trends. Can this be proven epidemiologically?
Although the rise in average temperature is thought to accelerate mosquito’s
breeding cycle, why was there no low frequency terms of temperature repre-
sented in the model? Moreover, why are some moderate frequencies (between
2 and –4 months) of both maximum and minimum temperatures selected in the
model?
As Singapore is an area surrounded by sea, does this research finding fit well into
other regions with dengue prevalence?
41 Mining Weather Information in Dengue Outbreak 493
References
1. World Health Organisation: Dengue Reported Cases. 28 July 2008. WHO <http://www.searo.
who.int/en/Section10/Section332 1101.htm>
2. Andrick, B., Clark, B., Nygaard, K., Logar, A., Penaloza, M., and Welch, R., “Infectious
disease and climate change: detecting contributing factors and predicting future outbreaks”,
IGARSS ’97: 1997 IEEE International Geoscience and Remote Sensing Symposium, Vol. 4,
pp. 1947–1949, Aug. 1997.
3. Fu, X., Liew, C., Soh, H., Lee, G., Hung, T., and Ng, L.C. “ Time-series infectious disease data
analysis using SVM and genetic algorithm”, IEEE Congress on Evolutionary Computation
(CEC) 2007, pp. 1276–1280, Sept. 2007.
4. Mallat, S.G., “Multiresolution approximations and wavelet orthonormal bases of L2 .R/”,
Transactions of the American Mathematical Society, Vol. 315, No. 1, pp. 69–87, Sept. 1989.
5. Favier, C., Degallier, N., Vilarinhos, P.T.R., Carvalho, M.S.L., Yoshizawa, M.A.C., and Knox,
M.B., “Effects of climate and different management strategies on Aedes aegypti breeding
sites: a longitudinal survey in Brası́lia (DF, Brazil)”, Tropical Medicine and International
Health 2006, Vol. 11, No. 7, pp. 1104–1118, July 2006.
6. Grefenstette, J.J., Genetic algorithms for machine learning, Kluwer, Dordrecht, 1993.
7. Burges, C.J.C., “A tutorial on support vector machines for pattern recognition”, Data Mining
and Knowledge Discovery, Vol. 2, No. 2, pp. 955–974, 1998.
8. Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., and Vapnik, V., “Support Vector Regres-
sion Machines”, Advances in Neural Info Processing Systems 9, MIT Press, Cambridge,
pp. 155–161, 1996.
9. Daubechies, I. “Orthonormal Bases of Compactly Supported Wavelets.” Communications on
Pure and Applied Mathematics, Vol. 41, pp. 909–996, 1988.
10. National Environment Agency, Singapore: Climatology of Singapore. 20 Aug. 2007. NEA,
Singapore. <http://app.nea.gov.sg/cms/htdocs/article.asp?pid=1088>
11. Wu, Y., Lee, G., Fu, X., and Hung, T., “Detect Climatic Factors Contributing to Dengue Out-
break based on Wavelet, Support Vector Machines and Genetic Algorithm”, World Congress
on Engineering 2008, Vol. 1, pp. 303–307, July 2008.
12. Bartley, L.M., Donnelly, C.A., and Garnett, G.P., “Seasonal pattern of dengue in endemic
areas: math models of mechanisms”, Transactions of the Royal Society of Tropical Medicine
and Hygiene, pp. 387–397, July 2002.
13. Shon, T., Kim, Y., Lee, C., and Moon, J., “A machine learning framework for network anomaly
detection using SVM and GA”, Proceedings from the Sixth Annual IEEE SMC Information
Assurance Workshop 2005, pp. 176–183, June 2005.
494 Y. Wu et al.
14. Nakhapakorn, K. and Tripathi, N. K., “An information value based analysis of physical and cli-
matic factors affecting dengue fever and dengue haemorrhagic fever incidence”, International
Journal of Health Geographics, Vol. 4, No. 13, 2005.
15. Ooi, E., Hart, T., Tan, H., and Chan, S., “Dengue seroepidemiology in Singapore”, The Lancet,
Vol. 357, No. 9257, pp. 685–686, Mar 2001.
16. Ministry of Health, Singapore: Weekly Infectious Diseases Bulletin. 28 July 2008. M.O.H.
Singapore. <http://www.moh.gov.sg/mohcorp/statisticsweeklybulletins.aspx>
17. Gubler, D.J., “Dengue and dengue hemorrhagic fever”, Clinical Microbiology Reviews,
Vol. 11, No. 3, pp. 480–496, July 1998.
Chapter 42
PC Tree: Prime-Based and Compressed Tree
for Maximal Frequent Patterns Mining
Since the introduction of the Apriori algorithms [1], frequent patterns mining plays
an important role in data mining research for over a decade. Frequent patterns are
itemsets or substructures that exist in a dataset with frequency no less than a user
specified threshold.
Let L D fi1 ; i2 : : : in g be a set of items. Let D be a set of database transactions
where each transaction T is a set of items and jDj be the number of transactions
in D. Given P D fij : : : ik g be a subset of L (j k and 1 j, k n) is called a
M. Nadimi-Shahraki (B)
Department of Computer Engineering, Islamic Azad University, Najafabad branch, Iran and PhD
student of Computer Science, Faculty of Computer Science and Information Technology, Putra
University of Malaysia, 43400 UPM, Selangor, Malaysia
E-mail: admin1@iaun.ac.ir
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 495
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
496 M. Nadimi-Shahraki et al.
Many efficient algorithms have been introduced to solve the problem of maximal
frequent pattern mining more efficiently. They are almost based on three funda-
mental frequent patterns mining methodologies: Apriori, FP-growth and Eclat [5].
Mostly, they traverse the search space to find MFP. The key to an efficient traversing
is the pruning techniques which can remove some branches in the search space. The
pruning techniques can be categorized into two groups:
SUBSET FREQUENCY PRUNING : T HE ALL SUBSETS OF ANY FREQUENT PATTERN
ARE PRUNED BECAUSE THEY CAN NOT BE MAXIMAL FREQUENT PATTERN .
SUPERSET INFREQUENCY PRUNING : T HE ALL SUPERSETS OF ANY INFREQUENT
PATTERN ARE PRUNED BECAUSE THEY CAN NOT BE FREQUENT PATTERN .
The Pincer-Search algorithm [6] uses horizontal data layout. It combines a bottom-
up and a top down techniques to mine the MFP. However search space is traversed
without an efficient pruning technique. The MaxMiner algorithm [4] uses a breadth-
first technique to traverse of the search space and mine the MFP. It makes use of
a look ahead pruning strategy to reduce database scanning. It prunes the search
space by both subsets frequency and supersets infrequency pruning. The Depth
Project [7] finds MFP using a depth first search of a lexicographic tree of patterns,
and uses a counting method based on transaction projections. The DepthProject
demonstrated an efficient improvement over previous algorithms for mining MFP.
The Mafia [2] extends the idea in DepthProject. It uses a search strategy has been
42 PC Tree 497
This research proposes a new method to mine all MFP in only one database scan
efficiently. The method introduces an efficient database encoding technique, a novel
tree structure called PC Tree to capture transactions information and PC Miner
algorithm to mine MFP.
Y
k
TV t id D pr T D .tid; X/ (42.1)
j
The encoding technique utilizes Eq. (42.1) based on simple following definitions.
A positive integer N can be expressed by unique product
N D p1m1 p2m2 : : : prmr where pi is prime number,
p1 p2 pr and mi is a positive integer, called the multiplicity of pi [12].
For example, N D 1; 800 D 23 32 52 . Here we restrict the multiplicity to mi D 1
because there is no duplicated item in transaction T.
To facilitate the process of the database encoding technique used in our method,
let’s examine it through an example. Let item set L D fA, B, C, D, E, Fg and the
transaction database, DB, be the first two columns of Table 42.1 with eight transac-
tions. The fourth column of Table 42.1 shows TVtid computed for all transactions.
Using tree structure in mining algorithms makes two possibilities to enhance the
performance of mining. Firstly, data compressing by well-organized tree structures
like FP-tree. Secondly, reducing search space by using pruning techniques. Thus
the tree structures have been considered as a basic structure in previous data mining
research [5, 9, 13]. This research introduces a novel tree structure called PC Tree
(Prime-based encoded and Compressed Tree). The PC Tree makes use of both
possibilities data compressing and pruning techniques to enhance efficiency.
A PC Tree includes of a root and some nodes that formed sub trees as children
of the root or descendants. The node structure consisted mainly of several different
fields: value, local-count, global-count, status and link. The value field stores TV
to records which transaction represented by this node. The local-count field set by
1 during inserting current TV and it is increased by 1 if its TV and current TV
are equal. The global-count field registers support of pattern P which presented by
its TV.
In fact during of insertion procedure the support of all frequent and infrequent
patterns is registered in the global-count field. It can be used for interactive mining
where min sup is changed by user frequently [13]. The status field is to keep track-
42 PC Tree 499
{} {} {}
TID TV
2 2730 (b) After insert TIDs 1-2 (c) After insert TID 3 (d) After insert TIDs 4-5
3 66
4 770 {} {} {}
5 455
6 910
2310:1 2730:1 2310:1 2730:1 2310:1 2730:1
7 70
8 455
66:1 770:1 910:1 66:1 770:1 910:1 66:1 770:1 910:1
(a) TVs of the DB
ing of traversing. When a node visited in the traversing procedure the status field is
changed from 0 to 1. The link field is to form sub trees or descendants of the root.
Figure 42.1 shows step by step construction of PC Tree for transactions shown
in Table 42.1 which summarized in Fig. 42.1a.
The construction operation mainly consists of insertion procedure that inserts
TV(s) into PC Tree based on definitions below:
Definition 42.1. I F TV OF NODE nr AND ns IS EQUAL T HEN r D s. I NSER -
TION PROCEDURE INCREASES LOCAL – COUNT FIELD OF NODE nr BY 1 IF THE
CURRENT TV IS EQUAL WITH TV OF nr .
Figure 42.1b shows insertion of the first and second transactions. The second TV
with value 2,730 can not be divided by the first TV with value 2,310 and it creates
a new descendant using definition 2 and 3. Transactions 3–6 are inserted into their
descendant based on definition 2 shown in Fig. 42.1c–e. Insertion of the seventh
transaction applies definition 4 when TV 70 is belonged to two descendants (second
descendant shown in the red and bold line) shown in Fig. 42.1f. The TV of eighth
500 M. Nadimi-Shahraki et al.
transaction with value 455 is equal with the fourth TV, then the local-count field
of forth TV is increased by 1 using definition 1 shown in Fig. 42.1g (shown in
underline).
Each TV in PC Tree represents a pattern P and the support of pattern P or S (P)
is registered in the global-count field. Given pattern P and Q have been presented by
TVP and TVQ respectively, the PC Tree has some nice below properties:
Property 42.1. The S (P) is computed by traversing only descendants of TVP.
Property 42.2. P and Q belong to descendant R and S (P) < S (Q) iff TVP can be
divided by TVQ.
Property 42.4. Nodes are arranged according to TV order, which is a fixed global
ordering. In fact the PC Tree is an independent-frequency tree structure.
Property 42.5. important procedures are almost done only by two simple math-
ematic operations product and division. Obviously using mathematic operation
enhances the performance instead of string operation.
A 2 6
B 3 3
C 5 7
D 7 7
E 11 3
F 13 4
{}
2310:1 2730:1
(A, B, E ): 1 (A, C, D, E): 1 (A, C, D, F): 1
(a) The completed PC_Tree using TVs (b) The completed PC_Tree using patterns
pruned – the superset infrequency pruning. Given has been changed from 5 to 2
then all items are frequent. While the PC Miner traverses left sub tree or descendant
of node (A, B, C, D, E), node (A, C, D, E) is found as a maximal frequent – the
property 3. Then all its subsets (here is only node (A, C, D)) are pruned – the subset
frequency pruning.
In this section, the accuracy and performance of the method is evaluated. All the
experiments are performed on PC with CPU Intel P4 2.8 GHz, 2 GB main mem-
ory, and running Microsoft Windows XP. All the algorithms are implemented using
Microsoft Visual CCC6:0.
In first experiment, the Synthetic data used in our experiments are generated
using IBM data generator which has been used in most studies on frequent patterns
mining. We generate five datasets with number of items 1,000, average transaction
length ten and number of transaction 1,000–10,000 that called D1, D2, D4, D8,
and D10 respectively. In this experiment, the compactness of our database encoding
technique is verified separately. In fact all TVs transformed are stored in a flat file
502 M. Nadimi-Shahraki et al.
2500
Numbers of MFP
2000
1500
1000
500
0
5 4 3 2 1
Min_sup %
called encoded file then size of this file is compared with the size of original dataset.
It shows good results and reducing the size of these datasets more than half as shown
in Fig. 42.3.
In second experiment, we show accuracy and correctness of the method. The
test dataset T10.I6.D10K is also generated synthetically by the IBM data generator.
Figure 42.4 shows the numbers of maximal frequent patterns discovered for the tests
at varying min sup on this dataset.
In third experiment, in order to evaluate the scalability of our new algorithm,
we applied it as well as Apriori to four IBM dataset generated in experiment 1.
Figure 42.5 shows the performance of two algorithms as a function of the num-
bers of transactions for min sup 2%. When number of transaction is less than 5,000
Apriori slightly outperforms the PC Miner in execution time. When the numbers of
transactions are increased, the execution time of Apriori degraded as compared to
the PC Miner.
Fourth experiment is to compare the performance of the PC-Miner with the Apri-
ori and Flex algorithms on the dataset mushroom, which is a real and dense dataset
contains characteristics of various species of mushrooms [14]. In dataset mushroom,
there are 8,124 transactions, the number of items is 119 and the average transaction
length is 23. Figure 42.6 shows the PC Miner algorithm outperforms both Apriori
and Flex algorithms.
42 PC Tree 503
8
Apriori
7
PC_Miner
1.2 Flex
1 PC_Miner
Time (100 sec)
0.8 Apriori
0.6
0.4
0.2
0
10 8 6 4 2 1
Min_sup %
In this paper, we proposed a new method to discover maximal frequent patterns effi-
ciently. Our method introduced an efficient database encoding technique, a novel
tree structure called Prime-based encoded and Compressed Tree or PC Tree and
also PC Miner algorithm. The experiments verified the compactness of the database
encoding technique. The PC Tree presented well-organized tree structure with nice
properties to capture transaction information. The PC Miner reduced the search
space using a combined pruning strategy to traverses the PC Tree efficiently. The
experimental results showed the PC Miner algorithm outperforms the Apriori and
Flex algorithms. For interactive mining where min sup can be changed frequently,
the information kept in the PC Tree can be used and tree restructuring is no needed.
In fact this research introduced a number-theoretic method to discover MFP
that made use of prime number theory and simple computation based on division
operation and a combined pruning strategy. There are some directions for future
improvement optimal data structures, better memory management and pruning
method to enhance the efficiency. This method also can be extended for incremental
frequent patterns mining where transaction database is updated or minimum support
threshold can be changed [13].
504 M. Nadimi-Shahraki et al.
References
1. R. Agrawal and R. Srikant, Fast algorithms for mining association rules, 20th International
Conference. Very Large Data Bases, VLDB, 1215: 487–499 (1994).
2. D. Burdick, M. Calimlim, and J. Gehrke, Mafia: A maximal frequent itemset algorithm for
transactional databases, 17th International Conference on Data Engineering: pp. 443–452
(2001).
3. S. Bashir and A.R. Baig, HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid
Database Representation Approach, 9th International Multitopic Conference, IEEE INMIC:
pp. 1–7 (2005).
4. R.J. Bayardo Jr, Efficiently mining long patterns from databases, ACM SIGMOD International
Conference on Management of Data: pp. 85–93 (1998).
5. J. Han, H. Cheng, D. Xin, and X. Yan, Frequent pattern mining: Current status and future
directions, Data Mining and Knowledge Discovery, 15(1): 55–86 (2007).
6. D.I. Lin and Z.M. Kedem, Pincer-Search: A New Algorithm for Discovering the Maximum
Frequent Set, Advances in Database Technology–EDBT’98: 6th International Conference on
Extending Database Technology, Valencia, Spain (1998).
7. R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad, Depth first generation of long patterns,
Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining:
pp. 108–118 (2000).
8. R. Rymon, Search through systematic set enumeration, Third International Conference on
Principles of Knowledge Representation and Reasoning: pp. 539–550 (1992).
9. N. Mustapha, M.N. Sulaiman, M. Othman, and M.H. Selamat, FAST DISCOVERY OF LONG
PATTERNS FOR ASSOCIATION RULES, International Journal of Computer Mathematics,
80(8): 967–976 (2003).
10. F. Chen, and M. Li, A Two-Way Hybrid Algorithm for Maximal Frequent Itemsets Mining,
Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007.
11. M.J. Zaki, Scalable algorithms for association mining, IEEE Transactions on Knowledge and
Data Engineering, 12(3): 372–390 (2000).
12. T.T. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to algorithms: MIT Press,
Cambridge, MA (1990).
13. M. Nadimi-Shahraki, N. Mustapha, M.N. Sulaiman, and A. Mamat, Incremental updating of
frequent pattern: basic algorithms, Second International Conference on Information Systems
Technology and Management (ICISTM 08) (1): 145–148 (2008).
14. C.B.a.C. Merz, UCI Repository of Machine Learning, 1998.
Chapter 43
Towards a New Generation of Conversational
Agents Based on Sentence Similarity
Abstract The Conversational Agent (CA) is a computer program that can engage in
conversation using natural language dialogue with a human participant. Most CAs
employ a pattern-matching technique to map user input onto structural patterns of
sentences. However, every combination of utterances that a user may send as input
must be taken into account when constructing such a script. This chapter was con-
cerned with constructing a novel CA using sentence similarity measures. Examining
word meaning rather than structural patterns of sentences meant that scripting was
reduced to a couple of natural language sentences per rule as opposed to poten-
tially 100s of patterns. Furthermore, initial results indicate good sentence similarity
matching with 13 out of 18 domain-specific user utterances as opposed to that of the
traditional pattern matching approach.
43.1 Introduction
The concept of ‘intelligent’ machines was first conceived by the British mathemati-
cian Alan Turing [1]. The imitation game, known as the ‘Turing Test’, was devised
to determine whether or not a computer program was ‘intelligent’. This led to the
development of the Conversational Agent (CA) [1] – a computer program that can
engage in conversation using natural language dialogue with a human participant.
CAs can exist in two forms: ‘Embodied’ agents [2] possess an animated huma-
noid body and exhibit attributes such as facial expressions and movement of eye
gaze. ‘Linguistic’ agents [3, 4] consist of spoken and/or written language without
K. O’Shea (B)
The Intelligent Systems Group, Department of Computing and Mathematics, Manchester
Metropolitan University, Chester Street, Manchester M1 5GD
E-mail: k.oshea@mmu.ac.uk
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 505
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
506 K. O’Shea et al.
embodied communication. One of the earliest text-based CAs developed was ELIZA
[3]. ELIZA was capable of creating the illusion that the system was actually listen-
ing to the user simply by answering questions with questions. This was performed
using a simple pattern matching technique, mapping key terms of user input onto a
suitable response. Further advancements on CA design led to PARRY [4], capable
of exhibiting personality, character, and paranoid behavior by tracking its own inter-
nal emotional state during a conversation. Unlike ELIZA, PARRY possessed a large
collection of tricks, including: admitting ignorance by using expressions such as “I
don’t know” in response to a question; changing the subject of the conversation or
rigidly continuing the previous topic by including small stories about the theme [4].
CAs can also engage in social chat and are capable of forming relationships with a
user. ALICE [5], an online chatterbot and InfoChat [6] are just two such examples.
By conversing in natural language these CAs are able to extract data from a user,
which may then be used throughout the conversation.
Considerable research has been carried out on the design and evaluation of
embodied CAs [2,7]; however, little work appears to have been focused on the actual
dialogue. This paper will concentrate on text-based CAs and the development and
evaluation of high-quality dialogue.
Most text-based CA’s scripts are organized into contexts consisting of a number
of hierarchically organized rules. Each rule possesses a list of structural patterns
of sentences and an associated response. User input is then matched against the
patterns and the pre-determined response is sent as output. InfoChat [6] is one
such CA capable of interpreting structural patterns of sentences. However, every
combination of utterances must be taken into account when constructing a script –
an evidently time-consuming, high maintenance task, which undoubtedly suggests
scope for alternative approaches. It is, therefore, envisaged that the employment of
sentence similarity measures could reduce and simplify CA scripting by using a few
prototype natural language sentences per rule.
Two successful approaches to the measurement of sentence similarity are: Latent
Semantic Analysis (LSA) [8] and Sentence Similarity based on Semantic Nets and
Corpus Statistics [9]. LSA is a theory and method for extracting and representing
the contextual-usage meaning of words by statistical computations applied to a large
corpus of text [8]. A word by context matrix is formed based on the number of
times a given word appears in a given set of contexts. The matrix is decomposed
by Singular Value Decomposition (SVD) into the product of three other matrices,
including the diagonal matrix of singular values [10]. This dimension reduction step
collapses the component matrices so that words that occurred or did not occur in
some contexts now appear with a greater or lesser frequency [8]. Reconstruction of
the original matrix enables LSA to acquire word knowledge among large numbers of
contexts. Although LSA makes no use of syntactic relations, it does, however, offer
close enough approximations of people’s knowledge to underwrite and test theories
of cognition. Sentence Similarity based on Semantic Nets and Corpus Statistics will
be employed as the measure in this research and will be described in further detail
in Section 43.2.
43 Towards a New Generation of Conversational Agents Based on Sentence Similarity 507
This chapter is organized as follows: Section 43.2 will describe and illustrate the
sentence similarity measure; Section 43.3 will describe a traditional CA scripting
approach; Section 43.4 will describe CA scripting using sentence similarity; Sec-
tion 43.5 will present an experimental analysis of the two approaches; Section 43.6
will evaluate the results and Section 43.7 will conclude and highlight areas for
further work.
Sentence Similarity based on Semantic Nets and Corpus Statistics [9] is a mea-
sure that focuses directly on computing the similarity between very short texts
of sentence length. Through the use of a lexical/semantic knowledge-base such
as WordNet [11], the length of separation between two words can be measured,
which in turn, can be used to determine word similarity. The synset – a collection of
synonyms – at the meeting point of the two paths is called the subsumer. The depth
of the subsumer is similarly measured by counting the levels from the subsumer to
the top of the hierarchy. Li et al. [9, 12] proposed that the similarity between two
words be a function of the attributes: path length and depth. The algorithm initiates
by combining the two candidate sentences (T1 and T2) to form a joint word set
using only distinct words. For example:
T1 = Mars is a small red planet
T2 = Mars and Earth orbit the sun
A joint word set ‘T’ is formed where:
T = Mars is a small red planet and earth orbit the sun
As a result, each sentence is represented by the use of the joint word set with no
surplus information. Raw semantic vectors are then derived for each sentence using
the hierarchical knowledge-base WordNet [11], in order to determine the separation
between words. Taking a non-linear transfer function as an appropriate measure,
the following formula denotes a monotonically decreasing function of l, where l D
path length between words and α is a constant.
As for the depth of the subsumer, the relationship of words at varying levels of the
hierarchy must be taken into consideration. For example, words at the upper layers
are far more general and less semantically similar than words at lower layers [9].
Therefore, subsuming words at upper layers must be scaled down whereas words at
lower layers must be scaled up, resulting in a monotonically increasing function of
h, where h D depth of subsumer and ˇ is a constant.
As such, the raw similarity s(w1, w2) between two words is calculated as:
Finally, the semantic similarity Ss between two sentences, s1 and s2, is calcula-
ted as: p p
S s D si1:si 2= si1= si 2 (43.5)
where si1 is the resultant semantic vector of sentence 1 and si2 is the resultant
semantic vector of sentence 2.
Word order also plays an active role in sentence similarity. Each word is assigned
a unique index number which simply represents the order in which the word appears
in the sentence. For example, take the following sentences denoted T1 and T2:
T1 = The cat ran after the mouse
T2 = The mouse ran after the cat
A joint word set ‘T’ is formed where:
T = The cat ran after the mouse
Each sentence is than compared to that of the joint word set. If the same word
is present – or if not, the next most similar word – then the corresponding index
number from T1 will be placed in the vector, r1. As such, the word order vectors r1
and r2 for the example sentence pair T1 and T2 would be formed as follows:
r1 D f123456g
r2 D f163452g
where ı takes into account that word order plays rather a less significant role when
determining sentence similarity.
43 Towards a New Generation of Conversational Agents Based on Sentence Similarity 509
The new proposed approach using sentence similarity will maintain the same script
as that of the traditional approach; however, all patterns will be replaced with nat-
ural language sentences. This considerably reduces the burden and skill required to
produce CA scripts. Through the use of a sentence similarity measure [9], a match
is determined between the user’s utterance and the natural language sentences. The
highest ranked sentence is fired and sent as output. Figure 43.1. and the following
steps 1–3 illustrate the procedure.
1. Natural language dialogue is received as input, which forms a joint word set with
each rule from the script using only distinct words in the pair of sentences. The
script is comprised of rules consisting of natural language sentences.
2. The joint word set forms a semantic vector using a hierarchical semantic/lexical
knowledge-base [11]. Each word is weighted based on its significance by using
information content derived from a corpus.
3. Combining word order similarity with semantic similarity the overall sentence
similarity is determined. The highest ranked sentence is ‘fired’ and sent as output.
The proposed scripts are simply constructed by assigning a number of prototype
natural language sentences per rule. For example, one such rule may be constructed
as follows:
<Rule 01>
I need help
I do not understand
r: How can I help you
where s = sentence and r = response.
The precise number of sentences per rule will start at one and increase to ‘n’
where ‘n’ is determined by experimental analysis. However, it is expected that the
value of ‘n’ will be small and significantly less than the number of patterns used in
traditional scripting methodologies.
Sentence
Semantic
Similarity
User Highest
Joint
input ranked
Word
and sentence
Set
script fired
Word
Order
Similarity
43.5.1 Domain
The real world domain is concerned with advising students at University on debt
management and the payment of tuition fees. For the purpose of experimentation,
one script, which consists of 18 rules, was taken from a substantially extensive script
developed by Convagent Ltd. [6]. These rules were representative of the domain and
its complexity.
43.5.2 Experiments
however, replaces the above patterns for three generic natural language sentences:
“I need help”, “I do not understand” and “This is confusing”.
The first experiment examined the traditional approach using structural patterns of
sentences [6], while the second approach examined the new proposed approach
using natural language sentences. The experiments entailed sending as input 18
domain-specific user utterances. The 18 selected user utterances were deemed to be
a good sample of the domain. The resulting output, that is the fired pattern/sentence,
for the 18 cases are displayed in Table 43.2.
The results of the user utterances are: The outputs generated after the input of
user utterances 3, 6, 8, 10, 13, 15, 16, and 18 indicate a correct firing by approach
one. As a result, approach one appears to have found a structurally comparable
match. The outputs generated after the input of user utterances 1, 2, 4, 5, 7, 8, 10,
12, 13, 14, 15, 16, and 18 indicate a correct firing by approach two. As a result,
approach two appears to have found sufficient semantic similarity between the user
utterances and the corresponding natural language sentences.
The outputs generated after the input of user utterances 1, 2, 4, 5, 7, 9, 11, 12,
14, and 17 indicate a miss-firing by approach one. As a result, approach one appears
to have failed to find an identical or comparable match to that of the user utterance.
The outputs generated after the input of user utterances 3, 6, 9, 11, and 17 indicate
a miss-firing by approach two. As a result, approach two appears to have failed to
identify sufficient semantic similarity between the user utterances and the natural
language sentences.
In the cases where approach one miss-fired, this was due to the script not pos-
sessing an identical or comparable structural match. This, however, may be rectified
by incorporating the missing patterns into the script. In the cases where approach
two miss-fired, this was due to insufficient sentences representative of that specific
rule. This, however, can be rectified by incorporating additional natural language
sentences into the script. Furthermore, the sentence similarity measure could be
adjusted so as to consider other parts-of-speech.
In totality, approach one correctly matched 8 out of 18 user utterances, whereas
approach two correctly matched 13 out of 18 user utterances. Typically the number
of patterns per rule for the traditional pattern script was between 50 and 200. In
contrast, the average number of sentences per rule for the natural language script
was three. Moreover, a considerable amount of time is required to write and maintain
the scripts of approach one as opposed to that of approach two in which scripting is
greatly simplified.
Most CAs employ a pattern-matching technique to map user input onto structural
patterns of sentences. However, every combination of utterances that a user may
send as input must be taken into account when constructing such a script. This paper
was concerned with constructing a novel CA using sentence similarity measures.
Examining word meaning rather than structural patterns of sentences meant that
scripting was reduced to a couple of natural language sentences per rule as opposed
to potentially 100s of patterns. Furthermore, initial results indicate good sentence
similarity matching with 13 out of 18 domain-specific user utterances as opposed to
that of the traditional pattern matching approach.
Further work will entail considerable development of the new proposed approach.
The aim will be to incorporate the use of context switching whereby each context
defines a specific topic of conversation. The CA will be robust, capable of tolerating
514 K. O’Shea et al.
a variety of user input. The new approach will become easily adaptable via auto-
matic or other means. It is intended that a user evaluation of the two approaches
to CA design will be conducted. Firstly, each approach would be subjected to a
set of domain-specific utterances. Each CA would then compute a match between
the user utterance and the rules within the scripts, firing the highest strength pat-
tern/sentence as output. A group of human subjects would evaluate the scripts and
their corresponding outputs in order to judge whether the correct pattern/sentence
had been fired. This would provide a means for evaluating the opposing approaches
and their scripting methodologies.
References
1. A. Turing, “Computing Machinery and Intelligence” Mind, Vol. 54 (236), 1950, pp. 433–460.
2. D. W. Massaro, M. M. Cohen, J. Beskow, S. Daniel, and R. A. Cole, “Developing and
Evaluating Conversational Agents”, Santa Cruz, CA: University of California, 1998.
3. J. Weizenbaum, ELIZA – “A Computer Program for the Study of Natural Language Com-
munication between Man and Machine”, Communications of the Association for Computing
Machinery, Vol. 9, 1966, pp. 36–45.
4. K. Colby, “Artificial Paranoia: A Computer Simulation of Paranoid Process”. New York:
Pergamon, 1975.
5. R. S. Wallace, ALICE: Artificial Intelligence Foundation Inc. [Online]. Available:
http://www.alicebot.org (2008, February 01).
6. D. Michie and C. Sammut, Infochat TM Scripter’s Manual. Manchester: Convagent, 2001.
7. G. A. Sanders and J. Scholtz, “Measurement and Evaluation of Embodied Conversational
Agents”, in Embodied Conversational Agents, Chapter 12, J. Cassell, J. Sullivan, S. Prevost
and E. Churchill, eds., Embodied Conversational Agents, MIT Press, 2000.
8. T. K. Landauer, P. W. Foltz, and D. Laham, “Introduction to Latent Semantic Analysis”.
Discourse Processes, Vol. 25 (2–3), 1998, pp. 259–284.
9. Y. Li, D. McLean, Z. A. Bandar, J. D. O’Shea, and K. Crockett, “Sentence Similarity
Based on Semantic Nets and Corpus Statistics”, IEEE Transactions on Knowledge and Data
Engineering, Vol. 18 (8), 2006, pp. 1138–1149.
10. D. Landauer, D. Laham, and P. W. Foltz, “Learning Human-Like Knowledge by Singular
Value Decomposition: A Progress Report”, in Advances in Neural Information Processing
Systems 10, M. I. Jordan, M. J. Kearns, and S. A. Solla, eds., Cambridge, MA: MIT Press,
1998, pp. 45–51.
11. G. A. Miller, “WordNet: A Lexical Database for English”, Communications of the Association
for Computing Machinery, Vol. 38 (11), 1995, pp. 39–41.
12. Y. Li, Z. A. Bandar, and D. Mclean, “An Approach for Measuring Semantic Similarity between
Words using Multiple Information Sources”, IEEE Transactions on Knowledge and Data
Engineering, Vol. 15 (4), 2003, pp. 871–881.
13. C. Sammut, “Managing Context in a Conversational Agent”, Electronic Transactions on
Artificial Intelligence, Vol. 3 (7), 2001, pp. 1–7.
14. D. Michie, “Return of the Imitation Game”, Electronic Transactions in Artificial Intelligence,
Vol. 6 (2), 2001, pp. 203–221.
Chapter 44
Direction-of-Change Financial Time Series
Forecasting Using Neural Networks:
A Bayesian Approach
Andrew A. Skabar
Abstract Conventional neural network training methods find a single set of values
for network weights by minimizing an error function using a gradient descent-based
technique. In contrast, the Bayesian approach infers the posterior distribution of
weights, and makes predictions by averaging the predictions over a sample of net-
works, weighted by the posterior probability of the network given the data. The
integrative nature of the Bayesian approach allows it to avoid many of the diffi-
culties inherent in conventional approaches. This paper reports on the application
of Bayesian MLP techniques to the problem of predicting the direction in the
movement of the daily close value of the Australian All Ordinaries financial index.
Predictions made over a 13 year out-of-sample period were tested against the null
hypothesis that the mean accuracy of the model is no greater than the mean accu-
racy of a coin-flip procedure biased to take into account non-stationarity in the data.
Results show that the null hypothesis can be rejected at the 0.005 level, and that the
t-test p-values obtained using the Bayesian approach are smaller than those obtained
using conventional MLP methods.
44.1 Introduction
Financial time series forecasting has almost invariably been approached as a regres-
sion problem; that is, future values of a time series are predicted on the basis of past
values. In many cases, however, correctly predicting the direction of the change
(i.e., up or down) is a more important measure of success. For example, if a trader
A.A. Skabar
Department of Computer Science and Computer Engineering, La Trobe University, Australia,
E-mail: a.skabar@latrobe.edu.au
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 515
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
516 A.A. Skabar
where N0 is the number of inputs, N1 is the number of units in a hidden layer, wji is
a numerical weight connecting input unit i with hidden unit j , and wkj is the weight
connecting hidden unit j with output unit k. The function g is either a sigmoid
(i.e., g.x/ D .1 C exp.x//1 / or some other continuous, differentiable, nonlinear
function. For regression problems h is the identity function (i.e., h.u/ D u/, and for
classification problems h is a sigmoid.
Thus, an MLP with some given architecture, and weight vector w, provides a
mapping from an input vector x to a predicted output y given by y D f .x, w).
Given some data, D, consisting of n independent items (x1 , y1 /, . . . , (xN , yN /, the
objective is to find a suitable w. In the case of financial prediction, the raw data
usually consists of a series of values measured at regular time intervals; e.g: the
daily close value of a financial index such as the Australian All Ordinaries. The
input vector, xN , corresponds to the N th value of the series, and usually consists of
the current value, in addition to time-lagged values, or other quantities which can be
derived from the series, such as moving averages.
The conventional approach to finding the weight vector w is to use a gradient
descent method to find a weight vector that minimizes the error between the net-
work output value, f (x, w), and the target value, y. For regression problems, it is
usually the squared error that is minimized, whereas for binary classification prob-
lems it is usually more appropriate to minimise the cross-entropy error, which will
result in output values that can be interpreted as probabilities (See Bishop [9] for
a detailed discussion of error functions). This approach is generally referred to as
the maximum likelihood (ML) approach because it attempts to find the most proba-
ble weight vector, given the training data. This weight vector, wML , is then used to
predict the output value corresponding to some new input vector xnC1 .
Because of the very high level of noise present in financial time series data, there
is an overt danger that applying non-linear models such as MLPs may result in
overfitting – a situation in which a model performs well on classifying examples on
which it has been trained, but sub-optimally on out-of-sample data. While the risk of
overfitting can be reduced through careful selection of the various parameters that
control model complexity (input dimensionality, number of hidden units, weight
regularization coefficients, etc.), this usually requires a computationally expen-
sive cross-validation procedure. Bayesian methods provide an alternative approach
which overcomes many of these difficulties.
Bayesian methods for MLPs infer the posterior distribution of the weights given the
data, p.wjD/. The predicted output corresponding to some input vector xn is then
obtained by performing a weighted sum of the predictions over all possible weight
vectors, where the weighting coefficient for a particular weight vector depends on
p.wjD/. Thus, Z
yO n D f .xn ; w/ p.wjD/ d w (44.2)
518 A.A. Skabar
where f .xn , w) is the MLP output. Because p.wjD/ is a probability density func-
tion, the integral in Eq. (44.2) can be expressed as the expected value of f .xn , w)
over this density:
Z
f .xn ; w/p.wjD/ d w D Ep.wjD/ Œf .xn ; w/
(44.3)
1 X
N
' f .xn ; w/
N
i D1
Therefore the integral can be estimated by drawing N samples from the density
p.wjD/, and averaging the predictions due to these samples. This process is known
as Monte Carlo integration.
In order to estimate the density p.wjD/ we use the fact that p.wjD/ /
p.Djw/p.w/, where p.Djw/ is the likelihood, and p(w) is the prior weight dis-
tribution. Assuming that the target values, t n , are binary, then the likelihood can be
expressed as
X
p.Djw/ D exp ft n ln f .xn ; w/ C .1 t/n ln.1 f .xn ; w//g: (44.4)
n
The distribution p(w) can be used to express our prior beliefs regarding the com-
plexity of the MLP. Typically, p(w) is modelled as a Gaussian with zero mean and
inverse variance ˛:
!
˛ m=2 ˛X 2
m
p.w/ D exp wi (44.5)
2 2
i D1
where m is the number of weights in the network [7]. The parameter ˛ therefore
controls the smoothness of the function, with large values of ˛ corresponding to
smoother functions, and hence weights with smaller magnitudes. Because we do
not know what variance to assume for the prior distribution, it is common to set a
distribution of values. As ˛ must be positive, a suitable form for its distribution is
the gamma distribution [7]. Thus,
.a=2/a=2 a=21
p.˛/ D ˛ exp.˛a=2/ (44.6)
.a=2/
where the a and are respectively the shape and mean of the gamma distribution,
and are set manually. Because the prior depends on α, Eq. (44.2) should be modified
such that it includes the posterior distribution over the ˛ parameters:
Z
yO D f .xn ; w/ p.w; ˛jD/ dwd˛
n
(44.7)
where
p.w; ˛jD/ / p.Djw/p.w; ˛/: (44.8)
44 Direction-of-Change Financial Time Series Forecasting Using Neural Networks 519
The objective in Monte Carlo integration is to sample preferentially from the region
where p.w; ˛jD/ is large, and this can be done using the Metropolis algorithm [10].
This algorithm operates by generating a sequence of vectors in such a way that each
successive vector is obtained by adding a small random component to the previous
vector; i.e., wnew D wold C", where " is a small random vector. Preferential sampling
is achieved using the criterion:
The experimental results reported in this section were obtained by applying the tech-
nique described above to the daily close values of the Australian All Ordinaries
Index (AORD) for the period from November 1990 to December 2004.
.r.t/; ma5 .t/; ma10 .t/; ma30 .t/; ma60 .t// where
r.t/ D .pt pt 1 / =pt 1 (44.9)
and
1X
t
man .t/ D pt (44.10)
n t n
Note that there would almost certainly exist some other combination of inputs and
pre-processing steps that might lead to better performance than that which can
be achieved using this particular combination. However, in this paper we are not
concerned with optimizing this choice of inputs; rather, we are concerned with
comparing the performance of the Bayesian approach with that of the ML approach.
Assuming that a return series is stationary, then a coin-flip decision procedure for
predicting the direction of change would be expected to result in 50% of the pre-
dictions being correct. We would like to know whether our model can produce
predictions which are statistically better than 50%. However, a problem is that many
financial return series are not stationary, as evidenced by the tendency for commod-
ity prices to rise over the long term. Thus it may be possible to achieve an accuracy
significantly better than 50% by simply biasing the model to always predict up.
A better approach is to compensate for this non-stationarity, and this can be done
as follows. Let xa represent the fraction of days in an out-of-sample test period for
which the actual movement is up, and let xp represent the fraction of days in the
test period for which the predicted movement is up. Therefore, under a coin-flip
model the expected fraction of days corresponding to a correct upward prediction is
(xa xp /, and the expected fraction of days corresponding to a correct downward
prediction is .1 xa / .1 xp /. Thus the expected fraction of correct predictions is
We wish to test whether amod (the accuracy of the predictions of our model) is
significantly greater than aexp (the compensated coin-flip accuracy). Thus, our null
hypothesis may be expressed as follows:
using the 200 trading days immediately preceding this test period. The training and
prediction windows are each then advanced by 30 days. For each 30-day prediction
period we calculate amod and aexp . We then use a paired t-test to determine whether
the means of these values differ statistically.
The Bayesian approach requires that we specify the prior weight distribution, p(w).
Recall that p(w) is assumed to be Gaussian, with inverse variance ˛, and that ˛
is assumed to be distributed according to a Gamma distribution with shape a and
mean ., which remain to be specified. In order to gain some insight into the range of
values may be suitable for these parameters, we conducted a number of trials using
the ML approach, with weight optimization performed using the scaled conjugate
gradients algorithm [13].
Table 44.1 shows the training and test accuracies corresponding to various values
of ˛. Accuracies are averaged over the 120 30-day prediction windows. The values
in the fourth column of the table represent the p-values obtained through performing
the t-test. Italics show best performance.
Figure 44.1 plots training and test set accuracies against the ˛ value. Low values
for ˛, such as 0.01, impose a small penalty for large weights, and result in overfit-
ting; i.e., high accuracy on training data, but low accuracy on test data. In this case,
the null hypothesis cannot be rejected at the 0.05 level. In contrast, when ˛ is very
high (e.g., 10.0), large weights will penalised more heavily, leading to weights with
small magnitudes. In this case the MLP will be operating in its linear region and the
MLP will display a strong bias towards predictions that are in the same direction
as the direction of the majority of changes on the training data. Thus, if the num-
ber of upward movements on the training data is greater than the number of negative
movements, the MLP will be biased towards making upwards predictions on the test
data; however, this is not likely to lead to a rejection of the null hypothesis because
the null hypothesis takes the non-stationarity of the data into account. It can be seen
Table 44.1 Train accuracy, test accuracy and p-value for various ˛ values
˛ value Train. acc. Test. acc. p-value (Ho)
0.010 0.725 0.490 0.402
0.100 0.701 0.501 0.459
0.500 0.608 0.516 0.169
0.750 0.593 0.520 0.070
1.000 0.585 0.524 0.007
1.500 0.570 0.521 0.038
2.000 0.562 0.518 0.587
5.000 0.549 0.526 0.528
10.00 0.542 0.525 0.479
522 A.A. Skabar
0.75
Average Training Accuracy
Average Testing Accuracy
0.70
0.65
Accuracy
0.60
0.55
0.50
0.45
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
alpha (inverse variance of weights)
Fig. 44.1 Training and test set accuracy averaged over 120 training/test set pairs. A local maximum
test accuracy corresponds to an ˛ value of approximately 1.0
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3
alpha
Fig. 44.2 Gamma distribution with mean a = 1.0 and shape parameter D 10
from Fig. 44.1 that a local maximum for the test set accuracy occurs for an ˛ value
of approximately 1.0, and in this case the null hypothesis can clearly be rejected at
the 0.01 level.
The range of ˛ values for which the null hypothesis can be rejected is very nar-
row, and ranges from a lower ˛ value in the vicinity of 0.5–0.75, to an upper ˛ value
in the vicinity of 1.52.0. After visualizing the pdf for Gamma distributions with
mean 1.0 and various values for the shape parameter, a shape parameter of 10 was
chosen. The pdf is shown in Fig. 44.2. Note that the pdf conforms roughly to the ˛
value identified in the Table 44.1 as leading to a rejection of the null hypothesis.
44 Direction-of-Change Financial Time Series Forecasting Using Neural Networks 523
Table 44.2 Train accuracy, test accuracy and p-value for Bayesian MLPs (MCMC) and conven-
tionally trained (ML) MLPs (a D 1:0, D 10)
Method Train. acc. Test. acc. p-value (Ho)
MCMC 0.571 0.528 0.0011
ML 0.585 0.524 0.0068
We now describe the application of the Bayesian approach, which relies on MCMC
sampling to draw weight vectors from the posterior weight distribution.
Monte Carlo sampling must be allowed to proceed for some time before the sam-
pling converges to the posterior distribution. This is called the burn-in period. In the
results presented below, we allowed a burn-in period of 1,000 samples, following
which we then saved every tenth sample until a set of 100 samples was obtained.
Each of the 100 samples was then applied to predicting the probability of an upwards
change in the value of the index on the test examples, and the probabilities were then
averaged over the 100 samples. The resulting accuracies and p-values are shown
in the first row of Table 44.2. The second row shows results obtained using the
maximum-likelihood approach. Note that the p-values for MCMC are much smaller
than those for the maximum likelihood approach, indicating increased confidence
in the rejection of the null hypotheses. Also note that the average test accuracy
for MCMC (52.8%) is greater than that for ML (52.4%), while the average train-
ing accuracy for the MCMC (57.1%) is less than that for ML (58.5%), thereby
supporting the claim that the Bayesian approach is better at avoiding overfitting.
44.5 Conclusions
The superior performance of the Bayesian approach can be attributed to its integra-
tive nature: each individual weight vector has its own bias, and by integrating over
many weight vectors, this bias is decreased, thus reducing the likelihood of inferior
generalization performance resulting from overfitting on the training data.
The most important decision to be made in using the Bayesian approach is the
choice of prior. In this study, we used a relatively narrow distribution for ˛, the
parameter controlling the degree of regularization present in the error function.
This choice was made based on experimenting with different ˛ values within the
ML approach. The criticism could be made that this prior distribution was selected
based on the same data that we had used for testing, and hence that the signifi-
cance of our results may be overstated; however, this is unlikely to be the case, as
the prior depends heavily on factors such as the degree of noise in the data, and
this is relatively constant over different periods of the same time series. Moreover,
the fact that we need only select parameters describing the distribution of ˛, rather
524 A.A. Skabar
than a specific value for ˛, further diminishes the possibility that our prior is biased
towards the particular dataset that we have used.
One of the advantages of the Bayesian approach is its inherent ability to avoid
overfitting, even when using very complex models. Thus, although the results
presented in this paper were based on MLPs with six hidden units, the same per-
formance could, in principle, have been achieved using a more complex network. It
is not clear, however, whether we should expect any coupling between the number of
hidden layer units and the prior distribution. For this reason, we would recommend
preliminary analysis using the ML approach to ascertain the appropriate range for ˛
and selection of priors based on values which lead to significant predictive ability.
References
1. Y. Kajitani, A.I. McLeod, K.W. Hipel, Forecasting Nonlinear Time Series with Feed-forward
Neural Networks: A Case Study of Canadian Lynx data, Journal of Forecasting, 24, 105–117
(2005).
2. J. Chung, Y. Hong, Model-Free Evaluation of Directional Predictability in Foreign Exchange
Markets, Journal of Applied Econometrics, 22, 855–889 (2007).
3. P.F. Christoffersen, F.X. Diebold, Financial Asset Returns, Direction-of-Change Forecasting,
and Volatility Dynamics, Penn Institute for Economic Research PIER Working Paper Archive
04-009, 2003.
4. S. Walczak, An Empirical Analysis of Data Requirements for Financial Forecasting with
Neural Networks, Journal of Management Information Systems, 17(4), 203–222 (2001).
5. D.J.C. MacKay, A Practical Bayesian Framework for Back Propagation Networks, Neural
Computation, 4(3), 448–472 (1992).
6. R.M. Neal, Bayesian Training of Backpropagation Networks by the Hybrid Monte Carlo
Method, Department of Computer Science, University of Toronto Technical Report CRG-
TR-92-1, 1992.
7. R.M. Neal, Bayesian Learning for Neural Networks (Springer-Verlag, New York, 1996).
8. A. Skabar, Application of Bayesian Techniques for MLPs to Financial Time Series Fore-
casting, Proceedings of 16th Australian Conference on Artificial Intelligence, 888–891
(2005).
9. C. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, Oxford, 1995).
10. N.A. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A. Teller, and E. Teller, Equation of
State Calculations by Fast Computing Machines, Journal of Chemical Physics, 21(6), 1087–
1092 (1953).
11. S. Duane, A.D. Kennedy, B.J. Pendleton, and D. Roweth, Hybrid Monte Carlo, Physics Letters
B, 195(2), 216–222 (1987).
12. S. Geman, G. Geman, Stochastic Relaxation, Gibbs Distributions and the Bayesian Restora-
tion of Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741
(1984).
13. M.F. Moller, A scaled conjugate gradient algorithm for fast supervised learning, Neural
Networks, 6, 525–533 (1993).
Chapter 45
A Dynamic Modeling of Stock Prices
and Optimal Decision Making Using
MVP Theory
Abstract In this paper, first a precise mathematical model is obtained for four
competing or cooperating companies’ stock prices and then the optimal buy/sell
signals are ascertained for five different agents which are trading in a virtual market
and are trying to maximize their wealth over 1 trading year period. The model is
so that gives a good prediction of the next 30th day stock prices. The companies
used in this modeling are all chosen from Boston Stock Market. Genetic Program-
ming (GP) is used to produce the predictive mathematical model. The interaction
among companies and the effect imposed by each of five agents on future stock
prices are also considered in our modeling. Namely, we have chosen eight compa-
nies in order that there is some kind of interrelation among them. Comparison of
the GP models with Artificial Neural Networks (ANN) and Neuro-Fuzzy Networks
(trained by the LoLiMoT algorithm) shows the superior potential of GP in predic-
tion. Using these models; five players, each with a specific strategy and all with one
common goal (wealth maximization), start to trade in a virtual market. We have also
relaxed the short-sales constraint in our work. Each of the agents has a different
objective function and all are going to maximize themselves. We have used Parti-
cle Swarm Optimization (PSO) as an evolutionary optimization method for wealth
maximization.
R. Rajabioun (B)
School of Electrical and Computer Engineering, Control and Intelligent Processing Center of
Excellence, University of Tehran, Tehran, Iran
E-mail: r.rajabioun@ece.ut.ac.ir
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 525
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
526 R. Rajabioun, A. Rahimi-Kian
45.1 Introduction
Forecasting the change in market prices and making correct decisions is one of the
most principal needs of anyone who economical environments concerns him. Time
series are the most common methods used in price prediction [1–3]. But the pre-
dominant defect of these methods is that they use only the history of a company’s
price to do a prediction. Recently, there has been growing attention to the models
that concern the interaction among companies in modeling and the use of game the-
ory [4–6] in decision making because of providing more realistic models. Because of
complexity of the mutual effects of each company on the others, methods like Artifi-
cial Neural Networks (ANN), Neuro-Fuzzy Networks and State Space (SS) models
are used more often for the stock price modeling. In references [7–10] Neural
Network is used to model the stock market and make prediction. In reference [8],
Genetic algorithm (GA) is incorporated to improve the learning and generalizability
of ANNs for stock market prediction. In reference [11] the difference between the
price and the moving average, highest and lowest prices is used as inputs for one-
day-ahead price prediction. More over, volume of transactions, market indicators
and macro economic data are also considered as input variables [12]. There are also
some studies being performed on the fluctuations and the correlations in stock price
changes in physics communities, using the concepts and methods in physics [13,14].
In reference [15] a neuro-genetic stock prediction system is introduced, which uses
genetic algorithm to select a set of informative input features for a recurrent neural
network. In references [16, 17], the neuro-genetic hybrids for stock prediction are
proposed. The genetic algorithm is used to optimize the weights of ANN.
Producing the right buy/sell signals are also important for those who trade in
the stock markets. In reference [18], two simple and popular trading rules including
moving average and trading range breakout are tested in the Chilean stock market.
Their results were compared with the buy-and-hold strategy, and both trading rules
produced extra returns compared to the buy-and-hold strategy.
Genetic Programming (GP) is a symbolic optimization technique, developed by
Koza [19]. It is an evolutionary computational technique based on the so-called
“tree representation”. This representation is extremely flexible because trees can
represent computer programs, mathematical equations, or complete models of pro-
cess systems [20]. In reference [21] GP is used to produce a one-day-ahead model
to predict stock prices. This model is tested for a fifty consecutive trading days of
six stocks and has yielded relatively high returns on investment.
In this paper, we use the GP to find the best mathematical models for the four
companies’ stock prices under study. Our GP models are able to predict these stock
prices for up to the next 30 days with acceptable prediction errors in the market.
Because, the GP is a well known algorithm we will not present it in details. However,
reference [22] provides a good review of the GP algorithm.
The modeling is done for four companies in Boston Stock Market [23]. Selected
companies include: Advanced Micro Devices (AMD), Ericsson (ERIC), Sony
(SNE), Philips (PHG), International Business Machines (IBM), Intel Corporation
(INTC), Microsoft (MSFT) and Nokia (NOK). These companies are assumed to
45 A Dynamic Modeling of Stock Prices and Optimal Decision 527
have a relationship like competition or cooperation and so their stock prices could
affect on each other. Letters allocated in parentheses are the symbols using which
one can access the price data of each company. We use the price history of these
eight companies as inputs to predict our four objective companies’ prices including:
Ericsson (ERIC), International Business Machines (IBM), Sony (SNE) and Philips
(PHG). Obtained four models precision is compared with two traditional methods:
(1) Multi Layer Perceptron (MLP) and (2) Neuro-Fuzzy Network trained by Locally
Linear Model Tree (LoLiMoT) method.
After modeling the four companies’ stock prices, we create five agents who trade
in a virtual market in order to maximize their wealth. These agents (players) will buy
or sell their in hand stocks according to their uniquely defined strategies. Each player
has a unique objective function. The Buy/Sell actions of each player are obtained so
as to maximize its objective function in each trading period. The maximization is
done using the Particle Swarm Optimization (PSO) method [24].
At rest of the paper, in Section 45.2, modeling and prediction is discussed.
Section 45.3 demonstrates the virtual stock market and argues its constraints and
presumptions. Then in Section 45.4 the results of our simulations are shown and
finally the conclusion is done in Section 45.5.
As stated earlier, our primary goal is to obtain a predictive model that is able to
predict the future stock prices precisely. The companies that we are going to pre-
dict their stocks include: Ericsson (ERIC), International Business Machines (IBM),
Sony (SNE) and Philips (PHG). We presume that these companies have some kind
of interrelations with four other companies including: Advanced Micro Devices
(AMD), Intel Corporation (INTC), Microsoft (MSFT) and Nokia (NOK). So we
downloaded these eight companies’ price data from the Boston Stock Market [23].
The downloaded data encompasses some information like: daily opening price, daily
closing price, daily highest price, daily lowest price and exchange volume. In this
paper, we predict the average of daily opening and closing prices. Our data set con-
tains sampled price data for the interval of 2001/07/08 to 2006/03/11. The criterion
used to evaluate the models is the Normalized Mean Square Error (NMSE), which
is defined as follows:
Pn
.yi yO i /2
i D1
NMSE D (45.1)
P
n
y2i
i D1
where yi and yO i are the original and predicted price values, respectively. Figure 45.1
shows the NMSE values for the train data set (the first 70%) using GP, ANN
(MLP) and Neuro-Fuzzy networks (trained using LoLiMoT). Figure 45.2 depicts
this comparison for the test data set (the last 30%).
528 R. Rajabioun, A. Rahimi-Kian
0.006
0.004
0.002
0
IBM ERIC SNE PHG
Fig. 45.1 The prediction error (NMSE) for all companies (using train data)
0.3
NMSE
0.2
0.1
0
IBM ERIC SNE PHG
Fig. 45.2 The prediction error (NMSE) for all companies (using test data)
The GP-based stock price models were initialized with some functions and ter-
minals. The terminals included random number generators together with integers
from 1 to 16. The functions included: fC; ; ; ; log x; ex ; xy g. The population
sizes were set to 600 except for ERIC, which was set to 400. Meanwhile the number
of iterations was set to 80. As it can be seen from Figs. 45.1 and 45.2, the GP-
based price model prediction errors are acceptable for the training data set and less
than both of the MLP and Neuro-Fuzzy models for test data set. The only draw-
back of the GP algorithm is its time-consuming modeling characteristics, which is
acceptable comparing to its precise modeling.
Until now we have modeled the interactions of eight different companies that
affect the future price values. But due to the fact that buyers/sellers also affect future
stock prices of the companies, it is essential to include such interactions in the mod-
eling. Therefore, we augment a new term to our price models in order to include
the effects of the market players’ actions (buy/sell weights) into the future price
45 A Dynamic Modeling of Stock Prices and Optimal Decision 529
changes. Since there are not much available data on how the buy/sell volumes of the
market players affect the future prices, we decided to add a new term to show these
effects in our price prediction models as follows:
where:
γ : is a weighting coefficient that regulates the impact of the augmented term on
the models. When γ is large the augmented term makes the model deviate from
the trend of the time-series market historical data.
W: is a weight vector that its elements show each company’s stock trade impact
on future prices. The elements of this vector are between 0 and 1.
a: is the action vector of all players. Its elements are between -1 and 1 that show
the buy/sell rates of the stocks.
Price vector: contains the current stock price values in the market.
The best value for the γ factor obtained to be 0.1. The W-vector was chosen as
follows: W = [0.1 0.05 0.1 0.2 0.2 0.2 0.05 0.1]. The corresponding companies’
symbol vector is: (AMD ERIC IBM INTC MSFT NOK PHG SNE).
The augmented term makes it possible for us to see the effect of each player’s
market decision on the stock prices and other players’ wealth (similar to a non-
cooperative game).
Our objective in the next section would be to find the best market actions (sell/buy
of each stock) of each player so as to maximize its expected objective function
(wealth) in the market. Our market simulation studies are done in a Virtual Stock
Market and by means of an evolutionary optimization algorithm (PSO). In our simu-
lations a common PSO with inertia was used. Table 45.1 shows the parameters used
in the PSO optimization.
We assume five players (agents) in the stock market. We also assume that these
players have no stocks at the beginning of the market. They just have US$5,000,000
and intend to buy stocks that would maximize their expected wealth in the market.
The players are free to buy and sell stocks in each round of the market. There are
530 R. Rajabioun, A. Rahimi-Kian
1,000,000 stocks assumed to be available from each company in our virtual stock
market (4,000,000 stocks in total). The only limitation imposed by the market is
the maximum number of stocks each player can buy or sell each day. This buy/sell
volume is limited to 1,000 stocks trading per day for each company. This constraint
is essential because if there is no limitation the whole stocks might be bought at
the beginning of the trading period by one of the players. This way there will be
no chance for other players to buy some of the stocks. Through the augmented term
added to the stock price models we can see the effect of each agent’s action (sell/buy
stocks) on the future prices and other agents’ wealth in the market.
We assume five players (agents) with different objective functions and different
strategies in the market, but we assume that all the agents have access to the stock
price models (developed in Section 45.2) symmetrically.
The players’ strategies are as follows:
Strategy of player 1:
This player buys (sells) the maximum number of allowed stocks when the prediction
shows an increase (decrease) in next 30 day prices compared to the average prices
of the last 10 days.
Strategy of player 2:
This player uses the Mean-Variance Analysis (MVA). He chooses the standard devi-
ation of the expected return (rp / as a measure of risk (p /. He plots the opportunity
set (efficient frontier) for a four-asset portfolio and takes an average risk and for an
average return each day. A sample opportunity set for a four-asset portfolio is shown
in Fig. 45.3.
Strategy of player 3:
This player believes in Random Walk Theory. He believes that the stock prices are
unpredictable and therefore, he buys and sells stocks randomly.
0.05
0.04
Some of Selectable
0.03 Points
Working
0.02
Region of
Player 4
0.01
0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26
Risk (Standard Deviation)
Fig. 45.4 The working regions of players 2, 4 and 5 on the risk-return efficient frontier (the red
points can be selected in each trading day)
Strategy of player 4:
This player acts just like player 2. The only difference is in his risk averse behavior.
To reach the minimum risk, in each stage, he selects the buy/sell weights on the knee
of efficient frontier curve.
Strategy of player 5:
This player also acts like player 2 with the difference that this player is risk lover.
Therefore, in each stage this player selects the buy/sell weights with the maximum
risk and maximum expected return.
The working regions of players 2, 4 and 5 on the risk-return efficient frontier are
shown in Fig. 45.4.
These five players buy and sell in the virtual stock market. In the related litera-
ture it is usually seen that short-sales are disregarded when optimizing the players’
objective functions and the optimization is just done through stock purchases. How-
ever, in this paper we have relaxed this constraint and allowed the players to buy
and sell their stocks when needed.
In the following, we define the objective functions for all players and demonstrate
their optimization process. For players 2, 4 and 5 that the risk values are important
in their decisions, we define their objective function as follows:
(E.rp // versus the risk (p / of player-i , for D 1 the risk term disappears form
the objective function of player-i: Ei D E.rpi /. In our market simulation studies, we
chose D f0; 0:5; 1g for players f2, 4, 5g respectively according to their defined
risk behaviors in the market. The players’ objective functions were optimized with
respect to their decision variables (stock sell/buy actions) using the Particle Swarm
Optimization method and the results are presented and analyzed in the following
section.
The market simulation results for the five players are presented and analyzed in
this section. Figures 45.5 and 45.6 show the optimal buy/sell actions for players 1
and 5 for each company’s stock (ERIC, IBM, PHG and SNE). The optimal buy/sell
actions for players 2, 3 and 4 are shown in the appendix. In these figures, the buy
actions are positive, sell actions are negative and no-actions are zero.
If the buy action gets +1 (or 1), then the player, should buy (or sell) the max-
imum number of stocks allowed for that company. In Fig. 45.7, the wealth of each
player is shown for 1 trading year period. The wealth is measured as the values of
the in-hand stocks added to the cash amount in-hand for each player.
As can be seen from Fig. 45.7, players 1 and 5 have done better than the rest
of them in terms of the wealth maximization for 1 year stock trading. Figure 45.8
shows the expected risk values in each trading day for each player. As we expected,
player 1 has the minimum expected risk over the trading period and has also
obtained the maximum return from the market.
Fig. 45.5 The optimal trading actions of player 1 for all companies’ stocks
45 A Dynamic Modeling of Stock Prices and Optimal Decision 533
Fig. 45.6 The optimal trading actions of player 5 for all companies’ stocks
player 3
player 4
1.5 player 5
0.5
0
0 100 200 300 400
Trading year
Fig. 45.7 The wealth of all players for all days of the trading year
Its strategy was to buy/sell maximum stocks with respect to the comparison of
the predicted future-prices’ trends with those of the moving average 10-day-before
prices. Since the GP prediction models had small prediction errors for the test data,
this player did well in the market by relying on the prediction results.
Among players 2, 4 and 5 (referring to Fig. 45.7), we can see that player 5 with
the maximum risk level has made the most wealth (expected returns) and stands in
the second rank (after player 1) in terms of market returns. Player 3’s strategy was
to buy and sell randomly; by referring to Figs. 45.7 and 45.8, one can see that his
expected returns are similar to those for player 2 (see the Fig. 45.9, 45.10 and 45.11
in the Appendix) but, his expected risks values are more than other players.
534 R. Rajabioun, A. Rahimi-Kian
20
0
0 100 200 300 400
40
P3
20
0
0 100 200 300 400
40
P4
20
0
0 100 200 300 400
40
P5
20
0
0 100 200 300 400
Trading Year
Fig. 45.8 The expected risk values for each player over the 1 trading year (P1 to P5 from top to
bottom graphs)
45.5 Conclusion
In this paper, precise price predictive models were obtained for four companies’
stocks using the GP. This model incorporated the effects of the players’ actions on
the stock prices and other players’ wealth. After the GP model was verified (using
the test data), it was used for making sell/buy decisions by five agents that traded
in a virtual stock market. The trading period was considered 1 year for our market
simulation studies. Five different strategies and objective functions were defined
for the market trading agents (with different risk attitudes). The PSO algorithm was
used to obtain the optimal buy/sell actions for each player in order to maximize their
objective functions (expected returns). The players’ strategies and their expected
risk-returns were obtained and analyzed for the 1 year trading period. Our market
simulation studies showed that the player (P1) was the most successful one in our
virtual stock market.
45 A Dynamic Modeling of Stock Prices and Optimal Decision 535
Appendix
Fig. 45.9 The optimal trading actions of player 2 for all companies’ stocks
Fig. 45.10 The optimal trading actions of player 3 for all companies’ stocks
536 R. Rajabioun, A. Rahimi-Kian
Fig. 45.11 The optimal trading actions of player 4 for all companies’ stocks
References
1. R. Thalheimer and M. M. Ali, Time series analysis and portfolio selection: An application to
mutual savings banks, Southern Economic Journal, 45(3), 821–837 (1979).
2. M. Pojarliev and W. Polasek, Applying multivariate time series forecasts for active portfolio
management, Financial Markets and Portfolio Management, 15, 201–211 (2001).
3. D. S. Poskitt and A. R. Tremayne, Determining a portfolio of linear time series models,
Biometrika, 74(1), 125–137 (1987).
4. T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory. (San Diego, CA:
Academic, 1995).
5. J. W. Weibull, Evolutionary Game Theory. (London: MIT Press, 1995).
6. D. Fudenberg and J. Tirole, Game Theory. (Cambridge, MA: MIT Press, 1991).
7. S. Lakshminarayanan, An Integrated stock market forecasting model using neural network,
M. Sc. dissertation, College of Engineering and Technology of Ohio University (2005).
8. K. J. Kim and W. B. Lee, Stock market prediction using artificial neural networks with optimal
feature transformation, Journal of Neural Computing & Applications, Springer, 13(3), 255–
260 (2004).
9. K. Schierholt and C. H. Dagli, Stock market prediction using different neural network classi-
fication architectures, Computational Intelligence for Financial Engineering, Proceedings of
the IEEEIAFE Conference, pp. 72–78 (1996).
10. K. Nygren, Stock Prediction – A Neural Network Approach, M.Sc. dissertation, Royal
Institute of Technology, KTH (2004).
11. E. P. K. Tsang, J. Li, and J. M. Butler, EDDIE beats the bookies, International Journal of
Software, Practice and Experience, 28(10), 1033–1043 (1998).
12. D. S. Barr and G. Mani, Using neural nets to manage investments, AI EXPERT, 34(3), 16–22
(1994).
13. R. N. Mantegna and H. E. Stanley, An Introduction to Econophysics: Correlation and
Complexity in Finance (Cambridge University Press, Cambridge, MA, 2000).
14. J. P. Bouchaud and M. Potters, Theory of Financial Risks: From Statistical Physics to Risk
management (Cambridge University Press, Cambridge, MA, 2000).
15. Y. K. Kwon, S. S. Choi, and B. R. Moon, Stock Prediction Based on Financial Correlation, In
Proceedings of GECCO’2005, pp. 2061–2066 (2005).
45 A Dynamic Modeling of Stock Prices and Optimal Decision 537
16. Y. K. Kwon and B. R. Moon, Daily stock prediction using neuro-genetic hybrids, Proceedings
of the Genetic and Evolutionary Computation Conference, pp. 2203–2214 (2003).
17. Y. K. Kwon and B. R. Moon, Evolutionary ensemble for stock prediction, Proceedings of the
Genetic and Evolutionary Computation Conference, pp. 1102–1113 (2004).
18. F. Parisi and A. Vasquez, Simple technical trading rules of stock returns: evidence from 1987
to 1998 in Chile, Emerging Market Review, 1, 152–64 (2000).
19. J. Koza, Genetic Programming: On the Programming of Computers by Means of Natural
Evolution. (Cambridge, MA: MIT Press, 1992).
20. J. Madar, J. Abonyi, and F. Szeifert, Genetic Programming for the identification of nonlinear
input-output models, Industrial Engineering Chemistry Research, 44, 3178–3186 (2005).
21. M. A. Kaboudan, Genetic Programming prediction of stock prices, Computational Economics,
16(3), 207–236 (2000).
22. S. Sette and L. Boullart, Genetic Programming: principles and applications, Engineering
Applications of Artificial Intelligence, 14, 727–736 (2001).
23. Boston Stock Group web page (August 1, 2007); http://boston.stockgroup.com
24. J. Kennedy and R. Eberhart, Particle Swarm Optimization, Proceedings of the IEEE Interna-
tional Conference on Neural Networks, Perth, Australia, pp. 1942–1945 (1995).
Chapter 46
A Regularized Unconstrained Optimization
in the Bond Portfolio Valuation and Hedging
46.1 Introduction
This paper presents a numerical approach to the valuation and hedging of a portfolio
of bond and options in the case of strong dependency of bond principal on the market
interest rate metric. Collateralized Mortgage Obligations (CMO) represent one of
the important classes of such bond type. CMOs can have a high degree of variability
in cash flows. Because of this, it is generally recognized that a yield to maturity of
Y. Gryazin (B)
Department of Mathematics, Idaho State University, Pocatello, ID 83209, USA,
E-mail: gryazin@isu.edu
The work of this author was supported in part by Mexican Consejo Nacional de Ciencia y
Tecnologia (CONACYT) under Grant # CB-2005-C01-49854-F.
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 539
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
540 Y. Gryazin, M. Landrigan
static spread calculation is not a suitable valuation methodology. Since the 1980s
Option-Adjusted Spread (OAS) has become a ubiquitous valuation metric in the
CMO market. There have been many criticisms of OAS methodology, and some
interesting modifications have focused on the prepayment side of the analysis, e.g.,
[1,2]. One of the problems with using OAS analysis is the lack of information about
the distribution of the individual spreads, which in turn leads to the difficulties in
the construction of the hedging portfolio for CMO.
To improve the CMO valuation methodology and to develop a robust procedure
for the construction of the optimal hedge for CMO, we introduce an optimization
approach to minimize the dispersion of the portfolio spread distribution by using
available on market options. In doing so, we design an optimal hedge with respect
to the set of available benchmarks and we obtain the two new valuation metrics that
represent the quality of hedge with respect to this set. Our two main outputs are the
mean and the standard deviation of the individual spreads for the optimal portfolio.
These metrics can be used in comparison analysis in addition to the standard OAS
valuation.
This new methodology can lead to quite different conclusions about CMO than
OAS does. In particular, in comparing two bonds, the more negatively convex bond
may look cheaper on an OAS basis, but be more attractive according to our analysis
since we provide a way to estimate the quality of available hedge.
The main difficulty in implementing our new methodology is in the minimization
of the spread-variance functional. The difficulty is partly because the optimization
problem is ill-conditioned, and in many situations, requires introduction of some
type of regularization. Our approach is to use the standard Tikhonov regularization
[3], which has the strong intuitive appeal of limiting the sizes of benchmark weights
used in hedging.
A word about static versus dynamic hedging may be in order. Our methodology
is to set up a static hedge and conduct bond valuation based on the static optimal
portfolio. It may be argued that an essentially perfect hedge may be created dynam-
ically. However, a dynamically-hedged portfolio will not have the OAS as a spread.
Moreover, ones hedging costs will be related to the amount of volatility in the future
so can be quite uncertain. Our methodology greatly reduces dynamic hedging costs
by setting up an optimized static hedge, and thus reduces the uncertainty in dynamic
hedging costs.
The rest of the paper is organized as follows. In the second section we briefly
review OAS and point out in more detail the problems we see with it. In the third
section we explicitly define our hedging methodology. In the forth section we give a
brief description of the regularized numerical method. In the fifth section we present
some details on the term-structure model used; on our prepayment assumptions and
summarize numerical results from our analysis.
46 A Regularized Unconstrained Optimization 541
Standard OAS analysis is based on finding a spread at which the expected value
of the discounted cash flows will be equal to the market value of a CMO. This is
encapsulated in Eq. (46.1):
X
L
M VCMO e
DE d.ti ; s/ c.ti /: (46.1)
i D1
Here M VCMO is the market value of CMO, E e denotes expectation with respect to
the risk neutral measure, d.ti ; s/ is the discount factor to time ti with a spread of
s, and c.ti / is the cash flow at time ti , for i D 1; 2; : : : ; L. Note that in the OAS
framework, a single spread term is added to the discounted factors to make this
formula a true equality, this spread, s, is referred to as the OAS.
The goal of OAS analysis is to value a CMO relative to liquid benchmark
interest rate derivatives, and, thus, the risk-neutral measure is derived to price
those benchmarks accurately. To calculate expected values in practice one uses
Monte-Carlo simulation of a stochastic term-structure model. We use the two-factor
Gaussian short-rate model G2++ as in [4] calibrated to U.S. swaption prices, but
other choices may be suitable. In terms of numerical approximation, it will give us
Eq. (46.2):
( L )
1 XN X Y t
1
k
M Vbench D k
cfbench .n; t/ C Err k : (46.2)
N nD1 t D1
1 C ti r.n; ti /
i D1
1; : : : ; M are the future cash flows of the benchmarks, N is the number of generated
trajectories, L is the number of time intervals until expiration of all benchmarks
and CMO, and M is the number of benchmarks in the consideration. The last term
Err k represents the error term. Though, the detailed consideration of calibration
procedure is outside the scope of this presentation, it is worth to mention that the
absolute value of the Err k term is bounded in most of our experiments by five basis
points.
The second step in the OAS analysis of CMOs is to find the spread term from the
Eq. (46.3):
8 9
XN <X i Y
i =
1 1
M VCMO D cf .n; ti / : (46.3)
N : 1 C tj .r.n; tj / C s/ ;
nD1 i D1 j D1
We will see in more detail later how consideration of the OAS of various portfolios
makes some serious drawbacks of OAS analysis apparent. Some of the drawbacks
are:
1. It matches a mean to market value, but provides no information on the distribu-
tion of the individual discounted values.
2. One can use it to calculate some standard risk metrics, but it gives no way to
design a refined hedge.
3. It is sensitive to positions in your benchmarks.
In our approach an affective hedging strategy is proposed and new valuation metrics
are considered.
In our proposed approach, instead of using the same spread for all paths, we are
looking at the individual spreads for every path of the portfolio of CMO and bench-
marks. The dispersion of the distribution can be considered a measure of risk or
alternately as a measure of the quality of a hedge. In a classical mean-variance
portfolio optimization problem (see, e.g., [5]) there are two parts of the formula-
tion usually considered: (1) the task of the maximizing the mean of the portfolio
return under a given upper bound for the variance; (2) the task of minimizing
the variance of the portfolio return given a lower bound on the expected portfolio
return. We approach the problem of hedging CMOs with swaptions from a simi-
lar point of view, but focus on the distribution of spreads. The goal is to apply an
46 A Regularized Unconstrained Optimization 543
X
M
M VCMO C k
wk M Vbench D
kD1
" #
X
L X
M
k
cf .n; ti / C wk cfbench .n; ti / (46.5)
i D1 kD1
Y
i
1
:
1 C ti .r.n; tj / C s.n; w1 ; : : : ; wM //
j D1
As mentioned before, we are considering that for every interest rate trajectory the
individual spread depends on the weights for benchmarks in the portfolio. Then the
target functional is defined in Eq. (46.6):
1 X
N
f .w1 ; : : : ; wn / D s.n; w1 ; : : : ; wn /2 2 : (46.6)
N nD1
P
Here D N1 N nD1 s.n; w1 ; : : : ; wM /. The Jacobian and Hessian of the functional
are given by Eq. (46.7):
2 X
N
@f @s.n; w1 ; : : : ; wn /
D .s.n; w1 ; : : : ; wn / / ; i D 1; : : : ; M;
@wi N nD1 @wi
(46.7)
@2 f 2 X
N
@s @s @2 s
D C .s / (46.8)
@wi @wl N nD1 @wi @wl @wi @wl
( ) ( )
1 X @s 1 X @s
N N
2 ; i; l D 1; : : : ; M:
N nD1 @wi N nD1 @wl
544 Y. Gryazin, M. Landrigan
Using implicit differentiation one can find @s=@wi ; @2 s=.@wi @wl /, i; l D 1; :::; M .
But this functional in general is ill-conditioned. To ensure the convergence of the
optimization method, we introduce standard Tikhonov regularization. The modified
target functional could be presented as in Eq. (46.9):
In most situations this guaranties the convergence of the numerical method to the
unique solution. In our approach we are using the optimization technique based
on the combination of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) (see, e.g.,
[6]) and Newton methods. The BFGS approach is applied on the early iterations to
ensure the convergence of Newton’s method to a minimum. When the l2 norm of
the gradient of the target function becomes less then 1010 we apply the Newton
method assuming that the approximation is already close enough to the solution and
the quadratic convergence rate of the Newton method can be achieved.
As we already mentioned, the regularization keeps the value of the target func-
tional small and stabilizes the optimization method by preventing the weights of the
benchmarks in the portfolio wi ; i D 1; : : : ; M from becoming too large. To keep the
condition number of the Hessian bounded, one has to use fairly large regularization
parameter ˛. On the other hand, in order to keep the regularized problem reasonably
close the original one, we expect that ˛ needs to be small. In addition to these pure
mathematical requirements, in the case of managing a portfolio, one has to take into
consideration the cost of the hedging. Since regularization term prevents the optimal
weights from drifting to infinity, the regularization parameter becomes a desirable
tool in keeping hedging cost under control. In our experiments we found that the
parameter ˛ D 109 represents a good choice.
swaptions. The l2 norm of the difference vector of the model swaption prices and
the market prices in our experiments is less than five basis points.
A key component in valuing mortgage bonds is to calculate the principal bal-
ance. This calculation is based on normal amortization and prepayment. In order to
calculate amortization rates, we assume that the loans backing the bond have some
specific parameters. Those parameters can take a wide range of values, but here we
assume the following. The loan age (WALA) is 60 months, the interest rate on the
loans is (WAC) is 6%. We also assume that the coupon on the bond is 5.5%, and it
has a price of $22.519. The bond under consideration has the following parameters:
WALA = 60 months, WAC = 6%, coupon = 5.5%, and a price of $22.519. We are
using very simple prepayment model defined by linear interpolation of the data from
Table 46.1.
For optimization we use a basket of 64 payer and receiver European swaptions
with yearly expiration dates from years 1 to 16, including at the money (ATM) pay-
ers and receivers, C50 basis points out of the money payers and 50 basis points
out of the money receivers. The regularization parameter used is 109 , and the num-
ber of trajectories in all experiments was 500. The SVM numerical method was
implemented in Matlab with using the BFGS standard implementation available in
the Matlab optimization toolbox. The computer time for the Matlab optimization
routine was 42 s on the standard PC with 2 Hz processor frequency.
Figure 46.1 represents the convergence history of the iterations in the optimiza-
tion method, showing the log of the norm of the gradient.
Figure 46.2 shows the square root of the target functional, which is approximately
the standard deviation of the spread distribution. One can see that as a result of the
application of the new methodology, the standard deviation of the spread distribu-
expiration dates 1–16. In all of these experiments we used one unit of the trust
IO with the cost of $22.519.
As we can see from the first line of the table, the cost of the hedge is an increasing
function of the number of options. But the most importantly, we manage to signif-
icantly decrease the standard deviation of the spread distribution by using spread
variance minimization methodology.
Notice that, when using just two options, the cost of our hedging portfolio is
essentially zero, we are buying the ATM receiver swaption and selling an equal
amount of the ATM payer swaption. This is effectively entering into a forward rate
agreement, and so this case could be considered as a hedging strategy of offsetting
the duration of the CMO. The experiments with the larger sets of swaptions refine
this strategy and take into account more detail about the structure of the cash flows
of the bond. It proves to be a very successful approach to the hedging of the path-
dependent bond.
The last two rows of the table present two different metrics: The mean spread and
the OAS of the portfolio of bond and hedges. We can see that they become close as
standard deviation decreases. In fact, they would be the same if the standard devia-
tion becomes zero. With no swaptions, or with very few of them, these metrics can
result in drastically different conclusions about the cheapness of the bond. Contrary
to a common interpretation of OAS, it does not represent the mean spread of an
unhedged bond. In fact, it can be expected to be close to the mean spread only when
548 Y. Gryazin, M. Landrigan
46.6 Conclusions
References
1. Cohler, G., Feldman, M. & Lancaster, B., Price of Risk Constant (PORC): Going Beyond OAS.
The Journal of Fixed Income, 6(4), 1997, 6–15.
2. Levin, A. & Davidson, A., Prepayment Risk and Option-Adjusted Valuation of MBS. The
Journal of Portfolio Management, 31(4), 2005.
3. Tikhonov, A. N., Regularization of incorrectly posed problems. Soviet Mathematics Doklady,
4, 1963, 1624–1627.
4. Brigo, D. & Mercurio, F., Interest Rate Models: Theory and Practice. Berlin Heidelberg,
Springer-Verlag, 2001, pp. 132–165.
5. Korn, R. & Korn, E., Option Pricing and Portfolio Optimization. AMS, Providence, Rhode
Island, V. 31, 2000.
6. Vogel, C. R., Computational methods for inverse problems. SIAM, Philadelphia, PA, 2002.
Chapter 47
Approximation of Pareto Set in Multi Objective
Portfolio Optimization
47.1 Introduction
Many computational finance problems ranging from asset allocation to risk man-
agement, from option pricing to model calibration can be solved efficiently using
modern optimization techniques. The question of optimal portfolio allocation has
been of long-standing interest for academics and practitioners in finance. In 1950s
Harry Markowitz published his pioneering work where he has proposed a simple
quadratic program for selecting a diversified portfolio of securities [1]. His model
for portfolio selection can be formulated mathematically either as a problem of max-
imization of expected return where risk, defined as variance of return, is (upper)
I. Radziukyniene (B)
Department of Informatics, Vytautas Magnus University, 8 Vileikos str.,
Kaunas LT 44404, Lithuania,
E-mail: i.radziukyniene@if.vdu.lt
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 551
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
552 I. Radziukyniene, A. Zilinskas
Risk plays an important role in modern finance, including risk management, capital
asset pricing and portfolio optimization. The problem of portfolio selection can be
formulated as the problem to find an optimal strategy for allocating wealth among a
number of securities (investment) and to obtain an optimal risk-return trade-off. The
portfolio optimization problem may be formulated in various ways depending on
the selection of the objective functions, the definition of the decision variables,
and the particular constraints underlying the specific situation [7–10]. Beyond the
expected return and variance of return, like in Markowitz portfolio model [1],
the additional objective function can include number of securities in a portfo-
lio, turnover, amount of short selling, dividend, liquidity, excess return over of
47 Approximation of Pareto Set in Multi Objective Portfolio Optimization 553
a benchmark random variable and other [7]. In the bank portfolio management,
the additional criteria such as the prime rate, processing cost, expected default
rate, probability of unexpected losses, quantity of the long-term and short-term
can be considered [8]. For example, the multi-objective portfolio selection problem
can include the following objectives [9]: (to be maximized) portfolio return, divi-
dend, growth in sales, liquidity, portfolio return over that of a benchmark, and (to
be minimized) deviations from asset allocation percentages, number of securities
in portfolio, turnover (i.e., costs of adjustment), maximum investment proportion
weight, amount of short selling. We considered two multi-objective portfolio prob-
lems. The first problem was based on a simple two objectives portfolio model
including the standard deviation of the returns and mean of the returns, where the
return Ri is 1 month return of stock i ; return means percentage change in value. The
second problem included three objectives, where annual dividend yield is added
to two above mentioned objectives. For the experiment we used a data set of ten
Lithuanian companies’ stock data from Lithuanian market.
Financial portfolio selection is one of real world decision making problems with
several, often conflicting, objectives which can be reduced to multi objective opti-
mization. There are many methods to attack multi objective optimization problems.
In some cases a solution to a multi objective problem can be defined as a special
point of Pareto set by means of scalarization, i.e. by converting the initial multi
objective problem to a single criterion problem. However, in many cases decision
can not be made without some information about the whole Pareto set. The the-
oretical problem to find the whole Pareto set normally (e.g. in case of continuum
cardinality of Pareto set) can not be solved algorithmically, thus the theoretical
problem of finding of whole Pareto set should reformulated to algorithmic construc-
tion of an appropriate its approximation. In the present paper we consider several
algorithms of Pareto set approximation including the newly proposed method of
adjustable weights and some evolutionary algorithms well assessed in recent publi-
cations. Over the past decades evolutionary algorithms have received much attention
owing to its intrinsic ability to handle optimization problems with both single and
multiple objectives including problems of financial optimization [2–7]. Let us note
that financial optimization problems were attacked also by means of other heuris-
tics, e.g. by simulated annealing in [3], and by Tabu search in [8]. Comparisons of
the performance of different heuristic techniques applied to solve one criterion port-
folio choice problems are given by Chang et al. [3]. However, to the best knowledge
of the authors’ similar comparative analysis of performance of recently proposed
evolutionary multi-criteria algorithms has not been yet reported.
First method that attracted our attention was the widely used one scalarization
method of weighted criteria summation. Besides of the status of “classics” of multi
objective optimization, an important argument to consider the method of weighted
criteria summation is simplicity of its implementation. Since a multi criteria prob-
lem is converted into a single criterion optimization problem where the objective
function is a weighted sum of the criteria functions, weights can be used to express
the relative significance of different criteria. For some multi objective optimization
problems it is convenient to take into account in this way the preferences of decision
makers. However, we do not assume availability of information about importance
of criteria, and for us weights play the role of parameters defining points in the
Pareto set corresponding to the solutions of parametric single criteria problems. The
mathematical model of the weighted sum method takes the form of:
X
m
min f .x/ ; f .x/ D !i fi .x/ ; (47.1)
i D1
P
where !i is the weight of i th criterion, 0 !i 1; i D 1; :::; m, and m i D1 !i .
Assume that all fi .x/ are convex functions; then for every point of Pareto set
there exist weights !i ; i D 1; :::; m such that this Pareto point is a minimizer of
f .x/ defined by Eq. (47.1). The violation of the convexity assumption can imply
not existence of weights implying location of the minimizer of the corresponding
composite objective function f .x/ in some subsets of Pareto set. However, in the
problems of portfolio selection objectives fi .x/ normally are defined by rather sim-
ple analytical expressions, and checking of the validity of convexity assumption is
easy. The fact of existence of the correspondence between points in Pareto set and
minimizers of the parametric minimization problem (Eq. (47.1)) theoretically jus-
tifies further investigation of properties of this correspondence. In many cases it is
not an easy task to choose the set of weights defining solutions of Eq. (47.1) that
would be well (in some sense uniformly) distributed over the Pareto set. To generate
such a subset of the Pareto set by means of repeatedly solution of Eq. (47.1) with
different weights the latter should be chosen in a special but a priory unknown way.
We propose a branch and bound type method for iterative composing of a set of
weights implying the desirable distribution of solutions of Eq. (47.1). The feasible
region for weights
( )
Xm
D .!1 ; :::; !m / W 0 !i 1; i D 1; :::; m; !i D 1 ; (47.2)
i D1
is a standard simplex. Our idea is to partition into sub simplices whose vertices are
mapped to the Pareto set via Eq. (47.1). The sequential partition is controlled aiming
to generate such new sub simplices that mapping of their vertices would generate
points uniformly distributed in Pareto set. The partition procedure is arranged as a
branching of a tree of nods corresponding to sub simpices. The original standard
simplex is accepted as the root of the tree. Branching means partition of a sim-
plex into two sub simplices where the new vertex is the midpoint of the favorable
47 Approximation of Pareto Set in Multi Objective Portfolio Optimization 555
edge defined later. Simplicial partitioning procedure has been applied to construct
algorithms for single criterion global optimization in [11, 12]; this procedure can be
applied also in multi objective optimization if only preconditions for selection of a
simplex for partition would be reconsidered.
We aim to subdivide the original simplex into sub simplices in such a way that
solutions of Eq. (47.1) corresponding to the vertices of these sub simplices would
be well distributed over the Pareto set. Let us consider the current set of sub sim-
plices in the space of weights, and points in the criteria space corresponding to the
vertices of these simplices. The eligibility of a simplex to partition is defined by the
longest distance between the points in the criteria space corresponding to the ver-
tices of the considered simplex. Branching strategy is based on depth first search,
and the selected simplices are partitioned until the longest distance between the
corresponding points in criteria space is reduced up to predefined tolerance.
The method of adjustable weights (AdjW) was implemented for two and three
criteria cases in MATLAB using fminconstr for solving minimization problem
(Eq. (47.1)). To illustrate the advantage of the proposed method of adjustable
weights over the standard weighted sum method both methods were applied for the
construction of Pareto set of two criteria portfolio optimization with the data men-
tioned above. Figure 47.1 shows the distributions of the Pareto points found by both
methods. In the standard method weights have been changed with the step for 0.05.
Because of space limitation we do not describe in detail the tolerance for branching
of the method of adjustable weights; it have been set aiming at generation of similar
number of points as generated by weighting method. In this experiment the number
of points generated by the weighting method was equal to 21, and that generated by
our method was equal to 26.
Fig. 47.1 Points in Pareto set generated by the standard weighted sum method and by the method
of adjustable weights
556 I. Radziukyniene, A. Zilinskas
Three methods: Fast Pareto genetic algorithm (FastPGA) [2], Multi-Objective Cel-
lular genetic algorithm (MOCeLL) [3], and Archive-based hybrid Scatter Search
algorithm [4] were proposed during the last 2 years. Their efficiency for vari-
ous problems has been shown in original papers, but their application to portfolio
optimization problem was not yet explored. NSGA-II [5], the state-of-the-art evo-
lutionary method, was chosen following many authors who use it as a standard for
comparisons.
FastPGA. Eskandari and Geiger [2] have proposed framework named fast Pareto
genetic algorithm that incorporates a new fitness assignment and solution ranking
strategy for multi-objective optimization problems where each solution evaluation
is relatively computationally expensive. The new ranking strategy is based on the
classification of solution into two different categories according to dominance. The
fitness of non-dominated solutions in the first rank is calculated by comparing each
non-dominated solution with one another and assigning a fitness value computed
using crowding distance. Each dominated solution in the second rank is assigned
a fitness value taking into account the number of both dominating and dominated
solutions. New search operators are introduced to improve the proposed method’s
convergence behavior and to reduce the required computational effort. A population
regulation operator is introduced to dynamically adapt the population size as needed
up to a user-specified maximum population size, which is the size of the set of non-
dominated solutions. FastPGA is capable of saving a significant number of solution
evaluations early in the search and utilizes exploitation in a more efficient manner
at later generations.
Characteristics of FastPGA: the regulation operator employed in FastPGA
improves its performance for fast convergence; proximity to the Pareto optimal set,
and solution diversity maintenance.
MOCeLL. Nebro et al. [3] presented MOCeLL, a multi-objective method based
on cellular model of GAs, where the concept of small neighborhood is intensively
used, i.e., population member may only interact with its nearby neighbors in the
breeding loop. MOCell uses an external archive to store the non-dominated solutions
found during the execution of the method, however, the main feature characterizing
MOCell is that a number of solutions are moved back into the population from the
archive, replacing randomly selected existing population members. This is carried
out with the hope of taking advantage of the search experience in order to find a
Pareto set with good convergence and spread.
MOCell starts by creating an empty Pareto set. The Pareto set is just an additional
population (the external archive) composed of a number of the non-dominated solu-
tions found. Population members are arranged in a two-dimensional toroidal grid,
and the genetic operators are successively applied to them until the termination
condition is met. Hence, for each population member, the method consists of
selecting two parents from its neighbourhood for producing an offspring.
An offspring is obtained applying operators of recombination and mutation. After
evaluating of the offspring as the new population member it is inserted in both the
47 Approximation of Pareto Set in Multi Objective Portfolio Optimization 557
auxiliary population (if it is not dominated by the current population member) and
the Pareto set. Finally, after each generation, the old population is replaced by the
auxiliary one, and a feedback procedure is invoked to replace a fixed number of ran-
domly chosen population members of the population by solutions from the archive.
In order to manage the insertion of solutions in the Pareto set with the goal to
obtain a diverse set, a density estimator based on the crowding distance has been
used. This measure is also used to remove solutions from the archive when this
becomes full.
Characteristics of MOCeLL: the method uses an external archive to store the
non-dominated population members found during the search; the most salient fea-
ture of MOCeLL with respect to the other cellular approaches for multi-objective
optimization is the feedback of members from archive to population.
AbYSS. This method was introduced by Nebro et al. [4]. It is based on the scat-
ter search using a small population, known as the reference set, whose population
members are combined to construct new solutions. Furthermore, these new popula-
tion members can be improved by applying a local search method. For local search
the authors proposed to use a simple (1 + 1) Evolution Strategy which is based on
a mutation operator and a Pareto dominance test. The reference set is initialized
from an initial population composed of disperse solutions, and it is updated by tak-
ing into account the solutions resulting from the local search improvement. AbYSS
combines ideas of three state-of-the-art evolutionary methods for multi criteria opti-
mization. On the one hand, an external archive is used to store the non-dominated
solutions found during the search, following the scheme applied by PAES [5], but
using the crowding distance of NSGA-II [6] as a niching measure instead of the
adaptive grid used by PAES; on the other hand, the selection of solutions from
the initial set to build the reference set applies the density estimation used by
SPEA2 [4].
Characteristics of AbYSS: it uses an external archive to store the non-dominated
population members found during the search; salient features of AbYSS are the feed-
back of population members from the archive to the initial set in the restart phase of
the scatter search, as well as the combination of two different density estimators in
different parts of the search.
NSGA-II. The evolutionary method for multi-criteria optimization NGSA-II
contains three main operators: a non-dominated sorting, density estimation, and
a crowded comparison [6]. Starting from a random population the mentioned
operators govern evolution whose aim is uniform covering of Pareto set.
Non-dominated sorting maintains a population of non dominated members: if
a descendant is dominated, it immediately dies, otherwise it becomes a member
of population; all members of parent generation who are dominated by descen-
dants die. The density at the particular point is measured as the average distance
between the considered point and two points representing the neighbour (left and
right) population members. The crowded comparison operator defines selection for
crossover oriented to increase the spread of current approximation of Pareto front.
Population members are ranked taking into account “seniority” (generation number)
and local crowding distance.
558 I. Radziukyniene, A. Zilinskas
The worst-case complexity of NSGA-II algorithm is O mN 2 , where N is the
population size and m is the number of objectives [6].
Characteristics of NSGA-II: this method is of the lower computational complexity
than that of its predecessor NSGA; elitism is maintained, no sharing parameter
needs to be chosen because sharing is replaced by crowded-comparison to reduce
computations.
Table 47.1 Performance metrics for two objective problem; the performance measures for AdjW
were GD D 0:0, IGD D 2:97e-5, HV D 0:886; F denotes the number of function evaluations
by the evolutionary methods
Method GD IGD HV
Avg. Std. Avg. Std. Avg. Std.
F D 15; 000
AbYSS 5.47e-4 4.09e-4 1.86e-3 2.14e-3 8.55e-1 3.31e-2
FastPGA 4.12e-4 2.81e-4 2.28e-3 1.64e-3 8.56e-1 2.38e-2
MOCeLL 3.10e-4 2.31e-4 1.23e-3 1.65e-3 8.69e-1 2.12e-2
NSGAII 4.17e-4 2.75e-4 1.40e-3 1.24e-3 8.69e-1 1.42e-2
F D 25; 000
AbYSS 2.06e-4 5.95e-5 1.60e-4 5.59e-4 8.82e-1 5.50e-3
FastPGA 2.26e-4 7.04e-5 4.88e-4 9.77e-4 8.79e-1 1.26e-3
MOCeLL 9.29e-5 1.88e-5 1.11e-4 2.92e-4 8.84e-1 1.90e-3
NSGAII 2.36e-4 2.87e-5 9.90e-5 1.28e-5 8.82e-1 3.00e-4
F D 35; 000
AbYSS 1.63e-4 3.57e-5 7.20e-5 2.40e-6 8.83e-1 3.20e-4
FastPGA 2.11e-4 2.72e-5 1.04e-4 1.15e-4 8.83e-1 4.70e-4
MOCeLL 6.34e-5 8.90e-6 6.70e-5 9.00e-7 8.84e-1 8.50e-5
NSGAII 2.41e-4 3.02e-5 9.70e-4 4.70e-6 8.82e-1 1.80e-4
Fig. 47.2 Pareto sets of AbYSS, FastPGA, MOCeLL, NSGAII, and AdjW
a visualization can not be included in a paper we will present only heuristically obvi-
ous conclusion that the quality of approximation of Pareto set by adjustable weights
method is better than that by evolutionary methods. However the latter are less
computing intensive. Therefore development of a hybrid method seems promising.
47 Approximation of Pareto Set in Multi Objective Portfolio Optimization 561
Table 47.2 Performance metrics for two objective problem; the performance measures for AdjW
were GD D 0:0, IGD D 1:48e-4, HV D 0:735; F denotes the number of function evaluations
by the evolutionary methods
Method GD IGD HV
Avg. Std. Avg. Std. Avg. Std.
F D 25; 000
AbYSS 1.44e-3 4.68e-4 2.16e-4 1.16e-4 7.15e-1 6.30e-3
FastPGA 1.42e-3 3.85e-4 2.08e-4 1.90e-5 7.16e-1 2.10e-3
MOCeLL 1.16e-3 3.62e-4 2.12e-4 1.90e-5 7.18e-1 1.30e-3
NSGAII 1.33e-3 3.02e-4 2.10e-4 2.20e-5 7.15e-1 1.90e-3
F D 50,000
AbYSS 1.22e-3 4.04e-4 2.13e-4 1.90e-5 7.16e-1 1.50e-3
FastPGA 1.24e-3 3.59e-4 2.06e-4 1.50e-5 7.18e-1 1.40e-3
MOCeLL 1.14e-3 2.77e-4 2.12e-4 1.80e-5 7.19e-1 1.20e-3
NSGAII 1.65e-3 5.05e-4 2.12e-4 2.20e-5 7.16e-1 1.50e-3
47.7 Conclusions
From the results of the experiments for two criteria portfolio optimization it follows
that MOCeLL is the best of four considered evolutionary methods with respect to
all three performance criteria. The results of the experiments with these methods for
three criteria portfolio optimization reveal that MOCeLL provides the best results
in terms of Hypervolume, and Generational distance, but is slightly outperformed
by FastPga with respect to the Inverted generational distance. The evaluated perfor-
mance criteria of evolutionary methods are only slightly worse than those of method
of adjustable weights who is advantageous in the considered cases. Summarizing
the results it seems promising to develop a hybrid method trying to combine the
advantages of evolutionary methods with those of the method of adjustable weights.
Acknowledgements The second author acknowledges the support by the Lithuanian State Sci-
ence and Studies Foundation.
References
6. Xia, Y., Wang, S., Deng, X.: Theory and methodology: a compromise solution to mutual
funds portfolio selection with transaction costs. European Journal of Operation Research, 134,
564–581 (2001).
7. Mukerjee, A., Biswas, R., Deb, K., Mathur, A. P.: Multi-objective evolutionary algorithm for
the risk-return trade-off in bank loan management. International Transactions in Operational
Research, 9, 583–597 (2002).
8. Stummer, C., Sun, M.: New Multiobjective Metaheuristic Solution Procedures for Capital
Investment Planning. Journal of Heuristics, 11, 183–199 (2005).
9. Ehrgott, M., Waters, C., Gasimov, R. N., Ustun, O.: Multiobjective Programming
and Multiattribute Utility Functions in Portfolio Optimization. 2006. Available via
http://www.esc.auckland.ac.nz/research/tech/esc-tr-639.pdf, cited 20 June 2008.
10. Mukerjee, A., Biswas, R., Deb, K., Mathur, A. P.: Multi-objective evolutionary algorithm for
the risk-return trade-off in bank loan management. International Transactions in Operational
Research, 9, 583–597 (2002).
11. Steuer, R. E., Qi, Y., Hirschberger, M.: Portfolio Selection in the Presence of Multiple Criteria.
In Zopounidis, C., Doumpos, M., Pardalos, P. M. (eds.) Handbook of Financial Engineering,
Springer, New York (2008).
12. Clausen, J., Zilinskas, A.: Global Optimization by Means of Branch and Bound with simplex
Based Covering, Computers and Mathematics with Applications, 44, 943–955 (2002).
13. Zilinskas, A., Zilinskas, J.: Global Optimization Based on a Statistical Model and Simplicial
Partitioning, Computers and Mathematics with Applications, 44, 957–967 (2002).
14. Nebro, A. J., Durillo, J. J., Luna, F., Dorronsoro, B., Alba, E.: A Cellular Genetic Algorithm
for Multiobjective Optimization. Proceedings of NICSO 2006, pp. 25–36.
15. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective Genetic
Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6, No. 2, 182–197
(2002).
16. Zizler, E., Deb, K., Thiele, L.: Comparison of Multi Objective Evolutionary Algorithms:
Empirical Results, Evolutionary Computation, 8, No. 2, 173–195 (2000).
17. Eskandari, H., Geiger, C. D.: A Fast Pareto Genetic Algorithm Approach for Solving
Expensive Multiobjective Optimization Problems, Journal of Heuristics, 14, No. 3, 203–241
(2008).
18. Nebro, A. J., Luna, F., Alba, E., Beham, A., Dorronsoro, B.: AbYSS Adapting Scatter Search
for Multiobjective Optimization. Technical Report ITI-2006–2, Departamento de Lenguajes y
Ciencias de la Computación, University of Malaga (2006).
Chapter 48
The Scaling Approach for Credit Limit
Management and Its Application
Abstract When bank decides to provide the loan to a client, decision has to be made
on the credit amount and the term. Some studies looked into possibility of taking
into consideration the probability of default while choosing the optimal credit pol-
icy by introducing synthetic coefficient or VaR technology. This paper suggests the
scaling methodology for calculation of coefficients, which are required for optimal
credit limit estimation. Scaling rationale is based on the goal-oriented parameters.
The method is built on comparison of two competing forces: (i) potential increase
of the limit caused by client’s positive characteristics such as income, good credit
history, ownership of movable and immovable properties etc. and (ii) potential limit
decrease, which is a result of the probability of default calculated from the client
score. Such model can be related to the quantitative cause-effect model type, which
is focused on the final result without consideration of the process’ dynamics.
48.1 Introduction
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 563
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
564 M.Y. Konovalikhhin, et al.
creditworthy customers and at the same time to control the losses. Aggressive mar-
keting efforts have resulted in deeper penetration of the risk pool amongst potential
customers. The need to process them rapidly and effectively has initiated growing
automation of the credit and insurance application and adjudication processes.
Banks need rapid credit decision making process in order to maintain high bor-
rowing power, but on the other hand it should not result in deterioration of portfolio
quality. The risk manager is now challenged to provide solutions that not only assess
the creditworthiness, but also keep the per-unit processing cost low, while reducing
turnaround time for the customers. In addition, customer service quality requires
this automated process to be able to minimize the denial of credit to creditworthy
customers, while keeping out as many potentially delinquent customers as possible.
In the recent years a particular attention has been paid to risk scoring, which
along with other predictive models is a tool evaluating the risk level associated with
applicants or customers. While it does not identify “good” (no negative behavior
expected) or “bad” (negative behavior expected) applications on individual basis, it
provides the statistical odds, or probability, that an applicant with any given score
will be either “good” or “bad”. These probabilities or scores along with other busi-
ness considerations, such as expected approval rates, profits and losses are used as
a basis for decision making.
When bank decides to provide the loan to a client, decision has to be made on
the credit amount and the term. Credit limit is the approved level of a loan amount,
which theoretically must represent client’s financial rating. In other words, correct
calculation of the optimal credit limit is a powerful tool for minimizing the credit
losses.
At the same time it should be noted that most of the existing methodologies
for credit limit calculation do not take into consideration the explicit assessment of
the probability of default. Traditional formula for credit limit calculation does not
deal with the credit risk. It is postulated that the limit, calculated by this formula,
provides zero credit risk, or, in other words, the probability of default reduces to
zero [3].
Some studies looked into possibility of taking into consideration the probability
of default while choosing the optimal credit policy by introducing synthetic coef-
ficient [4] or VaR technology [5]. These studies are based on the estimation of a
hypothecation value for inter-bank credits, which is not always applicable to retail
business.
This paper suggests the scaling methodology for calculation of coefficients,
which are required for optimal credit limit estimation. While developing such
scaling methodology, three principles have been considered:
First, scaling rationale is based on the goal-oriented parameters. In this case,
scaling parameters are determined from credit limit quality perspective.
Second, while seeking the simplicity of the scaling equation, it is required
to reduce the (potential) effects of characteristics, involved in calculations, on
component phenomena.
Third, equations and correlations that were used in development of the scaling
methodology are transparent and can be separately validated.
48 The Scaling Approach for Credit Limit Management and Its Application 565
The method is built on comparison of two competing forces: (i) potential increase
of the limit caused by client’s positive characteristics such as income, good credit
history, ownership of movable and immovable properties etc. and (ii) potential limit
decrease, which is a result of the probability of default calculated from the client
score. Such model can be related to the quantitative cause-effect model type, which
is focused on the final result without consideration of the process’ dynamics.
48.2 Analysis
BP D D B2 OL; (48.2)
SD D BP B1 (48.3)
where T is the loan term and R is the bank interest rate (%)/100.
The weakness of this method is in the fact, that coefficients B1, B2 and B3 are
defined by the expert judgment. Calculation of the limits, which is based on the
expert judgement may result in ungrounded over- or understatement of credit limits.
In the case of understatement of the credit limits the bank cannot use all the credit
resources, that in turn leads to reduced profit. One of the main consequences of
credit limit overstatement is the increase in credit risks, which leads to additional
losses.
566 M.Y. Konovalikhhin, et al.
On the other hand, applying precise methods to the credit limit calculation allows
to reach full bank credit potential and to maximize bank’s profit. Also, one should
take into account client’s rating (scoring) in order to minimize the risk of default.
UCI
Fig. 48.1 Dependency of average DCICUCI
value on the score of the “good” clients
UCI
Fig. 48.2 Dependency of average DCICUCI
value on the score of the “bad” clients
568 M.Y. Konovalikhhin, et al.
Then two alternatives were examined, when Eq. (48.1) applied undocumented
income .UCI/appl is less than .UCI/calc and Eq. (48.2) when it is greater than
.UCI/calc . In the first case the whole UCI amount can be taken into account and be
employed for client income calculation or, in other words, B3 D 1 in Eq. (48.1). In
the second case the UCI value can be calculated as a sum of two items: .UCI/calc
and some part of the value Œ.UCI/appl .UCI/calc W
During the next step the coefficient C3 , which demonstrates the optimal degree
of confidence in the client, should be estimated. Figure 48.3 shows the difference
between real client undocumented income UCI and calculated values .UCI/calc . One
of the possible solutions is to use the cause-effect relations, which have been suc-
cessfully applied to quantitative analysis of various technical problems [6]. Such
equations provide solution to the problem without describing process’ dynamics.
The result in this case is the ratio between positive and negative characteristics,
that are represented by client’s financial condition, such as movable and immovable
properties, etc. in the numerator and the default probability P .X / in the denomi-
nator:
P
Ai Zi
i
C3 D C3,corr ; (48.8)
P .X /
48 The Scaling Approach for Credit Limit Management and Its Application 569
where C3, corr is the coefficient, which correlates Eq. (48.8) to statistical data, Zi
parameters of client financial condition, Ai – regressive coefficients, that are cal-
culated by the Interactive Grouping methodology. In order to estimate C3 the
following parameters were used: cost of movable properties (car, yacht etc.), country
house, apartment and business shareholding. Results of the calculation are presented
in Fig. 48.4.
E0
B2 D C2 ; corr ; (48.9)
P .X /
where X
E0 D Di Ei ; (48.10)
i
570 M.Y. Konovalikhhin, et al.
Based on the described approach the credit limits for all clients from the samples
were recalculated. The results are summarized in Table 48.1.
From the analysis results the following observations can be made:
Total increase of the sample portfolio was C14:4% while maintaining the initial
risk level. Calculations were performed in relative values.
Number of clients, that had undocumented income (UCI) in the low risk region
..UCI /calc /, was 32.3% of the total number of clients, that had undocumented
income.
Table 48.1 shows that the maximum average increase was obtained in the low
score range. This can be explained by the fact that clients in the range were
undervalued.
It should be noted that all B coefficients can be adjusted by the corresponding
coefficients C , that depend on the product statistical data.
572 M.Y. Konovalikhhin, et al.
Fig. 48.7 The average credit limit dynamic of one product (the results are calculated at the end of
each month)
Fig. 48.8 The relative delinquency (from 1 to 30 days) level dynamic of one product (the results
are calculated at the end of each month)
48 The Scaling Approach for Credit Limit Management and Its Application 573
Table 48.1 The credit limits for all clients from the samples
The score, X X < 0:6 0:6 X < 0:8 0:8 X 1
Because of the normal score distribution most clients have scores in the range
0:6 X < 0:8. This explains the fact that the average changes in this range and
in the total portfolio are very close.
The model enabled to substantially increase (Fig. 48.7) the credit portfolio (about
C17%) while maintaining or improvement (Fig. 48.8) its quality.
48.4 Conclusions
This paper presented the scaling model that can be applied to the calculation of the
optimal credit limit. Particular attention has been paid to this problem due to grow-
ing competition between the banks in the retail market, that leads to the increase
in credit risk. Correct credit limit calculation plays one of the key roles in risk
reduction.
The model described in the paper is based on comparison of two opposite client
characteristics, one leading to potential credit limit increase, and the second leading
to decrease of the credit limit. Credit limit calculation involved client ’s personal
data, which allows to approach each client on the individual basis during credit
amount allocation. The scaling method was applied in order to analyze the data
obtained from the real client database. The scaling ratio provides reasonable predic-
tive capability from the risk point of view and therefore has been proposed to serve
as a model for credit limit allocation. The model’s flexibility allows coefficients’
adjustments according to new statistical data. Although the present work is quite
preliminary, it does indicate that presented solution allows to substantially increase
the credit portfolio while maintaining its quality.
References
5. I.V. Voloshin, “Time Factor Consideration in the Credit Limit Calculation with the VaR
Technology”, Thesis report, Bank analyst club, http://www.bankclub.ru/library.htm
6. M.Y. Konovalikhin, T.N. Dinh, R.R. Nourgaliev, B.R. Sehgal, M. Fischer, “The Scaling
Model of Core Melt Spreading: Validation, Refinement and Reactor Applications”, Organi-
zation of Economical Cooperation and Development (OECD) Workshop on Ex-Vessel Debris
Coolability, Karlsruhe, Germany, 15–18 November 1999.
Chapter 49
Expected Tail Loss Efficient Frontiers for CDOS
of Bespoke Portfolios Under One-Factor Copula
Marginal Distributions
Abstract The global structured credit landscape has been irrevocably changed with
the innovation of Collateralized Debt Obligations (abbreviated as CDOs). As of
2006, the volume of synthetic CDO structures outstanding grew to over $1.9 trillion.
Understanding the risk/return trade-off dynamics underlying the bespoke collateral
portfolios is crucial when optimising the utility provided by these instruments. In
this paper, we study the behaviour of the efficient frontier generated for a collat-
eral portfolio under heavy-tailed distribution assumptions. The convex and coherent
credit risk measures, ETL and Copula Marginal ETL (abbreviated as CMETL), are
used as our portfolio optimisation criterion. iTraxx Europe IG S5 index constituents
are used as an illustrative example.
49.1 Introduction
The global structured credit landscape has been irrevocably changed with the inno-
vation of Collateralized Debt Obligations (abbreviated as CDOs). As of 2006, the
volume of synthetic CDO structures outstanding grew to over $1.9 trillion, making it
the fastest growing investment vehicle in the financial markets. Bespoke deals made
up 21% of the total volume [1]. Understanding the risk/return trade-off dynamics
underlying the bespoke collateral portfolios is crucial when optimising the utility
provided by these instruments.
R. Guo (B)
Department of Statistical Sciences, University of Cape Town, Private Bag, Rhodes’ Gift, Ronde-
bosch 7701, Cape Town, South Africa
E-mail: Renkuan.Guo@uct.ac.za
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 575
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
576 D. Jewan, et al.
The next step is to determine the level of subordination, and tranche thickness
corresponding to the risk appetite of the investor. These deal parameters determine
the degree of leverage and the required premium payments [11]. This step of the
process adds a further two constraints to the transaction optimisation problem. The
first is related to the tranche thickness, and the second is the credit rating assigned to
the tranche. At this level of the problem, from an investor’s perspective, the return
generated by the CDO tranche is maximised.
A typical placement of a bespoke CDO is outlined in the following Fig. 49.1.
The cash flow payments that occur between the investor and the bank is indicated
by the dashed lines.
The challenge of issuing bespoke CDOs is the ongoing need and expense of the
risk management of the tranche position [12].
The use of heavy-tailed distributions for the margins of copula models is becoming
an industry standard for credit risk applications.
The Stable Paretian distribution can be expressed by its characteristic function, '.t/,
is given by [13]:
8
< jtj˛ 1 iˇsign.t/tan 2˛ C it; if ˛ ¤ 1
In .t/ D (49.1)
:
jtj 1 iˇsign.t/ 2 int C it; if ˛ D 1
578 D. Jewan, et al.
49.3.2 Copulas
Copula functions are useful tools in the modelling of the dependence between
random variables.
Definition 49.1. A copula is defined as a multivariate cumulative distribution func-
tion (abbreviated cdf) with uniform marginal distributions:
where,
C.ui / D ui for all sP D 1; 2; : : : ; n
The ideas behind multivariate distribution theory carries through to copula func-
tions. The above definition indicates that copulas represent the joint cdf of uniform
marginal distributions. The probability integral transformation of continuous non-
uniform random variables allows copulas to define the joint multivariate distribution
H with margins F1 ; F2 ; : : : ; ; Fn . By the strict monotocity of the cdf,
H.F1 ; F2 ; : : : ; Fk / D Pr.X1 !1 ; : : : ; Xk !k /
(49.2)
D C.F1 .!1 /; F2 .!2 /; : : : ; Fk .!k //:
Recently, growing literature has been devoted to an axiomatic theory of risk mea-
sures [8], in which authors presented an axiomatic foundation of coherent risk
measures to quantify and compare uncertain future cashflows. [17] extended the
notion of coherent risk measures to convex risk measures.
49 Expected Tail Loss Efficient Frontiers for CDOS 579
Let X denote the set of random variables defined on the probability space (; =; P ).
We define as a set of finitely many possible scenarios for the portfolio. Financial
risks are represented by a convex cone M .; =; P / of random variables. Any
random variable X in this set, will be interpreted as a possible loss of some credit
portfolio over a given time horizon. The following provides a definition of a convex
cone.
Definition 49.3. Given some convex cone, M of random variables, a measure of risk
‚ with domain M is a mapping: ‚ W M ! R.
49.4.2 Expected-Tail-Loss
In the CDO portfolio optimisation problem, we represent the portfolio risk objective
function by the Expected-Tail-Loss measure.
Definition 49.6. Expected-Tail-Loss is defined by the following:
Portfolio optimisation with ETL as object function will result in a smooth, convex
problem with a unique solution [7].
Definition 49.7. The CMETL investor’s optimal portfolio is then given by the
following:
˚
X0 . j ˇ/ D arg min CMETLp .ˇ/ ;
q Q
s; t:
X
x D 1;
` x u; 8 r;
CMERp : (49.3)
Where X denotes the resulting portfolio’s weights. The subscript c indicates that
the problem is defined under the copula framework.
The constraints, in the order written above, require that (a) the sum of the port-
folio weights should equal to one; (b) the position weights should lie within the
trading limits l and u to avoid unrealistic long and short positions; (c) the expected
return on the portfolio in the absence of rating transitions should be at least equal to
some investor defined level .
Together, these optimal risk/return trade-offs define the efficient frontier.
49 Expected Tail Loss Efficient Frontiers for CDOS 581
where Vik is the value of exposure i at the given horizon in scenario k. In particular:
(
Ni ; Ri ; default statc;
Vi D
k
(49.5)
Ni otherwise;
X
n
Lk .x/ D Lki xi ; (49.6)
i D1
where M and "i are independent standard normal variants in the case of a Gaussian
copula model, and cov."k ; "l / ¤ 0, for all k ¤ l.
The default dependence comes from the factor M . Unconditionally, the stochas-
tic processes are correlated but conditionally they are independent. The default
probability of an entity i , denoted by Fi , can be observed from market prices of
credit default swaps, and defined by the following:
2 t 3
Z
Fi .t/ D 1 exp 4 hi .u/du5 ; (49.8)
0
where hi .u/ represents the hazard rate for reference entity i . For simplicity, we
assume that the CDS spread term structure is flat, and calibration of the hazard rates
for all reference entities is straightforward.
582 D. Jewan, et al.
The default barrier Bi .t/ is defined as: Bi .t/ D G 1 .Fi .t//, where G defines
the inverse distribution function. In the case of a Gaussian copula, this would be the
inverse cumulative Gaussian distribution function.
A second type of copula model considered comes from the Archimedean family.
In this family we consider the Clayton copula.
1
In."i / In."i /
Ai WD ' 1 D C1 ; (49.9)
M M
where ./ is the Laplace transform of the Gamma.1=/ distribution, "i is a uniform
random variable and M is a Gamma.1=/ distributed random variable.
Using the credit models presented above, the loss distribution for the test portfo-
lio described in [18] can easily be derived. This is shown in Fig. 49.2. Reference [18]
also provides empirical evidence for the stationarity of credit spreads. Stationarity
is a crucial issue, since credit risk models implicitly assume that spreads follow
stationary processes.
The following Table 49.1 displays the parameters of the two models for few of
the iTraxx Europe IG index constituents.
The coefficients in the regression analysis for all constituents in the bespoke test
portfolio were significant at a 99% confidence level.
The following Table 49.2 summarizes the expected loss, unexpected loss, and
CMETL at the 99.9% confidence level.
Although the Gaussian copula predicts a slightly higher expected loss than the
Clayton copula, there is a 1.758% absolute difference in the CMETL, and a 3%
difference in the maximum loss between the copula models.
We optimise all positions and solve the linear programming problem represented
by Eq. (49.3). Three scenarios are considered:
12%
11%
10%
9%
8%
Probabilty
7%
6%
5%
4%
3% Gaussian Copula
2%
Clayton Copula
1%
0%
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
Portfolio Loss %
Fig. 49.2 Credit loss distributions for homogenous test portfolio at 5-year horizon under different
default dependence assumptions
49 Expected Tail Loss Efficient Frontiers for CDOS 583
(Gaussian) (Clayton)
BAE Systems PLC 0.249616 0.043221
Unilever NV 0.239667 0.035097
Continental AG 0.264289 0.062691
Peugeot SA 0.243152 0.041494
Commerzbank AG 0.220350 0.025312
Table 49.2 Risk measure differences for initial homogenous test portfolio under both Gaussian &
Clayton copulas assumptions
Risk measure Gaussian copula Clayton copula
Expected loss 3.7248% 3.6952%
Unexpected loss @ 99.9% 20.349% 20.328%
CMETL @ 99.9% 25.534% 27.292%
Maximum loss 30.447% 33.434%
200
180
160
Excess Return (bps)
140
120
100
80
60
40
20
0
0% 5% 10% 15% 20% 25% 30%
Fig. 49.3 Comparison of efficient frontiers under Gaussian and Clayton copula assumptions
200
180
Excess Return (bps)
160
140
120
100
80
60
40
20
0
5% 10% 15% 20% 25% 30%
CMETL (% Losses in Portfolio)
10% Upper Bound 15% Upper Bound 100% Upper Bound
Poly. (100% Upper Bound) Poly. (15% Upper Bound) Poly. (10% Upper Bound)
Fig. 49.4 Behaviour of second order polynomial fit of efficient frontiers, under increasing upper
trading limit
200
180
Excess Return (bps)
160
140
120
100
80
60
40
20
0
0% 5% 10% 15% 20% 25% 30%
The figure displays the expected variation in efficient frontiers when the upper
trading limit is slowly relaxed. Under these circumstances, concentration risk is
introduced into the portfolio. Investors demand a higher premium for taking on this
risk. For a 20% level of risk, investors demand an extra 25 bps premium for holding
49 Expected Tail Loss Efficient Frontiers for CDOS 585
a more concentrated portfolio. At these levels, the concentrated portfolio only holds
positions in 49 of the 114 names, with the largest position size being 19.5%.
In the final case, the effect on the efficient frontier for allowing short positions is
examined. Under this scenario, only the well-diversified portfolio case is considered.
The lower and upper trading limits are set at 5% and 5% respectively.
The figure displays an important result: allowing short positions in credits, pro-
vides the investor with superior returns to those in the long-only case. At a 20%
level of risk, investors can earn an extra 30 bps premium for taking on overweight
views on certain credits in the portfolio. This indicates why hedge fund strategies
involving bespoke CDO structures have become increasingly popular.
The results for all efficient frontier second order polynomial regressions were
significant at a 95% confidence level. The resulting R2 coefficients were all above
90%.
In this paper, we propose a new framework, the CMETL model, for the investi-
gations on CDO market patterns and the efficient frontier. The Gaussian copula
asset allocation is proved sub-optimal. Clayton copula efficient frontiers provided a
higher return for a given level of credit risk. A closer examination of the original risk
profile shows that the credit risk can be reduced to one-fifth of the original amount,
under the Clayton asset allocation method.
The permission of short positions in the bespoke CDO portfolio allows investors
to increase returns beyond a specific risk tolerance level. In the case study con-
sidered, a maximum increase of 37.5% in investor-defined return is achieved by
allowing overweight positions in certain credits.
References
1. E. Beinstein, “A review of themes from 2006 and outlook for 2007,” JP Morgan, in Corporate
Quantitative Research, 2007.
2. C. Finger, “Conditional approaches for credit metrics portfolio distributions,” Credit Metrics
Monitor, 2(1), 1999, 14–33.
3. P. Schonbucher, “Taken to the limit: Simple and not so simple loan loss distributions,” Working
paper, Department of Statistics, Bonn University.
4. O. Vasicek, “Probability of loss on a loan portfolio,” Technical report, KMV Corporation,
1987, Available: www.moodys.com.
5. J. P. Laurent, & J. Gregory, “Basket Default Swaps, CDO’s and Factor Copulas,” Working
paper, 2002.
6. H. Mausser, & D. Rosen, “Efficient risk/return frontiers for credit risk,” in Algo Research
Quarterly, 2(4), 1999, 9–22.
7. R. T. Rockafellar, & S. Uryasev, “Optimisation of conditional value-at-risk,” in Journal of
Risk, 3, 2000, 21–41.
586 D. Jewan, et al.
50.1 Introduction
P. K. Bala
Xavier Institute of Management, Bhubaneswar, PIN-751013, India
E-mail: p k bala@rediffmail.com
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 587
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
588 P.K. Bala
In most of the real situations, we do not find a one-item inventory, but multi-
item inventory with different periods of replenishment. These inventories are usually
managed in aggregate because of the complexity of handling each individual item.
Intuition and commonsense rules with the aid of historical data have been useful
to some extent, but they are not often efficient or cost effective. Some examples of
the multi-item inventory are retail sale store, spare parts for maintenance, medicine
store etc.
Out of thousands of items held in an inventory of a typical organization, only a
small percentage of them deserve management’s closest attention and tightest con-
trol. It is not economical to exercise same degree of inventory control on all the
items. Using selective inventory control, varying degree of control are exercised on
different items. Most widely-used technique used to classify items for the purpose
of selective inventory control is ABC classification. Inventory replenishment policy
deals with ‘how-much-to-order’ and ‘when-to-order’ aspects. Purchase pattern is an
important consumer insight. It is obvious that the knowledge of purchase pattern
will be an important input for designing inventory replenishment policy.
Data mining is used to find new, hidden or unexpected patterns from a very large
volume of historical data, typically stored in a data warehouse. Knowledge or insight
discovered using data mining helps in more effective individual and group deci-
sion making. Irrespective of the specific technique, data mining methods may be
classified by the function they perform or by their class of application. Using this
approach, some major categories of data mining tools, techniques and methods can
be identified as given below.
(i) Association Rule: Association rule is a type of data mining that correlates
one set of items or events with another set of items or events. Strength of
association rule is measured in the framework of support, confidence and
lift [1].
(ii) Classification: Classification techniques include mining processes intended to
discover rules that define whether an item or event belongs to a particular pre-
defined subset or class of data. This category of techniques is probably the most
broadly applicable to different types of business problems. Methods based on
‘Decision Tree’ and ‘Neural Network’ are used for classification.
(iii) Clustering: In some cases, it is difficult or impossible to define the parame-
ters of a class of data to be analyzed. When parameters are elusive, clustering
methods can be used to create partitions so that all members of each set are sim-
ilar according to a specified set of metrics. Various algorithms like k-means,
CLARA, CLARANS are used for clustering. Kohonen’s map is also used for
clustering.
(iv) Sequence Mining: Sequencing pattern analysis or time series analysis methods
relate events in time, such as prediction of interest rate fluctuations or stock
performance, based on a series of preceding events. Through this analysis, var-
ious hidden trends, often highly predictive of future events, can be discovered.
GSP algorithm is used to mine sequence rules [2].
(v) Summarization: Summarization describes a set of data in compact form.
50 Data Mining for Retail Inventory Management 589
(vi) Regression: Regression techniques are used to predict a continuous value. The
regression can be linear or non-linear with one predictor variable or more than
one predictor variables, known as ‘multiple regression’.
Numerous techniques are available to assist in mining the data, along with numer-
ous technologies for building the mining models. Data mining is used by business
intelligence organizations to learn consumer behavior for the purpose of customer
relationship management, fraud detection, etc.
Inventory items have been classified or clustered in groups for the purpose of joint
replenishment policy in [3–6]. Generic inventory stock control policies are derived
using multi-item classification [3]. Clustering of items has also been done in produc-
tion and inventory systems [4]. Multi-criteria inventory classification has been done
by parameter optimization using genetic algorithm [5]. Artificial neural networks
(ANNs) have been used for ABC classification of stock keeping units (SKUs). ANN
has been used in a pharmaceutical company [6].
The interacting items showing the mutual increase in the demand of one com-
modity due to the presence of the other has been considered in the model given
in [7]. Correlation between the demands of two items has been considered. The arti-
cle gives a new approach towards a two-item inventory model for deteriorating items
with a linear stock-dependent demand rate. It has also assumed a linear demand rate,
that is, more is the inventory, more is the demand. The model has made an attempt
to capture demand interdependency to some extent.
Market segmentation has been done using clustering through neural network
and genetic algorithm [8]. Clustering of customers on the basis of their demand
attributes, rather than the static geographic property has been done in [9]. However,
the findings have not been used for the purpose of inventory management.
As part of the customer relationship management (CRM) strategy, many resear-
chers have been analyzing ‘why’ customers decide to switch. However, despite its
practical relevance, few studies have investigated how companies can react to defec-
tion prone customers by offering the right set of products [10]. Consumer insight
has been captured in the work, but not used for inventory replenishment. For cross-
selling consideration, a method to select inventory items from the association rules
has been developed in [11] which gives a methodology to choose a subset of items
which can give the maximal profit with the consideration of cross-selling effect.
However, this does not talk about any inventory replenishment policy.
In the present research paper, a data mining model has been proposed which can
be used for multi-item inventory management in retail sale stores. The model has
been illustrated with an example database.
590 P.K. Bala
In the context of data mining, various aspects of purchase dependencies which may
be useful for the purpose of retail inventory management have been discussed as
given below.
Demand Interdependency: The problem of multi-item inventory is more challeng-
ing when there is association in the demand or usage pattern amongst the items or
item-sets (Item-set refers to a set of one or more items, hence if we say item-set, it
may refer to one item or a number of items.). The correlation in the demand amongst
the items can be one to one, one to many, many to one or many to many. In many
situations, a customer buys an item or item-set only when another item or item-set
is also in stock.
To explain the above situation further, say, in a store, item B is in stock and item
A is out of stock. One customer is interested in purchasing B, provided A is also
available in the store, so that he can purchase both A and B. As the demand for
B depends on demand for A, he will not purchase B, if A is not available. Under
this situation, we can say that stock of B is as good as a stock-out situation for that
customer. Hence, if A is not in stock, there will be no sale of B also in many cases.
This example depicts a case of interdependency in demand. Interdependency in the
demand of two items can be captured in various ways and can be used for inventory
management. Association rules can be used to mine demand interdependencies.
Customer Profile and Demand Pattern: Customer profile is the detailed features of
a customer. The profile may contain income, age, marital status, gender, education,
number of cars, family size etc. The profile may also contain frequency of shopping,
credit rating, loyalty index etc. In retail sale, demand pattern depends a lot on the
customer profile along with the other factors. Hence, customer profile is an impor-
tant parameter which may be used for learning the purchase pattern of a customer
and this may be useful for inventory modeling. Classification and clustering can
be used for learning the impact of customer profile on demand pattern. Customer
profile will be a useful input for forecasting the demand of the items.
Sequence of Purchase: Many a times, a sequence of purchase gets repeated most of
the times with a time gap between two purchases. A sequence may be of with two
or more events of purchases. In each event of purchase, certain itemset is purchased.
Once a repetitive sequence rule is identified, it can be used as an input for inventory
modeling. Sequence rules are mined using GSP algorithm.
Time-Dependent Purchase Pattern: On different days of the week, there may be
different purchase pattern of the customers. Purchase patterns on the weekdays
and the weekends are generally different. Similarly, in different months or seasons,
different patterns may be observed. Time-dependent purchase pattern may also be
observed at different hours of the day. Purchase pattern in the evening hours hap-
pens to be different from the day hours. Segregated data with respect to time can be
used to learn time-dependent purchase patterns. For example, data of weekend and
weekdays can be segregated for this purpose.
50 Data Mining for Retail Inventory Management 591
The model proposed has been described in Fig. 50.1. In the present research, it has
been proposed to discover purchase patterns using data mining. For this purpose,
sale transaction data of the inventories contained in the ‘Materials’ module can be
Application of
Data
Multi-Item Inventory
Mining
Management
Techniques
Data Warehouse
Data Store Replenishment
Transaction Data Policy
User
DSS
Data
Collection
Other
Modules
Materials
CRM Subjective
Considerations
Eleven (11) items have been considered in a retail sale store with one thousand
(1,000) past sale transaction data along with the profiles of the customers. Eleven
items are named as ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’ and ‘k’. A transaction con-
tains the items purchased by a customer along with a small profile of the customers.
The profile contains ‘customer id’, ‘value’ (as rated by the retailer), ‘pmethod’
(payment method by cheque/cash/card etc.), ‘sex’ (Male/Female expressed in M/F),
‘hometown’ (yes/no), ‘income’ and ‘age’. Customer id is an identification code for
the customer and it finds use for capturing the sequence of purchases made by the
same customer and hence in mining sequence rules. Otherwise, customer id is not
of any use for mining other patterns.
To understand the database, as it is not possible to show 1,000 records of the
database in this paper, five records have been shown in Table 50.1 from which one
can visualize various fields in the database.
50 Data Mining for Retail Inventory Management 593
Table 50.1 Sale transaction data with customer profile showing five transactions (part of the
database with 1,000 rows)
Customer profile Items
Value Payment Sex Hometown Income (in $ Age a b c d e f g h i J k
method (Y/N) thousands)
43 Ch M N 27 46 F T T F F F F F F F T
25 Cs F N 30 28 F T F F F F F F F F T
21 cs M N 13 36 F F F T F T T F F T F
24 cd F N 12 26 F F T F F F F T F F F
19 cd M Y 11 24 F F F F F F F F F F F
In Table 50.1, ‘T’ (under ‘Items’) implies purchase of the corresponding item in
the corresponding transaction and ‘F’ (under ‘Items’) implies that the corresponding
item has not been purchased in the corresponding transaction. Association rules
have been mined from the database of 1,000 transactions with the threshold values
of support and confidence to be 10% and 80% respectively. Association rules mined
have been shown in Table 50.2.
Only three rules have qualified for the chosen threshold values of support and
confidence. In each of these three rules, we observe that same three items (i.e., d, f
and g) are involved.
With respect to the simultaneous purchase of all three items (i.e., d, f and g) in
the same transaction, classification of the customers has been done based on their
profiles. Using data mining, decision tree has been built up from the database for the
purpose of classification which is given in Fig. 50.2. Considering threshold value of
80% confidence for the decision rules, only one rule (at node 3) qualifies. The rule
(with 84.242% confidence) is – IF “income is less than or equal to 16,950 and sex
is F (Female)”, THEN “the customer purchases all three items, i.e., d, f, and g”. The
rule has been observed in 84.242% cases.
The association rules and the decision rule mined from the database can be used
as input for designing inventory replenishment policy. Inventory managers must
consider these rules for placing the orders for replenishment.
594 P.K. Bala
Node.
Category % n
F 85.4 854
T 14.6 146
Total 100 1000
Income
< or = 16950 >16950
Node 4
Node 1
Category % n
Category % n
F 99.2 657
F 58.3 197
T 0.8 5
T 41.7 141
Total 66.2 662
Total 33.8 338
Sex
M F
Node 2 Node 3
Category % n Category % n
F 98.8 171 F 15.8 26
T 1.2 2 T 84.2 139
Total 17.3 173 Total 16.5 165
Based on focussed interview with sales people and customers in various grocery
retail outlets, 45 items were selected for the research. The objective was to select
a small number of frequently purchased items where some purchase dependencies
can be anticipated and the list will also contain items where no purchase dependency
is expected. In this way, for the purpose of research, the small subset of items can
be representative of the superset of items with large number of items in retail store.
Hence, the intention of the interview was to foresee a few purchase dependencies
from the experience of sales people and customers and to include the related items
for the purpose of the research. In this way, 45 items were selected where one may
expect purchase dependency for some items and no purchase dependency for some
other items. The list of 45 items with description is as given in Table 50.3. Names
50 Data Mining for Retail Inventory Management 595
of the items have been written as usually told by the Indian shoppers. The list of
45 items contains 29 food items and 16 miscellaneous grocery items. Miscellaneous
grocery items constitute of cosmetics, detergents, toothpaste and mosquito repellent.
In view of identifying ‘purchase dependencies within a group of items’ and ‘inter-
group purchase dependencies’ the variety of items were chosen.
A data volume of 8,418 sales transactions (at least one of the selected 45 items
is present in each transaction) with quantity of the items was collected from various
grocery retail outlets of different sizes (small and large) in two different cities in
India.
For threshold of 20% support and 65% confidence, 14 association rules were
obtained as given in Table 50.4. In this table, it is observed that rules # 1, 2, 3, 4 and 9
are amongst the items, Noodles – Maggi (200 g), Tomato Sauce (Maggi) (200 g) and
MDH Chicken Masala. Rules 6,7,13 and 14 are between any two items of Noodles
– Maggi (200 g), Tomato Sauce (Maggi) (200 g) and MDH Chicken Masala. This
is like the example illustrated in Section 50.5 which involves three items d, f and g.
The rules which involve these three items have been shown separately in Table 50.5.
50.7 Conclusions
Consumer insight extracted using data mining tools has been used for business deci-
sion making in marketing, shelf design, product management etc. There has been
limited use in inventory management. Purchase dependency as a type of consumer
50 Data Mining for Retail Inventory Management 597
insight which can be applied in inventory modeling has been discussed for the first
time in this paper. The analysis and findings in this research throws some light
on various aspects of purchase dependencies which may be useful for inventory
management in retail stores. The model, illustrative example and the case discus-
sion in this research paper will motivate researchers and inventory managers to
develop methodologies for using the findings of data mining in multi-item inventory
management.
There is scope for further work for identifying the guiding profile of the cus-
tomers which causes in simultaneous purchase of three items, Noodles – Maggi
(200 g), Tomato Sauce (Maggi) (200 g) and MDH Chicken Masala. This knowl-
edge will of tremendous importance for designing inventory replenishment policy
in retail sale.
References
1. Han, J., & Kamber, N., “Data Mining: Concepts and Techniques”, Morgan Kaufmann, San
Francisco, CA, 2006, pp. 227–284.
2. Pujari, A.K., “Data Mining Techniques”, Hyderabad, University Press, 2002, pp. 259–260.
3. Cohen, M.A., & Ernst, R., “Multi-item classification and generic inventory stock control
policies”, Production and Inventory Management Journal, 29(3), 1988, pp. 6–8.
4. Ernst, R., & Cohen, M.A., “A clustering procedure for production/inventory systems”, Journal
of Operations Management, 9(4), 1990, pp. 574–598.
5. Guvenir, H.A., & Erel, E., “Multicriteria inventory classification using a genetic algorithm”,
European Journal of Operations Research, 105(1), 1998, pp. 29–37.
6. Partovi, F.Y., & Anandarajan, M., “Classifying inventory using an artificial neural network
approach. Computers & Industrial Engineering”, 41, 2002, pp. 389–404.
7. Bhattacharya, D.K., “Production, manufacturing and logistics on multi-item inventory”,
European Journal of Operational Research, 162(3), 2005, pp. 786–791.
598 P.K. Bala
8. Kuo, R.J., An, A.L., Wang, H.S., & Chung, W.J., “Integration of self-organizing feature maps
neural network and genetic K-means algorithm for market segmentation”, Expert Systems
with Applications, 30, 2006, pp. 313–324.
9. Hu, T. L., & Sheu, J.B., “A fuzzy-based customer classification method for demand-responsive
logistical distribution operations”, Fuzzy Sets and Systems, 139, 2003, pp. 431–450.
10. Larivie‘re, B., & Piel, D.V.D., “Investigating the role of product features in preventing cus-
tomer churn, by using survival analysis and choice modeling: The case of financial services”,
Expert Systems with Applications, 27, 2004, pp. 277–285.
11. Wong, W., Fu, A.W., & Wang, K., “Data mining for inventory item selection with cross-selling
considerations”, Data Mining and Knowledge Discovery, 11(1), 2005, pp. 81–112.
Chapter 51
Economic Process Capability Index for Product
Design and Process Planning Economic Process
Capability Index
Angus Jeang
Abstract The process capability index (PCI) is a value which reflects real-time
quality status. The PCI acts as the reference for real-time monitoring that enables
process controllers to acquire a better grasp of the quality of their on site processes.
The PCI value is typically defined as the ability to carry out a task or achieve a
goal. However, simply increasing the PCI value can easily create additional and
unnecessary production costs that result from extra efforts and expensive devices
for ensuring tolerance control. Hence, there is a need to balance customer demands
for quality and production costs. In this regard, the off-line PCI value is intro-
duced, in consideration of quality loss and production cost, simultaneously in this
research. The quality loss is expressed by quality loss function, and the production
cost is represented by tolerance cost function. Then, this new PCI expression can be
used as linkage for concurrent product design and process planning, prior to actual
production.
51.1 Introduction
A. Jeang
Department of Industrial Engineering and Systems Management, Feng Chia University, P.O. Box
25-150, Taichung, Taiwan, R.O.C.
E-mail: akjeang@fcu.edu.tw
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 599
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
600 A. Jeang
what is needed is a way to measure the degree of the producer’s process capabil-
ity, in satisfying the customer’s quality requirement. More importantly, a growing
number of producers include this measurement value in their purchase contracts
with customers, as a documentation requirement [2]. One such measurement is the
process capability index (PCI).
The process capability index (PCI) is a value which reflects real-time quality
status. The PCI acts as the reference for real-time monitoring that enables process
controllers to acquire a better grasp of the quality of their on site processes [3, 4].
Although the PCI is considered as one of the quality measurements employed during
on-line quality management, several authors have pointed out that the PCI should
be addressed at the beginning of the design stage rather than at the production stage,
where process capability analysis is typically done [Shina, 1995]. For the sake of
convenience, let us call the PCI for the former one, off-line PCI, and the latter one,
on-line PCI. The on-line PCI has realized process mean and process variance that
are obtained from the existing process. Conversely, the off-line PCI has the process
mean and process variance as two unknown variables, which the product designer
and process planner would have to determine. When cost is not considered as a
factor for off-line PCI analysis; normally the process planners would do their best
to set the process mean close to the design target, and minimize the process variance
to the process limit. Because the additional cost incurred for tightening the variance
is not considered, obviously, the establishment of mean and variance values will
result in a high PCI scale [6]. Thus, a PCI expression which contains cost factors
for an Off-line application is developed.
The PCI value is typically defined as the ability to carry out a task or achieve a
goal. The controllable factors are the process mean and process variance [7]. The
deviation between process mean and design target can be reduced by locating the
process mean close to the design target without additional cost being incurred.
The process variance can be lowered by tightening the process tolerance, with extra
cost incurred. In case the conventional on-line PCI is used for process capability
analysis during the product and process designs, designer engineers naturally intend
to raise the PCI value by locating the process mean near the target value, and by
reducing the tolerance value to ensure a better product quality. However, simply
increasing the PCI value can easily create additional and unnecessary production
costs that result from extra efforts and expensive devices for ensuring tolerance
control. Hence, there is a need to balance customer demands for quality and pro-
duction costs. In this regard, the off-line PCI value is introduced, in consideration
of quality loss and production cost, simultaneously in this research. The quality
loss is expressed by quality loss function, and the production cost is represented
by tolerance cost function. Then, this new PCI expression can be used as linkage
for concurrent product design and process planning, prior to actual production. The
rationale will be discussed in the latter sections.
51 Economic Process Capability Index for Product Design and Process Planning 601
The frequently seen PCI includes Cp , Cpk , and Cpm expressions. Cp can be defined
as follows [3, 4, 8–10]:
USL LSL
Cp D (51.1)
6
The expression (USL LSL) refers to the difference between the upper and lower
limits which are specified by the customer’s quality requirement; is the standard
deviation which is actually incurred in the production process. However, during the
production process, the process means U can be located at positions other than
design target. If the process variance 2 did not change, the above Cp value would
also remain unchanged; this was the major defect owing to the facts that only the
spread of the process is reflected, and the deviation of process mean can not be
reflected in the measurement. These are the main reasons why Cpk was developed;
the Cpk expression is defined as below:
USL U U LSL
Cpk D Min ; (51.2)
3 3
There is still a deficiency for Cpk expression: the same Cpk values may be consti-
tuted with different process means and variances. This situation has created a great
deal of confusion, and uncertainty as to which would be the best process capability
among the alternatives. To cope with the above arguments, another form of PCI,
Cpm , was developed. Cpm is defined as follows:
USL LSL
Cpm D q (51.3)
6 2 C .U T /2
When the process mean is equal to design target, the Cpm can be simplified as
Cp . For the purpose of comparison, three processes: A, B, and C are depicted in
Fig. 51.1. The Cp , Cpk , and Cpm values from processes A, B, and C are shown in
Table 51.1. Because process C has the greatest Cp value, it is might be mistakenly
concluded that process C had the best process capability among processes A, B,
and C, when Cp is considered as a reference value. However, this erroneous conclu-
sion originates from the fact that the Cp value is solely based on the magnitude of
variance, and disregards the negative impact from the process deviation. Similarly,
when Cpk was used in representing the levels of process capability, the process Cpk
values for processes A, B, and C, are all just equal to one. Thus again, quite obvi-
ously, there is difficulty in ordering the superiority of process capability of the three
processes. To overcome the defects appearing in Cp and Cpk expressions, another
PCI expression, Cpm is introduced. Unlike the previous two expressions, Cpm can
simultaneously reflect the impact from process deviation and process variance. This
feature is particularly important because process mean and process variance are
602 A. Jeang
LSL T USL
35 50 65
only legitimate after production process when realized U and t are no more con-
trollable variables for a design. However, when Cpm is used as a measurement scale
before production process, U andtbecome controllable variables. Then, it is possi-
ble that various combinations of U and t will result in the same Cpm value. Thus,
it is difficult to make a distinction among alternatives, in order to make a correct
choice among them. Additionally, the designers would most likely establish the pro-
cess mean U as close as possible to the design target T , within the process feasible
range, and attempt to decrease the process variance as much as possible within the
process capability limits in order to attain a higher Cpm value. In other words, with
the exclusive use of the process mean and process tolerance as the determinants of
conventional Cpm expression, regardless of the cost impact on customer and pro-
duction, there is a tendency for designers to position the process mean as close to
the target value as possible, and solely cut down the process tolerance to lower capa-
bility limit in order to increase the Cpm value. Apparently, the found PCI value is
erroneous.
The degree of proximity reflects the different quality loss according to the cus-
tomer’s view. Reducing the process variance is normally completed by tightening
the tolerance value through tolerance design which usually involves additional cost.
Therefore, in addition to the constraints from feasible ranges and capability lim-
its, the influence exerted by the relevant costs representing the selected process
mean and process tolerance, should be considered as well. This brings us to the
next section, a discussion on the requirement that costs related to process mean and
process tolerance must be contained in PCI expression, when referred to as off-line
process capability analysis, during product design and process planning.
Because various combinations of U and t will result in the same Cpm value, this
unhelpful facet will prevent the conventional Cpm from being a suitable index for
differentiating possible alternatives during product design or process planning. To
overcome the above weakness, the lack of consideration of the cost influence from
various U and t values, should be resolved. As is known, all costs incurred within
a product life cycle, include the material and production costs which are incurred
before the product reaches the consumer, and quality loss, which occurs after a
sale. In these regards, let the denominator of Cpm be replaced with total cost, TC,
which is the sum of quality and production related cost, which includes quality
loss KŒ 2 C .U T /2 and tolerance cost CM .ti /. To evaluate the tolerance cost,
this paper adopts the tolerance cost function as developed in the literature [11].
CM .t/ D a C b exp .ct/, where a, b, and c are the coefficients for the tolerance
cost function, and t is the process tolerance.
604 A. Jeang
As discussed in the preceding section, a lower quality loss (better quality) implies
a higher tolerance cost, and a higher quality loss (poorer quality) indicates lower
tolerance cost. Hence, the design parameters must be adjusted to reach an eco-
nomic balance between reduction in quality loss and tolerance cost, so that the cost
effectiveness PCI expression, Cpmc , is maximized. The model developed in the fol-
lowing illustration attempts to achieve this objective in the case of multiple quality
characteristics.
Before model development, a functional relationship between the dependent vari-
able Y and independent variable X should be identified and thoroughly understood.
Based on this relationship, the resultant overall quality characteristic such as Y
and UY can be estimated from the set of individual quality characteristics in both
product and process. The proposed process capability index, Cpmc; for the multiple
quality characteristics is:
USL LSL
Cpmc D s (51.7)
P0
M
6 KŒ.UY TY /2 C Y2 C CM .ti /
i D1
R1 R2
E2
-
V-
V0
R3
V+
E1 +
R4
U1 4 P2 P2
3
2t32 1 C U U2
7 1C
2
U1 U1
C 3 5C U3
(51.8)
2 U3
P 1 C U4 U4 2 1C U4
51 Economic Process Capability Index for Product Design and Process Planning 607
!2 !2
t22 1
U1
C 1
U t12 UU22 U2
U
U1 1C U3 1 U12 1C U3
4 4
Y2 D C
P2 P2
2 2
t42 1 C U
U1
2
U32 t32 1 C U
U1
2
C 4
C 4
(51.9)
P2 1 C UU4
3 4
U4 P2 1 C U 3
U4
U42
0 12 0 12
1 1 A C t12 @ U2 U2
tY2 D t22 @ C 2
A
U1 U1 1 C U 3 U1 U12 1 C U3
U4 U4
2 2
t42 1 C U
U1
2
U32 t32 1 C U 2
U1
C 4
C 4
(51.10)
1C U 3
U4 U 4
4 1C U 3
U4 U42
Objective function:
USL LSL
Max Cpmc D r P (51.11)
6 KŒ.UY TY /2 C Y2 C CM .ti /
i D1
Subjective to:
j TY U Y j <
D SY tY (51.12)
tLi ti tU i (51.13)
ULi Ui UUi (51.14)
drives the Cpm value to become unusually high. Similar reasons can also explain
why the total cost, TC, is always higher when Cpm is used as objective function than
is the case for Cpmc . In other words, Cpmc is an appropriate expression for appli-
cation in the process capability analysis during off-line application; particularly, for
product design and process planning at the stage of blueprint.
51.4 Summary
Conventionally, production related issues are usually dealt with via process analysis
after the product design or process planning has been completed. These approaches
will likely result in poor product quality and high production cost, as a consequence
of a lack of consideration concerning quality and production related costs. These
are some of the reasons that concurrent engineers suggesting that possible issues
occurring in the production stages should first be considered at the time that the new
product is developed. That can reduce the time span for introducing a new product
onto market, and increase the chance for obtaining a superior edge among competi-
tors. Thus, the present research introduces a PCI measurement, Cpmc , for the process
capability analysis, to ensure that lower production cost and high product quality
can be achieved at the earlier time of the blue print stage. For an application, prior
to production, process engineers can establish process mean and process tolerance
based on the optimized mean and tolerance values via the above three approaches.
As the expectation, the produced quality values after production process must be
distributed with statistic values as the process mean and process tolerance estab-
lished before production process. As a result, an effective PCI for product life cycle
becomes actualized.
References
1. Carter, D. E. and Baker, B. S., 1992, Concurrent Engineering: The Product Development
Environment for the 1990s, Addison-Wesley, New York.
2. Schneider, H. and Pruett, J., 1995, Use of Process Capability Indices in the Supplier
Certification Process, Qual. Eng., 8: 225–235.
3. Kotz, S. and Johnson, N. L., 1993, Process Capability Indices, Chapman & Hall, London.
4. Kotz, S. and Lovelace, C. R., 1998, Process Capability Indices in Theory and Practice, Arnold,
a Member of the Hodder Headline Group, New York.
5. Shina, S. G. and Saigal, A., 2000, Using Cpk as a Design Tool for New System Development,
Qual. Eng., 12: 551–560.
6. Spiring, F. A., 2000, “Assessing Process Capability with Indices” in Statistical Process Mon-
itoring and Optimization, edited by S. H. Park and G. G. Vining, Marcel-Dekker, New
York.
7. Jeang, A., 2001, Combined Parameter and Tolerance Design Optimization with Quality and
Cost Reduction, Int. J. Prod. Res., 39(5): 923–952.
8. Boyle, R. A., 1991, The Taguchi Capability Index, J. Qual. Technol., 23: 17–26.
51 Economic Process Capability Index for Product Design and Process Planning 609
9. Chan, L. K., Cheng, S. W., and Spring, F. A., 1989, A New Measurement of Process
Capability: Cpm , J. Qual. Technol., 20: 162–175.
10. Kane, V. E., 1986, Process Capability Index, J. Qual. Technol., 18: 41–52.11.
11. Chase, K. W., Greenwood, W. H., Loosli, B. G., and Haugland, L. F., 1990, Least Cost Toler-
ance Allocation for Mechanical Assemblies with Automated Process Selection, Manuf. Rev.,
3(1): 49–59.
12. Jeang, A., Liang, F., and Chung, C. P., 2008, Robust Product Development for Multiple Quality
Characteristics Using Computer Experiments and an Optimization Technique, Int. J. Prod.
Res., 46(12): 3415–3439.
13. Boyd, R. R., 1999, Tolerance Analysis of Electronic Circuits Using Matlab, CRC Press, New
York.
Chapter 52
Comparing Different Approaches for Design
of Experiments (DoE)
52.1 Introduction
Lye [1] defined the Design of Experiments (DoE) as a methodology for systemat-
ically applying statistics to experimentation. More precisely, it can be defined as a
series of tests in which purposeful changes are made to the input variables of a pro-
cess or system so that one may observe and identify the reasons for these changes
in the output response(s) [2].
Since experimentation is a frequent activity at industries, most engineers (and
scientists) end up using statistics to analyse their experiments, regardless of their
background [3]. Not all engineers are exposed to statistics at the undergraduate level,
and this leads to problems when tools are required in practice; either they don’t know
M. Tanco (B)
Department of Industrial Management Engineering at TECNUN (University of Navarra),
Paseo Manuel Lardizabal 13, 20018 San Sebastian, Spain
E-mail: mtanco@tecnun.es
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 611
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
612 M. Tanco et al.
what type of experimental strategy is required for their problem and they select
something inappropriate, or they select the correct strategy and apply it incorrectly,
or they select the wrong strategy in which case it probably doesn’t matter whether
they use it correctly or not [4].
OFAT (one-factor-at-a-time) is an old-fashioned strategy, usually taught at uni-
versities and still widely practiced by companies. It consists of varying one variable
at a time, with all other variables held constant. On the other hand, DoE is an
efficient technique for experimentation which provides a quick and cost-effective
method for solving complex problems with many variables. Numerous case studies
in different areas of application prove the advantages and potential of DoE [5].
The statistical approach to Design of Experiments and the Analysis of Variance
(ANOVA) technique was developed by R.A. Fisher in 1920. Since then, many have
contributed to the development and expansion of this technique. Most techniques,
defined throughout this article as “Classical”, have adapted Fisher’s ideas to various
industries, including agriculture. However, engineers Shainin and Taguchi are espe-
cially influential due to their contribution of two new approaches to DoE. Both new
approaches offer more than just Design of Experiments, as they can be considered
quality improvement strategies [6].
All three approaches to DoE (Classical, Shainin and Taguchi) are far superior
to OFAT. The aforementioned approaches have their proponents and opponents, and
the debate between them is known to become heated at times. Dr. Deming once said,
“Any technique is useful as long as the user understands its limitations.” Therefore,
the aim of this paper is to present each approach along with its limitations.
Most engineers have surely at least heard of Design of Experiments (DoE),
Taguchi Methods or the Shainin SystemTM . Yet how many of them can really
say that they understand the differences among them? Or that they are capable of
correctly deciding when to use which technique?
The answer to these questions is heavily influenced by the knowledge and expe-
rience one has of each approach to Design of Experiments. Since the majority of
practitioners only have experience with a single approach, this article aims to open
their minds and compare available approaches.
Each approach to Design of Experiments will be briefly described in chronolog-
ical order in the following section. In Section 52.3, the main criticisms published in
literature about each approach are highlighted. Finally, the conclusion and recom-
mendations for engineers and managers working in the manufacturing industry are
presented in Section 52.4.
Although books on Design of Experiments did not begin to appear until the twen-
tieth century, experimentation is certainly about as old as mankind itself [7]. The
one-factor-at-a-time strategy (OFAT) was, and continues to be, used for many years.
52 Comparing Different Approaches for Design of Experiments (DoE) 613
However, these experimentation strategies became outdated in the early 1920s when
Ronald Fisher discovered much more efficient methods of experimentation based
on factorial designs [1]. Those designs study every possible combination of fac-
tor settings, and are especially useful when experimentation is cheap or when
the number of factors under study is small (less than five). Fisher first applied
factorial designs to solve an agricultural problem, where the effect of multiple vari-
ables was simultaneously (rain, water, fertilizer, etc.) studied to produce the best
crop of potatoes. His experiences were published in 1935 in his book “Design of
Experiments” [8].
Fractional Factorial designs were proposed in the 1930s and 1940s in response
to the overwhelming number of experiments that are involved with full factorial
designs. This design consists of a carefully selected fraction of the full factorial
experimental design. They provide a cost-effective way of studying many factors
in one experiment, at the expense of ignoring some high-order interactions. This is
considered to be low risk, as high order interactions are usually insignificant and
difficult to interpret anyway.
According to Montgomery [2], the second stage in the era of the classical
approach to DoE began in the 1950s when Box & Wilson [9] developed what was
later called Response Surface Methodology (RSM). Their methodology allowed
DoE to be applied in the chemical industry and afterwards in other industries as
well. They touted the advantages of industrial experiments compared to agricultural
experiments in the two following areas: (a) Immediacy: response can be obtained
quicker than with agricultural experiments, when results can sometimes take up to a
year to be obtained; (b) Sequentially: the experimenter is able to carry out few exper-
iments, analyse them and again plan new experiments based on findings obtained
from the previous experiments. In this era Central Composite designs (CCD) and
Box-Behnken designs (BBD) were created.
The third era of the classical approach started with the appearance of the Taguchi
and Shainin approach in the US in the 1980s as a simple and efficient method
of experimentation. Eventually, statisticians and academics began to acknowledge
the value of certain engineering ideas of Taguchi and Shainin. This led to positive
changes, adopting many ideas of the new approaches (for example, the reduction
of variance became an important research area within classical design), and giving
importance to constructing methodologies and guidelines to ease application.
In this last era, the democratization of statistics, thanks in part to software pack-
ages and the spread of Six Sigma thinking throughout industries [10], helped spread
Design of Experiments to all types of industries. Moreover, an increasing interest
in literature was advocated to Design of Experiments [11]. Furthermore, software
packages have made the construction of graphs and calculus easier, further facilitat-
ing the application of DoE. Recent books by Funkenbusch (2005) [12] and Robinson
(2000) [13] show how this approach can be understood by engineers.
Many scientists and statisticians have contributed to DoE development, making
the classical approach a valid and robust methodology for Design of Experiments.
Usual reference books are Box et al. [14] and Montgomery [2].
614 M. Tanco et al.
The Shainin SystemTM is the name given to a problem solving system developed by
Dorian Shainin. In 1975, he established his own consulting practice: Shainin LLC.
His sons Peter and Richard later joined the family business. Shainin described his
colourful method as the American approach to problem solving, with the same goals
of the Taguchi approach [25].
Shainin viewed his ideas as private intellectual property, which he was known to
sell to clients to help them gain a competitive advantage [26]. As Shainin SystemsTM
are legally protected trademarks and some of its methods are rarely discussed in
literature, it is difficult to obtain a complete overview of the approach [27].
Keki R. Bhote was authorised to publish information in the first and only book
about these methods. His company, Motorola, won the Malcolm Baldrige National
Quality Award, which stipulates that its methods be shared with other US Com-
panies [28]. Interest in Dorian Shainin’s problem solving techniques rose with the
1991 publication of this book (a second edition was published in 2000 [28]).
Dorian Shainin included several techniques – both known and newly invented –
in a coherent step-by-step strategy for process improvement in manufacturing envi-
ronments [27]. Among those powerful tools, he considered Design of Experiments
as the centrepiece. Moreover, he didn’t believe that DoE was limited to the exclu-
sive province of professionals, but could rather be extended so that the whole factory
could be turned loose on problem-solving [28].
The foundation of Shainin’s DoE strategy rests on:
The Pareto Principle: Among many, even hundreds of candidate factors, a single
one will be the root cause of variation of the response y. That root cause is called
the Red X R
and may be a single variable or the interaction of two more separate
616 M. Tanco et al.
variables [25]. There may be then a second or a third significant cause, called the
Pink X R
and Pale Pink X R
, respectively.
Shainin strongly objected to the use of the Fractional Factorial technique. He
proposed instead to identify and diagnostically reduce most of the sources of
variation down to a manageable number (three or four), at which time he allowed
the use of full factorials [29].
“Talk to the parts, they are smarter than engineers”. First, talk to the parts. Then,
talk to the workers on the firing line. Last, the least productive methods are to
talk to the engineers [28].
The Shainin System presents many tools in a sequence of progressive problem solv-
ing. It can be divided into three main groups: Clue Generation tools, Formal DoE
and Transition to SPC. The Shainin DoE technique considers as many variables as
can be identified [30]. The first groups try to generate clues (like Sherlock Homes)
with a number of tools (Multi Vary, Components SearchTM, Paired ComparisonTM,
Process SearchTM and Concentration ChartTM) to reduce the number of variables
involved in the problem through on-line experimentation. In the second stage, the
Variable SearchTM technique is used to reduce the number of variables by sequen-
tially (not randomly) experimenting off-line, based on engineering judgement with
binary search. Once a few factors are obtained, full factorials are used to analyse
their effects and interactions. Afterwards, other techniques (B vs. CTM , Response
Surface, and ScaterPlots) are used to confirm the results and optimise them when
necessary. Finally, in the last step, PositrolTM , Process Certification and Pre-Control
are recommended to guarantee that results will be obtained in the future.
Up to the time when “Taguchi Methods” were propagated in the U.S., Design of
Experiments (classical) and associated techniques were treated as mathematical
tools, more like an adjunct to an engineer’s technical resources for the study of
product and process characteristics [17].
Taguchi and Shainin were the biggest critics of the Classical Approach. They
believed that managers, engineers and workers found the use of DoE to be a
complicated, ineffective and frustrating experience [28].
However, it is worth mentioning that after a decade of strong opposition to
their new ideas, the “classical” proponents began to acknowledge the importance
of some of the new ideas proposed by Taguchi and Shainin. For example, there
have been many attempts to integrate Taguchi’s parameter design principle with
well-established statistical techniques [18]. As a consequence, the exploitation of
response surface methodology for variance reduction has become a major area of
research [31, 32]. Moreover, the simplicity demanded by critics was reached (or at
52 Comparing Different Approaches for Design of Experiments (DoE) 617
least greatly improved) by the aid of software capable of designing and analysing
experimental design. Furthermore, great emphasis was placed on presenting guide-
lines and including graphical tools to clarify every step in the planning stage. For
example, the Pareto Charts of effects or the use of Multi-Vary charts to correctly
define the problem were included.
52.3.2 Taguchi
There was great debate about these methods during the first decade after their
appearance in the U.S. [21–23, 33, 34]. Taguchi’s approach was criticised for being
inefficient and often ineffective. Moreover, Shainin was highly critical of Taguchi,
challenging the myth of the “secret super Japanese weapon” [25].
Nair [23] identified three general criticisms of Taguchi’s work: (a) The use of
the SNR as a measure of the basis of analysis, (b) his analysis methods and (c) his
choice of experimental designs.
Many have criticized the use of SNR as a performance measurement. Better
approaches to the parameter design problem have been addressed in recent years,
such as Box’s performance measurement [35] or the Response Surface approach
[31].
There has also been much criticism of the new analysis techniques proposed
by Taguchi. Most of them are more difficult and less effective than previous ones.
An example is the use of accumulation analysis [21, 36].
Finally, and most importantly, is the criticism of experimental designs. Firstly,
orthogonal arrays were criticised for underestimating interactions. In response to
this criticism, Taguchi stated [29], “A man who does not believe in the existence of
non-linear effects is a man removed from reality”. He believes, however, in the abil-
ity of engineers to decide on the levels of factors (called sliding levels), in order to
make some interactions for that particular experiment insignificant. Unfortunately,
there is evidence that interaction should be avoided. This is accepted as a matter of
faith among Taguchi’s followers [17].
On the other hand, the designs proposed by Taguchi to simultaneously study
both the mean and the variance (crossed arrays) were also criticized, since they
require multiple runs and generally don’t allow to study control factor interactions.
Therefore, Welch [37], among others, proposed using a combined array to reduce
the number of runs. There has been much debate in recent years on this topic and it
is not yet clear what the best approach is. Pozueta et al. [38] and Kunert et al. [39]
have demonstrated, for example, how classical designs are sometimes worse than
Taguchi’s designs.
It is worth mentioning that whereas some classical statisticians and academics
have acknowledged the value of certain engineering ideas of Taguchi, the Taguchi
camp has shown few signs of compromise, steadfastly vouching for the unchallenge-
able effectiveness of the Taguchi package of procedures in its original form [17].
618 M. Tanco et al.
For more details on the technical debate of Taguchi’s approach, refer to Nair [23],
Pignatello [21], Box [33] and Robinson [22], among others.
52.3.3 Shainin
As the Shainin SystemTM is legally protected, one of the only ways to learn about his
method is to go to his classes. The only other alternative is to read Bhote’s book [28].
Unfortunately, this book is full of hyperbolic, over optimistic, extremely biased and
sometimes even intellectually dishonest claims in its urge to prove that Shainin’s
techniques are superior to all other methods. For example, Bhote claims, [28] “We
are so convinced of the power of the Shainin DoE that we throw out a challenge. Any
problem that can be solved by classical or Taguchi Methods, we can solve better,
faster, and less expensively with the Shainin methods”. Moreover, he presents DoE
as the panacea to all kinds of problems and criticises other alternatives such as Six
Sigma and TQM without academic basis behind his claims.
Although there is little written in industry literature about Shainin, the existing
material is enough to fuel criticism of his methods. The most criticised tech-
niques are the Variable SearchTM [29, 30, 40, 41] and Pre-Control [26, 41]. Variable
SearchTM has much in common with the costly and unreliable “one-factor-at-a-
time” method [29]. Shainin’s technique relies heavily on using engineering judge-
ment. Its weakness lies in the skill and knowledge required to carry out two tasks:
firstly, to correctly identify the variables and secondly, to allocate those variables to
the experiment [30].
On the other hand, although Pre-Control is presented as an alternative to sta-
tistical process control (SPC), it is not an adequate substitution. In particular, it is
not well-suited for poorly performing processes where its use will likely lead to
unnecessary tampering [26].
A recent review and discussion by Steiner et al. [41, 42] provides an in-depth
look at the Shainin SystemTM and its criticisms.
Three different approaches to DoE have been presented throughout the last two
sections. These approaches, compared with OFAT strategies, are more successful.
However, this does not prove that each technique is necessarily the best. Therefore,
we will give some recommendations and conclusions on each approach based on
our experience and research.
Firstly, we must commended Shainin for stressing the importance of statisti-
cally designed experiments [29]. His methods are easy to learn and can be applied
to ongoing processing during full production. Despite their obvious simplicity
(which stems from the fact that they are, most of the time, simple versions of
one-factor-at-a-time methods), they do not seem to offer a serious alternative to
52 Comparing Different Approaches for Design of Experiments (DoE) 619
References
1. Lye, L.M., Tools and toys for teaching design of experiments methodology. In 33rd Annual
General Conference of the Canadian Society for Civil Engineering. 2005 Toronto, Ontario,
Canada.
2. Montgomery, D.C., Design and Analysis of Experiments. 2005, New York: Wiley.
620 M. Tanco et al.
3. Gunter, B.H., Improved Statistical Training for Engineers – Prerequisite to quality. Quality
Progress, 1985. 18(11): pp. 37–40.
4. Montgomery, D., Applications of Design of Experiments in Engineering. Quality and
Reliability Engineering International, 2008. 24(5): pp. 501–502.
5. Ilzarbe, L. et al., Practical Applications of Design of Experiments in the Field of Engineering.
A Bibliographical Review. Quality and Reliability Engineering International, 2008. 24(4):
pp. 417–428.
6. De Mast, J., A Methodological Comparison of Three Strategies for Quality Improvement.
International Journal of Quality and Reliability Management, 2004. 21(2): pp. 198–212.
7. Ryan, T.P., Modern Experimental Design. 2007, Chichester: Wiley.
8. Fisher, R.A., The Design of Experiments. 1935, New York: Wiley.
9. Box, G.E.P. and K.B. Wilson, On the Experimental Attainment of Optimum Conditions.
Journal of the Royal Statistical Society, 1951. Series B(13): pp. 1–45.
10. Montgomery, D.C., Changing Roles for the Industrial Statisticians. Quality and Reliability
Engineering International, 2002. 18(5): pp. 3.
11. Booker, B.W. and D.M. Lyth, Quality Engineering from 1988 Through 2005: Lessons from
the Past and Trends for the Future. Quality Engineering, 2006. 18(1): pp. 1–4.
12. Funkenbusch, P.D., Practical Guide to Designed Experiments. A Unified Modular Approach.
2005, New York: Marcel Dekker.
13. Robinson, G.K., Practical Strategies for Experimentation. 2000, Chichester: Wiley.
14. Box, G.E.P., J.S. Hunter, and W.G. Hunter, Statistics for Experimenters – Design, Innovation
and Discovery. Second Edition. Wiley Series in Probability and Statistics, ed. 2005, New York:
Wiley.
15. Taguchi, G., Introduction to Quality Engineering. 1986, White Plains, NY: UNIPUB/Kraus
International.
16. Taguchi, G., System of Experimental Design: Engineering Methods to Optimize Quality and
Minimize Cost. 1987, White Plains, NY: UNIPUB/Kraus International.
17. Goh, T.N., Taguchi Methods: Some Technical, Cultural and Pedagogical Perspectives. Quality
and Reliability Engineering International, 1993. 9(3): pp. 185–202.
18. Tay, K.-M. and C. Butler, Methodologies for Experimental Design: A Survey, Comparison
and Future Predictions. Quality Engineering, 1999. 11(3): pp. 343–356.
19. Arvidsson, M. and I. Gremyr, Principles of Robust Design Methodology. Quality and
Reliability Engineering International, 2008. 24(1): pp. 23–35.
20. Roy, R.K., Design of Experiments Using the Taguchi Approach: 16 steps to Product and
Process Improvement. 2001, New York: Wiley.
21. Pignatello, J. and J. Ramberg, Top Ten Triumphs and Tragedies of Genechi Taguchi. Quality
Engineering, 1991. 4(2): pp. 211–225.
22. Robinson, T.J., C.M. Borror, and R.H. Myers, Robust Parameter Design: A Review. Quality
and Reliability Engineering International, 2004. 20(1): pp. 81–101.
23. Nair, V.N., Taguchi’s Parameter Design: A Panel Discussion. Technometrics, 1992. 31(2):
pp. 127–161.
24. Taguchi, G., S. Chowdhury, and Y. Wu, Taguchi’s Quality Engineering Handbook. First
edition. 2004, New York: Wiley Interscience.
25. Shainin, D. and P. Shainin, Better than Taguchi Orthogonal Tables. Quality and Reliability
Engineering International, 1988. 4(2): pp. 143–149.
26. Ledolter, J. and A. Swersey, An Evaluation of Pre-Control. Journal of Quality Technology,
1997. 29(2): pp. 163–171.
27. De Mast, J. et al., Steps and Strategies in Process Improvement. Quality and Reliability
Engineering International, 2000. 16(4): pp. 301–311.
28. Bhote, K.R. and A.K. Bhote, Word Class Quality. Using Design of Experiments to Make it
Happen. Second edition. 2000, New York: Amacom.
29. Logothetis, N., A perspective on Shainin’s Approach to Experimental Design for Quality
Improvement. Quality and Reliability Engineering International, 1990. 6(3): pp. 195–202.
52 Comparing Different Approaches for Design of Experiments (DoE) 621
30. Thomas, A.J. and J. Antony, A Comparative Analysis of the Taguchi and Shainin DoE Tech-
niques in an Aerospace Enviroment. International Journal of Productivity and Performance
Management, 2005. 54(8): pp. 658–678.
31. Vining, G.G. and R.H. Myers, Combining Taguchi and Response Surface Philosophies:
A Dual Response Approach. Journal of Quality Technology, 1990. 22(1): pp. 38–45.
32. Quesada, G.M. and E. Del Castillo, A Dual Response Approach to the Multivariate Robust
Parameter Design Problem. Technometrics, 2004. 46(2): pp. 176–187.
33. Box, G.E.P., S. Bisgaard, and C. Fung, An Explanation and Critique of Taguchi’s Contribution
to Quality Engineering. International Journal of Quality and Reliability Management, 1988.
4(2): pp. 123–131.
34. Schmidt, S.R. and R.G. Lausnby, Understanding Industrial Designed Experiments. Fourth
Edition. 2005, Colorado Springs, CO: Air Academy Press.
35. Box, G.E.P., Signal to Noise Ratios, Performance Criteria, and Transformations. Technomet-
rics, 1988. 30(1): pp. 1–17.
36. Box, G.E.P. and S. Jones, An Investigation of the Method of Accumulation Analysis. Total
Quality Management & Business Excellence, 1990. 1(1): pp. 101–113.
37. Welch, W.J. et al., Computer Experiments for Quality Control by Parameter Design. Journal
of Quality Technology, 1990. 22(1): pp. 15–22.
38. Pozueta, L., X. Tort-Martorell, and L. Marco, Identifying Dispersion Effects in Robust Design
Experiments - Issues and Improvements. Journal of Applied Statistics, 2007. 34(6): pp. 683–
701.
39. Kunert, J. et al., An Experiment to Compare Taguchi’s Product Array and the Combined Array.
Journal of Quality Technology, 2007. 39(1): pp. 17–34.
40. Ledolter, J. and A. Swersey, Dorian Shainin’s Variables Search Procedure: A Critical
Assessment. Journal of Quality Technology, 1997. 29(3): pp. 237–247.
41. De Mast, J. et al., Discussion: An Overview of the Shainin SystemTM for Quality Improve-
ment. Quality Engineering, 2008. 20(1): pp. 20–45.
42. Steiner, S.H., J. MacKay, and J. Ramberg, An Overview of the Shainin SystemTM for Quality
Improvement. Quality Engineering, 2008. 20(1): pp. 6–19.
43. Tanco, M. et al., Is Design of Experiments Really Used? A Survey of Basque Industries.
Journal of Engineering Design, 2008. 19(5): pp. 447–460.
44. Viles, E. et al., Planning Experiments, the First Real Task in Reaching a Goal. Quality
Engineering, 2009. 21(1): pp. 44–51.
Chapter 53
Prevention of Workpice Form Deviations in
CNC Turning Based on Tolerance Specifications
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 623
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
624 G.V. Riestra et al.
53.1 Introduction
Once finished machining operations, the deflection due to cutting forces disappears
and the elastic recovery of material causes the workpiece axis to return to its original
position, but this entails a defect of form in the workpiece surfaces equivalent to the
deflection that the axis had undergone. Actually, turning of cylindrical or conical
53 Prevention of Workpice Form Deviations in CNC Turning 625
surfaces leads to a third-order polynomial surface profile [4] (Fig. 53.1a). Similarly,
facing of front surfaces leads to conical surfaces with a lack of perpendicularity with
regard to the symmetry axis (Fig. 53.1b). The conicity of real front faces is directly
related to the turn .z/ of the section and to the total elastic deflection ıT .z/. For
the section in which the cutting tool is applied .z D a/ this relation will be:
d
tan .a/ D ıT .a/ (53.1)
dz
The total deviation in the radial direction ıT .z/ of a turned component at a distance z
from the chuck is composed by simultaneous factors related to deformations caused
by cutting forces, such as the spindle-chuck system ısc .z/, the toolpost ıtp .z/,
the workpiece deflection ıp .z/ and the thermal effects [5, 11] ıt h .z/. This can be
expressed as:
Deviations can be obtained for the tangential (Z) and radial (X ) directions
(Fig. 53.2). The contribution of the spindle-chuck system can be expressed for these
directions [5] as:
Fr Fr Fa D D 2 Fr
ısc r .z/ D C z2 C z
k2r k3r 2 k3r k2r
Fr Fa D Lc Fr Lc 2
C C (53.3)
k1r 2 k2r k2r
626 G.V. Riestra et al.
Ft Ft Fa D D 2 Ft
ısct .z/ D C z2 C z
k2t k3t 2 k3t k2t
Ft Fa D Lc Ft Lc 2
C C (53.4)
k1t 2 k2t k2t
where Fr is the radial component of the cutting force, lt , At and Et the cantilever
length, cross section and elastic modulus for the tool respectively, and lh , Ah and
Eh the length, cross section and elastic modulus for the tool-holder (Fig. 53.2).
Based on strain-energy, the contribution of the workpiece deflection for the chuck
clamping method when a cutting force is at z D a (distance between force and
chuck) will be:
X Z Zi C1
4 F .a z/2
ıp .a/ D dz
i
E Zi ri ext ri4int
4
Z Zi C1 !
F 1
C dz (53.6)
G Zi ri2ext ri2int
53 Prevention of Workpice Form Deviations in CNC Turning 627
where i represents each zone with constant cross section, E is the elastic modulus,
G the shear modulus, I the moment of inertia and is the shear factor. For the
between-centre clamping method, the expression is:
Z Zi C1 !
4 F .L a/2 X z2
ıp .a/ D dz
L2 E
i Zi ri4ext ri4int
Z Zi C1 ! (53.7)
F .L a/2 X 1
C dz
L2 G
i Zi ri2ext ri2int
Finally, the contribution of the thermal drift takes place mainly in the radial direction
and depends on factors such as spindle speed, feed rate, machine-tool operation
time and ambient temperature. Some authors use these parameters as input data to a
neural network, which output is the radial thermal drift. [10, 11]
Due to the form error, the real distance between two front faces depends on the
zone in which the measurement is taken. Actual measurements will be between the
values L0min and L0max (Fig. 53.3a) and they will be admissible when the following
expression is satisfied:
where Ln is the nominal distance and dLu and dLl are the upper and lower limits
of longitudinal tolerance, respectively. Considering distances ei and ej in Fig. 53.3a,
the following relation can be established:
Taking into account Eqs. (53.9)–(53.10) and that the tolerance zone t is the dif-
ference between the upper and lower limits, the following is the condition for the
measurement to meet the tolerance:
Due to the form error of the cylindrical surfaces, the diameter of the workpiece
depends on the location of the measurement point, varying between a maximum
0 0
d and Dmd ) (Fig. 53.3b). The measurement is admissible
and a minimum value (DM
when the following expression is satisfied:
0 0
DM d Dn C dDu and Dmd Dn C dDl (53.13)
where Dn is the nominal diameter and dDu and dDl are the upper and lower limits
of the diametrical tolerance, respectively.
Once machined the cylindrical surface, the maximum error is given by the differ-
ence between the maximum ıTMd and the minimum ıTmd deflections. The following
relation can be deduced from geometry (Fig. 53.3b):
0 0
DM d Dmd D 2 jıTMd ıT md j (53.14)
By considering Eqs. (53.13)–(53.14) and that the tolerance zone t is the difference
between the upper and lower deviations, the following is the condition to meet the
tolerance:
2 jıTMd ıT md j dDu dDl D t (53.15)
Definition of each type of geometrical tolerance has been considered based on ISO
standard (ISO 1101:1983). According to that, the flatness tolerance establishes a
zone of acceptance t with respect to the nominal orientation of the controlled face,
within which the real machined surface must stand. In order to satisfy this condition,
the form error must be lower than the width of the tolerance zone (Fig. 53.3c).
Considering this and being distance ei calculated as in Eq. (53.12), the condition
will be:
ei t (53.16)
630 G.V. Riestra et al.
jıTMd ıT md j t (53.18)
This tolerance specifies a zone between two surrounding spherical surfaces whose
diameter difference is the value of the tolerance zone t and whose centres are located
onto the theoretical surface. According to norm ISO 3040:1990, for the case of a
conical surface, the tolerance zone becomes the space between two cones of the
same angle than the datum and equidistant to that cone the half of the tolerance
value.
All diameters of the real machined surface must lie within the tolerance zone,
0 0
including the diameters in which deviation is maximum DM d
and minimum Dmd ,
which will satisfy:
ˇ 0 ˇ ˇ 0 ˇ
ˇD ˇ ˇD Dnmd ˇ
M d DnM d t md t
and (53.19)
2 2 cos ˛ 2 2 cos ˛
where DnM d and Dnmd are the nominal diameters of the cone measured in the
positions of maximum and minimum diametrical deviation (Fig. 53.3e).
On the other hand, workpiece deflections in the zones of maximum and minimum
deviations (ıTMd and ıT md ) can be expressed as:
ˇ 0 ˇ ˇ 0 ˇ
ˇD DnM d ˇ ˇD Dnmd ˇ
ıTMd D Md
and ıT md D md
(53.20)
2 2
The final condition is derived from Eqs. (53.19–53.20):
t
ıTMd C ıT md (53.21)
cos ˛
53 Prevention of Workpice Form Deviations in CNC Turning 631
Parallelism is applied between front faces in turned parts. Let be i the datum and
j the related face. According to the definition, the related surface must lie within
an space between two planes parallel to the datum and distanced one another the
value of the tolerance t (Fig. 53.3f). Although even the datum has a form error,
according to norm ISO 5459:1981, this surface will be considered perpendicular
to the part axis and, consequently, also the planes which define the tolerance zone.
Considering this, and calculating ej as in Eq. (53.12), the geometrical condition to
meet the tolerance will be:
ej t (53.22)
Perpendicularity is applied between a front face and a feature axis in turned parts.
Two different situations must be considered depending on which of both elements
is the datum and which the controlled one.
When the axis is the datum and a face is controlled, the tolerance zone is the
space between two planes perpendicular to the datum axis and distanced one another
the value of the tolerance t (Fig. 53.3g). Considering this, and calculating ej as in
Eq. (53.12), the condition will be same than in Eq. (53.22).
On the other hand, when a face is the datum and the axis is controlled, the toler-
ance zone is the space within a cylinder of diameter t, and axis perpendicular to the
datum face. The elastic recovery of the part axis after machining implies that any
deviation of this element does not depend directly on the cutting action but on other
technological aspects such as a bad alignment of the workpiece in the lathe. There-
fore, no relation can be established between this error and the deviation caused by
cutting forces.
This tolerance limits the relative deviation between two zones of the part axis. As in
the previous case, the elastic recovery of the workpiece axis after machining implies
that the possible error is not due to deviations caused by cutting forces but to other
causes.
For circular-radial runout, the tolerance zone is the area between two concentric
circles located into a plane perpendicular to the axis, whose radii difference is the
tolerance t and whose centre is located at datum axis. Since form errors derived
632 G.V. Riestra et al.
from cutting forces are completely symmetrical with respect to the workpiece rota-
tion axis, the error evaluated throughout this tolerance does not depend on these
forces, but on the workpiece deflection caused by the clamping force or by a lack of
workpiece alignment.
A similar situation takes place regarding circular-axial runout.
The optimization of cutting conditions in turning is the final stage of process plan-
ning in which not only mathematical considerations about the objective function
have to be done (e.g., time, cost or benefit) but also there are several constraints that
restrict the best solution.
Common constraints are related to ranges of cutting parameters (cutting speed,
feed-rate and depth of cut), ranges of tool-life and other operating limits such as sur-
face finish, maximum power consumption and maximum force allowed. The cutting
force constraint is imposed to limit the deflection of the workpiece or cutting tool,
which result in dimensional error, and to prevent chatter. [12] Traditionally, the value
of this constraint has not been clearly determined. Nevertheless, the relationships
between workpiece deviations and tolerances described in the previous sections can
also be considered as relationships between cutting forces and maximum deviations
allowed and, therefore, all together can be used as optimization constraints.
53.6 Conclusions
This work provides a mathematical model based on strain energies for prediction
of deviations in the turning process of workpieces with complex external and/or
internal geometry and which takes into account the main clamping procedures in
turning.
53 Prevention of Workpice Form Deviations in CNC Turning 633
Acknowledgements This work is part of a research project supported by the Spanish Education
and Science Ministry (DPI2007-60430) and FEDER.
References
1. S. Yang, J. Yuan, and J. Ni, Real-time cutting force induced error compensation on a turning
center, Int. J. Mach. Tools Manuf. 37(11), 1597–1610 (1997).
2. A.-V. Phan, L. Baron, J.R. Mayer, and G. Cloutier, Finite element and experimental studies of
diametrical errors in cantilever bar turning, J. App. Math. Model. 27(3), 221–232 (2003).
3. L.Z. Qiang, Finite difference calculations of the deformations of multi-diameter workpieces
during turning, J. Mat. Proc. Tech. 98(3), 310–316 (2000).
4. L. Carrino, G. Giorleo, W. Polini, and U. Prisco, Dimensional errors in longitudinal turning
based on the unified generalized mechanics of cutting approach. Part II: Machining pro-
cess analysis and dimensional error estimate, Int. J. Mach. Tools Manuf. 42(14), 1517–1525
(2002).
5. G. Valiño, S. Mateos, B.J. Álvarez, and D. Blanco, Relationship between deflections caused
by cutting forces and tolerances in turned parts, Proceedings.of the 1st Manufacturing
Engineering Society International Conference (MESIC), Spain (2005).
6. D. Mladenov, Assessment and compensation of errors in CNC turning, Ph.D. thesis, UMIST,
UK (2002).
7. S. Hinduja, D. Mladenov, and M. Burdekin, Assessment of force-induced errors in CNC
turning, Annals of the CIRP 52(1), 329–332 (2003).
8. J. Yang, J. Yuan, and J. Ni, Thermal error mode analysis and robust modelling for error
compensation on a CNC turning center, Int. J. Mach. Tools Manuf. 39(9), 1367–1381 (1999).
9. V.A. Ostafiev and A. Djordjevich, Machining precision augmented by sensors, Int. J. Prod.
Res. 37(1), 1999, pp. 91–98.
10. X. Li, P.K. Venuvinod, A. Djorjevich, and Z. Liu, Predicting machining errors in turning using
hybrid learning, Int. J. Adv. Manuf. Tech. 18, 863–872 (2001).
11. X. Li and R. Du, Analysis and compensation of workpiece errors in turning, Int. J. of Prod.
Res. 40(7), 1647–1667 (2002).
12. Y.C. Shin and Y.S. Joo, Optimization of machining conditions with practical constraints,
Int. J. Prod. Res. 30(12), 2907–2919 (1992).
Chapter 54
Protein–Protein Interaction Prediction
Using Homology and Inter-domain Linker
Region Information
Nazar Zaki
Abstract One of the central problems in modern biology is to identify the complete
set of interactions among proteins in a cell. The structural interaction of proteins and
their domains in networks is one of the most basic molecular mechanisms for bio-
logical cells. Structural evidence indicates that, interacting pairs of close homologs
usually interact in the same way. In this chapter, we make use of both homology and
inter-domain linker region knowledge to predict interaction between protein pairs
solely by amino acid sequence information. High quality core set of 150 yeast pro-
teins obtained from the Database of Interacting Proteins (DIP) was considered to
test the accuracy of the proposed method. The strongest prediction of the method
reached over 70% accuracy. These results show great potential for the proposed
method.
54.1 Introduction
N. Zaki
Assistant Professor with the College of Information Technology, UAE University.
Al-Ain 17555, UAE,
E-mail: nzaki@uaeu.ac.ae
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 635
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
636 N. Zaki
the various biological membranes, the packaging of chromatin, the network of sub-
membrane filaments, muscle contraction, signal transduction, and regulation of gene
expression, to name a few [1]. Abnormal PPIs have implications in a number of neu-
rological disorders; include Creutzfeld-Jacob and Alzheimer’s diseases. Because of
the importance of PPIs in cell development and disease, the topic has been studied
extensively for many years. A large number of approaches to detect PPIs have been
developed. Each of these approaches has strengths and weaknesses, especially with
regard to the sensitivity and specificity of the method.
Some of the earliest techniques predict interacting proteins through the similarity
of expression profiles [14], description of similarity of phylogenetic profiles [12]
or phylogenetic trees [15], and studying the patterns of domain fusion [16]. How-
ever, it has been noted that these methods predict PPI in a general sense, meaning
54 Protein–Protein Interaction Prediction 637
joint involvement in a certain biological process, and not necessarily actual physical
interaction [17].
Most of the recent works focus on employing protein domain knowledge [18–
22]. The motivation for this choice is that molecular interactions are typically medi-
ated by a great variety of interacting domains [23]. It is thus logical to assume that
the patterns of domain occurrence in interacting proteins provide useful information
for training PPI prediction methods [24].
An emerging new approach in the protein interactions field is to take advantage of
structural information to predict physical binding [25, 26]. Although the total num-
ber of complexes of known structure is relatively small, it is possible to expand this
set by considering evolutionary relationships between proteins. It has been shown
that in most cases close homologs (>30% sequence identity) physically interact in
the same way with each other. However, conservation of a particular interaction
depends on the conservation of the interface between interacting partners [27].
In this chapter, we propose to predict PPI using only sequence information. The
proposed method combines homology and structural relationships. Homology rela-
tionships will be incorporated by measuring the similarity between protein pair
using Pairwise Alignment. Structural relationships will be incorporated in terms
of protein domain and inter-domain linker region information. We are encouraged
by the fact that compositions of contacting residues in protein sequence are unique,
and that incorporating evolutionary and predicted structural information improves
the prediction of PPI [28].
54.2 Method
In this work, we present a simple yet effective method to predict PPI solely by amino
acid sequence information. The overview of the proposed method is illustrated in
Fig. 54.1. It consists of three main steps: (a) extract the homology relationship by
measuring regions of similarity that may reflect functional, structural or evolution-
ary relationships between protein sequences (b) downsize the protein sequences by
predicting and eliminating inter-domain linker regions (c) scan and detect domain
matches in all the protein sequences of interest.
The proposed method starts by measuring the PPI sequence similarity, which reflects
the evolutionary and homology relationships. Two protein sequences may interact
by the mean of the amino acid similarities they contain [24]. This work is moti-
vated by the observation that an algorithm such as Smith-Waterman (SW) [29],
which measures the similarity score between two sequences by a local gapped
alignment, provides relevant measure of similarity between protein sequences. This
638 N. Zaki
A Matrix
Summarizes
Eliminating Inter-Domain Linker Regions
the Protein–
Before Using the Pairwise Alignment
Protein
Interaction
where m is the number of the protein sequences. For example, suppose we have
the following randomly selected PPI dataset: YDR190C, YPL235W, YDR441C,
YML022W, YLL059C, YML011C, YGR281W and YPR021C represented by x1 ,
x2 , x3 , x4 , x5 , x6 , x7 and x8 respectively. The interaction between these eight
proteins is shown in Fig. 54.2.
54 Protein–Protein Interaction Prediction 639
YDR190C YPL235W X1 X2
YDR441C YML022W X3 X4
OR
YLL059C YML011C X5 X6
YGR281W YPR021C X7 X8
x1 x2 x3 x4 x5 x6 x7 x8
x1 X 465 28 30 25 30 34 29
x2 465 X 30 24 32 33 50 47
x3 28 30 X 553 29 27 32 29
x4 30 24 553 X 29 20 25 40
x5 25 32 29 29 X 24 28 49
x6 30 33 27 20 24 X 25 26
x7 34 50 32 25 28 25 X 36
x8 29 47 29 40 49 26 36 X
From M , higher score may reflect interaction between two proteins. S W .x1 ; x2 /
and S W .x2 ; x1 / scores are equal to 465; S W .x3 ; x4 / and S W .x4 ; x3 / scores are
equal to 553, which confirm the interaction possibilities. However, S W .x5 ; x6 / and
S W .x6 ; x5 / scores are equal to 24; S W .x7 ; x8 / and S W .x8 ; x7 / scores are equal
to 36, which are not the highest scores. To correct these errors more biological
information is needed, which lead us to the second part of the proposed method.
Fig. 54.3 An example of linker preference profile generated using Domcut. In this case, linker
regions greater than the threshold value 0.093 are eliminated from the protein sequence
Suyama’s method [32], we defined the linker index Si for amino acid residue i and
it is calculated as follows:
Linker
fi
Si D ln (54.3)
fiDomain
Where fiLinker is the frequency of amino acid residue i in the linker region and
fiDomain is the frequency of amino acid residuei in the domain region. The negative
value of Si means that the amino acid preferably exists in a linker region. A thresh-
old value is needed to separate linker regions as shown in Fig. 54.3. Amino acids
with linker score greater than the set threshold value will be eliminated from the
protein sequence of interest.
When applying the second part of the method, the matrix M will be calculated
as follows:
x1 x2 x3 x4 x5 x6 x7 x8
x1 X 504 30 30 25 32 34 27
x2 504 X 30 21 32 32 50 36
x3 30 30 X 775 29 24 38 29
x4 30 21 775 X 19 21 53 37
x5 25 32 29 19 X 28 28 24
x6 32 32 24 21 28 X 23 27
x7 34 50 38 53 28 23 X 339
x8 27 36 29 37 24 27 339 X
FromM , it’s clearly noted that, more evidence is shown to confirm the interac-
tion possibility between proteins x7 and x8 , and therefore, the result is furthermore
enhanced. In the following part of the method, protein domain knowledge will be
incorporated in M for better accuracy.
54 Protein–Protein Interaction Prediction 641
To test the method, we obtained the PPI data from the Database of Interacting
Proteins (DIP). The DIP database catalogs experimentally determined interactions
between proteins. It combines information from a variety of sources to create a
single, consistent set of PPIs in Saccharomyces cerevisiae. Knowledge about PPI
networks is extracted from the most reliable, core subset of the DIP data [34].
The DIP version we used contains 4,749 proteins involved in 15,675 interactions
for which there is domain information [6]. However, only high quality core set
of 2,609 yeast proteins was considered in this experimental work. This core set
is involved in 6,355 interactions, which have been determined by at least one small-
scale experiment or two independent experiments [35]. Furthermore, we selected
proteins interacts with only one protein and not involved in any other interactions.
This process results in a dataset of 150 proteins with 75 positive interactions as
shown in Fig. 54.4. The intention is to design a method capable of predicting pro-
tein interaction partner, which facilitate a way to construct PPI using only protein
sequences information.
We started our experimental work by measuring the protein–protein sequence
interaction similarity using SW algorithm as implemented in FASTA [36]. The
default parameters are used: gap opening penalty and extension penalties of 13
and 1, respectively, and a substitution matrix BLOSUM62 matrix. The choice of
which substitution matrix to use is not trivial because there is no correct scoring
scheme for all circumstances. The BLOSUM matrix is another very common used
642 N. Zaki
Fig. 54.4 Dataset of core interaction proteins used in the experimental work
amino acid substitution matrix that depends on data from actual substitutions. This
procedure produces the matrix M150X150 . This matrix was then enhanced by incor-
porating inter-domain linker regions information. In this case, only well defined
domains with sequence length ranging from 50 to 500 residues were considered.
We skipped all the frequently matching (unspecific) domains. A threshold value of
0.093 is used to separate the linker regions. Any residue generates an index greater
than the threshold value results in eliminating it. This procedure downsized the pro-
tein sequences without losing the biological information. In fact, running the SW
algorithm on a sequence having pure domains, results in better accuracy. A linker
preference profile is generated using the linker index values along an amino acid
sequence using a siding window. A window of size w D15 was used which gives
the best performance (different window sizes were tested).
Further more, protein domains knowledge will be incorporated in M150X150 .
In this implementation, ps scan [33] is used to scan one or several patterns from
PROSITE against the 150 protein sequences. All frequently matching (unspecific)
patterns and profiles are skipped. The M150X150 is then used to predict the PPI
network.
The performance of the proposed method is measured by how well it can predict
the PPI network. Prediction accuracy, whose value is the ratio of the number of
correctly predicted interactions between protein pairs to the total number of interac-
tions and non-interactions possibilities in network, is the best index for evaluating
the performance of a predictor. However, approximately 20% of the data are truly
interacting proteins, which leads to a rather unbalanced distribution of interacting
and non-interacting cases. Suppose we have six proteins, then the interacting pairs
are 1R 2 3
, R 4 and 5 R 6
, which result in three interactions cases out of a total of 15
interaction possibilities (12 non-interactions).
To assess our method objectively, another two indices are introduced in this
paper, namely specificity and sensitivity commonly used in the evaluation of
54 Protein–Protein Interaction Prediction 643
Specificity Sensitivity
0.703 0.61
Accuracy (100%)
Accuracy (100%)
0.7025 0.6 0.6067
0.702 0.7026 0.7026 0.59
0.7015 0.58
0.701
0.57
0.7005 0.5733
0.7 0.56
0.55 0.5467
0.6995 0.6992
0.699 0.54
Using only SW Eliminating inter- Adding protein Using only SW Eliminating inter- Adding protein
Algorithm domain linker domains Algorithm domain linker domains
regions evidence regions evidence
70.3
Accuracy (100%)
70.2
70.1 70.17 70.19
70
69.9
69.8
69.7 69.82
69.6
Using only SW Eliminating inter- Adding protein
Algorithm domain linker domains
regions evidence
information retrieval. A high sensitivity means that many of the interactions that
occur in reality are detected by the method. A high specificity indicates that most
of the interactions detected by the screen are also occurring in reality. Sensitivity
and specificity are combined measures of true positive (tp), true negative (tn), false
positive (fp) and false negative (fn) and can be expressed as:
Based on these performance measures, the method was able to achieve encour-
aging results. In Figs. 54.5 and 54.6, we summarized the sensitivity and specificity
results based on the three stages of the method. The figures clearly show improve-
ment in sensitivity but not much in specificity and that’s because of the big number
of non-interacting possibilities.
The overall performance evaluation results are summarized in Table 54.1.
644 N. Zaki
54.5 Conclusion
In this research work we make use of both homology and structural similarities
among domains of known interacting proteins to predict putative protein interaction
pairs. When tested on a sample data obtained from the DIP, the proposed method
shows great potential and a new vision to predict PPI. It proves that the combination
of methods predicts domain boundaries or linker regions from different aspects and
the evolutionary relationships would improve accuracy and reliability of the pre-
diction as a whole. However, it is difficult to directly compare the accuracy of our
proposed method because all of the other existing methods use different criteria for
assessing the predictive power. Moreover, these existing methods use completely
different characteristics in the prediction. One of the immediate future works is to
consider the entire PPI network and not to restrict our work on binary interaction.
Other future work will focus on employing more powerful domain linker region
identifier such as profile domain linker index (PDLI) [37].
References
Abstract One basic way that gene expression can be controlled by regulating tran-
scription. In prokaryotes, our basic understanding of how transcriptional control of
gene expression began with Jacob and Monod’s 1961 model of the Lac operon.
Recently, attempts have been made to model operon systems in order to understand
the dynamics of these feedback systems. In the present study, it was independently
attempted to model the Lac operon, based on the molecules and their activities on
the lac operon of an E. coli. By various ways and means in engineering, a flow chart
on the metabolic pathway of the lac operon activities of the E. coli was established.
55.1 Introduction
The control of gene expression has been a major issue in biology. One basic way
that gene expression can be controlled by regulating transcription. In prokaryotes,
our basic understanding of how transcriptional control of gene expression began
with Jacob and Monod’s 1961 model of the Lac operon (see Fig. 55.1), in which
a regulator protein controls transcription by binding to a target region called the
operator (see Fig. 55.2).
In the absence of this binding, RNA transcription is initiated when an RNA
polymerase is able to bind to a region, called the promoter, just upstream from
the operator. The RNA polymerase travels down stream and transcribes a series
of structural genes coding for enzymes in a particular metabolic pathway (see
Fig. 55.3). The operon system acts as a molecular switch involving positive or
negative control [1].
G. Sun (B)
Mechanics & Engineering Science Department, Fudan University, Shanghai 200433, China
E-mail: gang sun@fudan.edu.cn
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 647
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
648 A.H. Leo and G. Sun
Fig. 55.1 Organization of the lac genes of E. coli and the associated regulatory elements,
the operator, promoter, and regulatory gene [3]
Fig. 55.2 Functional state of the lac operon in E. coli growing in the absence of lactose [3]
Fig. 55.3 Functional state of the lac operon in E. coli growing in the presence of lactose as the sole
carbon source [3]
55 Cellular Computational Model of Biology 649
Recently, attempts have been made to model operon systems in order to under-
stand the dynamics of these feedback systems. In 2003, Yidirim modeled the Lac
operon as a series of coupled differential equations in terms of chemical kinetics [2].
Yidrim and Mackay note that their formulation assumes that they are dealing with
a large number of cells and that the dynamics of small numbers of individual cells
might be quite different. Meanwhile, it is hardly to obtain analytic solutions by
using concentrations, average-values, as the variables of dynamics in differential
equations, thus Laplace transform is necessary to solve the equations. Whereas, in
the present study, it was independently attempted to model the Lac operon, based
on the molecules and their activities on the lac operon of an E. coli. By various
ways and means in engineering, a flow chart (Fig. 55.4) on the metabolic pathway
of the lac operon activities of the E. coli was established. In terms of the amount of
each relevant molecule present in the cell, a set of difference equations, mathematic
model, comes on the basis of the flow chart. Since in form of difference equation,
the numerical solution of molecular state-variable are available, by recurring the
state-variables upon given a set of initial state or condition intuitionally.
Lactose
switch
E. coli membrance
RNA Polymerase
transcribing Lac gene
L
n inducer/effector
G
translation
Galactose = = >
glucose R
¦ Âgalactosidase
enzyme
Glucose metabolizing
Transacetylase
enzyme
1
Permease
enzyme
G
An E. coli cell contains about 3,000 copies of holoenzyme, a form of RNA poly-
merase [3]. The DNA-length of lacZYA is [1],
It was found that mRNA half-lives were globally similar in nutrient-rich media,
determined from a least-squares linear fit. Approximately, 80% of half-lives ranged
between 3 and 8 min in M9 C glucose medium, and 99% of the half-lives measured
were between 1 and 18 min. In the LB, 99% of half-lives were between 1 and 15 min.
The mean half-life was 5.7 min in M9 C glucose and 5.2 min in the LB [4]. The
transcription of a polymerase approaches at a rate of average 30–50 nucleotides
per second [3]. But in some defined media, the general time was approximately
tripled [4]. The lac mRNA is extremely unstable, and decays with a half-life of only
3 min [3]. Another article indicates the average half-life of an mRNA in E. coli is
about 1.8 min. As lactose is no longer present, the repressor will be activated, and
thus bind to the operator, making the transcription action stopped.
Considering the above conditions, mR.n C 1/; rTRSCRB, and R are able to be
determined in the presence of lactose. The variable, mR (nC1), equates, elapsing the
period of , the amount remained of mR (n) through the halving attenuation, plus
the amount produced by transcription triggered in the presence of the allolactose
from the lactose.
55 Cellular Computational Model of Biology 651
5000
4000
3000
2000
1000
0
0 20 40 60 80 min
Given supplying E. coli cells lactose, ˇ-galactosidase enzyme will appears in min-
utes, so do lac permease in membrane and transacetylase, a third protein. The level
of ˇ-galactosidase enzyme can accumulate to 10% of cytoplasmic protein [5]. Once
a ribosome approaches away of the initiative site on an mRNA, another one will
locate at the initiative site. Thus, many ribosomes may well simultaneously be trans-
lating each mRNA. An average mRNA has a cluster of 8–10 ribosomes, named as
polysome, for synthesizing protein [3].
Where cells of E. coli are grown in the absence of lactose, there is no need of
ˇ-galactosidase. An E. coli cell contains 3–5 molecules of the enzyme [3]. That
is, a dozen of the repressors bind and unbind rather than just bind and stay on the
mRNA. In a fraction of a second, after one repressor unbinds and before another
binds, an mRNA polymerase could initiate a transcription of the operon, even in
the absence of lactose. [3]. In 2–3 min after adding lactose, there soon are 3,000–
5,000 molecules of ˇ-galactosidase per E. coli cell [1, 3]. The ˇ-galactosidase in
the E. coli is more stable than the mRNA, whose half-life is more than 20 h. The
half-life of the permease is more than 16 h [1]. So that, the ˇ-galactosidase activity
remains at the induced level for longer. To add or change codons at the 50 end of the
652 A.H. Leo and G. Sun
gene to encode N-terminal amino acid within the DNA-lacZYA will provide greater
resistance to ˇ-galactosidase production [6]. As a porter, the permease compound
can be to concentrate lactose against the gradient across the cellular membrane in
ad hoc manner for E. coli. The concentration of the substrate against a gradient can
be achieved up to 103 to 106 -fold. The existing ˇ-galactosidase molecules will be
diluted out by cell division [5]. Whereas, replicating the DNA of an E. coli needs
84 min [3], the half-life is 100 min for simplicity, or 30 min for conservation. This
point of view could be used to derive a model in colony of the bacteria. The DNA-
length of lac ZYA [1]:
x 105
2
1.5
0.5
0
0 20 40 60 80 min
Fig. 55.6 The simulated curve of G(n), the permease or ˇ -galactosidase of the cell
55 Cellular Computational Model of Biology 653
( x
2e when x 0
f .x/ D x (55.5)
e else
x 106
5
0
0 20 40 60 80 min
Fig. 55.7 The simulated curves of the Lext (n) and L(n), the external and internal lactose molecules
of the cell
654 A.H. Leo and G. Sun
1.8
1.6
1.4
1.2
0.8
0.6
0.4
0.2
-2 0 2 4 6
x 106
ATP half-life,A, is about 15 min in water at room temperature, estimating 2 min for
to convert a copy of lactose into 72 ATPs. The energy resource for E. coli growth
is from the lactose in form of ATP at 72 in scale. The number of ATP for the
transportation-in of a lactose by a permease in energy equates about one. There-
fore, the transcription of an mRNA needs 5,115 ATPs (in forms of ATP, UTP, CTP,
or GTP), and the translation of an mRNA needs 1,700 ATPs.
Given the consumption of energy, comprising transcribing, translating, as well
as transporting lactose into the cell, count 80% of total energy, as is the primary life
activity, then a coefficient d for energy distribution set at 0.8. The remaining energy
is for the other metabolic activities.
With regard to the rmeta , G.n/, , steps n C 1 and n, the energy generated by the
glucose is distributed among producing mRNA and ˇ-Galactosidase, and bringing
the lactose into E. coli cell (Fig. 55.9–55.11).
An energy-distributional equation is as follows:
0:69
A.n C 1/ D e A A.n/ C 72 rmeta G.n/ Œ5115rTRSCRB
C 1700rTRSLTN mR.n/ C max rTRSPT f .Lext .n/ L.n// G.n/
(55.6)
In the above equation, first term is accumulated energy before step n; second term
is the energy created from the lactose, and third term is the consumed for the three
activities. If the third is greater or equal to the second, there will not be energy
55 Cellular Computational Model of Biology 655
x 107
2.5
1.5
0.5
0
0 20 40 60 80 min
Fig. 55.9 Simulated curve of the ATP molecules consumed within the cell
x 107
3
2.5
1.5
0.5
0
0 20 40 60 80 min
Fig. 55.10 Simulated function curve of the ATP molecules created within the cell
0:69
mR.n C 1/ D e R mR.n/ C rTRSCRB c.n/; (55.1)
0:69
G.n C 1/ D e G G.n/ C rTRSLTN m R.n/ c.n/; (55.2)
Lext .n C 1/ D Lext .n/ max rTRSPT G.n/;
f .Lext .n/ L.n// c.n/; (55.3)
L.n C 1/ D L.n/ C max rTRSPT G.n/ f .Lext .n/ L.n//;
c.n/ rmeta L.n/: (55.4)
They are solved numerically by MATLAB through setting the following initial
condition:
A traditional experiment culturing E. coli in tubes offered the following graph [6].
In comparison of the outcomes of the presented equations and the traditional
experimental shown as Fig. 55.2, the results are consistent in a mutual time scale;
and so does mRNA in Fig. 55.1. In addition, more information on the cell, such as
external and internal lactose, ATP consumption, etc. in a cell of E. coli are available
by the model, but not in the other methods.
55.9 Conclusion
55.10 Discussion
This paper was based on the authors’ background of computer science, Electri-
cal Engineering, and Genetics. This paper discusses the growth of an E. coli cell
supplied with mere lactose. As a core governing the state-variables of the E. coli
cell’s molecules on the lac operon, the model could be developed into another set
of formula for colony, with regard to cellular divisions in compliance with normal
distribution in mathematics. The set of formula may well model a colony of the bac-
teria by halving the values of state-variable in various periods, simulating cellular
division. Finally, the formula could be transformed into corresponding differential
equations. In addition, effects of parameters and d in the Eqs. (55.5) and (55.8)
should be studied further respectively. The rate or behavior pattern of the permeases
transporting lactose across the membrane of an E. coli cell is rather approximate.
Finally, it is noticeable that the simulated numerical solutions of the model vary
slightly with value. The phenomenon probably is of the theoretical approximate
hypothesis in the Section 55.2.
Since on individual cellular level, a colony model could be developed rather to
describe and simulate more complex activities and regular patterns of the represen-
tative substances of the colony, by considering individual cell division, and could
further be developed into a general quantitative analysis for cellular biology in high
precise.
658 A.H. Leo and G. Sun
References
Abstract The increasing request of patients, suffering of chronic diseases, who wish
to stay at home rather then in a hospital and also the increasing need of homecare
monitoring for elderly people, have lead to a high demand of wearable medical
devices. Also, extended patient monitoring during normal activity has become a very
important target. Low power consumption is essential in continuously monitoring of
vital-signs and can be achieved combining very high storage capacity, wireless com-
munication, and ultra-low power circuits together with firmware management of
power consumption. This approach allows the patient to move unconstraint around
an area, city or country. In this paper the design of ultra low power wearable
monitoring devices based on ultra low power circuits, high storage memory flash,
bluetooth communication and the firmware for the management of the monitoring
device are presented.
56.1 Introduction
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 659
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
660 R.G. Lupu et al.
receive information that has a longer time span than a patient’s normal stay in
a hospital and this information has great long-term effects on home health care,
including reduced expenses for healthcare.
Despite the increased interest in this area, a significant gap remains between
existing sensor network designs and the requirements of medical monitoring. Most
telemonitoring networks are intended for deployments of stationary devices that
transmit acquired data at low data rates and rather high power consumption. By
outfitting patients with wireless, wearable vital sign devices, collecting detailed real-
time data on physiological status can be greatly simplified. But, the most important
effect is the widely social integration of peoples with disabilities or health problems
that need discreet and permanent monitoring.
It is well known that Bluetooth represents a reliable and easy solution for sig-
nal transmission from a portable, standalone unit to a nearby computer or PDA [3].
When a real time rudimentary analysis program detects pathological abnormality
in the recorded data an alert is sent to the telemonitoring centre via internet or
GSM/GPRS using a PDA or computer.
The needs of the society and market trend suggest that healthcare-related applica-
tions will be developed significantly. Particularly, PDA phones interfaced with wear-
able sensors have an enormous market as they will enlarge the goal of healthcare and
mobile devices [4].
The development of ultra low power wearable monitoring device unit is pro-
pounded. The main functions of the wearable device consist in signal acquisition,
rudimentary processing, local data storage and transmission to the remote care
centre. In Fig. 56.1, the general architecture of the telemonitoring network is
presented.
This paper presents the initial results and our experiences for a prototype medi-
cal wearable device for patient monitoring taking into account some hardware and
software aspects regarding vital signs measurement, robustness, reliability, power
consumption, data rate, security and integration in a telemonitoring network.
Respiratory
Acceleromatters
rate
SD/MMC
ECG ADC
ECG Amplifier CARD
3 Leads
Ultra low power
Digital Port Keyboard
microcontroller
LED Drive Circuit
SaO2
Input Front End DAC Bluetooth
Heart rate
Circuit and LED
Temperature Digital Port Digital
Temperature sensor
board
The monitoring device is build using custom developed hardware and application
software. The block diagram is presented [5] in Fig. 56.2. Low power amplifiers
and sensor are connected to the device [6], for vital parameters acquisition. The
health parameters acquired are: heart rate, heart rhythm regularity, respiratory rate,
oxygen saturation and body temperature. The monitoring device includes also a
custom made three-lead ECG amplifier.
The digital board includes an ultra low power microcontroller and interface cir-
cuits for keyboard, SD/MMC card and Bluetooth. The low power sensors for the
above mentioned parameters are connected to the digital board using digital port or
specific converters.
The wireless communication via Bluetooth with a computer or PDA and the flash
memory for raw data information storage are two of the most important facilities of
the proposed wearable device.
The software application running on computer and PDA assures the communica-
tion via internet respectively GSM/GPRS [3, 7].
The monitoring device is built using an ultra low power microcontroller (MSP430
from Texas Instruments) that has a 16 bit RISC core with clock rates up to 8 MHz.
An important advantage of this family of microcontrollers is the high integration of
some circuits that offer the possibility to design devices with open architecture: dig-
ital ports, analog to digital converter (ADC) and digital to analog converter (DAC).
MSP430 microcontroller also has a built in hardware multiplier which is a valuable
resource for digital filter implementation. The custom made electronic boards used
for data acquisition are all designed with low power circuits. The system is modular:
every board is detachable.
The on board storage device (SD/MMC card) with FAT32 file system is used
for raw data recording together with signals processing results. The radio module
Bluetooth [8] designed for short-range communication is used for data transmis-
sion between monitoring device and PC or PDA. This module has low power
consumption being used only when data transfer is performed.
662 R.G. Lupu et al.
56.3 Functionality
The ultra low power wearable monitoring device is able to acquire simultane-
ously the physiological parameters mentioned in the previous section and also
to perform rudimentary digital signal processing on board. The signals are con-
tinuously recorded in separate files on flash memory for feature analysis. Once
pathological abnormality is detected, the monitoring device requests a transmission
trough PC or PDA to the remote care centre.
For the respiratory rate it is used a low power three axis low-g accelerometer
MMA7260QT [9]. This is a low cost capacitive micromachined accelerometer that
features signal conditioning, a one-pole low pass filter, temperature compensation
and g-Select which allows for the selection among four sensitivities.
The accelerometer outputs are read and processed once at 10 ms (100 times per
second).
Because this is a low power system, the accelerometer is disabled between two
conversion cycles. Each conversion cycle has a 2 ms startup period for accelerome-
ters to recover from sleep mode.
Due to the sensitivity of the accelerometer and the high resolution of analog to
digital converter, the useful signal (the output voltages of accelerometer) has a large
amount of superposed noise with possible sources in body moving or even heart
beating. In Fig. 56.3, the acquired signal affected by noise is presented. There can
be seen the initial rise of voltage corresponding to the start of chest movement. This
is explained by the fact that the movable central mass of g cell moves in opposite
direction of the movement. As the chest movement velocity becomes constant, the
acceleration decreases and movable central mass comes back to initial position. This
settling process implies acceleration opposite to the direction of movement.
Noise filtering is achieved by using a 128 point rolling average filter. Math-
lab simulation was used with real datasets for filter calibration. Without filtering,
the signal is almost unusable for further processing. After filtering, the noise is
substantially reduced.
To easily distinguish between positive and negative accelerations (e.g. to deter-
mine the sense of movement) the accelerometer has an offset of half of supply volt-
age for 0 g acceleration. So, for positive or negative accelerations the corresponding
output voltage is positive in all cases.
To make the difference between positive and negative accelerations, it is neces-
sary to fix the reference value corresponding for 0 g. This is done at the application
startup, when an average with 512 points is made.
In Fig. 56.4, the results of the algorithm are plotted. The first plot is the raw
unfiltered data from accelerometers outputs in millivolts. This plot is a 2,500 points
record of accelerometer output for a time period of 5 s. The second plot represents
the filtered input acceleration.
The custom made ECG module uses three leads. The ECG signals are acquired
using micro power instrumentation amplifiers and micro power operational ampli-
fiers. Both are single power supply. A very important feature of the instrumen-
tation amplifier is that it can be shutdown with a quiescent current of less than
1 μA (Fig. 56.5). Returning to normal operations within microseconds, the shut-
down feature makes is optimal for low-power battery or multiplexing applications
[10]. The propounded functioning is: on power on the instrumentation ampli-
fier is shutdown. With 10 μs before a conversion cycle starts the instrumentation
amplifier is returned from shutdown mode and back again when the conversion cycle
finishes.
The temperature is monitored every minute using an ultra low power temperature
sensor.
Meanwhile, the sensor is shutdown saving maximum power by shutting down
all device circuitry other then the serial interface, reducing current consumption to
typically less than 0:5 μA [11].
Because the read/write operation with flash card can reach a current consumption
of max 60 mA, these are done with blocks of data [12]. The file system on flash card
is FAT32 witch gives the possibility of reading those files with any PC with a card
reader.
The average current consumption of Bluetooth circuit is around of 35 mA. For
this reason, the Bluetooth communication is used in two ways: always connected
transmitting live or stored data (signal processing and data storing are not per-
formed) and not connected, the circuit is in sleep mode (signal processing and data
storing are performed). If a critical situation occurs the circuit is awaken, the com-
munication starts, and in the same time data storage is performed. When files are
downloaded every monitoring operation is stopped.
In the case of blood oxygenation measurement, for driving and controlling the
LEDs a solution proposed by Texas Instruments was used [9]. It is based on two
LEDs, one for the visible red wavelength and another for the infrared wavelength.
The photo-diode generates a current from the received light. This current signal is
amplified by a trans-impedance amplifier. OA0 (microcontroller built in), one of the
three built in op-amps, is used to amplify this signal. Since the current signal is very
small, it is important for this amplifier to have a low drift current [9].
56 Patient Monitoring: Wearable Device for Patient Monitoring 665
Because of the high level of analog circuits’ integration the external components
involved in hardware development are very few. Furthermore, by keeping the LEDs
ON for a short time and power cycling the two light sources, the power consumption
is reduced [9].
The developed firmware consists of several tasks and each of them manages a par-
ticular resource as presented in Fig. 56.6. The communication between tasks is
implemented with semaphores and waiting queues allowing a high level of paral-
lelism between processes. Each process may be individually enabled or disabled.
This feature is very important in increasing the flexibility of the application: if real
time monitoring is desired, then SD Card Process may be disabled and Bluetooth
Process is enabled, if only long term monitoring is desired then SD Card Process
is enabled and Bluetooth Process may be disabled. This has a positive impact on
power consumption because only the resources that are needed are enabled for use.
A logical cycle of the device operation is presented in Fig. 56.6. In the first step
(thread), a buffer of data is acquired. The ADC module works in sequence mode and
acquires data from several channels at a time. The ADC channels are associated with
temperature, accelerometer, ECG signals and SaO2 inputs of electronic interface
modules. One sequence of ADC conversion is made by one sample for each of
these inputs. The data is formatted and then stored in a buffer. Only when the buffer
is full the following thread may begin. Depending on application requirements the
storage, transmission or analyze of the data may be performed.
Encryption of personal data is provided. Some files that store personal data
related to the patient are encrypted and can be accessed only using the proper
decryption algorithm. Only a part of the files are encrypted because there must be
kept a balance between power consumption and computing power requirements. For
space saving, a light compression algorithm may be activated as an extra feature of
the device. The activation of this feature has a negative impact on the overall power
consumption.
The communication process implements an application layer protocol for data
exchange between the device and other Bluetooth enabled devices. The communi-
cation requirements must be limited to a minimum rate in order to save power. The
Bluetooth module is powered on when a message has to be sent. The message trans-
mission rate is kept low by using internal buffering and burst communication. The
Bluetooth module can be also waked up by an incoming message that may embed
commands to the device.
The analysis process implements some rudimentary signal processing tasks
in order to detect anomalies in physiological activities. Only few vital parame-
ters anomalies are locally detected (by the wearable device) and trigger an alarm
event. More complex signal processing and dangerous condition detection are
666 R.G. Lupu et al.
implemented on a remote computer which has the required computing power. The
remote computer receives raw data from the device and does the required analysis.
The physical processes that are monitored have a slow changing rate leading to
a low sampling rate requirement. This allows the controller to be in sleep mode for
an important percentage of operating time.
56 Patient Monitoring: Wearable Device for Patient Monitoring 667
The acquisition of one full data buffer takes the most time of the operation cycle.
The time needed for the execution of each remaining thread is shorter compared to
the ADC process.
A double buffering scheme is used: meanwhile a buffer is filled up with new data
samples, the second buffer may be processed (stored on SD card, transmitted via
Bluetooth or locally analyzed in order to detect conditions that should trigger alarm
events). When the new data buffer is acquired the buffers change (simple pointer
assignment) occurs: the previous processed buffer becomes empty, the acquired data
will be placed in it and the actual filled buffer will be processed.
The device must be reliable. For this purpose a power monitoring function is
designed. To implement this feature the ADC module is used. One of its inputs
monitors the voltage level across the batteries. The alarm events are designed to
preserve as much power as possible. If a voltage threshold is reached then an alarm
event is triggered and a resource could be disabled or its usage restricted only if
configured so. For example, the messages content and its transmission period could
be modified in order to save power but also to ensure a safe operation.
The computer software (see Fig. 56.7) is a MDI application (Multiple Document
Interfaces). A connection can be established with more than one device (each one
has a unique id). The software is always listening for connection with registered
Bluetooth monitoring device. This means that before data transmission, the device
id has to be known by the software. This is done by entering the code manually,
or by answering yes, when the application prompts for a communication request.
The ID is stored for further use. The communication session is established in two
ways: by the device when a critical situation occurs or by the computer when a
real time monitoring or file download is requested. This is done by sending a break
condition to the device to wake it from the deep sleep state [8]. The software assures
the connection to a server via Internet and also data transmission.
56.5 Conclusion
The work of this paper focuses on design and implementation of an ultra low
power wearable device able to acquire patient vital parameters, causing minimal
discomfort and allowing high mobility. The proposed system could be used as a
warning system for monitoring during normal activity or physical exercises. The
active collaboration with The Faculty of Biomedical Engineering from University
of Medicine and Pharmacy from Iasi (Romania) and with several hospitals offered
the opportunity to test the prototype. Preliminary results were so far satisfactory.
This wearable device will be integrated into SIMPA healthcare systems to provide
real-time vital parameters monitoring and alarms.
In conclusion, this paper presents some challenges of hardware and software
design for medical wearable device based on low-power medical sensors and
microcontroller with a recent tremendous impact in many medical applications.
Obviously, the results demonstrate that there is still significant work to be done if
the wearable device is effectively integrated in a large network of medical sensors.
References
Abstract In this chapter, faces are detected and facial features are located from
video and still images. ‘NRC-IIT Facial video database’ is used as image sequences
and ‘face 94 color image database’ is used for still images. Skin pixels and non-skin
pixels are separated and skin region identification is done by RGB color space. From
the extracted skin region, skin pixels are grouped to some meaningful groups to
identify the face region. From the face region, facial features are located using seg-
mentation technique. Orientation correction is done by using eyes. Parameters like
inter eye distance, nose length, mouth position, and DCT coefficients are computed
which is used for a RBF based neural network.
57.1 Introduction
The technological advancement in the area of digital processing and imaging has
led to the development of different algorithms for various applications such as auto-
mated access control, surveillance, etc. For automated access control, most common
and accepted method is based on face detection and recognition. Face recognition
is one of the active research areas with wide range of applications. The problem is
to identify facial image/region from a picture/image. Generally pattern recognition
problems rely upon the features inherent in the pattern for efficient solution. Though
face exhibits distinct features which can be recognized almost instantly by human
eyes, it is very difficult to extract and use these features by a computer. Human can
identify faces even from a caricature. The challenges associated with face detection
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 669
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
670 V.P. Lekshmi et al.
and recognition are pose, occlusion, skin color, expression, presence or absence of
structural components, effects of light, orientation, scale, imaging conditions etc.
Most of the currently proposed methods use parameters extracted from facial
images. For access control application, the objective is to authenticate a person
based on the presence of a recognized face in the database.
Here skin and non-skin pixels are separated and the pixels in the identified skin
region are grouped to obtain face region. From the detected face area, the facial fea-
tures such as eyes, nose and mouth are located. Geometrical parameters of the facial
features and the DCT coefficients are given to a neural network for face recogni-
tion. The expressions with in the face regions are analyzed using DCT and classified
using a neural network.
A lot of research has been going on in the area of human face detection and
recognition [1]. Most face detection and recognition methods fall into two cate-
gories: Feature based and Holistic. In feature-based method, face recognition relies
on localization and detection of facial features such as eyes, nose, mouth and
their geometrical relationships. In holistic approach, entire facial image is encoded
into a point on high dimensional space. Principal Component Analysis (PCA) and
Active Appearance Model (AAM) [2] for recognizing faces are based on holistic
approaches. In another approach, fast and accurate face detection is performed by
skin color learning by neural network and segmentation technique [3]. Indepen-
dent Component Analysis (ICA) was performed on face images under two different
conditions [4]. In one condition, image is treated as a random variable and pixels
are treated as outcomes and in the second condition pixels are treated as random
variables and image as outcome. Facial expressions are extracted from the detailed
analysis of eye region images is given in [5]. Large range of human facial behavior
is handled by recognizing facial muscle actions that produce expressions is given
in [6]. Video based face recognition is explained in [7]. Another method of classi-
fication of facial expression using Linear discriminant analysis (LDA) is explained
in [8] in which the Gabor features extracted using Gabor filter banks are compressed
by two stage PCA method.
C D C D> C 1 D CT (57.2)
On applying the DCT, the input signal will get decomposed into a set of basis
images. For highly correlated data, cosine transforms show excellent energy com-
paction. Most of the energy will be represented by a few transform coefficients.
Radial functions are a special class of function. Their characteristic feature is that
their response decreases (or increases) monotonically with distance from a central
point. The center, the distance scale, and the precise shape of the radial function
are parameters of the model, all fixed if it is linear. A typical radial function is the
Gaussian which, in the case of a scalar input, is
Its parameters are its centre c and its radius r. Universal approximation theorems
show that a feed forward network with a single hidden layer with non linear units can
approximate any arbitrary function [Y]. No learning is involved in RBF networks.
For pattern classification problems, the numbers of input nodes are equal to the
number of elements in the feature vector and the numbers of output nodes are equal
to the number of different clusters.A Radial Basis Function network is shown in
Fig. 57.1.
57.5 Method
The method explains detection of faces from video frames and still images. This is
followed by extraction of facial features, recognition of faces and analyzing facial
expressions.
672 V.P. Lekshmi et al.
f(x)
ω1 ωj ωm
x1 xj xn
The first step in face detection problem is to extract facial area from the background.
In our approach, both still and video images are used for face detection. Image
frames from video are extracted first. The input images contain regions other than
face such as hair, hat etc. Hence it is required to identify the face. Each pixel in
the image is classified as skin pixel or non-skin pixel. Different skin regions are
detected from the image. Face regions are identified from the detected skin region as
in [9] which addressed the problem of face detection in still images. Some randomly
chosen frames with different head pose, far away from the camera, and expressions
from video face images are extracted as shown in Fig. 57.2. The difference image at
various time instances is shown in Fig. 57.3.
The portion of the image that is moving is assumed as head. The face detection
algorithm used RGB color space for the detection of skin pixels. The pixels corre-
sponding to skin color of the input image is classified according to certain heuristic
rules. The skin color is determined from RGB color space as explained in [10, 11].
A pixel is classified as skin pixel if it satisfies the following conditions.
R > 95 AND G > 40 AND B > 20 AND max fR; G; Bg min fR; G; Bg
(57.4)
> 15 AND jR Gj > 15 ANDR > GANDR > B
OR
57 Face Recognition and Expression Classification 673
Fig. 57.4 Edge detection and skin region identification from video frames
From these skin regions, it is possible to identify whether a pixel belongs to skin
region or not. To find the face regions, it is necessary to categorize the skin pixels
in to different groups so that it will represent some meaningful groups such as face,
hand etc. Connected component labeling is performed to classify the pixels. In the
connected component labeling operation, pixels are connected together geometri-
cally. In this, 8-connected component labeling used so that each pixel is connected
to its eight immediate neighbors. At this stage, different regions are identified and
each region has to be classified as a face or not. This is done by finding the skin
area of each region. If the height
p to width ratio of those skin region falls with in the
range of golden ratio (.1 C 5/=2˙ tolerance), then that region is considered as a
face region.
57.5.2.1 Segmentation
Image segmentation is a long standing problem in computer vision. There are differ-
ent segmentation techniques which divides spatial area with in an image to different
meaningful components. Segmentation of images is based on the discontinuity and
similarity properties of intensity values. Cluster analysis is a method of grouping
objects of similar kind in to respective categories. It is an exploratory data analy-
sis tool which aims at sorting different objects in to groups in a way that degree
of association between two objects is maximal if they belong to same group and
minimal otherwise. K-Means is one of the unsupervised learning algorithms that
solve the clustering problems. The procedure follows a simple and easy way to clas-
sify a given data set through a certain number of clusters (assume K clusters) fixed a
57 Face Recognition and Expression Classification 675
priori. The idea is to define K centroids, one for each cluster. These centroids should
be placed in a cunning way because of different location causes different results. So
the better choice is to place them as much as possible far away from each other. The
next step is to take each point belonging to a given dataset and associate it to the
nearest centroid. When no point is pending, the first step is completed and an early
groupage is done. At this point K new centroids need to be re-calculated as barycen-
tres of the clusters resulting from previous step. After getting these K new centroids,
a new binding has to be done between the same dataset points and the nearest new
centroid. A loop has been generated. As a result of this loop it may be noticed that
the K centroids change their location step by step until no more changes are done.
If we know the number of meaningful groups/classes based on the range of pixel
intensity, weighted K-means clustering can be used to cluster the spatial intensity
values.
In facial images the skin color and useful components can be generally classified
as two different classes. But if two class based clustering is used, it may result
in components that may be still difficult to identify. So three classes are used and
are able to cluster the data in useful manner. Initial cluster centers are calculated
using histogram. Then K-means clustering algorithm computes distance between
the different pixels and cluster centers and selects a minimum distance cluster for
each pixel. This process continues until all pixels are classified properly.
The results of clustering algorithm are shown in Fig. 57.6. Class-I is selected
since it is possible to separate the components properly compared to other classes.
Then connectivity algorithm is applied to all the components in the clustered face.
Eye regions are located in the upper half of skin region and can be extracted using
the area information of all the connected components. Using eye centers, orientation
is corrected by rotation transformation. After calculating the inter eye distance, nose
and mouth are identified as they generally appear along the middle of two eyes in
the lower half. The inter eye distance, nose length and mouth area are computed.
To keep track of the overall information content in the face area, DCT is used. On
applying DCT, most of the energy values can be represented by a few coefficients.
First 64 64 coefficients are taken as part of the feature set. The face area calcu-
lated from the skin region is taken as another parameter. Parameters like inter eye
distance, nose length, mouth position, face area and DCT coefficients are computed
and given to a RBF based neural network.
For expression analysis, face images are considered from JAFFE face database [12].
Figure 57.7 shows some images in the database. There were ‘K’ images with ‘N’
expressions for each face so that ‘K N’ face images are used as the database. Nor-
malization is done to make the images with uniform scale. The facial expressions
are analyzed by facial feature extraction. Facial expressions are dominated in eye
and mouth regions. These regions are located first and then extracted by cropping
the image. The face image with its features cropped is shown in Fig. 57.8.
DCT is applied to these cropped images. The DCT coefficients are given to
the RBF based neural network which classifies the expressions. Four expressions
namely ‘Happy’, ‘Normal’, ‘Surprise’ and ‘Angry’ are considered for the analysis.
57.6 Results
In this experiment 200 frames of video images with considerable variations in head
poses, expressions, camera viewing angle were used. The ‘face 94’ color image
database was used for still images. One hundred face images were selected with
considerable expression changes and minor variation in head turn, tilt and slant.
The performance ratios were 90% for video image sequence and 92% for still
images. The results of face detection for both still and video images are shown in
Figs. 57.9 and 57.10. Figure 57.11 shows distinct connected components. Rotation
transformation is done by eyes. Figure 57.12 shows the located features. Geomet-
ric parameters and DCT coefficients were given to a classification network for
recognition.
Facial expressions are analyzed using JAFFE face database. Three sets of datasets
from JAFFE database were used. Four expressions namely ‘Happy’, ‘Normal’, ‘Sur-
prise’ and ‘Anger’ were analyzed. The DCT coefficients obtained from the cropped
regions of faces which were very sensitive to facial expressions were given to the
neural network which classifies the expressions with in the face. The average per-
formance ratio is 89.11%. The efficiency plots for three sets of images are shown in
Fig. 57.13.
57.7 Conclusion
In this paper, faces are detected and facial features are located from video and still
images. ‘NRC-IIT Facial video database’ is used as image sequences and ‘face
94 color image database’ is used for still images. Skin pixels and non-skin pixels
are separated and skin region identification is done by RGB color space. From the
extracted skin region, skin pixels are grouped to some meaningful groups to identify
the face region. From the face region, facial features are located using segmentation
technique. Orientation correction is done by using eyes. Parameters like inter eye
distance, nose length, mouth position, and DCT coefficients are computed which is
used for a RBF based neural network. In this experiment only one image sequence
is used for detection of faces.
Facial expressions namely ‘Happy’, ‘Neutral’, ‘Surprise’ and ‘Anger’ are ana-
lyzed using JAFFE face database. Facial features such as eyes and mouth regions
are cropped and these areas are subjected to discrete cosine transformation. The
DCT coefficients are given to a RBF neural network which classifies the facial
expressions.
57 Face Recognition and Expression Classification 679
References
1. Rama Chellappa, Charles L. Wilson and Saad Sirohey, “Human and Machine Recognition of
Faces: A Survey” In Proceedings of the IEEE, Vol. 83, No. 5, 1995, 705–740.
2. Nathan Faggian, Andrew Paplinski, and Tat-Jun Chin, “Face Recognition from Video
Using Active Appearance Model Segmentation”, 18th International Conference on Pattern
Recognition, ICPR 2006, Hong Kong, pp. 287–290.
3. Hichem Sahbi, Nozha, Boueimma, Tistarelli, J. Bigun, A.K. Jain (Eds), and “Biometric
Authentication” LNCS2539, Springer, Berlin/Heidelberg.
4. Marian Stewrt Bartlett, Javier R, Movellan and Terrence J Seinowski, “Face Recognition by
Independent Component Analysis”, IEEE Transactions on Neural Networks, Vol. 113, No. 6,
November 2002.
5. Tsuyoshi Moriyama, Takeo Kanade, Jing Xiao and Jeffrey F. Cohn “Meticulously Detailed
Eye region Model and Its Application to Analysis of Facial Images”, IEEE Transactions in
Pattern Analysis and Machine Intelligence, Vol. 28, No. 5, May 2006.
6. Yan Tong, Weinhui Lio and Qiang Ji, “Facial Action Unit Recognition by Exploiting Their
Dynamic and Semantic Relationships”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 29, No. 10, October 2007.
7. Dmitry O. Gorodnichy “Video-based Framework for Face Recognition in Video”, Second
Workshop on Face Processing in Video. (FPiV’05) in Proceedings of Second Canadian Con-
ference on Computer and Robot Vision (CRV’05), Victoria, BC, Canada, 9–11 May, 2005, pp.
330–338.
8. Hong-Bo Deng, Lian-Wen Jin, Li-Xin Zhen and Ian-Cheng Huang, “A New Facial Expression
Recognition Method Based on Local Gabor Filter Bank and PCA plus LDA”, International
Journal of Information Technology, Vol. 11, No. 11, 2005 pp. 86–96.
9. K. Sandeep and A.N. Rajagopalan, “Human Face Detection in Cluttered Color Images Using
Skin Color and Edge Information”. Proc. Indian Conference on Computer Vision, Graphics
and Image Processing, Dec. 2002.
10. P. Peer and F. Solina, “An Automatic Human Face Detection Method”, Proceedings of 4th
Computer Vision Winter Workshop (CVWW’99). Rastenfeld, Australia, 1999, pp. 122–130.
11. Franc Solina, Peter Peer, Borut Batagelj and Samo Juvan,“15 Seconds of Frame-An Interac-
tive Computer Vision Based Art Installation”, Proceedings of 7th International Conference on
Control, Automation, Robotics and Vision (ICARCV 2002), Singapore, 2002, pp. 198–204.
12. Michael J. Lyons, Shigeru Akamatsu, Miyuki Kamachi and Jiro Gyoba,“Coding Facial
Expressions with Gabor Wavelets”, Proceedings, Third IEEE International Conference on
Automatic Face and Gesture Recognition, April 14–16, 1998, Nara Japan, IEEE Computer
Society, pp. 200–205.
Chapter 58
Spiking Neurons and Synaptic Stimuli: Neural
Response Comparison Using Coincidence-Factor
Abstract In this chapter, neural responses are generated by changing the Inter-
Spike-Interval (ISI) of the stimulus. These responses are subsequently compared
and a coincidence factor is obtained. Coincidence-factor, a measure of similarity, is
expected to generate a high value for higher similarity and a low value for dissim-
ilarity. It is observed that these coincidence-factors do not have a consistent trend
over a simulation time window. Also, the lower-bound limit for faithful behaviour
of coincidence factor shifts towards the right with the increase in the reference ISI
of the stimulus. In principle, if two responses have a very high similarity, then their
respective stimuli should be very similar and could possibly be considered the same.
However, as results show, two spike trains generated by highly-varying stim-
uli have a high coincidence-factor. This is due to limitations imposed by the
one-dimensional comparison of coincidence-factor.
58.1 Introduction
The responses of a neuron to various types of stimuli have been studied extensively
over the past years [1–9]. Stimulus-dependent behaviour of neurons has already
been pursued to understand the spiking responses and it is thought that either the
firing rate or firing time of individual spikes carries specific information of the neu-
ronal response [3, 10–16]. The response of the neurons studied above has a constant
magnitude whose variance is very low. In this paper, the neural responses fluctuate
M. Sarangdhar (B)
Department of Computer Science, University of Hull, Cottingham Road, Hull,
East-Yorkshire HU6 7RX
E-mail: M.Sarangdhar@2006.hull.ac.uk
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 681
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
682 M. Sarangdhar, C. Kambhampati
The computational model and stimulus for an H-H neuron is replicated from [3].
The differential equations (Eqs. (58.1–58.3)) of the model are the result of non-
linear interactions between the membrane voltage V and the gating variables m, h
and n. for NaC and KC .
9
˛m D 0:1.V C 40/=Œ1 e .V C40/=10 >
>
>
>
>
>
˛h D 0:07e .V C65/=20 >
>
>
>
>
>
>
>
.V C55/=10 >
˛n D 0:01.V C 55/=Œ1 e >
=
(58.3)
>
>
ˇm D 4e .V C65/=18 >
>
>
>
>
>
>
>
ˇh D 1=Œ1 C e .V C35/=10 >
>
>
>
>
>
;
ˇn D 0:125e .V C65/=80
The variable V is the resting potential where as VNa , VK , and VL are the reversal
potentials of the NaC , KC channels and leakage. VNa D 50 mV, VK D 77 mV
and VL D 54:5 mV. The conductance for the channels are gNa D 120 mS=cm2 ,
gK D 36 mS=cm2 and gL D 0:3 mS=cm2 . The capacitance of the membrane is
C D 1 μ F=cm2 .
An input spike train described in (Eq. (58.4)) is used to generate the pulse compo-
nent of the external current.
X
Ui .t/ D Va ı.t tf / (58.4)
n
tf.1/ D 0 (58.6)
T represents the ISI of the input spike train and can be varied to generate a different
pulse component. The spike train is injected through a synapse to give the pulse
current IP . X
IP D gsyn ˛.t tf /.Va Vsyn / (58.7)
n
684 M. Sarangdhar, C. Kambhampati
gsyn , Vsyn are the conductance and reversal potential of the synapse. The α-function
is defined in [32] as
˛.t/ D .t=/e t = ‚.t/; (58.8)
where, is the time constant of the synapse and ‚.t/ is the Heaviside step function.
V D 30 mV, D 2 ms, gsyn D 0:5 mS=cm2 and Vsyn D 50 mV.
The total external current applied to the neuron is a combination of static and pulse
component
I i D IS C IP C " (58.9)
where, IS is the static and IP is the pulse current, " is the random Gaussian noise
with zero mean and standard deviation D 0:025. [3] has ignored the noise in the
external current and the current consists of only two terms. However, the presence
of noise is necessary in the simulation of a biological activity and hence considered.
The static component IS of the external current is set at 25 μ A. The H-H neuron is
stimulated with a current Ii D IS C IP and its response is recorded. The fluctuations
in the membrane are due to the specific nature of synaptic stimulus. The amplitude
of the action potential in Fig. 58.1 is not constant and the standard deviation is
Amp D 3:0978. Hence, the amplitude of the response is not ignored. This is one
major difference between [3, 30, 31] and this work. The synaptic time constant of
2 ms defines the shape of the pulse current. As the refractory period of an H-H
neuron is 1–2 ms, we choose a 2 ms bound for coincidence detection. The simulation
activity is divided into three sets of ISIs. Each set has a corresponding reference
ISI (Tref /. The first set compares responses generated using stimulus ISI between
14–16 ms while the second set compares responses of ISIs between 13–15 ms. The
third set compares responses for ISIs varied between 15–17 ms. The responses for
each set are compared with a fixed response known as the reference response. The
reference response for each set is unique and is generated by varying the stimulus
ISI. Reference ISIs for the sets are 15, 14 and 16 ms respectively. Neural responses
are recorded for various ISIs within a set and compared with the reference response
for that set. For set 1, the reference spike train is generated with T D 15 ms (Tref /
and compared with responses generated with T D 14–16 ms. Coincidence factors
are calculated to estimate the similarity between these responses.
58 Spiking Neurons and Synaptic Stimuli 685
Fig. 58.1 Response of the H-H neuron to with T D 15 ms causing fluctuations in membrane
voltage. (a) The synaptic spike train input that induces a pulse current. (b)The pulse current gener-
ated. (c) The total external current applied to the neuron. Note that there is a static offset. (d) The
neuronal response to the current
25
Spike points
20
15
10
50 100 150 200 250 300
( a ) T ime T=15 ms
50 T=16 ms
Voltage
-50
-100
50 100 150 200 250 300
( b ) T ime
Fig. 58.2 Comparison of responses. (a) The corresponding magnitude of spikes for the responses
at T D 16 ms and T D 15 ms. (b) The two spike trains not only differ in firing times but also in
magnitudes
25
Spike points 20
15
10
50 100 150 200 250 300
( a ) T ime T=15 ms
50 T=14 ms
Voltage
-50
-100
50 100 150 200 250 300
( b ) T ime
Fig. 58.3 Comparison of responses. (a) The corresponding magnitude of spikes for the responses
at T D 14 ms and T D 15 ms. (b) The two spike trains not only differ in firing times but also in
magnitudes
58.3.3 Coincidence-Factor
The coincidence-factor, as described by [18, 20] is 1 only if the two spike trains
are exactly the same and 0 if they are very dissimilar. Coincidence for an individ-
ual spike is established if its firing time is within 2 ms of the firing time of the
corresponding spike in the reference spike train (in this case T D 15 ms).
Ncoinc hNcoinc i 1
D (58.10)
1=2.N1 C N2 / N
where, N1 is the number of spikes in the reference train, N2 is the number of spikes
in the train to be compared, Ncoi nc is the number of coincidences with a precision
ı D 2ms between the spike trains.hNcoi nc i D 2ıN1 is the expected number of
coincidences generated by a homogeneous Poisson process with the same rate as
the spike train to be compared. N D 1 2ı is the normalising factor. For set
1, N1 is the number of spikes in the reference spike train (Tref D 15 ms) and N2
is the number of spikes in the train to be compared (T D 14–16 ms). Figure 58.4
shows that the coincidence-factors for responses generated using T D 14–16 ms do
not follow a fixed pattern. The coincidence-factor () is expectedly 1 when spike
train generated with T D 15 ms is compared with the reference spike train Tref (T D
15 ms). However, the coincidence factor for spike trains generated at T D 16 ms and
Tref is 1. This indicates that the two highly varying currents have an exactly similar
58 Spiking Neurons and Synaptic Stimuli 687
Concidence factor =1
1
Coincidence factor
0.8
Expected
0.6 decrease in the
coincidence
factor with
0.4 increase in ISI
from the
Inconsistent result reference spike Inconsistent result
train
0.2
0
14 14.2 14.4 14.6 14.8 15 15.2 15.4 15.6 15.8 16
Inter-Spike-Interval
Fig. 58.4 Coincidence-factor versus ISI. The coincidence-factor decreases expectedly between
T D 15–14:65 ms and T D 15–15:25 ms. At other times the result is inconsistent and does not
have a fixed pattern
response or conversely as the responses are same; the two input stimuli are similar,
which is an incorrect inference. The coincidence factor for the spike trains generated
at T D 14 ms and Tref is 0.1207 indicating very low similarity. From a mathematical
and signal transmission standpoint, the coincidence-factor should decrease as the
input stimulus increasingly varies from Tref . However, this can only be observed
between T D 14:65–15:25 ms (30% of the 2 ms time window). The coincidence-
factor increases from T D 14–14:5 ms but then drops till T D 14:65 ms.
steadily increases to 1 when T D 15 ms and drops for 0.25 ms. There is an upward
rise from T D 15:25–15:5 ms, a sharp drop from T D 15:5–15:75 ms followed by a
steep increase to D 1 at T D 16 ms. Traversing from the reference the expected
trajectory of the coincidence-factor breaks at T D 14:65 ms and T D 15:25 ms.
These are therefore taken as limits for faithful behaviour of the coincidence-
factor approach. However, for set 2 reference spike train is chosen as Tref D 14 ms,
limits of faithful behaviour change (Fig. 58.5, left). The coincidence factor steadily
rises to unity, stays there for 0.5 ms and drops gradually. Ideally, the coincidence-
factor should be not 1 for T D 13:5, 13.65 and 13.75. While in set 3, Fig. 58.5, right,
reference spike train chosen is at Tref D 16 ms. The limits of faithful behaviour
change with a change in the stimulus. There is a sharp rise in the coincidence factor
from 15.75 to 16 ms where it reaches unity. From 16 to 17 ms the coincidence-
factor executes a perfect curve as expected. From Figs. 58.4 to 58.6 it is conclusive
that the lower-bound of faithful behaviour increases with the increase in the input
reference ISI. The difference between the reference ISI (Tref / and the lower-bound
limit decreases with the increase in the reference ISI. It is also important to note
that within each set of simulation, there are some false coincidences. The term false
coincidence is used to identify comparisons whose coincidence factor is 1 – when
it should not be. In Fig. 58.4, there is a false coincidence when ISI D 16 ms is
688 M. Sarangdhar, C. Kambhampati
Reference Reference
T=14ms T=16ms
1 1
0.8 0.8
Coincidence Factor
Coincidence Factor
0.6 0.6
0.4 0.4
0.2 0.2
Faithful behaviour
Faithful behaviour
0 0
13 13.2 13.4 13.6 13.8 14 14.2 14.4 14.6 14.8 15 15 15.2 15.4 15.6 15.8 16 16.2 16.4 16.6 16.8 17
Inter-Spike-Interval Inter-Spike-Interval
Fig. 58.5 Coincidence-factor versus ISI. Left – The coincidence-factor has a faithful behaviour
between T D 13:15 ms and T D 14:65 ms. Right – The coincidence-factor has a faithful
behaviour between T D 15:75–17 ms. It executes a perfect curve after 16 ms
compared with Tref D 15 ms. In Fig. 58.5, left, false coincidences can be seen when
ISI varied between 13.5–13.75 ms is compared with Tref D 14 ms while in Fig. 58.5,
right, false coincidences can be observed for ISI varied between 15 and 15.15 ms and
compared with Tref D 16 ms.
The peak of each spike in a spike train is considered as an object. The number of
objects for each spike train is equal to the number of spikes.
2
drs D .Nr Ns /.Nr Ns /0 (58.12)
The Euclidean distance between object-pairs is calculated using (Eq. (58.12)) where,
Nr , NS are the objects in the spike train. Once the distance between each pair
of objects is determined, the objects are clustered based on the nearest neighbour
approach using
d.r; s/ D min.d i st.Nri Nsj //
(58.13)
i 2 .1; :::; nr /; j 2 .1; :::; ns /
where nr ; ns is the total number of objects in the respective clusters. The binary clus-
ters are plotted to form a hierarchical tree whose vertical links indicate the distance
between two objects linked to form a cluster. A number is assigned to each cluster
as soon as it is formed. Numbering starts from (m C 1), where m = initial number
of objects, till no more clusters can be formed. We investigated the case described
in Section 58.3.3 for the response generated at Tref D 15 ms and T D 16 ms
(false coincidence). The coincidence-factor for these responses is 1 (Fig. 58.4)
and indicates an exact match. The clustering tree shows that these responses are
actually different from each other by a margin not captured by coincidence-factor
(Fig. 58.6a, b). The clustered objects are shown on the X-axis and the distance
between them is shown on the Y-axis. A comparison of the clustering solutions
shows that the shape, form, height as well as linkages are different for the two spike
trains. In Fig. 58.6a, objects 12 and 13 are clustered together at a height of 11.5
while in Fig. 58.6b, objects 11 and 12 are clustered at a height of 13.5 – shown
in green circles. Also, objects 4 and 5 are clustered in Fig. 58.6a while objects
3 and 4 are clustered in Fig. 58.6b – shown in red circles. This means that the
spike trains are inherently different from each other. The results hence prove that
the two spike trains are not an exact match. We therefore believe that though deter-
mining coincidence-factor is important, a two-dimensional analysis is necessary for
responses with fluctuating membrane voltages.
15
14
14
13
13
Distance between objects
12
12
11
11
10
10
9 9
17 18 19 20 21 14 15 16 22 23 24 12 13 1 2 3 4 5 6 7 8 9 10 11 23 24 22 19 20 21 16 17 18 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12
Objects being clustered Objects being clustered
(a) ISI = 15 ms (b) ISI = 16 ms
Fig. 58.6 (a) Clustering solution for T D 15 ms indicating objects being clustered. (b) Clustering
solution for T D 16 ms indicating objects being clustered. The shape, form, height as well as
linkages for each spike train are different
690 M. Sarangdhar, C. Kambhampati
58.4 Conclusions
References
10. Rinzel J (1985). Excitation dynamics: Insights from simplified membrane models. Theoretical
Trends in Neuroscience Federal Proceedings, 44(15): 2944–2946.
11. Gabbiani F, Metzner W (1999). Encoding and processing of sensory information in neuronal
spike trains. The Journal of Biology, 202: 1267–1279.
12. Panzeri S, Schultz S R, Treves A, Rolls E T (1999). Correlations and the encoding of infor-
mation in the nervous system. Proceedings of the Royal Society of London, B 266(1423):
1001–1012.
13. Agüera y Arcas B, Fairhall A L (2003). What causes a neuron to spike? Neural Computation,
15: 1789–1807.
14. Agüera y Arcas B, Fairhall A L, Bialek W (2003). Computation in a single neuron: Hodgkin
and Huxley revisited. Neural Computation, 15: 1715–1749.
15. Izhikevich E M (2006). Polychronization: Computation with spikes. Neural Computation, 18:
245–282.
16. Li X, Ascoli G A (2006). Computational simulation of the input-output relationship in
hippocampal pyramidal cells. Journal of Computational. Neuroscience, 21: 191–209.
17. Kepler T B, Abbott L F, Marder E (1992). Reduction of conductance-based neuron models.
Biological Cybernetics, 66: 381–387.
18. Joeken S, Schwegler H (1995). Predicting spike train responses of neuron models; in
M.Verleysen (ed.), Proceedings of the 3rd European Symposium on Artificial Neural Net-
works, pp. 93–98.
19. Wang X J, Buzsáki G (1996). Gamma oscillation by synaptic inhibition in a hippocampal
interneuronal network model. The Journal of Neuroscience, 16(20): 6402–6413.
20. Kistler W M, Gerstner W, Leo van Hemmen J (1997). Reduction of the Hodgkin-Huxley
equations to a single-variable threshold model. Neural Computation, 9: 1015–1045.
21. Izhikevich E M (2003). Simple model of spiking neurons. IEEE Transactions on Neural
Networks, 14(6): 1569–1572.
22. Shriki O, Hansel D, Sompolinsky H (2003). Rate models for conductance-based cortical
neuronal networks. Neural Computation, 15: 1809–1841.
23. Jolivet R, Gerstner W (2004). Predicting spike times of a detailed conductance-based neuron
model driven by stochastic spike arrival. Journal of Physiology – Paris, 98: 442–451.
24. Jolivet R, Lewis T J, Gerstner W (2004). Generalized integrate-and-fire models of neuronal
activity approximate spike trains of a detailed model to a high degree of accuracy. Journal of
Neurophysiology, 92: 959–976.
25. Jolivet R, Rauch A, Lüscher H-R, Gerstner W (2006). Integrate-and-fire models with adap-
tation are good enough: Predicting spike times under random current injection. Advances in
Neural Information Processing Systems, 18: 595–602.
26. Jolivet R, Rauch A, Lüscher H-R, Gerstner W (2006). Predicting spike timing of neocortical
pyramidal neurons by simple threshold models. Journal of Computational Neuroscience, 21:
35–49.
27. Clopath C, Jolivet R, Rauch A, Lüscher H-R, Gerstner W (2007). Predicting neuronal activity
with simple models of the threshold type: Adaptive exponential integrate-and-fire model with
two compartments. Neurocomputing, 70: 1668–1673.
28. Djabella K, Sorine M (2007). Reduction of a cardiac pacemaker cell model using singular
perturbation theory. Proceedings of the European Control Conference 2007, Kos, Greece, pp.
3740–3746.
29. Hodgkin A, Huxley A (1952). A quantitative description of membrane current and its
application to conduction and excitation in nerve. Journal of Physiology, 117:500–544.
30. Maršálek, P (2000). Coincidence detection in the Hodgkin–Huxley equations. Biosystems,
58(1–3).
31. Victor J D, Purpura K P (1997). Metric-space analysis of spike trains: Theory, algorithms and
application. Network: Computation in Neural Systems, 8: 127–164.
32. Park M H, Kim S (1996). Analysis of phase models for two coupled Hodgkin-Huxley neurons.
Journal of the Korean Physical Society, 29(1): 9–16.
692 M. Sarangdhar, C. Kambhampati
59.1 Introduction
F. Casolo (B)
Politecnico di Milano, Dipartimento di Meccanica – Campus Bovisa Sud, via La Masa 34,
20156 Milano, Italy
E-mail: federico.casolo@polimi.it
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 693
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
694 F. Casolo et al.
moving the arm horizontally when it is supported by a plane. The forced immobil-
ity produces a faster degeneration of the muscular structure. The recovery of some
active mobility of the arm is then important both to carry out some autonomous
daily activity and to execute exercises as self physiotherapy. Assistive systems may
help the subject in this task: they can be active, that means in general motor driven,
or passive. Some active systems require to be driven by the controlateral limb –
e.g. by replicating its movement or by means of a joystick – producing unnatural
movements [1, 2]. The present research was originated by the request of an organi-
zation of disables affected by muscular distrophy (UILDM) for the development of
a passive system. Most of UILDM affiliated in fact, up to now, prefer to avoid any
motor driven device except for the electrical wheelchair on which they are forced to
live. Therefore, aim of the first part of the research is the development of a passive
device as simple as possible, capable to enhance the upper limb mobility by acting
against the gravity action. The device must be mounted on the frame of the subjects’
wheelchair. Until now we designed and used only passive systems for preliminary
analyses of patients response. These results are the basis for the development of
our next passive device that will be optionally provided of an active module for the
weaker subjects.
59.2 Background
Presently some mechatronic devices are designed to increase subject physical abil-
ities [3–7], but only few of them are suitable for subjects affected by muscular
dystrophy.
In [8], the aim of the research is to increase the autonomy of people affected
by muscular dystrophy and spinal muscular atrophy through a passive device. The
system has four degrees of freedom: flexion extension of the shoulder, abduction
adduction of the shoulder, flexion extension of the elbow and prono-supination of
the forearm. The torque value at the joint is a non linear function of position, how-
ever gravity balance is achieved by means of linear springs (Fig. 59.1). To obtain
Fig. 59.1 (a) Rahman device – (b) schema of the static balance of a one d.o.f. structure with linear
spring
59 Overcoming Neuro-Muscular Arm Impairment by Means of Passive Devices 695
the system balance regardless the value of , the spring stiffness value is set to:
k D mgl=.ab/.
Armon orthosis [9, 10] is designed for people affected by spinal muscular atro-
phy. It assumes that arm weight can be sustained by natural shoulder and elbow
(Fig. 59.2). Static balance is obtained by springs for about 75% of the arm weight,
the interface between the device and the arm is located in the forearm, near the
elbow. The system has two degrees of freedom in the vertical plane; the spring
action is transmitted to the frame by a system of pulleys.
In [11], the Dynamic Arm Support (DAS) is designed to be mounted on the
wheelchair of people affected by muscular dystrophy, muscular atrophy and amy-
otrophic lateral sclerosis. It’s an hybrid device with an active operator-controlled
electrical system to adjust weight compensation. Patient’s forearm is balanced by a
gravity compensation system, made by a planar linkage (Fig. 59.3) through a linear
spring. The balance is guaranteed by equation: ra ak D mgl.
Starting from a kinematic model of the human arm [12] with seven degrees of free-
dom – five d.o.f. for shoulder and two d.o.f. for elbow – a linkage system was
designed to be coupled in parallel with the arm, avoiding to add constraints to the
scapula-clavicle complex (Fig. 59.4).
696 F. Casolo et al.
X
AC AB
FV D 0 ! F2 D 1 F1 C F3 C F4 (59.1)
AC
X
MO D 0 ! MC D F3 CD sin ./ C F4 CF sin ./ (59.2)
The forearm counterbalancing torque (Fig. 59.6 – line A) is only function of the
elbow angle g, otherwise counterbalance weight is independent from the arm
position and both of them are functions of the subject weight.
For the prototype a spiral spring has been designed to follow approximately the
torque diagram and the vertical guide has been equipped with masses counterbalanc-
ing, through a cable and a pulley, the subject arm and the sliding structure weight.
Line C in Fig. 59.6 shows the torque exerted by the spring and C1 represents the
residual torques that the subject must exert to move the forearm. Lines B and B1 are
related to a preloaded spring.
While counterweights can be easily adjusted on patients’ weight and skills,
spring action can only be tuned by varying the preload thanks to its adjustable sup-
port. Major variations of the torque can only be obtained by changing the spring.
Preliminary tests allow to adjust the spring preload for each specific subject.
The prototype has been tested with some patients to evaluate its effectiveness for
helping them to move the arm and to handle simple objects for daily tasks (e.g.
drinking). The analyzed parameters are the changes produced by the external device
to the joints excursions and to the trajectories followed by the hand to complete a
task. The test protocol gives a description of the exercises performed by patients
without any initial training. Every task is executed three times, starting from sim-
ple movements of a single joint, to more complex exercises (as to take a glass of
water from a table and bring it to the mouth). Markers on upper body (arm and
trunk) are acquired by an infrared motion capture system together with the EMG
signals of the most important muscles. Therefore the system kinematics could be
698 F. Casolo et al.
reconstructed and correlated with the timing of the muscular activity. For the first
series of tests a performance gain index was also extrapolated by comparing the
results with the preliminary tests obtained without using the assistive device. The
four tests performed by all the subjects are listed in Table 59.1.
For example, Fig. 59.7 shows the data processed during the execution of move-
ment n.2 and Fig. 59.8 highlights the increase of the range of motion of the subject’s
elbow produced by the helping device. Figure 59.9 shows how the same device is
also helpful during the drinking task to reduce the lateral flexion of the trunk.
Overall tests results clearly show that an anti-gravity device can increase the
autonomy of people affected by M.D. The subject’s motion skills increase test after
test and, moreover, in most cases the trunk posture improves.
Fig. 59.9 Comparison of trunk bending (average values of one subject) for drinking task with the
device (1) and without (2); areas show the st. deviat
between masses, lengths and springs stiffness guarantee static equilibrium of the
structure in every position.
Arm and forearm segments (respectively OA and AE in Fig. 59.11) are balanced
for each value of α and ˇ. For segment AE:
m2 AE D M2 OD (59.5)
From which:
m1 OG C m2 OA D M1 OF (59.7)
Equations (59.5) and (59.7) describe the static equilibrium of the structure; two
values among M1 , M2 , OF and OD (directly related to structure dimensions and
weight) can be arbitrarily chosen while the two others result from equations.
The equilibrium equations for the structure with springs can be achieved from
analogous considerations.
Following notations on Fig. 59.13:
F1y OD cos ˇ C F1x OD sin ˇ C FBC OC sin ˛ cos ˇ C FBC OC cos ˛ sin ˇ D 0
(59.8)
Where F1x and F1y are the components of elastic force: F1x D k1 ODcosˇ and
F1y D k1 .OM ODsi nˇ/. From Eq. (59.8), for the equilibrium:
m1 gOG cos ˛ F2y OF cos ˛ C F2x OF sin ˛ C FyA OA cos ˛ C FxA OA sin ˛ D 0
(59.10)
With F2x D k2 OF cos ˛ and F2y D k2 .OL C OF / sin ˛.
Thus for OA segment:
Like the other configuration of the device, there are four unknown parameters
(OM, OL, k1 , k2 /, but only two of them are independent. Therefore, if the system
geometry is constrained, it is enough to choose the spring stiffness and vice versa.
59.5.1 Simulations
To better evaluate and choose between the planned solutions, a multibody model of
the human arm coupled with the device has been realized and torques in shoulder
and elbow joints have been calculated. Inertia actions have been taken into account.
702 F. Casolo et al.
Simulations were carried out for both the new structures to calculate torques on
shoulder and elbow joints. The anthropometric data (arm and forearm weights and
centers of mass) come from literature [15].
Simulations was completed for each of the following configurations:
Non balanced structure (free arm)
Weight balanced structure
Spring balanced structure
The torques supplied by the subjects have been evaluated for the following these
different motion tasks:
1. Static initial position: ˛ D 60ı , ˇ D 0ı , no motion
2. Motion 1: ˛P D 15ı =s, ˇ D 0ı
3. Motion 2: ˛P D 0ı =s, ˇP D 10ı =s
4. Motion 3: Arm flexion ˛P D 30ı=s
Joint torque values with the balancing apparatus (either by weights or by springs) are
lower than ones in unbalanced conditions, as confirmation of the device efficiency.
Since in practice it is very hard to build a structure whose joints exactly remain
overlapped to the arm articulations, the effect of the misalignment has also been
investigated.
To evaluate how the torques exerted by shoulder and elbow can change, pre-
vious simulations were repeated with the new configurations where both human
joints position have an offset with respect to the ideal one. Figure 59.14 shows
the torques required during motion 2, starting for both joints with the same offset
(30 mm horizontally and 70 mm vertically): although the new torques are higher
than the ones evaluated in the ideal configuration (Figs. 59.14 and 59.15), they still
remain acceptable.
59.6 Conclusions
Fig. 59.14 Motion 2 – shoulder (up) and elbow (down) torques unbalanced and with weight and
spring structure
Fig. 59.15 Arm placed with joint off-set – influence on joints torques
704 F. Casolo et al.
References
1. Perry, J.C., Rosen, J., and Burns, S., “Upper-Limb Powered Exoskeleton Design”, Mechatron-
ics, IEEE/ASME Transactions on, vol. 12, no. 4, pp. 408–417, Aug. 2007.
2. Kiguchi, K., Tanaka, T., Watanabe, K., and Fukuda, T., “Exoskeleton for Human Upper-
Limb Motion Support”, Robotics and Automation, 2003, Proceedings. ICRA ‘03, IEEE
International Conference on, vol. 2, pp. 2206–2211, 14–19 Sept. 2003.
3. Kazerooni, H., “Exoskeletons for Human Power Augmentation”, Proceedings of the
IEEE/RSJ. International Conference on Intelligent Robots and Systems, pp. 3120–3125.
4. Kazuo K., Takakazu T., Keigo W., and Toshio F., “Exoskeleton for Human Upper-Limb
Motion Support”, Proceedings of the 2003 IEEE International Conference on Roboticsn&
Automation Taipei, 2003.
5. Agrawal, S.K., Banala, S.K., Fattah, A., Sangwan, V., Krishnamoorthy, V., Scholz, J.P., and
Hsu, W.L.,“Assessment of Motion of a Swing Leg and Gait Rehabilitation With a Gravity Bal-
ancing Exoskeleton”, IEEE Transactions on Neural Systems and Rehabilitation Engineering,
2007.
6. Ball, S.J., Brown, I.E., and Scott, S.H., “A Planar 3DOF Robotic Exoskeleton for Rehabilita-
tion and Assessment”, Proceedings of the 29th Annual International Conference of the IEEE
EMBS, Lyon, France, pp. 4024–4027, 23–26 Aug. 2007.
7. Kiguki, K. and Tanaka, T. and Watanabe, K. and Fukuda, T., “Exoskeleton for Human Upper-
Limb Motion Support”, Proceedings of the 2003 IEEE 10th International Conference on
Robotics n& Automation, Taipei, Taiwan, pp. 2206–2211, 14–19 Sept.
8. Rahman T., Ramanathan R., Stroud S., Sample W., Seliktar R., Harwin W., Alexander M.,
and Scavina M., “Towards the Control of a Powered Orthosis for People with Muscular
Dystrophy”, IMechE Journal of Engineering in Medicine, vol. 215, Part H, pp. 267–274, 2001.
9. Herder, L., “Development of a Statically Balanced Arm Support: ARMON”, Proceedings of
the 2005 IEEE 9th International Conference on Rehabilitation Robotics, Chicago, IL, USA,
pp. 281–286, June 28–July 1, 2005.
10. Mastenbroek, B., De Haan, E., Van Den Berg, M., and Herder, J.L., “Development of a Mobile
Arm Support (Armon): Design Evolution and Preliminary User Experience”, Proceedings of
the 2007 IEEE 10th International Conference on Rehabilitation Robotics, Noordwijk, The
Netherlands, pp. 1114–1120, 12–15 June.
11. Kramer, G., Romer, G., and Stuyt, H., “Design of a Dynamic Arm Support (DAS) for
Gravity Compensation”, Proceedings of the 2007 IEEE 10th International Conference on
Rehabilitation Robotics, Noordwijk, The Netherlands, pp. 1042–1048, 12–15 June.
12. Keith K. and Barbara W., Anthropometry and Biomechanics, Man-Systems Integration
Standards Revision B, NASA, July 1995.
13. Chandler, R.F., Clauser, C.E., McConville, J.T., Reynolds, H.M., and Young, J.W., Investiga-
tion of Inertial Properties of the Human Body. Technical Report DOT HS-801 430, Aerospace
Medical Research Laboratory, Wright-Patterson Air Force Base, OH, March 1975.
14. Legnani, G., “Robotica Industriale”, Milano, Casa Editrice Ambrosiana, 2003 ISBN 88-408-
1262-8.
15. Chandler, R.F., Clauser, C.E., McConville, J.T., Reynolds, H.M., and Young, J.W. “Investi-
gation of Inertial Properties of the Human Body”. Wright Patterson Air Force Base, Ohio
(AMRL-TR-75-137), 1975.
Chapter 60
EEG Classification of Mild and Severe
Alzheimer’s Disease Using Parallel Factor
Analysis Method
PARAFAC Decomposition of Spectral-Spatial
Characteristics of EEG Time Series
Abstract Electroencephalograms (EEG) recordings are now widely used more and
more as a method to assess the susceptibility to Alzheimer’s disease. In this study,
we aimed at classifying control subjects from subjects with mild cognitive impair-
ment (MCI) and from Alzheimer’s disease (AD). For each subject, we computed
the relative Fourier power of five frequency bands. Then for each frequency band,
we estimated the mean power of five brain regions: frontal, left temporal, central,
right temporal and posterior. There were an equivalent number of electrodes in each
of the five regions. This grouping is very useful in normalizing the regional repar-
tition of the information. We can form a three-way tensor, which is the Fourier
power by frequency band and by brain region for each subject. From this tensor,
we extracted characteristic filters for the classification of subjects using linear and
nonlinear classifiers.
60.1 Introduction
J. Jeong (B)
Brain Dynamics Lab., Bio and Brain Engineering Dept., KAIST, Yuseong-gu, Guseong-dong,
Daejeon, South Korea, 305-701
E-mail: jsjeong@kaist.ac.kr
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 705
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
706 C.-F. V. Latchoumane et al.
available to detect this disease including imaging [2, 3], genetic methods [4], and
other physiological markers [5], however, do not allow a mass screening of the pop-
ulation. Whereas psychological tests such as Mini Mental State Evaluation (MMSE)
in combination with an electrophysiological analysis (e.g. electroencephalograms
or EEG) would be very efficient and inexpensive screening approach to detect the
patients damaged by the disease.
EEG recordings are now widely used more and more as a method to assess the
susceptibility to Alzheimer’s disease, but are often obtained during steady states
where temporal information does not easily reveal the relevant features for sub-
ject differentiation; interestingly, in those conditions, previous studies could obtain
excellent classification results ranging from 84% to 100%, depending on the con-
ditions (i.e. training or validation conditions for the classifier tests) and on the
confronted groups (i.e. control subject, mild cognitive impairment (MCI), and
different stages of Alzheimer’s disease) demonstrating the promising use of resting
EEGs for diagnosis of Alzheimer’s disease [6–11].
The spectral spatial information estimated during resting states might contain
valuable clues for the detection of demented subjects; however, the inter-subject
variability, also influenced by the differences in the progression of the disease, might
render the study difficult when undertaken subject-by-subject. In that case, a multi-
way analysis would allow the extraction of information that is contained across
subjects simultaneously considering the spectral spatial information. This method-
ology has been applied to epilepsy detection and has successfully characterized the
epilepsy foci in a temporal-frequency-regional manner [12,13]. Classification based
on multi-way modeling has been performed on continuous EEGs [14] showing the
power and versatility of multi-way analyses.
Previous two-way analyses combining PCA-like techniques [6] (i.e. principal com-
ponent analysis (PCA) and independent component analysis (ICA)) have shown
very high performance in the classification of subjects and have assisted in early
detection. These methods require data in the form of matrices, and then they limit
the order (i.e. number of dimensions or modes) or mix several types of variables
(e.g. using the unfolding method, also known as matricization). Either way, the nat-
urally high-order of EEGs and interactions between the ways (or modes) is lost or
destroyed, limiting further the understanding of underlying processes in the brain.
The multi-way array decomposition is a modeling tool that conserves the original
high dimensional nature of the data. This method uses the common features and
interactions between modes present in the data to create a model fit of the same
data that can be decomposed into components. The decomposition into components
provides the information of interactions between modes in the form of weight, which
is relatively easier to interpret. The application of such methods to diagnosis of
60 EEG Classification of Mild and Severe 707
The subjects analyzed in this study were obtained from a previously studied database
[6,9,11] and consisted of eyes-open, steady state EEG recordings (20 s in duration),
with over 21 leads disposed according to the 10–20 international system and digital-
ized at 200 Hz. The database contains 38 control (mean age: 71:7 ˙8:3) subjects, 22
mild cognitive impairment (MCI) subjects (mean age: 71:9 ˙ 10:2) who later con-
tracted Alzheimer’s disease, and 23 Alzheimer’s disease patients (AD; mean age:
72:9 ˙ 7:5). The control subjects had no complaints or history of memory prob-
lems, and scored over 28 (mean score: 28:5 ˙ 1:6) on the mini mental state exam
(MMSE). The MCI subjects had complaints about memory problems and scored
over 24 at the MMSE (mean score: 26 ˙ 1:8). The inclusion criterion was set at 24
as suggested in [11], therefore encompassing MCI subjects with various cognitive
deficits, but in the early stage of Alzheimer’s disease. The AD patients scored below
20 on the MMSE and had had full clinical assessment. Thirty-three moderately or
severely demented probable AD patients (mean MMSE score: 15:3 ˙ 6:4; range: 0–
23) were recruited from the same clinic. After obtaining written informed consent
from the patients and controls, all subjects underwent EEG and SPECT examination
within 1 month of entering the study. All subjects were free of acute exacerbations
of AD related co-morbidities and were not taking medication. The medical ethical
committee of the Japanese National Center of Neurology and Psychiatry approved
this study.
60.3 Method
In this study, we aimed at classifying control subjects from subjects with MCI
and from AD. For each subject, we computed the relative Fourier power of five
frequency bands ı (1–4 Hz), (4–8 Hz), ˛1 (8–10 Hz), ˛2 (10–12 Hz) and ˇ (12–
25 Hz); then for each frequency band, we estimated the mean power of five brain
regions: frontal, left temporal, central, right temporal and posterior. There were an
equivalent number of electrodes in each of the five regions. This grouping is very
useful in normalizing the regional repartition of the information (Fig. 60.1).
708 C.-F. V. Latchoumane et al.
Fig. 60.1 Topological grouping of electrodes in five regions; the numbers 1, 2, 3, 4 and 5 denote
the frontal, left temporal, central, right temporal and posterior regions, respectively
For simplicity and interpretation, we opted for two-step classification of the subjects
(divide and conquer scheme): (1) we compared healthy subjects with the patient
group (regrouping MCI and AD); (2) we compared MCI group with AD group. The
schematic diagram of our approach is presented in Fig. 60.2.
For each step of classification, we extracted the features based on the Parallel
Factor Analysis (PARAFAC) decomposition (unsupervised) and the reference fil-
ters (supervised). We estimated the accuracy in classification for subjects using a
k-fold cross-validation with k D 1 (i.e. leave-one-out cross validation) on both
60 EEG Classification of Mild and Severe 709
Fig. 60.2 Divide and conquer method for the classification of control subjects vs. MCI vs. AD
subjects – cascade of classifiers
a quadratic naı̈ve Bayes classifier (one of the simplest statistical classifier with
quadratic boundary) and an artificial neural network (ANN; with n D 1 hidden
neurons in one hidden layer, i.e. nonlinear classifier).
The important idea underlying multi-way analysis is the extraction of multiple inter-
actions within the structure of the data. Interactions and common patterns between
modes of the data are often neglected cause studied separately (e.g. subject by
subject, region by region) or folded on the same dimension (i.e. mixed, hence
destroying interactions) as for method using two dimensional PCA or ICA. In this
study, we constructed a three-way tensor in the form Subjects x Frequency Band
Power x Brain Region to preserve relations between spectral and regional character-
istics of each subject; we used a common modeling method for N-way analyses to
extract common interaction in the tensor with relation to the subjects: the parallel
factor analysis (PARAFAC) [15].
The PARAFAC is often referred to as a multilinear version of the bilinear factor
models. From a given tensor, X 2 RI J K , this model is able to extract linear
decompositions of Rank-1 tensors.
X
R
XD ar ı br ı cr C E; (60.1)
rD1
Fig. 60.3 PARAFAC modeling of a three-way tensor; each component (R D 2) is the outer
product of Rank-1 tensor a, b, and c, and E is a residual tensor
for a core consistency value that is above 90% and with the core consistency value
of R C 1 under 90%.
Fr D br crT ; (60.2)
where br and cr are the Rank-1 vector of the frequency and region mode in the
PARAFAC model, respectively. Following Eq. (60.2), we obtain a number of filters
R D 3.
60 EEG Classification of Mild and Severe 711
The reference filter are supervised filters in contrast to the filters obtained using
PARAFAC (unsupervised) because they integrate prior knowledge of the member-
ship of the subjects. The reference filters of the control group and demented group
were calculated as the average Frequency Band Power x Brain Region over the nor-
mal subjects and the demented subjects (regrouping MCI and AD), respectively.
The reference filters of the MCI group and AD group were calculated as the aver-
age Frequency Band Power x Brain Region over MCI subjects and AD subjects,
respectively.
The reference filter and the PARAFAC filters contain common characteristics of the
group under this study (i.e. control vs. demented or MCI vs. AD). We use a distance
measure based on the matrix scalar product also known as normalized Frobenius
inner product to compare the estimated filters (i.e. reference filters or PARAFAC
filters) with the Frequency Band Power x Brain Region profile of each subject. The
details on the distance measure are as follows:
Trace.F1T F2 /
Di s.F1 ; F2 / D ; (60.3)
jjF1 jjjjF2 jj
where F1 and F2 are two Frequency Band Power x Brain Region matrices,
T
denotes the conjugate transposition of a matrix and the Trace function returns
of a matrix. The function jj jj indicates the Frobenius norm defined as
the trace p
jjF jj D Trace.F T F /. For each subject, we obtained R features comparing the
R PARAFAC filter to the subject’s Frequency Band Power x Brain Region profile.
Similarly, we obtained two features comparing the references filters to the subject’s
Frequency Band Power x Brain Region profile.
60.4 Results
The best classification results using the artificial neural network (ANN) and quadratic
naı̈ve Bayes classifier are displayed in Table 60.1. There was no noticeable differ-
ence between the performance of the two classifiers (both nonlinear) using either the
reference filters or the PARAFAC filters; however, while comparing MCI with AD
subjects, the ANN showed better performance using the PARAFAC filters (75.6%)
than using the reference filters (68.9%).
712 C.-F. V. Latchoumane et al.
Table 60.1 Classification results obtained using an ANN and quadratic naı̈ve Bayes classifier
with the leave-one-out cross-validation method; NC denotes normal controls and Patients denotes
patient group (regrouping MCI group and AD group)
NC vs. patients NC vs. patients AD vs. MCI AD vs. MCI
(PARAFAC) (reference) (PARAFAC) (reference)
(%) (%) (%) (%)
ANN 74.7 61.4 75.6 64.4
Bayes 74.7 61.5 68.9 64.4
Fig. 60.4 ROC curve of classification accuracy of (a) control vs. demented subjects and (b)
AD vs. MCI subjects using an ANN and leave-one-out cross-validation method; classification
results obtained using the reference filters (stars) and using the filtered data extracted with a
three-component PARAFAC (R D 3; triangle)
Fig. 60.5 Clustering of frequency bands; dendrograms extracted from the clustering of two-
component PARAFAC models for (a) control subject, (b) MCI patients, and (c) Alzheimer
patients
60.5 Discussion
Acknowledgements The first author would like to thank the Minister of Information and Technol-
ogy of South Korea, Institute for Information and Technology Advancement (IITA) for his financial
support. The authors would like to thank Dr. T. Musha for his contribution in subjects’ analysis and
the EEG recordings and Dr. M. Maurice and S. Choe for their help in the editing of the content.
References
1. Ferri, C.P. et al. Global prevalence of dementia: a Delphi consensus study. The Lancet 366,
2112–2117 (2006).
2. Alexander, G.E. Longitudinal PET evaluation of cerebral metabolic decline in dementia: a
potential outcome measure in Alzheimer’s disease treatment studies. American Journal of
Psychiatry 159, 738–745 (2002).
3. Deweer, B. et al. Memory disorders in probable Alzheimer’s disease: the role of hippocampal
atrophy as shown with MRI. British Medical Journal 58, 590 (1995).
4. Tanzi, R.E. & Bertram, L. New frontiers in Alzheimer’s disease genetics. Neuron 32, 181–184
(2001).
5. Andreasen, N. et al. Evaluation of CSF-tau and CSF-Aß42 as Diagnostic Markers for
Alzheimer Disease in Clinical Practice, Archives of Neurology, 58, pp. 373–379 (2001).
6. Cichocki, A. et al. EEG filtering based on blind source separation (BSS) for early detection of
Alzheimer’s disease. Clinical Neurophysiology 116, 729–737 (2005).
7. Buscema, M., Rossini, P., Babiloni, C. & Grossi, E. The IFAST model, a novel parallel
nonlinear EEG analysis technique, distinguishes mild cognitive impairment and Alzheimer’s
disease patients with high degree of accuracy. Artificial Intelligence in Medicine 40, 127–141
(2007).
8. Huang, C. et al. Discrimination of Alzheimer’s disease and mild cognitive impairment by
equivalent EEG sources: a cross-sectional and longitudinal study. Clinical Neurophysiology
111, 1961–1967 (2000).
9. Musha, T. et al. A new EEG method for estimating cortical neuronal impairment that
is sensitive to early stage Alzheimer’s disease. Clinical Neurophysiology 113, 1052–1058
(2002).
60 EEG Classification of Mild and Severe 715
10. Pritchard, W.S. et al. EEG-based, neural-net predictive classification of Alzheimer’s disease
versus control subjects is augmented by non-linear EEG measures. Electroencephalography
and Clinical Neurophysiology 91, 118–30 (1994).
11. Woon, W.L., Cichocki, A., Vialatte, F. & Musha, T. Techniques for early detection of
Alzheimer’s disease using spontaneous EEG recordings. Physiological Measurement 28,
335–347 (2007).
12. Acar, E., Aykut-Bingol, C., Bingol, H., Bro, R. & Yener, B. Multiway analysis of epilepsy
tensors. Bioinformatics 23, i10–i18 (2007).
13. Acar, E., Bing, C.A., Bing, H. & Yener, B. in Proceedings of the 24th IASTED International
Conference on Biomedical Engineering 317–322 (2006).
14. Lee, H., Kim, Y.D., Cichocki, A. & Choi, S. Nonnegative tensor factorization for continuous
EEG classification. International Journal of Neural Systems 17, 305 (2007).
15. Andersson, C.A. & Bro, R. The N-way toolbox for MATLAB. Chemometrics and Intelligent
Laboratory Systems 52, 1–4 (2000).
16. Bro, R. & Kiers, H.A.L. A new efficient method for determining the number of components
in PARAFAC models. Contract 1999, 10377 (1984).
17. Coben, L.A., Danziger, W.L. & Berg, L. Frequency analysis of the resting awake EEG in mild
senile dementia of Alzheimer type. Electroencephalography and Clinical Neurophysiology
55, 372–380 (1983).
18. Sloan, E.P., Fenton, G.W., Kennedy, N.S.J. & MacLennan, J.M. Electroencephalography and
single photon emission computed tomography in dementia: a comparative study. Psychologi-
cal Medicine 25, 631 (1995).
19. Atkinson, A.C. & Riani, M. Exploratory tools for clustering multivariate data. Computational
Statistics and Data Analysis 52, 272–285 (2007).
20. Bro, R. PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory
Systems 38, 149–171 (1997).
21. Al Kiers, H., Ten Berge, J. & Bro, R. PARAFAC2: PART I. a direct fitting algorithm for the
PARAFAC2 model. Journal of Chemometrics 13, 275–294 (1999).
22. Kim, Y.D., Cichocki, A. & Choi, S. in IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP-2008) (IEEE, Las Vegas, Nevada, 2008).
23. Cichocki, A., Zdunek, R. & Amari, S. Nonnegative matrix and tensor factorization. Signal
Processing Magazine, IEEE 25, 142–145 (2008).
24. Cichocki, A., Zdunek, R., Plemmons, R. & Amari, S. in ICANNGA-2007 (ed. Science,
L.N.i.C.) 271–280 (Springer, Warsaw, Poland, 2007).
Chapter 61
Feature Selection of Gene Expression Data
Based on Fuzzy Attribute Clustering
Abstract In this chapter, a novel approach which uses fuzzy version of k-mode has
been proposed for grouping interdependent genes. The fuzzy approach considers
uncertainty, achieves more stability and consequently improves the classification
accuracy. In addition, other modifications have implemented to fuzzify the selection
of the best features from each cluster. A new method to initialize cluster centers has
also been applied in this work. Moreover, a novel discretization method based on
Fisher criterion is also proposed.
61.1 Introduction
E. Chitsaz (B)
Computer Science & Engineering Department of Shiraz University, Shiraz, Iran
E-mail: chitsaz@cse.shirazu.ac.ir
S.I. Ao and L. Gelman (Eds.), Advances in Electrical Engineering and Computational 717
Science, Lecture Notes in Electrical Engineering.
c Springer Science + Business Media B.V. 2009
718 E. Chitsaz et al.
approaches such as the typical principle component analysis (PCA) method [3] and
the linear discriminant analysis (LDA) method [4], GA [5, 6], Neural Network [7],
Fuzzy Systems [8, 9], mutual information-based feature selection [10, 11]). Selec-
tion of the most influential genes, for classification of samples in micro-arrays, is
one of the well known topics in which many investigations have been carried out. In
the context of microarrays and LDA, wrapper approaches have been proposed [12].
Recently, biclustering algorithms [13, 14] have been proposed to cluster both genes
and samples simultaneously. Wai-Ho Au et al., in 2005 [15], proposed a feature
selection technique based on clustering the correlated features by a semi-k-means
method named k-modes.
In this paper a novel approach which uses fuzzy version of k-mode has been pro-
posed for grouping interdependent genes. The fuzzy approach considers uncertainty,
achieves more stability and consequently improves the classification accuracy. In
addition, other modifications have implemented to fuzzify the selection of the best
features from each cluster. A new method to initialize cluster centers has also been
applied in this work. Moreover, a novel discretization method based on Fisher
criterion [4] is also proposed.
C4.5 [16] classifier has been applied in the final stage to asses the feature selec-
tion process. Leukemia dataset [17], a micro-array data containing 73 samples and
7,129 genes, is used in all experimental results.
In next section, the previous work of feature selection by clustering on Micro-
arrays [15] is explained following by its fuzzy approach in Section 61.3. Exper-
imental results are presented and explained in Section 61.4. Finally we draw our
conclusion in Section 61.5.
The attribute clustering algorithm (ACA) has been applied by Wai-Ho Au et al., in
2005 [15], for grouping, selection, and classification of gene expression data which
consists of a large number of genes (features) but a small number of samples.
This approach finds c disjoint clusters and assigns each feature to one of the
clusters. The genes in each cluster should have high a correlation with each other
while they are low correlated to genes in other clusters. This method uses the
interdependence redundancy measure as the similarity measure.
To cluster features, the k-modes algorithm is utilized which is similar to the well
known clustering method, k-means [18]. Mode of each cluster is defined as one of its
features which has the largest multiple interdependence redundancy measure among
other features in that cluster. The multiple interdependence redundancy measure is
calculated for each feature by Eq.(61.1).
X
MR .Ai / D R Ai W Aj (61.1)
Aj 2 C luster.i /;
j ¤i
61 Feature Selection of Gene Expression Data Based on Fuzzy Attribute Clustering 719
Where, Cluster(i) is the set of features which are in the same cluster with Ai and is
the interdependence measure between the two features, Ai and Aj , which is defined
by Eq. (61.2).
I Ai W Aj
R Ai W Aj D (61.2)
H Ai W Aj
Where, I.Ai W Aj / is the mutual information between Ai and Aj as computed in
Eq. (61.3).
mi mj
X X
I.Ai W Aj / D Pr Ai D i k ^ Aj D jl
kD1 lD1
Pr Ai D i k ^ Aj D jl (61.3)
log
Pr .Ai D i k / Pr Aj D jl
H.Ai W Aj / is used to normalize I.Ai W Aj / in Eq. (61.2). The larger value for
I.Ai W Aj /, the higher interdependency of two features, Ai and Aj . Hence, there
should be some pairs of values for these features which are simultaneously vis-
ited with high frequency and other pairs are less probable. Therefore, having one
of the values, other one may be approximated considering value pairs with high
probability.
K-modes is different from k-means in two points. First, mode of each cluster is
selected as the cluster center instead of the mean. Second, use of Euclidean distance
as the dissimilarity measure is substituted by the interdependency between attributes
as a similarity measure.
In ACA method, genes are grouped into different clusters. A cluster is a set of fea-
tures which are more correlated in comparison with features in other clusters. Hence,
if there is a missed value for one of these features, it may be approximated consid-
ering other co-cluster features one by one. Therefore, a few features in each cluster
may be sufficient to present properties of samples. But the selected features should
be overall correlated with other features in the same cluster. This is the motivation
of selecting features with highest multiple interdependence redundancy measure, as
defined by Eq. (61.1), to represent the cluster.
Computing interdependency is defined on just discrete data types. Therefore, to
determine the interdependence measure between two features, the range of all con-
tinuous features should be first discretized into a finite number of intervals. This is
done by the Optimal Class-Dependent Discretization algorithm (OCDD).
720 E. Chitsaz et al.
A novel method is proposed which combines the effectiveness of ACA with fuzzy
k-means algorithm [19]. In this method each feature is assigned to different clusters
with different degrees. This comes from the idea that each gene may not belong
to just one cluster and it is much better to consider the correlation of each gene
to features in entire clusters. Hence, during the selection of the best features, more
accurate relations between genes are available. The main point here is that in select-
ing each feature, it is considered among the entire clusters not just one. Hence, in
this case a feature, which is not correlated enough with members of one cluster
but its correlation among entire clusters is high, gains more chance to be selected
in comparison with crisp ACA. In this method better features might be selected
for classification stage over crisp method. The stability of fuzzy clustering is also
much higher in comparison with k-modes according to experimental results. In addi-
tion, Fuzzy Attribute Clustering Algorithm (FACA) converges smoothly whereas
k-modes in ACA often oscillates between final states.
In the proposed method matrix Ukm represents the membership degree of each
gene in each cluster where k is the number of clusters that is fixed and determined
at first and m is the number of features. Matrix U is computed by Eq. (61.5).
1
uri D (61.5)
P
k
R.Ai ;c /
2
m1
R.Ai ;r /
cD1
Where, k is number of clusters. uri is membership degree of ith feature in rth cluster
and m is a weighting exponent. Afterwards, uri is normalized by Eq. (61.6).
unew
unew
ri D ri
(61.6)
P
k
uold
li
lD1
unew
unew
ri D ri
(61.7)
P
k
uold
li
lD1
Where, p is the total number of features and r is the cluster number in which the
multiple interdependence redundancy measure of feature Ai is calculated. Hence,
in calculating MRr .Ai / the entire features are considered.
Indeed, fuzzy multiple interdependence redundancy measure should be com-
puted for each cluster separately since each feature is not belonged to just one
cluster. In this approach, mode of a cluster is updated to the feature with the
highest fuzzy multiple interdependence redundancy in that cluster. Fuzzy multiple
61 Feature Selection of Gene Expression Data Based on Fuzzy Attribute Clustering 721
X
k X
p
J D ri R .Ai W
um r/ (61.8)
rD1 i D1
Where, k and p are the number of clusters and features respectively and r is mode
of rth cluster which represents center of that cluster.
Selection of the best features is based on the rank of each feature which is
calculated as Eq. (61.9).
X
k
rank .Ai / D um
ri MRr .Ai / (61.9)
rD1
The final stage is the classification of data points according to selected features. The
following items encouraged us to use C4.5 as the final classifier for assessment of
the proposed feature selection.
Both C4.5 and proposed feature selection method can be applied just on discrete
features.
C4.5 is a well known decision tree technique with high performance which is
used as a benchmark in many articles in the state of the art.
To determine the priority of features, it uses Information gain and Entropy which
are similar to mutual information (I) and joint entropy (H), respectively.
The flowchart of the proposed algorithm is depicted in Fig. 61.1. The stop condition
is satisfied when the value of the cost function defined in Eq. (61.8) is the same
as previous iteration with a predefined precision. A maximum number of iterations
have also been defined.
Initialization of cluster centers (modes) is the only random event in ACA. But
uncertainty concept in FACA and taking membership degree of each feature in each
cluster into consideration leads to more stability in the final clustering result. Never-
theless, a simple initialization procedure is proposed here which seems to improve
the final results as explained in the next section.
A method which is inspired by an initialization method proposed in [20] and [21],
is utilized. All patterns are considered as just one cluster and the feature with the
highest multiple interdependence redundancy measure is chosen as the first initial
cluster center . 1 /. Other centers are selected from features by Eq. (61.10).
2 3
X
r D arg min 4 R Ai W Aj 5 2 r k (61.10)
Ai …Sr1 Aj 2Sr1
722 E. Chitsaz et al.
Where, r is the center of rth cluster, Ai is the ith feature and k is the desired
number of clusters. Sr represents the set f 1 ; 2 ; : : :; r g. Indeed, after selecting
the first cluster center, others are selected from unselected features one by one. In
any step, the feature, which is the least interdependent feature with selected cluster
centers by now, is selected as a new cluster center. It improves the final objective
function as explained in the next section.
In this paper, Leukemia dataset [17] is used as a well known biological dataset. It
is a gene expression micro-array dataset with 73 samples and 7,129 genes (fea-
tures). Due to memory constraints and dynamic implementations, only the first
1,000 features are considered.
61.4.1 Stability
As mentioned in last section, FACA seems to be more stable than ACA. Curve of
objective function Eq. (61.8) (achieved in the final stage of clustering) vs. number
61 Feature Selection of Gene Expression Data Based on Fuzzy Attribute Clustering 723
Fig. 61.2 Objective function vs. the number of clusters. Best, worst, average curves in 20 runs for
random initialization and associated curve with proposed initialization technique. (a) Crisp. (b)
Fuzzy
of clusters, is depicted Figs. 61.2a and b, for crisp and fuzzy approaches, respec-
tively. In these figures, with a fixed number of clusters and specifying the initial
cluster modes randomly, 20 separate clustering runs have been plotted. Best, worst
and average values over these runs are depicted in separate curves. Less difference
between best and worst values for fuzzy approach, in comparison with crisp version,
may show less dependency of the clustering to initial values and consequently more
stability is gained. Also considering the same curve, with the proposed initialization
technique indicates completely better results than the average values for both crisp
and fuzzy approaches, as shown in Fig. 61.2.
Also less slope of these curves for fuzzy approaches shows that the number of
clusters is not influential on the objective function for fuzzy approach as much as
crisp version. Although, objective function should be maximized but, higher objec-
tive value in crisp version is never an evidence of better performance in comparison
with fuzzy version. Indeed, the objective function in the crisp version is summation
of interdependence measure values between high correlated features; whereas, in
fuzzy version, high interdependencies are reduced by membership degree which is
less than one, and this reduction is tried to be counted with low interdependence
measure value and low membership degrees of other feature pairs.
With a predefined number of clusters, objective function is expected to change
after each iteration in fuzzy k-modes. As shown in Fig. 61.3, objective function
oscillates in crisp version whereas, fuzzy objective function converges smoothly.
This smoothness speeds up and guarantees convergence.
724 E. Chitsaz et al.
Fig. 61.3 Objective function vs. iterations, for 4, 5, 6 and 7 clusters. (a) Crisp. (b) Fuzzy
Fig. 61.4 Classification rate resulted by C4.5 on Leukemia dataset for two features per cluster
As another experiment, C4.5 decision tree has been used to classify Leukemia
dataset based on selected features. The resulted classification rate can be encoun-
tered as a measure of assessing the feature selection method. Hence, ACA and Fuzzy
ACA are compared by the resulted classification rate on different number of clus-
ters, as depicted in Figs. 61.4–61.6 for 2, 3 and 4 features per cluster, respectively.
Wai-Ho Au et al. [15] proposed selection of a predefined number of features from
each cluster. But in fuzzy approach, any feature is belonged to all clusters. There-
fore, each feature is first belonged to the cluster with maximum membership degree
before any feature selection based on their ranks. Leaved-One-Out method has been
used here to assess the generalization of classification system. Increasing the num-
ber of features per cluster, improves the classification accuracy for both crisp and
fuzzy approaches.
61 Feature Selection of Gene Expression Data Based on Fuzzy Attribute Clustering 725
Fig. 61.5 Classification rate resulted by C4.5 on Leukemia dataset for three features per cluster
Fig. 61.6 Classification rate resulted by C4.5 on Leukemia dataset for four features per cluster
61.5 Conclusion
In this paper, a fuzzy approach is suggested to group features into clusters in order to
select best ones for classification stage. It proposes a new clustering method which
combines k-modes clustering used in ACA with fuzzy k-means clustering algorithm.
Hence, it leads to more stability, faster convergence and greater classification rate,
in comparison with the previous method.
Both ACA and FACA are based on the interdependence redundancy measure
which is applicable only on discrete data types. Hence, a new method for dis-
cretization of continuous data has also been proposed. Moreover, the initialization
of cluster centers is not random, but a new method has been applied for initialization
in order to increase the probability of better clusters formation.
Further work will extend the suggested discretization method and improve the
similarity measure. Discovering the influential genes on diseases regarding the
selected genes is also worth working on.
726 E. Chitsaz et al.
References
1. I. Guyon and A. Elisseeff, ‘An introduction to variable and feature selection’, Journal of
Machine Learning Research, vol. 3, pp. 1157–1182, 2003.
2. R. Caruana and D. Fratage, ‘Greedy attribute selection’, in Machine Learning: Proceedings of
11th International Conference, San Francisco, CA, pp. 283–288, 1994.
3. T. Joliffe, ‘Principal Component Analysis’, New York: Springer-Verlag, 1986.
4. K. Fukunaga, ‘Introduction to Statistical Pattern Recognition’, New York: Academic, 1972.
5. F. Z. Bril, D. E. Brown, and N. W. Worthy, ‘Fast genetic selection of features for neural
network classifiers’, IEEE Transactions on Neural Networks, vol. 3, pp. 324–328, March 1992.
6. M. L. Raymer, W. F. Punch, E. D. Goodman, L. A. Kuhn, and L. C. Jain, ‘Dimensionality
reduction using genetic algorithms’, IEEE Transactions on Evolutionary Computation, vol. 4,
no. 2, pp. 164–171, 2000.
7. R. Setiono and H. Liu, ‘Neural-network feature selector’, IEEE Transactions on Neural
Networks, vol. 8, pp. 654–662, May 1997.
8. E. C. C. Tsang, D. S. Yeung, and X. Z. Wang, ‘OFFSS: Optimal Fuzzy-Valued Feature Subset
Selection’, IEEE Transactions on Fuzzy Systems, vol. 11, no. 2, pp. 202–213, 2003.
9. M. R. Rezaee, B. Goedhart, B. P. F. Lelieveldt, and J. H. C. Reiber, ‘Fuzzy feature selection’.
PR (32), no. 12, pp. 2011–2019, December 1999.
10. R. Battiti, ‘Using mutual information for selecting features in supervised neural net learning’,
IEEE Transactions on Neural Networks, vol. 5, pp. 537–550, July 1994.
11. N. Kwak and C.-H. Choi, ‘Input feature selection for classification problems’, IEEE Transac-
tions on Neural Networks, vol. 13, pp. 143–159, January 2002.
12. M. Xiong, W. Li, J. Zhao, L. Jin, and E. Boerwinkle, ‘Feature (gene) selection in gene
expression-based tumor classification’, Molecular Genetics and Metabolism, vol. 73, no. 3,
pp. 239–247, 2001.
13. S. C. Madeira and A. L. Oliveira, ‘Biclustering algorithms for biological data analysis: A
survey’, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 1, no.
1, pp. 24–45, January-March 2004.
14. Y. Cheng and G. M. Church, ‘Biclustering of expression data’, Proceedings of Eighth
International Conference Intelligent Systems for Molecular Biology, pp. 93–103, 2000.
15. W.-H. Au, K. C. C. Chan, A. K. C. Wong, Y. Wang, ‘Attribute clustering for grouping, selec-
tion, and classification of gene expression data’, IEEE/ACM Transactions on Computational
Biology and Bioinformatics (TCBB), vol. 2, no. 2, pp. 83–101, 2005.
16. J. R. Quinlan, ‘C4.5: Programs for Machine Learning’, Morgan Kaufmann, San Francisco,
CA, 1993.
17. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller,
M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, ‘Molecular
classification of cancer: class discovery and class prediction by gene expression monitoring’,
vol. 286, no. 5439, pp. 531–537, 1999.
18. T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, ‘A local search
approximation algorithm for k-means clustering’, Computational Geometry, vol. 28, pp. 89–
112, 2004.
19. J. C. Bezdek, ‘Fuzzy mathematics in pattern classification’, Ph.D. thesis, Applied Mathematics
Center, Cornell University, Ithaca, 1973.
20. M. L. Micó, J. Oncina, and E. Vidal, ‘A new version of the nearest-neighbour approximat-
ing and eliminating search algorithm (AESA) with linear preprocessing time and memory
requirements’, Pattern Recognition, vol. 15, pp. 9–17, 1994.
21. M. Taheri and R. Boostani, ‘Novel auxiliary techniques in clustering’, International Confer-
ence on computer science and engineering, 2007.