978 3 642 05253 8
978 3 642 05253 8
Artificial Intelligence
and Computational
Intelligence
International Conference, AICI 2009
Shanghai, China, November 7-8, 2009
Proceedings
13
Series Editors
Randy Goebel, University of Alberta, Edmonton, Canada
Jörg Siekmann, University of Saarland, Saarbrücken, Germany
Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany
Volume Editors
Hepu Deng
RMIT University, School of Business Information Technology
City Campus, 124 La Trobe Street, Melbourne, Victoria 3000, Australia
E-mail: hepu.deng@rmit.edu.au
Lanzhou Wang
China Jiliang University, College of Metrological Technology and Engineering
Hangzhou 310018, Zhejiang, China
E-mail: lzwang@cjlu.edu.cn
Fu Lee Wang
City University of Hong Kong, Department of Computer Science
83 Tat Chee Avenue, Kowloon Tong, Hong Kong, China
E-mail: flwang@cityu.edu.hk
Jingsheng Lei
Hainan University, College of Information Science and Technology
Haikou 570228, China
E-mail: jshlei@hainu.edu.cn
ISSN 0302-9743
ISBN-10 3-642-05252-5 Springer Berlin Heidelberg New York
ISBN-13 978-3-642-05252-1 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
© Springer-Verlag Berlin Heidelberg 2009
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12781256 06/3180 543210
Preface
Organizing Committee
General Co-chairs
Program Committee
Co-chairs
Local Arrangements
Chair
Proceedings
Co-chairs
Publicity
Chair
Sponsorship
Chair
Program Committee
Ahmad Abareshi RMIT University, Australia
Stephen Burgess Victoria University, Australia
Jennie Carroll RMIT University, Australia
Eng Chew University of Technology Sydney, Australia
Vanessa Cooper RMIT University, Australia
Minxia Luo China Jiliang University, China
Tayyab Maqsood RMIT University, Australia
Ravi Mayasandra RMIT University, Australia
Elspeth McKay RMIT University, Australia
Alemayehu Molla RMIT University, Australia
Konrad Peszynski RMIT University, Australia
Siddhi Pittayachawan RMIT University, Australia
Ian Sadler Victoria University, Australia
Pradip Sarkar RMIT University, Australia
Carmine Sellitto Victoria University, Australia
Peter Shackleton Victoria University, Australia
Sitalakshmi Venkatraman University of Ballarat, Australia
Leslie Young RMIT University, Australia
Adil Bagirov University of Ballarat, Australia
Philip Branch Swinburne University of Technology, Australia
Feilong Cao China Jiliang University, China
Maple Carsten University of Bedfordshire, UK
Caroline Chan Deakin University, Australia
Jinjun Chen Swinburne University of Technology, Australia
Richard Dazeley University of Ballarat, Australia
Yi-Hua Fan Chung Yuan Christian University Taiwan, Taiwan
Richter Hendrik HTWK Leipzig, Germany
Furutani Hiroshi University of Miyazaki, Japan
Bae Hyeon Pusan National University, Korea
Tae-Ryong Jeon Pusan National University, Korea
Sungshin Kim Pusan National University, Korea
Wei Lai Swinburne University of Technology, Australia
Edmonds Lau Swinburne University of Technology, Australia
Qiang Li University of Calgary, Canada
Xiaodong Li RMIT University, Australia
Kuoming Lin Kainan University, Taiwan
YangCheng Lin National Dong Hwa University, Taiwan
An-Feng Liu Central South University, China
Liping Ma University of Ballarat, Australia
Costa Marly Federal University of the Amazon, Brazil
Jamie Mustard Deakin University, Australia
Syed Nasirin Brunel University, UK
Lemai Nguyen Deakin University, Australia
Heping Pan University of Ballarat, Australia
Organization IX
Reviewers
Adil Bagirov Chen Lifei Du Junping
Ahmad Abareshi Chen Xiang Du Xufeng
Bai ShiZhong Chen Ling Duan Yong
Bi Shuoben Chen Dongming Duan Rongxing
Bo Rui-Feng Chen Ming Fan Jiancong
Cai Weihua Chen Ting Fan Xikun
Cai Zhihua Chen Yuquan Fan Shurui
Cai Kun Chen Ailing Fang Zhimin
Cao Yang Chen Haizhen Fang Gu
Cao Qingnian Chu Fenghong Fuzhen Huang
Cao Guang-Yi Chung-Hsing Yeh Gan Zhaohui
Cao Jianrong Congping Chen Gan Rongwei
Carmine Sellitto Cui Shigang Gao chunrong
Chai Zhonglin Cui Mingyi Gao Xiaoqiang
Chang Chunguang DeJun Mu Gao Jiaquan
Chen Yong Liang Deng Liguo Gao Jun
Chen Haiming Deng-ao Li Gao Boyong
Chen Yong Ding Yi-jie Gu Xuejing
Chen Pengnian Dingjun Chen Gu Tao
X Organization
Neural Computation
A Uniform Solution to HPP in Terms of Membrane Computing . . . . . . . . 59
Haizhu Chen and Zhongshi He
Information Security
Research of Trust Chain of Operating System . . . . . . . . . . . . . . . . . . . . . . . 96
Hongjiao Li and Xiuxia Tian
A Novel Application for Text Watermarking in Digital Reading . . . . . . . . 103
Jin Zhang, Qing-cheng Li, Cong Wang, and Ji Fang
Immune Computation
Optimization of Real-Valued Self Set for Anomaly Detection Using
Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Liang Xi, Fengbin Zhang, and Dawei Wang
Genetic Algorithms
A GP Process Mining Approach from a Structural Perspective . . . . . . . . . 121
Anhua Wang, Weidong Zhao, Chongchen Chen, and Haifeng Wu
Effects of Diversity on Optimality in GA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Glen MacDonald and Gu Fang
Dynamic Crossover and Mutation Genetic Algorithm Based on
Expansion Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Min Dong and Yan Wu
Multidisciplinary Optimization of Airborne Radome Using Genetic
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Xinggang Tang, Weihong Zhang, and Jihong Zhu
Global Similarity and Local Variance in Human Gene Coexpression
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Ivan Krivosheev, Lei Du, Hongzhi Wang, Shaojun Zhang,
Yadong Wang, and Xia Li
A Grid Based Cooperative Co-evolutionary Multi-Objective
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Sepehr Meshkinfam Fard, Ali Hamzeh, and Koorush Ziarati
Fuzzy Computation
Fuzzy Modeling for Analysis of Load Curve in Power System . . . . . . . . . . 176
Pei-Hwa Huang, Ta-Hsiu Tseng, Chien-Heng Liu, and
Guang-Zhong Fan
Table of Contents XV
Biological Computing
Honey Bee Mating Optimization Vector Quantization Scheme in Image
Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Ming-Huwi Horng
Robotics
Robot Virtual Assembly Based on Collision Detection in Java3D . . . . . . . 270
Peihua Chen, Qixin Cao, Charles Lo, Zhen Zhang, and Yang Yang
XVI Table of Contents
Pattern Recognition
A Novel Character Recognition Algorithm Based on Hidden Markov
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Yu Wang, Xueye Wei, Lei Han, and Xiaojin Wu
New Algorithms for Complex Fiber Image Recognition . . . . . . . . . . . . . . . 306
Yan Ma and Shun-bao Li
Laplacian Discriminant Projection Based on Affinity Propagation . . . . . . 313
Xueping Chang and Zhonglong Zheng
An Improved Fast ICA Algorithm for IR Objects Recognition . . . . . . . . . 322
Jin Liu and Hong Bing Ji
Facial Feature Extraction Based on Wavelet Transform . . . . . . . . . . . . . . . 330
Nguyen Viet Hung
Fast Iris Segmentation by Rotation Average Analysis of
Intensity-Inversed Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Wei Li and Lin-Hua Jiang
Neural Networks
A New Criterion for Global Asymptotic Stability of Multi-delayed
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Kaiyu Liu and Hongqiang Zhang
An Improved Approach Combining Random PSO with BP for
Feedforward Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Yu Cui, Shi-Guang Ju, Fei Han, and Tong-Yue Gu
Fuzzy Multiresolution Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Li Ying, Shang Qigang, and Lei Na
Research on Nonlinear Time Series Forecasting of Time-Delay NN
Embedded with Bayesian Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Weijin Jiang, Yusheng Xu, Yuhui Xu, and Jianmin Wang
An Adaptive Learning Algorithm for Supervised Neural Network with
Contour Preserving Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Piyabute Fuangkhon and Thitipong Tanprasert
Table of Contents XVII
Machine Vision
Object Recognition Based on Efficient Sub-window Search . . . . . . . . . . . . 435
Qing Nie, Shouyi Zhan, and Weiming Li
Machine Learning
A Multi-Scale Algorithm for Graffito Advertisement Detection from
Images of Real Estate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Jun Yang and Shi-jiao Zhu
Intelligent Scheduling
A Controlled Scheduling Algorithm Decreasing the Incidence of
Starvation in Grid Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Minna Liu, Kairi Ou, Yuelong Zhao, and Tian Sun
Others
Formalizing the Modeling Process of Physical Systems in MBD . . . . . . . . 685
Nan Wang, Dantong OuYang, Shanwu Sun, and Chengli Zhao
Erratum
Implementation of On/Off Controller for Automation of
Greenhouse Using LabVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E1
R. Alimardani, P. Javadikia, A. Tabatabaeefar, M. Omid, and
M. Fathi
Abstract. Given the heavy computation, easy saturation and cumulate errors of
conventional direct vector control, the vector control of induction motor based on
simple model is studied and the detailed scheme is described on the basis of the
decomposing and approximating the rotor flux. Because of the direct closed-loop
control of the magnetizing current and the torque current and the complex current
regulator is completed by PI regulator, so the direct vector control of induction
motor is simplified. The experimental results show that the proposed method is
effective in decreasing the dynamic disturbance and has the advantages of the
simplicity of the code program, rare saturation and shocks.
1 Introduction
The vector control can make the induction motor gain the excellent dynamic per-
formance that is similar to the one of the DC motor. Now, the vector control that based
on the rotor field orientated control is paid enough attention because it easily realizes
the absolute decoupling between the magnetizing current and the torque current [1-2].
The direct vector control and the indirect vector control are the two most commonly
used in the vector control of induction motor system. The former has the speed
closed-loop control which includes torque closed-loop and the flux closed-loop control
system, the latter is an open-loop flux control system. There are a large number of
operations such as the differential and the product in the direct vector control. Fur-
thermore, the complex processes of current regulator in the voltage-type inverter bring
the heavy computation [3-5], easy saturation, cumulate errors, and other uncertain
disturbance, resulting in deterioration of system performance.
Therefore, in order to overcome the above-mentioned disadvantages of the direct
vector control and maintain its advantages of performance, the vector control of in-
duction motor based on simple model is studied and the detailed scheme is described in
this paper. After the decomposing and approximating the rotor flux, the magnetizing
current and the torque current are directly close-looped and the complex process of
current regulator in the voltage-type inverter is completed by PI regulator to achieve the
regulation without static error. Thus, the proposed method simplifies the direct vector
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 1–8, 2009.
© Springer-Verlag Berlin Heidelberg 2009
2 Y. Zhang et al.
control of induction motor, and makes the direct vector control system easy, clear, less
code program, rare saturation and shocks.
Park −1t.
ω * Te* i s*t ust* us*α
m, t SA
− ˆ
− Te SB
ωr ψ r* *
ism *
usm
α,β
us*β SC
− ψˆ r
θˆs
iA
pn Lm iB
Lr i iC
st
The following are the main formulas that used in the conventional direct vector
control system of the induction motor.
Rotor flux is expressed as follows
Lm
ψr = ism (1)
Tr p + 1
The feedback of the torque calculation equation is
pn Lm
Te = ist ψ r (2)
Lr
The angular speed of synchronous rotation is given by t-he following formula
Lm ist
ωs = ωr + ωsl = ωr + (3)
Trψ r
In the most example applications, the voltage-type inverter is used widely, so it is
necessary to change the current to voltage. This process is called current regulator. And
the equations of the conversion are
An Experimental Research on Vector Control of Induction Motor 3
⎧ Ls σTr p + 1
⎪usm = Rs (1 + R p T p + 1 )ism
⎪ s r
⎪ Tr p + 1 ist
⎪ − σLs (ωr + )ist
⎪ Tr ism
⎨ (4)
⎪u = [ R ( σLs p + 1) + Ls (σT p + 1)]i
⎪ st s
Rs Tr
r st
⎪
⎪ σT p + 1
+ ωr Ls r )ism
⎪ Tr p + 1
⎩
The parameters of equation (1) to (4) are described as follows:
L2m
σ =1−
Ls Lr
Lm — Mutual inductance
Lr — Rotor inductance
Rs — Stator resistance
Ls — Stator inductance
Tr — Rotor time constant
p — Differential operator
ωr — Rotor angular speed
ωsl — Slip angular speed
pn — The number of pole pairs
ism — M-axis stator current
ist — T-axis stator current
usm — M-axis stator voltage
ust — T-axis stator voltage
From the equation (1) to (3), we can see that the accuracy of estimation of equation (2)
and (3) depend on (1). It shows that the importance of the accuracy of estimation of the
rotor flux. Because of the existence of differential part, the transient drift will appear in
the part of direct current following the transient changes of the digital operation when
the rectangular discrete integrator is used, thus making the value of estimated rotor flux
inaccurate and then reducing the calculation accuracy of the torque and the synchro-
nous speed. Furthermore, the existence of a large number of the differential and product
operations in the equations described by (4) will bring the heavy computation, easy
saturation and cumulate errors. Especially, in the low speed region, the torque is very
prone to oscillate.
Park −1t.
ω* i s*t u st* u s*α
m, t SA
− −i SB
ωr α,β
st
i* *
usm u*sβ SC
sm
−i sm
θˆs
isα
m, t α,β iA
i st
α , β isβ a, b, c iB
ωr ism
Park t. Clarke t.
Fig. 2. The vector control system of induction motor based on simple model
the conventional direct vector control. The system includes the speed control subsys-
tem and the torque current control subsystem. And the inner loop of the speed control
subsystem is the torque current closed-loop control subsystem. The system is com-
posed of three PI regulators, the speed regulator, the torque current regulator and the
magnetizing current regulator.
The process of the main formulas derivation that used in the vector control system
based on simple model are as follows, in which they are discretized to fit DSP computing.
The equation (1) can be rewrote as
Tr pψ r + ψ r = Lm ism (5)
And the rotor flux equation that based on MT reference frame is
ψ r = Lmism + Lr irm (6)
Lσ r is rotor leakage inductance and it is small enough compared with the mutual in-
ductance Lm. So we can consider that Lm and Lr are approximately equal in value ac-
cording to L r = L m + L σ r . Furthermore, in the rotor field orientated control system,
the rotor flux must maintain constant in order to achieve an ideal speed-adjusting
performance under the rating frequency. And above the rating frequency, the flux
weakening control is adopted and is usually achieved through the look-up table. Lmism is
regard as constant and contained by Lrirm . So the equation (6) can be simplified as
ψ r = Lm irm (7)
In equation (7), irm is the M-axis component of rotor current, and applying equation (7)
to (5), we can obtain
dirm
Tr + irm = ism (8)
dt
After discrete differential, equation (8) can be rewrote as
An Experimental Research on Vector Control of Induction Motor 5
Ts
irm (k + 1) = irm (k ) + (ism (k ) − irm (k )) (9)
Tr
Where Ts is the switching period. And as the same method, equation (3) can be modi-
fied as
1 ist (k )
ωs (k + 1) = ωr (k + 1) + (10)
Tr irm (k + 1)
The equation (9) and (10) are derived based on the decomposition and approximation
of the rotor flux, which are considered as crucial observation equations of the system
showed in Fig. 2, being used to calculate the angular speed of synchronous rotation.
Comparing to the conventional direct vector control system, we can find that in the
simplified vector control system it is not to calculate the angular speed of synchronous
rotation by calculating the rotor flux, but to obtain directly through the rotor magnet-
izing current. Then the heavy computational processes are ignored, so the saturation
drift and cumulative errors are avoided. And we will have a good foundation during the
study of the real-time high-performance vector control system.
It can be seen from Fig. 2 that the magnetizing current and the torque current are for
closed-loop control directly rather than the torque and flux in the simplified vector
control system that compared to the conventional vector control. So the calculation of
torque feedback is avoided, and the whole system can regulate the torque and the flux
more quickly and efficiently. In addition, the complex process of current regulator is
included by PI regulator to achieve the regulation without static error. So it make the
system simple, clear and easy computation, and more suitable for real-time
high-performance control.
4 Experiment Analysis
The hardware and software environments of the experiment: the power inverter part is
designed with IGBT (IRGB15B60KD) which is produced by IR. The control system
part adopts the TI 2407DSP EVM by Wintech. The current sensor is LA28-NP pro-
duced by the company of LEM. The resolution of the speed encoder is 1000P/R, and
the type of digital oscilloscope is TektronixTDS2002. Switching frequency is 16 KHz,
switching period Ts = 62.5µs. PI parameters: in the current loop, P=1.12, I=0.0629 and
in the speed loop, P= 4.89, I=0.0131.
The parameters of the induction motor: the rated power: Pn =500W, the rated cur-
rent: In =2.9A, the rated voltage: Vn=220V, the rotor resistance: Rr =5.486Ω, the stator
resistance: Rs =4.516Ω, the mutual inductance: Lm =152mH, Lσr =13mH, Lr =165mH,
pn=2, Ls =168mH.
Fig. 3 gives the trajectory of the total flux circle when the operation frequency f
=5Hz in the vector control system of induction motor based on simple model. Ac-
cording to the experimental waveform, we can find that the trajectory of the total flux
circle is near round and that the system has a good running status when the system is
running at low speed.
6 Y. Zhang et al.
Ψβ (pu)
Ψα (pu)
Fig. 3. The total flux circle with the operation frequency f =5Hz
/450rpm/div
/450rpm/div
ωr
ωr
iA /1.5A/div
ist (pu)
t/50ms/div t/100ms/div
(a) rotor speed and phase current (b) rotor speed and torque current
Fig. 4. The start-up of induction motor with no-load
ism (pu)
ωr
ist (pu)
ist (pu)
t/100ms/div t/500ms/div
(a) rotor speed and torque current (b) magnetizing current and torque current
The experimental waveforms that under the random load when the motor runs at the
steady state of f =10Hz is represented by Fig. 5. Fig. 5a gives the waveforms of the rotor
speed and torque current, and we can see that the maximum dynamic speed downhill is
about 16% and the recovery time is less than the time of disturbance operation. Fig. 5b
shows the waveforms of the magnetizing current and torque current, and we can find
that the magnetizing current ism always remain unchanged and be able to get ideal de-
coupling with the torque current. So the vector control of induction motor based on
simple model works well in the anti-disturbance performance.
5 Conclusion
In this paper, the vector control of induction motor based on simple model is studied
and the detailed scheme is described through the decomposing and approximating the
rotor flux. The proposed method makes the system easy and has less code program,
shorter execution time. The experimental results show that the anti-disturbance per-
formance of the simplified vector control system is reliable and efficient and suitable
for real-time high-performance control system.
References
1. Noguchi, T., Kondo, S., Takahashi, L.: Field-oriented control of an induction motor with
robust on-line tuning of its parameters. IEEE Trans. Industry Applications 33(1), 35–42
(1997)
2. Telford, D., Dunnigan, M.W.: Online identification of induction machine electrical parame-
ters for vector control loop tuning. IEEE Trans. Industrial Electronics 50(2), 253–261 (2003)
8 Y. Zhang et al.
3. Zhang, Y., Chen, J., Bao, J.: Implementation of closed-loop vector control system of induc-
tion motor without voltage feedforward decoupler. In: IEEE 21st Annual Applied Power
Electronics Conference and Exposition, APEC, pp. 1631–1633 (2006)
4. Lorenz, Lawson, R.D., Donald, B.: Performance of feedforward current regulators for
field-oriented induction machine controllers. IEEE Trans. Industry Applications 23(1.4),
597–602 (1987)
5. Del Blanco, F.B., Degner, M.W., Lorenz, R.D.: Dynamic analysis of current regulators for
AC motors using complex vectors. IEEE Trans. Industry Applications 35(6), 1424–1432
(1999)
An United Extended Rough Set Model Based on
Developed Set Pair Analysis Method
Abstract. Different with traditional set pair analysis method, a new method to
micro-decompound the discrepancy degree is proposed in line with the distrib-
uting actuality of missing attributes. Integrated with rough set theory, advances
the united set pair tolerance relation and gives corresponding extended rough set
model. Different values of identity degree and discrepancy degree can modulate
the performance of this model and extend it's application range. Then expound
some existed extended rough set model are the subschema of it. At last simula-
tion experiments and conclusion are given, which validate the united set pair
tolerance relation model can improve classification capability.
Keywords: Set pair analysis, Otherness, Rough set, United set pair tolerance
relation.
1 Introduction
Classic rough set theory is based on the equivalence relation, and for a complete
information system. But in real life, acknowledge obtain always face the incomplete
information system for the reason of data measure error or the limitation of data
acquisition. There are two kind of null value: (1)omit but exist;[2]; (2)lost and not
allowed to be compared. Kryszkiewicz[3] established tolerance relationship for
type(1). Stefanowski[4] and others built similar relationship for type(2). Wang GY[5]
proposed the limited tolerance relation by in-depth study of tolerance relations and
similar relation.
The incomplete degree and the distribution of loss data in various systems are dif-
ferent from each other. And any extended rough set model can achieve a good effect
sometimes but not always. Some quantified extended rough set model, like quantified
tolerance relation and quantified limited tolerance relation[6], has a good effect , but the
process of quantifying costs large calculation.
The set pair analysis (Set Pair Analysis, SPA) theory was formally proposed by
China scholars ZHAO Ke-qin in 1989, which is used to study the relationship between
two set [7].It uses "a + bi + cj "as an associated number to deal with uncertain sys-
tem,such as fuzzy, stochastic or intermediary. It studies the uncertain of two sets from
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 9–17, 2009.
© Springer-Verlag Berlin Heidelberg 2009
10 X. Ji et al.
three aspects: identity, difference and opposition. At present, the set pair analysis the-
ory is widely used in artificial intelligence, system control, management deci-
sion-making and other fields [7].
In this paper, we expand the set pair analysis theory firstly, then put forward a united
extended rough set model based on it.This model can convert to different existed
models with different identity and discrepancy thresholds, including tolerance relation,
similarity relation and limited tolerance relation. Experiments on the UCI data sets
show validity ,rationality and effectiveness of this model.
Set Pair is composed of two sets A and B ,namely H=(A,B).Under general circum-
stances (W), it’s a formula:
u W (A , B ) =
S F P
+ i+ j (1)
N N N
Here N is the total number of features, S is the number of identity features, and P is the
number of contrary features, of the set pair discussed. F = N-S-P is the number of
features of the two sets that are neither identity nor contrary. S/N, F/N, and P/N are
called identity degree, discrepancy degree, and contrary degree of the two sets under
certain circumstances respectively. In order to clarify formula(1),we set a=S/N, b=F/N,
c=P/N, and then uW(A,B) can be rewritten as:
u = a + bi + cj (2)
It is obvious that 0 ≤ a, b, c ≤ 1, and the “a”, “b”, and “c” satisfy the equation a + b + c = 1.
An incomplete information system is the following tuple: I=(U,A,F), where U is a
finite nonempty set of objects, A is a finite nonempty set of attributes, Va is a nonempty
∈
set of values of a A, and F={Fl: U→ρ(Va)} is an information function that maps an
∈ ∈
object in U to exactly one value in Va. For every al A and every xi U, if Fl(xi)is a
single point set, then (U,A,F)is a complete information system. If there is some al ∈ A
∈
and some xi U makes Fl(xi) is not a single point set, then (U,A,F)is an incomplete
information system. An incomplete information system is the special case of a com-
plete information system.
Let I=(U,A,F) is an incomplete information system. For any B ⊆ A,the tolerance
relation TR(B) proposed by M.Kryskiewcz is defined as follows:[3]
TR(B)={(x,y) ∈ U×U| ∀ a ∈ A,a(x)=a(y) ∪ a(x)=* ∪ a(y)=*}
Let I=(U,A,F) is an incomplete information system. For any B ⊆ A ,the similarity
relation SR(B) proposed by Stefanowski is defined as follows:[4]
SR(B)={(x,y)∈U×U|∀b∈B,b(x)=b(y)orb(x)=*}
An United Extended Rough Set Model Based on Developed Set Pair Analysis Method 11
Let I=(U,A,F) is an incomplete information system. For any B ⊆ A, the limited toler-
ance relation LTR(B) proposed by WANG GY is defined as follows:[5]
The connection degree u=a +bi +cj can be expanded breadthwise and lengthways ac-
cording to the demand of research The transverse expanding form is:
u(x,y)=a + bi + cj
a1 b1 c1
u(x,y)=a + bi + cj
b1 b2 b3
。
b3=|{a| a(x)=*&a(y)=*}| The "b1" represents that attribute of x is missing. The "b2"
represents that attribute of y is missing. And "b3" represents attribute of x and y are all
missing.
The traditional set pair analysis theory decompose the discrepancy term "bi" ac-
cording to possible value of missing attribute. While in this paper we from the angle of
distribution of missing attributes to decompose discrepancy term "bi".In the later sec-
tion we'll prove that the symmetry and transitivity of binary relation depend on the
threshold of b1 and b2.
B Extended Rough Set Model Based on United Set Pair Tolerance Relation
The connection degree between two objects is composed of three parts: the ratio of
known and same attributes (the identity degree a), the ratio of known but different at-
tributes (contrary degree cj) and the ratio of unknown attributes (discrepancy degree b ).The
contribution of contrary degree to similarity is negative, without consideration to deal
with noise data, so we set contrary degree be 0,that is c=0.According to expanded set pair
analysis theory, we give the united set pair tolerance relation.
(
USPTR B X ) = {x | USPTRB ( x) ∩ X ≠ Φ}
USPTRB ( X ) = {x | USPTRB ( x) ⊆ X }
C Performance Analysis of Extended Rough Set Model Based on United Set Pair
Tolerance Relation
Unified set pair tolerance relation is reflexive,but whether it satisfy symmetry and
transitivity is not a conclusion. When the identity degree and discrepancy degree
thresholds are changed, this extended model will have different performance. We have
three theorems as follows.
∈
Let I= (U,A,V)is an incomplete information system, ∀ x, y U, B ⊆ A,α is threshold
of identity degree a, β1,β2 and β3 are thresholds of discrepancy degree bi respectly.
∈
Proof. Let y USPTRB(x), that is u(x,y)=a+(b1+b2+b3)i, b1≤β1 and b2≤β2. Because
β1≠β2, there may be some y in USPTRB(x) make b1>β2 or b2>β1.
u(y,x)=a+(b2+b1+b3)i, so x ∉ USPTRB(y).
∈
Proof. Firstly prove the situation of β1=0. Let y USPTRB(x) and z ∈USPTR (y),
∈A, if a(x)≠*, then a(y)=a(x). Simul-
B
The performance of united set pair tolerance relation will change with the modification
of identity and discrepancy thresholds, and some representative existed extended rough
set models are its subschema., see Table 1.
14 X. Ji et al.
Table 1. The relationship between united set pair tolerance relation and some existed models
4 Simulation Experiments
In this section, we select three test databases in UCI machine learning repository, Iris
and Zoo and cmc-data. Details see in Tbale2 .We get nine incomplete databases though
getting rid of data by random function, which are I-5%, I-10%, I-30%, Z-5%, Z-10%
,Z-30%,C-5%,C-10% and C-30%.
Let U be test database, E(xi) be the complete equivalent class of xi ,and R(xi) be the
tolerance class of xi under different extended rough set models. We do ten times ran-
dom experiments on every database ,and take their average as final result. The com-
parability( u ) of test result can be computed through formula(1) and formula(2).
|U|
| E(xi ) ∩ R(xi ) |
∑| E(x ) | + | R(x ) | − | E(x ) ∩ R(x ) |
i =1
u= i i i i (3)
|U |
10
∑µ t
(4)
µ= t =1
10
(1) β1=β2
(2) β1≠β2
5 Conclusions
In this paper, we firstly develop set pair analysis method, give a developed mi-
cro-de-compound method of discrepancy degree, then propose united set pair tolerance
relation and give corresponding extended rough set model. The threshold α, β1 and β2
can adjust according to the incomplete degree of databases. So this model is more
objective, flexible and effective. The experiments on three databases of UCI, which are
Iris, Zoo and cmc-data, can clearly manifest the advantage of new methods in this
paper. But how to fast calculate the identity and discrepancy degree thresholds is still a
problem need to solve.
Acknowledgement
References
1. Pawlak, Z.: Rough set. International Journal of Computer and Information Sciences 11(5),
341–356 (1982)
2. Grzymala Busse, J.W.: On the unknown attribute values in learning from examples. In: Raś,
Z.W., Zemankova, M. (eds.) ISMIS 1991. LNCS (LNAI), vol. 542, pp. 368–377. Springer,
Heidelberg (1991)
An United Extended Rough Set Model Based on Developed Set Pair Analysis Method 17
3. Stefanowski, J., Tsoukias: A incomplete information tables and rough classification. Com-
putational Intelligence 17, 545–566 (2001)
4. Kryszkiewicz, M.: Rough Set Approach to Incomplete Information System. Information
Sciences (S0020-0255) 112(1/4), 39–49 (1998)
5. Wang, G.-y.: The extension of rough set theory in incomplete information system. Com-
puter Research and Development 39(10), 1238–1243 (2002)
6. Sun, C.-m., Liu, D.-y., Sun, S.-y.: Research of rough set method oriented to incomplete
information system. Mini-micro Computer System 10(10), 1869–1873 (2007)
7. Zhao, K.-q.: Set pair and its prilimitary applications. Hangzhou Zhejiang Science Publishers
(2000)
8. Xu, Y., Li, L.-s., Li, X.-j.: Extended rough set model based on set pair power. Journal of
System Simulation 20(6), 1515–1522 (2008)
9. Stefanowski, J., Tsoukias, A.: On the Extension of Rough Sets under Incomplete Informa-
tion. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI),
vol. 1711, pp. 73–82. Springer, Heidelberg (1999)
10. Lei, Z., Lan, S.: Rough Set Model Based on New Set Pair Analysis. Fuzzy Systems and
Mathmatics (S1001-7402) 20(4), 111–116 (2006)
Learning Rules from Pairwise Comparison Table
1 Introduction
Multiple Criteria Decision Analysis (MCDA) aims at helping a decision maker (DM)
to prepare and make a decision where more than one point of view has to be consid-
ered. There are three major models used until now in MCDA: (1) the functional
model expressed in terms of a utility function within multiple attribute utility theory
[1]; (2)the relational model expressed in the form of an outranking relation [2] and a
fuzzy relation [3]; (3) the function-free model expressed in terms of symbolic forms,
like “if… then…” decision rules [6] or decision trees, or in a sub-symbolic form, like
artificial neural nets [4].
Both functional and relational models require that the DM gives some preference
information, such as importance weights, substitution ratios and various thresholds
on particular criteria, which is often quite difficult for DMs not acquainted with the
MCDA methodology [4]. According to Slovic [5], people make decisions by search-
ing for rules that provide good justification of their choices. The decision rule ap-
proach [6] follows the paradigm of artificial intelligence and inductive learning.
These decision rules are induced from preference information supplied by the DM in
terms of some decision examples. Therefore, the decision rules are intelligible and
speak the language of the DM [7].
Roy [8] has stated that the objective of an MCDA is to solve one of the following
five typologies of problems: classification, sorting, choice, ranking and description.
Classification concerns an assignment of a set of actions to a set of pre-defined
classes. The actions are described by a set of regular attributes and the classes are not
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 18–27, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Learning Rules from Pairwise Comparison Table 19
Let C be the set of criteria used for evaluation of actions from A. For any criterion
q∈C, let Tq be a finite set of binary relations defined on A on the basis of the evalua-
tions of actions from A with respect to the considered criterion q, such that for every
(x, y)∈A×A exactly one binary relation t∈Tq is verified.
The preferential information has the form of pairwise comparisons of reference ac-
tions from B⊆A, considered as exemplary decisions. The pairwise comparison table
(PCT) is defined as data table SPCT =(B, C∪{d}, TC∪Td, g), where B⊆B×B is a non-
empty set of exemplary pairwise comparisons of reference actions, TC= ∪ q∈C Tq , d is a
20 L. An and L. Tong
Let CO be the set of criteria expressing preferences on an ordinal scale, and CN, the set
of criteria expressing preferences on a quantitative scale or a numerical nonquantita-
tive scale, such that CO∪CN=C and CO∩CN=∅. Moreover, for each P⊆C, let PO be the
subset of P composed of criteria expressing preferences on an ordinal scale, i.e.
PO=P∩CO, and PN, the subset of P composed of criteria expressing preferences on a
quantitative scale or a numerical non-quantitative scale, i.e. PN=P∩CN. For each P⊆C,
we have P=PO∪PN and PO∩PN=∅. The following three situations are considered:
(1) P=PN and PO=∅.
The exemplary pairwise comparisons made by the DM can be represented in terms
of graded preference relations Pqh : for each q∈C and for every (x, y)∈A×A, Tq={ Pqh :
h∈Hq}, where Hq is a particular subset of the relative integers and
• x Pqh y, h>0, means that action x is preferred to action y by degree h with respect
to the criterion q,
• x Pqh y, h<0, means that action x is not preferred to action y by degree h with re-
spect to the criterion q,
• x Pq0 y means that action x is similar to action y with respect to the criterion q.
For each q∈C and for every (x, y)∈A×A, [x Pqh y, h>0]⇒[y Pqk x, k≤0] and [x Pqh y,
h<0]⇒[y Pqk x, k≥0].
Given PN⊆C (PN≠∅), (x, y), (w, z)∈A×A, the pair of actions (x, y) is said to domi-
nate (w, z), taking into account the criteria from PN, denoted by (x, y) D P N (w, z), if x
is preferred to y at least as strongly as w is preferred to z with respect to each q∈PN.
Precisely, “at least as strongly as” means “by at least the same degree”, i.e. hq≥kq,
where hq, kq∈Hq, x Pq q y and w Pq q z, for each q∈P.
h k
Exact definition of the cumulated preferences, for each (x, y)∈A×A, q∈C and
h∈Hq, is the following:
Given P⊆C and (x, y)∈A×A, the P-dominating set and the P-dominated set are de-
fined, respectively, as:
• a set of pairs of actions dominating (x, y), DP+ (x, y)={(w, z)∈A×A: (w, z)DP(x, y)},
• a set of pairs of actions dominated by (x, y), DP− (x, y)={(w, z)∈A×A: (x, y)DP(w, z)}.
Using the approximations of S and Sc based on the dominance relation defined above, it
is possible to induce a generalized description of the available preferential information
in terms of decision rules. The decision rules have in this case the following syntax:
(1) Df -decision rules
IF x Pqf1 h ( q1 ) y and x Pqf2h ( q2 ) y and… x Pqfeh ( qe ) y and cqe +1 (x) f rqe +1 and cqe +1 (x) p sqe +1
and … cq p (x) f rq p and cq p (x) p s q p , THEN xSy,
where P={ql, q2, …, qp}⊆C, PN={ql, q2, …, qe}, PO={qe+l, qe+2, …, qp}, (h(ql), h(q2),
…, h(qp))∈ H q1 × H q2 ×…× H q p and ( rqe +1 , …, rq p ), ( sqe +1 , …, s q p )∈ C qe +1 ×…× C q p .
These rules are supported by pairs of actions from the P-lower approximation of S
only.
(2) Dp -decision rules
IF x Pqp1 h ( q1 ) y and x Pqp2h ( q2 ) y and… x Pqpeh ( qe ) y and cqe +1 (x) p rqe +1 and cqe +1 (x) f sqe +1
and … cq p (x) f rq p and cq p (x) p s q p , THEN xScy,
where P={ql, q2, …, qp}⊆C, PN={ql, q2, …, qe}, PO={qe+l, qe+2, …, qp}, (h(ql), h(q2),
…, h(qp))∈ H q1 × H q2 ×…× H q p and ( rqe +1 , …, rq p ), ( sq e +1 , …,
sq p )∈ Cqe +1 ×…× Cq p .
These rules are supported by pairs of actions from the P-lower approximation of Sc
only.
(3) Df p -decision rules
ph ( qe+1 )
IF x Pqf1 h ( q1 ) y and x Pqf2h ( q2 ) y and… x Pqfeh ( qe ) y and x Pqe+1 y and x Pqpeh+ 2( qe+ 2 ) y and…
ph ( q f )
x Pq f y cq f +1 (x) f rq f +1 and cq f +1 (y) p sq f +1 and … cqg (x) f rq g and cqg (x) p sqg
and cq g +1 (x) p rqg +1 and cq g +1 (y) f s qg +1 and … cq p (x) p rq p and cq p (x) f s q p , THEN
xSy or xScy,
where O′={ql, q2, …, qe}⊆C, O″={qe+l, qe+2, …, qf}⊆C, PN=O′∪O″, O′ and O″ not
necessarily disjoint, PO={qf+l, qf+2, …, qf}, (h(ql), h(q2), …,
h(qf))∈ H q1 × H q2 ×…× H q f and ( rq f +1 , …, rq p ), ( sq f +1 , …, s q p )∈ C q f +1 ×…× C q p .
These rules are supported by pairs of actions from the P-boundary of S and Sc only.
Applying the decision rules induced from a given SPCT, a final recommendation for
choice or ranking can be obtained upon a suitable exploitation of this structure [12].
the concepts of decision matrices and decision functions to generate the minimal
decision rules.
where m[(x, y), (w, z)]={pi, qj: (x, y) D{ pi } (w, z), h pi > k pi , (x, y) D{1q j } (w, z) and
cq j (x)≠ cq j (y) or cq j (w)≠ cq j (z), (x, y) D{2q j } (w, z) and cq j (x)≠ cq j (w) or
cq j (y)≠ cq j (z)}.
Definition 2. Let M(S) be the decision matrix of S in SPCT. The decision function of
(x, y) with respect to M(S), is defined as:
fS[(x, y)]= ∧ { (∨ pi* ) ∨ (∨ q *j ) : pi, qj∈m[(x, y), (w, z)] and m[(x, y), (w, z)]≠∅}.
( w, z ) i j
where pi* and q *j are corresponding to the attribute pi and qj, respectively.
The decision function fS[(x, y)] is a Boolean function that expresses how a pair (x,
y)∈S can be discerned from all of the pairs (w, z)∈SC. Turning fS[(x, y)] into disjunc-
tive normal form, the prime implicants of fS[(x, y)] reveal the minimal subsets of P
that are needed to discern the pair (x, y) from the pair in SC and correspond to the
minimal Df -decision rules.
Similarly, we can define the decision matrix of SC, M(SC), and the decision func-
tion with respect to M(SC).
An example. Let us consider the example used in [6]. The students are evaluated
according to the level in Mathes, Phys and Lit. Marks are given on a scale from 0 to
20. Three students presented in Table 1 are considered.
Comprehensive
Pair of students Maths Phys Lit
outranking relation
(a, a) 18, 18 16, 16 10, 10 S
(a, b) 18, 10 16, 12 10, 18 S
(a, c) 18, 14 16, 15 10, 15 Sc
(b, a) 10, 18 12, 16 18, 10 Sc
(b, b) 10, 10 12, 12 18, 18 S
(b, c) 10, 14 12, 15 18, 15 Sc
(c, a) 14, 18 15, 16 15, 10 S
(c, b) 14, 10 15, 12 15, 18 S
(c, c) 14, 14 15, 15 15, 15 S
The analogous partial preorders can be induced using dominance relations on Phys
and on Lit. The entities to compute the lower and upper approximations of S are
shown in Table 3.
Pair of
D{+Maths} (x, y) D{+Physics } (x, y) D{+Literature} (x, y) D P+ (x, y)
students
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c), (b, (a, a), (b, a), (b, b), (a, a), (b,
(a, a)
(b, b), (c, b), (c, c) b), (c, b), (c, c) (b, c), (c, a), (c, c) b), (c, c)
(a, a), (a, b), (a, c),
(a, b) (a, b) (a, b) (b, a), (b, b), (b, c), (a, b)
(c, a), (c, b), (c, c)
(a, a), (a, b), (a, c),
(a, c) (a, b), (a, c) (a, b), (a, c) (b, a), (b, b), (b, c), (a, c)
(c, a), (c, b), (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c),
(b, a) (b, a), (b, b), (b, c), (b, a), (b, b), (b, c), (b, a) (b, a)
(c, a), (c, b), (c, c) (c, a), (c, b), (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c), (a, a), (b, a), (b, b), (a, a), (b,
(b, b)
(b, b), (c, b), (c, c) (b, b), (c, b), (c, c) (b, c), (c, a), (c, c) b), (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c),
(b, c) (b, b), (b, c), (c, b), (b, b), (b, c), (c, b), (b, a), (b, c) (b, c)
(c, c) (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c),
(c, a) (b, b), (c, a), (c, b), (b, b), (c, a), (c, b), (b, a), (c, a) (c, a)
(c, c) (c, c)
(a, a), (b, a), (b, b),
(c, b) (a, b), (c, b) (a, b), (c, b) (b, c), (c, a), (c, b), (c, b)
(c, c)
(a, a), (a, b), (a, c), (b, (a, a), (a, b), (a, c), (a, a), (b, a), (b, b), (a, a), (b,
(c, c)
b), (c, b), (c, c) (b, b), (c, b), (c, c) (b, c), (c, a), (c, c) b), (c, c)
P (S)={(a, a), (a, b), (b, b), (c, a), (c, b), (c, c)},
P (S)={(a, a), (a, b), (b, b), (c, a), (c, b), (c, c)},
BnP(S)=∅.
Learning Rules from Pairwise Comparison Table 25
The entities to compute the lower and upper approximations of SC are shown in
Table 4.
Pair of
D{−Maths} (x, y) D{−Physics } (x, y) D{−Literature} (x, y) D P− (x, y)
students
(a, a), (b, a), (b, b), (a, a), (b, a), (b, b), (a, a), (a, b), (a, c), (a, a), (b,
(a, a)
(b, c), (c, a), (c, c) (b, c), (c, a), (c, c) (b, b), (c, b), (c, c) b), (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c),
(a, b) (b, a), (b, b), (b, c), (b, a), (b, b), (b, c), (a, b) (a, b)
(c, a), (c, b), (c, c) (c, a), (c, b), (c, c)
(a, a), (a, c), (b, a), (a, a), (a, c), (b, a),
(a, c) (b, b), (b, c), (c, a), (b, b), (b, c), (c, a), (a, b), (a, c) (a, c)
(c, c) (c, c)
(a, a), (a, b), (a, c),
(b, a) (b, a) (b, a) (b, a), (b, b), (b, c), (b, a)
(c, a), (c, b), (c, c)
(a, a), (b, a), (b, b), (a, a), (b, a), (b, b), (a, a), (a, b), (a, c), (a, a), (b,
(b, b)
(b, c), (c, c) (b, c), (c, c) (b, b), (c, b), (c, c) b), (c, c)
(a, a), (a, b), (a, c),
(b, c) (b, a), (b, c) (b, a), (b, c) (b, b), (b, c), (c, b), (b, c)
(c, c)
(a, a), (a, b), (a, c),
(c, a) (b, a), (c, a) (b, a), (c, a) (b, b), (c, a), (c, b), (c, a)
(c, c)
(b, a), (b, b), (b, c), (b, a), (b, b), (b, c),
(c, b) (a, b), (c, b) (c, b)
(c, a), (c, b), (c, c) (c, a), (c, b), (c, c)
(a, a), (b, a), (b, b), (a, a), (b, a), (b, b), (a, a), (a, b), (a, c), (a, a), (b,
(c, c)
(b, c), (c, a), (c, c) (b, c), (c, a), (c, c) (b, b), (c, b), (c, c) b), (c, c)
P (Sc)={(a, c), (b, a), (b, c)}, P (Sc)={(a, c), (b, a), (b, c)}, BnP(S)=∅.
The decision matrix of S in SPCT,
( a, c ) (b, a) (b, c)
( a, a ) Lit Maths, Phys Maths, Phys
( a, b) Maths, Phys Maths, Phys Maths, Phys
M(S)= (b, b) Lit Maths, Phys Maths, Phys
( c, a ) Lit Maths, Phys ∅
( c, b ) ∅ Maths, Phys Maths, Phys
(c , c ) Lit Maths, Phys Maths, Phys
fS[(c, b)]=Maths∨Phys,
fS[(c, c)]=(Lit∧Maths)∨(Lit∧Phys).
Then, the following minimal Df -decision rules can be obtained:
IF (cMaths(x) f 18 and cMaths(y) p 18) and (cLit(x) f 10 and cLit(y) p 10) THEN xSy,
(a, a),
IF (cPhys(x) f 16 and cPhys(y) p 16) and (cLit(x) f 10 and cLit(y) p 10) THEN xSy, (a, a),
IF cMaths(x) f 18 and cMaths(y) p 10 THEN xSy, (a, b),
IF cPhys(x) f 16 and cPhys(y) p 12 THEN xSy, (a, b),
IF (cMaths(x) f 10 and cMaths(y) p 10) and (cLit(x) f 18 and cLit(y) p 18) THEN xSy,
(b, b),
IF (cPhys(x) f 12 and cPhys(y) p 12) and (cLit(x) f 18 and cLit(y) p 18) THEN xSy, (b, b),
IF (cMaths(x) f 14 and cMaths(y) p 18) and (cLit(x) f 15 and cLit(y) p 10) THEN xSy,
(c, a),
IF (cPhys(x) f 15 and cPhys(y) p 16) and (cLit(x) f 15 and cLit(y) p 10) THEN xSy, (c, a),
IF cMaths(x) f 14 and cMaths(y) p 10 THEN xSy, (c, b),
IF cPhys(x) f 15 and cPhys(y) p 12 THEN xSy, (c, b),
IF (cMaths(x) f 14 and cMaths(y) p 14) and (cLit(x) f 15 and cLit(y) p 15) THEN xSy,
(c, c),
IF (cPhys(x) f 15 and cPhys(y) p 15) and (cLit(x) f 15 and cLit(y) p 15) THEN xSy, (c, c).
Similarly, we can obtain the Dp -decision rules as follows.
IF (cMaths(x) p 18 and cMaths(y) f 14) and (cLit(x) p 10 and cLit(y) f 15) THEN xSCy,
(a, c),
IF (cPhys(x) p 16 and cPhys(y) f 15) and (cLit(x) p 10 and cLit(y) f 15) THEN xSCy, (a, c),
IF cMaths(x) p 10 and cMaths(y) f 18 THEN xSCy, (b, a),
IF cPhys(x) p 18 and cPhys(y) f 10 THEN xSCy, (b, a),
IF cMaths(x) p 10 and cMaths(y) f 14 THEN xSCy, (c, b),
IF cPhys(x) p 18 and cPhys(y) f 15 THEN xSCy, (c, b).
4 Conclusions
Learning decision rules from preference-ordered data differs from usual machine learn-
ing, since the former involves preference orders in domains of attributes and in the set of
decision classes. This requires that a knowledge discovery method applied to prefer-
ence-ordered data respects the dominance principle, which is addressed in the Domi-
nance-Based Rough Set Approach. This approach enables us to apply a rough set
approach to multicriteria choice and ranking. In this paper, we propose the concepts of
decision matrices and decision functions to generate the minimal decision rules from a
pairwise comparison table. Then, the decision rules can be used to obtain a recommen-
dation in multicriteria choice and ranking problems. We will further present some
extensions of the approach that make it a useful tool for other practical applications.
Learning Rules from Pairwise Comparison Table 27
References
1. Keeney, R.L., Raiffa, H.: Decision with Multiple Objectives-Preferences and Value Trade-
offs. Wiley, New York (1976)
2. Roy, B.: The Outranking Approach and the Foundation of ELECTRE Methods. Theory
and Decision 31(1), 49–73 (1991)
3. Fodor, J., Roubens, M.: Fuzzy Preference Modelling and Multicriteria Decision Support.
Kluwer, Dordrecht (1994)
4. Zopounidis, C., Doumpos, M.: Multicriteria Classification and Sorting Methods: A litera-
ture Review. European Journal of Operational Research 138(2), 229–246 (2002)
5. Slovic, P.: Choice between Equally-valued Alternatives. Journal of Experimental Psychol-
ogy: Human Perception Performance 1, 280–287 (1975)
6. Greco, S., Matarazzo, B., Slowinski, R.: Rough Set Theory for Multicriteria Decision
Analysis. European Journal of Operational Research 129(1), 1–47 (2001)
7. Fortemps, P., Greco, S., Slowinski, R.: Multicriteria Decision Support Using Rules That
Represent Rough-Graded Preference Relations. European Journal of Operational Re-
search 188(1), 206–223 (2008)
8. Roy, B.: Méthodologie multicritère d’aide à la décision. Economica, Paris (1985)
9. Pawlak, Z.: Rough Sets. International Journal of Computer and Information
Sciences 11(5), 341–356 (1982)
10. Pawlak, Z., Slowinski, R.: Rough Set Approach to Multi-attribute Decision Analysis.
European Journal of Operational Research 72(3), 443–459 (1994)
11. Greco, S., Matarazzo, B., Slowinski, R.: Rough Sets Methodology for Sorting Problems in
Presence of Multiple Attributes and Criteria. European Journal of Operational Re-
search 138(2), 247–259 (2002)
12. Greco, S., Matarazzo, B., Slowinski, R.: Extension of the Rough Set Approach to Multicri-
teria Decision Support. INFOR 38(3), 161–193 (2000)
13. Greco, S., Matarazzo, B., Slowinski, R.: Rough Approximation of a Preference Relation by
Dominance Relations. European Journal of Operational Research 117(1), 63–83 (1999)
14. Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Table.
In: Intelligent Decision Support: Handbook of Applications and Advances of the Rough
Set Theory, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1991)
A Novel Hybrid Particle Swarm Optimization
for Multi-Objective Problems
1 Introduction
Multi-objective Evolutionary Algorithms(MOEAs) are powerful tools to solve
multi-Objective optimize problems(MOPs), it gains popularity in recent years
[1]. Recently, some elitism algorithms are proposed as: NSGA-II [2], SPEA2 [3],
GDE3 [4], MOPSO [5, 6, 7].
NSGA-II adopt a fast non-dominated sorting approach to reduce computer
burden, it uses Ranking and Crowding Distance to choose the candidate solutions
[2]. SPEA2 proposes a fitness assignment with cluster technique, it designs a
truncation operator based on the nearest Neighbor Density Estimation metric [3].
GDE3 is a developed version of Differential Evolution, which is suited for global
optimization with an arbitrary number of objectives and constraints [4].
MOPSO is proposed by Coello et al, it adopts swarm intelligence to optimize
MOPs, and it uses the Pareto-optimal set to guide the particles flight [5]. Sierra
adopt Crowding Distance to filter the leaders, the different mutation methods
are acted on divisions particles [6]; Mostaghim introduces a new Sigam-method
to find local best information to guide particle [7].
The Project was supported by the Research Foundation for Outstanding Young
Teachers, China University of Geosciences(Wuhan)( No:CUGQNL0911).
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 28–37, 2009.
c Springer-Verlag Berlin Heidelberg 2009
A Novel Hybrid Particle Swarm Optimization 29
For one decision variable Xj with the boundary [lj , uj ], then the quantize techni-
cal divide the domain into Q levels αj1 , αj2 , · · · , αjQ , where the design parameter
30 S. Jiang and Z. Cai
When MOEAs get a set of equal good solutions full to the setting size(usually
archiveSize = 100), an accept rule must be designed to decide which one should
be cut off from archive. It’s a critical issue in MOEAs which directly influence
the quality of finally optimal set in convergence and spread metric. Some popular
accept rules have been presented: NSGA-II adopts the Ranking and Crowding
Distance metric, SPEA2 uses the nearest Neighbor Density Estimation metric.
In this paper, we design a new accept rule called Minimum Reduce Hypervol-
ume. Hypervolume is a quality indicator proposed by Zitzler et al, it is adopted
in jMetal 2.1 [14]. Hypervolume calculates the volume covered by members of a
non-dominated set of solutions (the region enclosed into the discontinuous line
respect the worst point W in the figure 1 is ADBECW in dashed line).
If two solutions D and E are non-dominated each other, NSGA-II choose the
solution D remained in archive if CD(D) > CD(E), it maintains the spread
along the Pareto-front.
CD(D) = AD + D B
(6)
CD(E) = BE + E C
D1
D’
CD(D)=AD’+D’B hv(D)=DD1*DD2
D D D2
E1
E’
CD(E)=BE’+E’C hv(E)=EE1*EE2
E E2
E
Fig. 1. The comparison of Crowding Distance and MRV is described in the left and
right figure
The factor scale is designed to equal the influence of crowding distance and
MRV, but if it is too large, we set it is to 1000 when scale > 1000.
5 Experiment Results
Experiment is based on jMetal 2.1 [14], which is a Java-based framework aimed
at facilitating the development of metaheuristics for solving MOPs, it provides
large block reusing code and fair comparison for different MOEAs. The paper
selects the algorithm OMOPSO [6], SMPSO [14] as the compare objectives.
Each algorithm independent runs for 100 times and maximum evolution times
is 25, 000. The test problems are choose from ZDTx, DTLZx problem family.
The performance metrics have five categories:
1
Unary additive epsilon indicator(I+ ). The metric indicator of an approx-
1
imation set A (I+ (A)) gives the minimum factor by which each point in the
A Novel Hybrid Particle Swarm Optimization 33
real front can be added such that the resulting transformed approximation set
is dominated by A:
1
I+ (A) = inf∈R{∀z 2 ∈ R\∃z 1 ∈ A : zi2 ≤ zi1 + ∀i} (9)
Hypervolume. This quality indicator calculates the volume (in the objective
space) covered by members of a nondominated set of solutions with a reference
point.The hypervolume (HV) is calculated by:
|Q|
HV = volume( vi ) (10)
i=1
Inverted Generational Distance. The metric is to measure how far the el-
ements are in the Pareto-optimal set from those in the set of non-dominated
vectors found. It is defined as:
|N | 2
i=1 di
IGD = (11)
N
Generational Distance. The metric is to measure how far the elements are in
the set of non-dominated vectors found from those in the Pareto-optimal set. It
is defined as:
|n| 2
i=1 di
GD = (12)
n
Spread. The Spread indicator is a diversity metric that measures the extent of
spread achieved among the obtained solutions. This metric is defined as:
n−1
df + dl + i=1 |di − d|
∆= (13)
df + dl + (n − 1)d
1
The higher Hypervolume and lower I+ ,GD, IGD, Spread mean the better al-
gorithm. The results are compared with median and interquartile range(IQR)
which measures of location (or central tendency) and statistical dispersion, re-
spectively, the best result is grey background.
From table 2, in term of median and IQR for additive epsilon metric, HPSODE
get best results in all of the 12 MOPs.
From table 3, in term of median and IQR for HyperVolume metric, HPSODE
get best results in all of the 12 MOPs.
From table 4, in term of median and IQR for Genetic Distance metric, HP-
SODE get best results in 11 MOPs, only worse in DTLZ6. OMOPSO get best
results only in 1 MOPs: DTLZ6.
From table 5, in term of median and IQR for Inverted Genetic Distance metric,
HPSODE get best results in 10 MOPs, only worse in ZDT2, DTLZ6. OMOPSO
get best results only in 1 MOPs: ZDT2. SMOPSO get best results only in 1
MOPs: DTLZ6.
34 S. Jiang and Z. Cai
1
Table 2. Unary additive epsilon indicator(I+ ) Median and IQR
From table 6, in term of median and IQR for Spread metric, HPSODE get
best results in all of the 6 MOPs. OMOPSO get best results only in 2 MOPs.
SMOPSO get best results only in 4 MOPs.
The comparison for three algorithms in five categories: epsilon indicator, hy-
pervolume, Generation Distance, Inverted genetic Distance, Spread, it shows
that the new algorithm is more efficient to solve the MOPs. Now, we summarize
the highlight as follows:
1. HPSODE adopt the statistical method Uniform Design to construct the first
population, which can get well distributed solutions in feasible space.
2. HPSODE combine the PSO and Differential Evolution operation to gener-
ate next population. DE operation enhances the diversity for global guide
population.
A Novel Hybrid Particle Swarm Optimization 35
References
1. Coello, C.A.C.: Evolutionary multi-objective optimization: A historical view of the
Field. IEEE Computational Intelligence Magazine 1(1), 28–36 (2006)
2. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjec-
tive genetic algorithm: NSGA - II. IEEE Transactions on Evolutionary Computa-
tion 6(2), 182–197 (2002)
3. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength Pareto evo-
lutionary algorithm, Technical Report 103, Computer Engineering and Networks
Laboratory (2001)
4. Kukkonen, S., Lampinen, J.: GDE3: The third evolution step of generalized dif-
ferential evolution. In: Proceedings of the 2005 IEEE Congress on Evolutionary
Computation (1), pp. 443–450 (2005)
5. Coello Coello, C.A., Toscano Pulido, G., Salazar Lechuga, M.: Handling Multiple
Objectives With Particle Swarm Optimization. IEEE Transactions on Evolutionary
Computation 8, 256–279 (2004)
6. Sierra, M.R., Coello, C.A.C.: Improving PSO-based multi-objective optimization
using crowding, mutation and -dominance. In: Coello Coello, C.A., Hernández
Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 505–519. Springer,
Heidelberg (2005)
7. Mostaghim, S., Teich, J.: Strategies for Finding Good Local Guides in Multi-
objective Particle Swarm Optimization (MOPSO). In: 2003 IEEE Swarm Intel-
ligence Symposium Proceedings, Indianapolis, Indiana, USA, pp. 26–33. IEEE
Service Center (2003)
8. Zhang, W.J., Xie, X.F.: DEPSO: Hybrid particle swarm with differential evolution
operator. In: IEEE International Conference on Systems Man and Cybernetics, pp.
3816–3821 (2003)
9. Fang, K.T., Ma, C.X.: Orthogonal and uniform design. Science Press (2001)
(in Chinese)
A Novel Hybrid Particle Swarm Optimization 37
10. Leung, Y.W., Wang, Y.: An orthogonal genetic algorithm with quantization for
global numerical optimization. IEEE Transactions on Evolutionary Computa-
tion 5(1), 41–53 (2001)
11. Zeng, S.Y., Kang, L.S., Ding, L.X.: An orthogonal multiobjective evolutionary
algorithm for multi-objective optimization problems with constraints. Evolutionary
Computation 12, 77–98 (2004)
12. Cai, Z.H., Gong, W.Y., Huang, Y.Q.: A novel differential evolution algorithm based
on -domination and orthogonal design method for multiobjective optimization.
In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007.
LNCS, vol. 4403, pp. 286–301. Springer, Heidelberg (2007)
13. Leung, Y.-W., Wang, Y.: Multiobjective programming using uniform design and
genetic algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part
C 30(3), 293 (2000)
14. Durillo, J.J., Nebro, A.J., Luna, F., Dorronsoro, B., Alba, E.: jMetal: A
Java Framework for Developing Multi-Objective Optimization Metaheuristics,
Departamento de Lenguajes y Ciencias de la Computación, University of
Málaga, E.T.S.I. Informática, Campus de Teatinos, ITI-2006-10 (December 2006),
http://jmetal.sourceforge.net
Research on Constrained Layout Optimization Problem
Using Multi-adaptive Strategies Particle Swarm
Optimizer∗
Kaiyou Lei
Faculty of Computer & Information Science, Southwest University, Chongqing, 400715, China
lky@swu.edu.cn
1 Introduction
The classical layout problem is generally divided in to two types: Packing problem
and Cutting problem. The main problem is to increase the space utility ratio as much
as possible under the condition of non-overlapping between piece (object) and piece
(object) and between piece (object) and container. They are called as layout problems
without behavioral constraints (unconstrained layout). In recent years, another more
complex layout problem is attracting a lot of attention, such as the layout design of
engineering machine, spacecraft, ship, etc. Solving these kinds of problems need to
consider some additional behavioral constraints, for instance, inertia, equilibrium,
stability, vibration, etc. They are called as layout problem with behavioral constraints
(constrained layout). Constrained layout belong to NP-hard problem, its optimization
is hand highly [1].
As a newly developed population-based computational intelligence algorithm, Par-
ticle Swarm Optimization (PSO) was originated as a simulation of simplified social
model of birds in a flock [2]. The PSO algorithm is easy to implement and has been
proven very competitive in large variety of global optimization problems and applica-
tion areas compared to conventional methods and other meta-heuristics [3].
∗
The work is supported by Key Project of Chinese Ministry of Education (104262).
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 38–47, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Research on Constrained Layout Optimization Problem 39
Since its introduction, numerous variations of the basic PSO algorithm have been
developed in the literature to avoid the premature problem and speed up the conver-
gence process, which are the most important two topics in the research of stochastic
search methods. To make search more effective, there are many approaches suggested
by researchers to solve the problems, such as variety mutation and select a single
inertia weight value methods, etc, but these methods have some weakness in common,
they usually can not give attention to both global search and local search, preferably,
so to trap local optima, especially in complex constrained layout problems [4].
In this paper, particle swarm optimizer with a better search performance is pro-
posed, which employ multi-adaptive strategies to plan large-scale space global search
and refined local search as a whole according to the specialties of constrained layout
problems, and to quicken convergence speed, avoid premature problem, economize
computational expenses, and obtain global optimum. We tested the proposed algo-
rithm and compared it with other published methods on three constrained layout ex-
amples. The experimental results demonstrated that this revised algorithm can rapidly
converge at high quality solutions.
Suppose that the center of the graph plane is origin o of the Cartesian system, the
graph plane and graph units are just in plane xoy, the thickness of the graph units is
ignored, and xi,yi are the coordinates of the center oi of the graph unit i, which is its
mass center also. The mathematical model for optimization of the problem is given
by:
40 K. Lei
{ (x )+ r }
(1)
min F ( X i ) = max + yi
2 2
i i
.t.
s
f 2 (X i ) = (x i
2
+ yi
2
) + r − R ≤ 0,
i i∈I
(3)
2 2 (4)
⎛ n ⎞ ⎛ n ⎞
f3 ( X i ) = ⎜⎜ mi xi ⎟⎟ + ⎜⎜ mi yi ⎟⎟ − [δ J ] ≤ 0, i ∈ I
∑ ∑
⎝ i =1 ⎠ ⎝ i =1 ⎠
( )
v idt +1 = w ∗ v idt + c1 ∗ r1 ∗ p idt − x idt + c 2 ∗ r2 ∗ p gd
t
− x idt ( ) (5)
Constants c1 and c2 are learning rates; r1 and r2 are random numbers uniformly dis-
tributed in the interval [0, 1]; w is an inertia factor.
To speed up the convergence process and avoid the premature problem, Shi pro-
posed the PSO with linearly decrease weight method (LDWPSO)[3,4]. Suppose wmax
is the maximum of inertia weight, wmin is the minimum of inertia weight, run is cur-
rent iteration times, runMax is the total iteration times, the inertia weight is formu-
lated as:
w = wmax − ( wmax − wmin ) * run / runMax (7)
Research on Constrained Layout Optimization Problem 41
The w has the capability to automatically harmonize global search abilities and local
search abilities, avoid premature and gain rapid convergence to global optimum. First
of all, larger w can enhance global search abilities of PSO, so to explore large-scale
search space and rapidly locate the approximate position of global optimum, smaller
w can enhance local search abilities of PSO, particles slow down and deploy refined
local search, secondly, the more difficult the optimization problems are, the more
fortified the global search abilities need, once located the approximate position of
global optimum, the refined local search will further be strengthen to get global opti-
mum[5,6,7].
According to the conclusions above, we first constructed ⑻
as new inertia weight
decline curve for PSO, demonstrated in Figure 2.
t
3.2 Adaptive Difference Mutation Strategy of Global Optimum p gd
Considering that the particles may find the better global optimum in the current best
region, the algorithm is designed to join mutation operation with the perturbation
operator. The runmax1, which is an iteration times of the transformation point, divides
runMax into two segments to respectively mutate according to themselves characteris-
tics, and further enhance the global search abilities and local search abilities to find a
satisfactory solution. pη is the mutation probability within (0.1, 0.3).The computed
equation is defined as:
In equation (3), the graph plane radius is constant, which will influence the fitness
value computation and the result of search. In the search process, the more smallness
the periphery envelope circle radiuses are, the more convergence graph units are.
Evidently, taking the current periphery envelope circle radius as layout radius, the
search progress in smaller radius area, and quicken convergence speed, economize
computational expenses. Suppose the periphery envelope circle radius is Rs, the com-
puted equation is defined as:
t +1 t +1 t +1 t +1
if RS > RS , RS = RS ,esle RS = RS
t t
(10)
Analysis on equation (5), if the best position of all particles has no change for a long
time, then the velocity updated is decided by w*vidt. Due to w<1, the velocity is less
and less, so the particle swarm is tended to get together, farther, the algorithm is
trapped into local optima, premature problem emerged like in genetic algorithms.
Considering that the adaptive difference mutation strategy of Xi is introduced. When
the best position of all particles has no or less change for a long time, the part of all
particles are kept down, deploy refined local search unceasingly; the rest are initial-
ized randomly, so to enhance global search abilities of PSO, break away from the
attraction of local optima, synchronously. Suppose runt is an iteration times, runk,
runk+t are the kth, (k+t)th iteration times, Rk, Rk+t are the maximal radius, accordingly;
Xρ are the initialized particles randomly, ρ is the dimension mutation probability,
accordingly; ε is an error threshold, the computed equation is defined as:
PSO, which is modified as MASPSO by the above methods, has the excellent search
performance to optimize constrained layout problems. According to MASPSO, The
design of the algorithm is as follows.
All particles are coded based on the rectangular plane axis system. Considering that
the shortest periphery envelope circle radius is the optimization criterion based on the
above constraint conditions for our problem, the fitness function and the penalty func-
tion, which are constructed in the MASPSO, respectively, can be defined as:
3
φ 1 (X i )= F (X i ) + ∑i=1
λ i f i (X i )u i ( f i ),
(12)
⎧0 fi ( X i ) ≤ 0
u i ( fi ) = ⎨ , i ∈ I.
⎩1 fi ( X i ) > 0
Research on Constrained Layout Optimization Problem 43
4 Computational Experiments
Taking he literature [8], [9] as examples, we tested our algorithm, the comparison of
statistical results are shown in Tab.1, Tab.2, and Tab.3, respectively.
Parameters used in our algorithm are set to be: c1=c2=1.5, runMax=1000. The run-
ning environment is: MATLAB7.1, Pentium IV 2GHz CPU, 256MRAM, Win
XPOS.
Example1. Seven graph units are contained in the layout problem. The radius of
a graph plane is R=50mm. The allowing value of static non-equilibrium J is
Num- Graph units Literature [8] result Literature [9] result Our result
ber r(mm) m(g) x(mm) y(mm) x (mm) y(mm) x(mm) y(mm)
1 10.0 100.00 -12.883 17.020 14.367 16.453 17.198 -13.618
2 11.0 121.00 8.8472 19.773 -18.521 -9.560 -19.857 6.096
3 12.0 144.00 0.662 0.000 2.113 -19.730 -1.365 19.886
4 11.5 132.00 -8.379 -19.430 19.874 -4.340 18.882 7.819
5 9.5 90.25 -1.743 0.503 -19.271 11.241 -16.967 -14.261
6 8.5 72.25 12.368 -18.989 -3.940 22.157 -0.606 -22.873
7 10.5 110.25 -21.639 -1.799 -0.946 2.824 -0.344 -3.010
44 K. Lei
Table 1. (continued)
(c) The example 1 layout results comparison based on 40 times algorithm running
Times Times
The least circle The least circle
radius included all Literature radius included all Literature
Our Our
graph units (mm) [9] graph units (mm) [9]
algorithm algorithm
algorithm algorithm
≤32.3 10 19 (32.9,33.1] 2 1
(32.3,32.5] 16 11 (33.1,33.3] 2 1
(32.5,32.7] 5 5 >33.3 3 0
(32.7,32.9] 2 3
(a) Literature [8] layout result (b) Literature [9] layout result (c) Our layout result
[gj]=3.4g*mm. The result is in Table 1 and the geometric layout is shown in Figure 3
(the particle size is 50).
Example2. Forty graph units are contained in the layout problem. The radius of a
graph plane is R=880 mm. The radius of a graph plane is R=880 mm. The allowing
value of static non-equilibrium J is [gj]=20g*mm. The result is in Table2 and the
geometric layout is shown in Figure 4 (the particle size is 100).
Research on Constrained Layout Optimization Problem 45
Table 2. (continued)
5 Conclusions
From table1and table2, we can educe that: PSO with four improved adaptive strate-
gies harmonized the large-scale space global search abilities and the refined local
search abilities much thoroughly, which has rapid convergence and can avoid prema-
ture, synchronously. The effectiveness of the algorithm is validated for the con-
strained layout of the NP-hard problems; the algorithm outperformed the known best
ones in the in quality of solutions and the running time. In addition, the parameters of
、、
runt ε ρ are chose by human experience, which has the certain blemish. How to
choose the rather parameters are one of the future works.
References
1. Teng, H., Shoulin, S., Wenhai, G., et al.: Layout optimization for the dishes installed on
rotating table. Science in China (Series A) 37(10), 1272–1280 (1994)
2. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proc. of IEEE Int’l Conf.
Neural Networks, pp. 1942–1948. IEEE Computer Press, Indianapolis (1995)
Research on Constrained Layout Optimization Problem 47
3. Eberhart, R.C., Kennedy, J.: A new optimizer using particles swarm theory. In: Sixth Inter-
national Symposium on Micro Machine and Human Science, pp. 39–43. IEEE Service Cen-
ter, Piscataway (1995)
4. Angeline, P.J.: Using selection to improve particle swarm optimization. In: Proc. IJCNN
1999, pp. 84–89. IEEE Computer Press, Indianapolis (1999)
5. Lei, K., Qiu, Y., He, Y.: A new adaptive well-chosen inertia weight strategy to automati-
cally harmonize global and local search ability in particle swarm optimization. In: 1st Inter-
national Symposium on Systems and Control in Aerospace and Astronautics, pp. 342–346.
IEEE Press, Harbin (2006)
6. Shi, Y., Eberhart, R.C.: A modified particle swarm optimizer. In: Proc. of the IEEE Con.
Evolutionary Computation, pp. 69–73. IEEE Computer Press, Piscataway (1998)
7. Jianchao, Z., Zhihua, C.: A guaranteed global convergence particle swarm optimizer. Jour-
nal of Computer Research and Development 4(8), 1334–1338 (2004) (in Chinese)
8. Fei, T., Hongfei, T.: A modified genetic algorithm and its application to layout optimization.
Journal of Software 10(10), 1096–1102 (1999) (in Chinese)
9. Ning, L., Fei, L., Debao, S.: A study on the particle swarm optimization with mutation
operator constrained layout optimization. Chinese Journal of Computers 27(7), 897–903
(2004) (in Chinese)
ARIMA Model Estimated by Particle Swarm
Optimization Algorithm for
Consumer Price Index Forecasting
1 Introduction
The ARIMA, which is one of the most popular models for time series forecast-
ing analysis, has been originated from the autoregressive model (AR) proposed
by Yule in 1927, the moving average model (MA) invented by Walker in 1931
and the combination of the AR and MA, the ARMA models [1]. The ARMA
model can be used when the time series is stationary, but the ARIMA model
hasn’t that limitation. ARIMA model pre-processes the original time series by
some methods to make the obtained series stationary so that we can then ap-
ply the ARMA model. For example, for an original series which has only serial
dependency and hasn’t seasonal dependency, if the dth differenced series of this
original series is stationary, we can then apply the ARMA(p, q) model to the
dth differenced series and we say the original time series satisfy the modeling
condition of ARIMA(p, d, q) which is one kind of the ARIMA models. So the
Corresponding author.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 48–58, 2009.
c Springer-Verlag Berlin Heidelberg 2009
ARIMA Model Estimated by Particle Swarm Optimization Algorithm 49
Then we can obtain a stationary time series with zero mean {x1 , x2 , . . . , xN −d },
where xt = zt −z̄. So the problem of modeling ARIMA(p, d, q) for {y1 , y2 , . . . , yN }
converts to the problem of modeling ARMA(p, q) for {x1 , x2 , . . . , xN −d }. And
the generalized form of ARMA(p, q) can be described as follow
where
ϕ(B) = 1 − ϕ1 B − ϕ2 B 2 − · · · − ϕp B p
θ(B) = 1 − θ1 B − θ2 B 2 − · · · − θq B q
and at is stationary white noise with zero mean, ϕi (i = 1, 2, . . . , p) and θi (i =
1, 2, . . . , q) are coefficients of the ARMA model.
On the other hand, for identifying the orders of ARMA model, autocorrela-
tion and partial autocortrelation graphs which are drawn based on 1 ∼ M lag
ARIMA Model Estimated by Particle Swarm Optimization Algorithm 51
numbers provide information about the AR and MA orders (p and q) [1]. Con-
cretely, p is determined as the number of the front a few significant coefficients
in the partial autocorrelation graph and similarly q is determined as the number
of the front a few significant coefficients in the autocorrelation graph.
Then we can estimate {θi } and σa2 which is the variance of {at } in (7).
γ̂0 (x̄) = σ̂a2 (1 + θ̂12 + · · · + θ̂q2 )
(7)
γ̂k (x̄) = σ̂a2 (−θ̂k + θ̂k+1 θ̂1 + · · · + θ̂q θ̂q−k ), k = 1, 2, . . . , q
p p
where γ̂k (x̄) = j=0 l=0 ϕ̂j ϕ̂l γ̂k+l−j , k = 0, 1, . . . , q
So if q = 1, according to (7) and the reversibility(that is |θ1 | < 1) of ARMA(p, q)
model, we can calculate σa2 and θ1 using (8) and (9).
2 γ̂0 (x̄) + γ̂02 (x̄) − 4γ̂12 (x̄)
σ̂a = (8)
2
Then
γ̂1 (x̄)
θ̂1 = − (9)
σ̂a2
But if q > 1, we can see it is difficult to solute (7). At this time, we can change
(7) to (10) to solve this problem by using linear iterative method.
⎫
σ̂a2 = γ̂0 (x̄)(1 + θ̂12 + · · · + θ̂q2 )−1 ⎪
⎬
γ̂k (x̄) (10)
θ̂k = −( 2 − θ̂k+1 θ̂1 − · · · − θ̂q θ̂q−k ), k = 1, 2, . . . , q ⎪
⎭
σ̂a
At first, give initial values for {θˆi } and σ̂a2 , for example, let θ̂i = 0 (i = 1, 2, . . . , q)
and σ̂a2 = γ̂0 (x̄) and mark them as {θ̂i (0)} and σ̂a2 (0). Then substitute them into
(10), we can obtain {θ̂i (1)} and σ̂a2 (1). The rest may be deduced by analogy, so
we can obtain {θ̂i (m)} and σ̂a2 (m). If max{|θ̂i (m) − θ̂i (m − 1)|, |σ̂a2 (m) − σ̂a2 (m −
1)|} is very small, we can consider {θ̂i (m)} and σ̂a2 (m) are the approximatively
estimated values of {θi } and σa2 .
52 H. Wang and W. Zhao
From the above, we can see that the moment estimation is cumbersome to
achieve. And when q > 1, the approximate results of the parameters is unpre-
dictable and this might greatly impact the accuracy of forecast. So a method of
PSO for model estimating is proposed in Section 3, and we can see this method
has a simple principle and can be implemented with ease. What’s more, the PSO
estimation for ARIMA model (simply marked as PSOARIMA) has a higher fore-
casting precision than the ARIMA model which will be shown in Section 4.
where
ϕi 1≤i≤p θi 1≤i≤q
ϕi = , θi =
0 i>p 0 i>q
Through comparing the terms which have the same degree of B in (12), we can
obtain {πi } in (13), see [7].
⎧
⎪ π1 = ϕ1 − θ1
⎪
⎨
i−1
(13)
⎪
⎩ πi = ϕi − θi +
⎪ θi−j πj , i>1
j=1
K
x̂t = πi x̂t−i , if t − i ≤ 0, let xt−i = 0 (14)
i=1
xk+1
i = xki + vik+1 (16)
where vik and xkiare the velocity and position of the ith particle at time k
respectively; pxbesti is the individual best position of the ith particle; gxbest is
the global best position of the whole swarm; τ1 , τ2 are random number uniformly
from the interval [0,1]; c1 , c2 ∈ [0, 2] are two positive constants called acceleration
coefficients namely cognitive and social parameter respectively and as default in
[9], c1 = c2 = 2 were proposed; w is inertia weight that can be determined by
where wmax and wmin are separately maximum and minimum of w; N ummax is
maximum iteration time and N um is the current iterative time. A larger inertia
weight achieves the global exploration and a smaller inertia weight tends to
facilitate the local exploration to fine-tune the current search area. Usually, the
maximum velocity Vmax is set to be half of the length of the search space [10].
In this paper, because there are p + q coefficients (ϕi , θj , i = 1, 2, . . . , p, j =
1, 2, . . . , q) to be selected, the dimension of the swarm n = p + q. What’s more,
when the coefficients {ϕi } and {θj } are denoted by Φ and Θ which have p and
54 H. Wang and W. Zhao
q components respectively, the form of one particle can be denoted by (Φ, Θ),
that is (ϕ1 , ϕ2 , . . . , ϕp , θ1 , θ2 , . . . , θq ). Then we mark the simulation series {x̂t }
(Φ,Θ)
which is obtained from (14) as {x̂t } when the coefficients are Φ and Θ.
In the investigation, the mean square error (MSE), shown as (18), serves as
the forecasting accuracy index and the objective function of PSO for identifying
suitable coefficients.
N −d
1 (Φ,Θ) 2
M SE (Φ,Θ) = (xt − x̂t ) (18)
N −d t=1
1. Randomly generate m initial particles x01 , x02 , . . . , x0m and initial velocities
v10 , v20 , . . . , vm
0
which have p + q components respectively.
2. for 1 ≤ i ≤ m do
0 0
3. pxbesti ⇐ x0i ; pf besti ⇐ M SE (xi (1:p),xi (p+1:p+q))
4. x1i ⇐ x0i + vi0
5. end for
6. for all i such that 1 ≤ i ≤ m do
7. if pf bests ≤ pf besti then
8. gxbest ⇐ pxbests ; gf best ⇐ pf bests
9. end if
10. end for
11. k ⇐ 1; wmax ⇐ 0.9; wmin ⇐ 0.4
12. while k < N ummax do
13. for 1 ≤ i ≤ m do
k k
14. if M SE (xi (1:p),xi (p+1:p+q)) < pf besti then
k k
15. pf besti ⇐ M SE (xi (1:p),xi (p+1:p+q)) ;
16. pxbesti ⇐ xi k
17. end if
18. end for
19. for all i such that 1 ≤ i ≤ m do
20. if pf bests ≤ pf besti then
21. gxbest ⇐ pxbests ; gf best ⇐ pf bests
22. end if
23. end for
24. w ⇐ wmax − k(wmax − wmin )/N ummax
25. for 1 ≤ i ≤ m do
26. vik+1 ⇐ w × vik + c1 τ1 (pxbesti − xki ) + c2 τ2 (gxbest − xki )
27. if vik+1 ≥ Vmax or vik+1 ≤ −Vmax then
28. vik+1 ⇐ vik+1 /|vik+1 | × Vmax
29. end if
30. xk+1
i ⇐ xki + vik+1
31. end for
32. end while
ARIMA Model Estimated by Particle Swarm Optimization Algorithm 55
where x(1 : p), x(p + 1, p + q) denote the 1st ∼ pth and the (p + 1)th ∼ (p + q)th
components of the vector x respectively. So gxbest(1 : p) and gxbest(p + 1, p + q)
are the optimally estimated values of Φ and Θ.
4 Case Study
In this section, a case study of predicting the consumer price index (CPI) of 36
big or medium-sized cities in China is shown. The original data is CPI in Jan.
2001 ∼ Oct. 2008 which is obtained from CEInet statistical database. And mark
the original data as {yt }, t = 1, 2, . . . , 94.
We can easily see {yt } is a non-stationary series by observing its graph-
ics. Observing the autocorrelogram of the third-differenced data of the original
data(shown in Fig. 1, where the two solid horizontal lines represent the 95%
confidence interval), in which we can see the autocorrelation coefficients statis-
tically rapidly decline to zero after lag 1, we confirm the third-differenced data
is stationary, and mark it as {zt }, t = 1, 2, . . . , 91. So d is equal to 3. And we can
then mark {zt − z̄} as {xt }, t = 1, 2, . . . , 91 where z̄ = −0.0275 according to (4).
Then we can draw the autocorrelogram and partial autocorrelogram (Fig. 2).
From this figure, two facts stand out: first, the autocorrelation coefficient starts
at a very high value at lag 1 and then statistically rapidly declines to zero;
second, PACF up to 4 lags are individually statistically significant different from
zero but then statistically rapidly declines to zero too. So we can determine p = 4
and q = 1, that is to say, we can design an ARMA(4,1) model for {xt }.
Estimating the coefficients Φ = {ϕ1 , ϕ2 , ϕ3 , ϕ4 } and Θ = {θ1 } according to
(6), (8), (9) and PSO method with the optimizing range of [−1, 1], respectively,
we can obtain Φ = {−1.0750, −0.8546, −0.4216, −0.1254}, Θ = {0.1614} and
Φ = {−0.4273, −0.0554, 0.1979, 0.1487}, Θ = {0.8196} respectively. Then we
can obtain the fitted and prediction series, and the fitted graph is shown as Fig.
3. At the same time, we obtain their MSE are 1.3189 and 1.2935 respectively,
that is to say, the latter fit better than the former.
8 8
Original data Original data
Simulation values Simulation values
6 6
4 4
2 2
0 0
Ŧ2 Ŧ2
Ŧ4 Ŧ4
Ŧ6 Ŧ6
Ŧ8 Ŧ8
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Furthermore, the prediction values of the original data {yt } can be obtained
according to Section 2.3 by both moment estimation method and PSO method
for estimation. The relevant results are shown in Table 1. From this table, we can
see that the accuracy of PSOARIMA(4, 3, 1) model is better than the ARIMA(4,
3, 1) model. What’s more, we present the prediction figures (Fig. 4) to compare
the accuracy of the two models clearly.
ARIMA Model Estimated by Particle Swarm Optimization Algorithm 57
110 110
Original data Original data
Actual value in Nov. 2008 Actual value in Nov. 2008
Prediction values Prediction values
108 108
106 106
Consumer price index
102 102
100 100
98 98
96 96
2001.01 2002.01 2003.01 2004.01 2005.01 2006.01 2007.01 2008.01 2001.01 2002.01 2003.01 2004.01 2005.01 2006.01 2007.01 2008.01
Year.Month Year.Month
Additionally, when we know the actual value in Nov. 2008, the value in
Dec. 2008 can be predicted. The rest may be deduced by analogy, we predict
the values in Jan. 2009 ∼ Mar. 2009, and the relevant results are shown in
Table 2. From this table we can see that accuracy of both models is very good,
but relative error of PSOARIMA(4, 3, 1) is smaller than ARIMA(4, 3, 1) gen-
erally.
5 Conclusion
In this paper, we proposed a novel design methodology which is a hybrid model
of particle swarm optimization algorithm and ARIMA model. PSO is used for
model estimation of ARIMA in order to overcome the deficiency that the tra-
ditional estimation method is difficult to implement and may obtain very bad
results. Furthermore, it is observed that the PSO-based model has worked more
accurately than the traditional moment estimation-based model through the
experimental results of forecasting the CPI.
On the other hand, we can see the power of the PSO for optimizing param-
eters. So in the future study, we will find more practical way to apply PSO
58 H. Wang and W. Zhao
in the other fields. But it is necessary to pay attention to that standard PSO
sometimes easily get in local optimization but not the holistic optimization,
fortunately, chaotic particle swarm optimization algorithm(CPSO) can help to
solve this problem successfully.
References
1. Ediger, V.Ş., Akar, S.: ARIMA forecasting of primary energy demand by fuel in
Turkey. Energy Policy 35, 1701–1708 (2007)
2. Erdogdu, E.: Electricity demand analysis using cointegration and ARIMA mod-
elling: A case study of Turkey. Energy Policy 35, 1129–1146 (2007)
3. Ong, C.-S., Huang, J.-J., Tzeng, G.-H.: Model identification of ARIMA family
using genetic algorithms. Applied Mathematics and Computation 164, 885–912
(2005)
4. Niu, D., Cao, S., Zhao, L., Zhang, W.: Power load forecasting technology and its
applications. China Electric Power Press, Beijing (1998) (in Chinese)
5. Ediger, V.Ş., Akar, S., Uğurlu, B.: Forecasting production of fossil fuel sources
in Turkey using a comparative regression and ARIMA model. Energy Policy 34,
3836–3846 (2006)
6. Valenzuela, O., Rojas, I., Rojas, F., Pomares, H., Herrera, L.J., Guillen, A., Mar-
quez, L., Pasadas, M.: Hybridization of intelligent techniques and ARIMA models
for time series prediction. Fuzzy Sets and Systems 159, 821–845 (2008)
7. Li, M., Zhou, J.-z., Li, J.-p., Liang, J.-w.: Predicting Securities Market in Shanghai
and Shenzhen by ARMA Model. Journal of Changsha Railway University 3, 78–84
(2000) (in Chinese)
8. Dingxue, Z., Xinzhi, L., Zhihong, G.: A Dynamic Clustering Algorithm Based
on PSO and Its Application in Fuzzy Identification. In: The 2006 International
Conference on Intelligent Information Hiding and Multimedia Signal Processing,
pp. 232–235. IEEE Press, New York (2006)
9. Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: The IEEE Interna-
tional Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE Press, New
York (1995)
10. Alatas, B., Akin, E., Bedri Ozer, A.: Chaos embedded particle swarm optimization
algorithms. Chaos, Solitons and Fractals (2008)
A Uniform Solution to HPP in Terms of Membrane
Computing
1 Introduction
P systems are emergent branch of nature computing, which can be seen as a kind of
distributed parallel computing model. It is based upon the assumptions that the proc-
esses taking place in the living cells can be considered as computations. Up to now,
several variants of P systems with linear or polynomial time have been constructed to
solve some NP-complete problems, such as SAT [1, 2, 3], HPP [2, 4, 5], Subset Sum
[6], Knapsack [7], 2-Partition[8], Bin Packing[9].
Hamiltonian path problem (HPP for short) is a well-known NP-complete problem,
and there are several P systems dealing with it. The P systems in [4, 5] are all semi-
form way [10]. Specifying the starting and ending vertices, literature [2] presents a
uniform solution to HPP with membrane separation. It remains open confluently solv-
ing HPP in polynomial time by P systems with division rules instead of separation
rules in a uniform way [2].
Without specifying the starting vertex and ending vertex of a path, this paper
presents a uniform solution to HPP with membrane division where the communica-
tion rules (sending objects into membranes) [3] are not used. The remainder of the
paper is organized as follows: first recognizer P system with membrane division is
introduced in the next section. In Section3, the solution to HPP with membrane divi-
sion is presented, with some formal details of such solution given in Section4. Finally,
conclusions are drawn in the last section.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 59–68, 2009.
© Springer-Verlag Berlin Heidelberg 2009
60 H. Chen and Z. He
Π =( V , H , µ , w1 , …, wm , R )
where:
m ≥1 (the initial degree of the system);
V is the alphabet of objects;
H is a finite set of labels for membranes;
µ is a membrane structure, consisting of m membranes, labeled (not necessarily in a
one-to-one manner) with elements of H ;
w1 , …, wm are strings over V describing the multisets (every symbol in a string
representing one copy of the corresponding object) placed in the m regions of µ
respectively;
R is a finite set of developmental rules, of the following forms:
(1) [ a → v ] eh , for h ∈ H , e ∈ {+, −, 0} , a ∈V , v ∈V *
(object evolution rules);
(2) [a]eh1 → [ ]eh2 b , for h ∈ H , e1 , e2 ∈ {+, −, 0} , a, b ∈V
(communication rules);
(3) [a]eh1 → [ b]eh2 [c]eh3 , for h ∈ H , e1 , e2 , e3 ∈{+, −, 0} a, b, c ∈V
(division rules for elementary membranes)
Note that, in order to simplify the writing, in contrast to the style customary in the
literature, we have omitted the label of the left parenthesis from a pair of parentheses
which identifies a membrane. All aforementioned rules are applied according to the
following principles:
(a) The rules of type (1) are applied in the parallel way.
(b) The rules of types (2), (3) are used sequentially, in the sense that one membrane
can be used by at most one rule of these types at a time.
(c) All objects and all membranes, which can evolve, should evolve simultaneously.
We will introduce two definitions[1]:
Definition 1. A P system with input is a tuple ( Π , ∑ , iΠ ), where: (a) Π is a P sys-
tem, with working alphabet Γ , with p membranes labeled with 1, ..., p, and initial
multisets w1 , …, w p associated with them; (b) ∑ is an (input) alphabet strictly con-
tained in Γ ; the initial multisets of Π are over Γ - ∑ ; and (c) iΠ is the label of a
distinguished (input) membrane.
Let win be a multiset over ∑ . The initial configuration of ( Π , ∑ , iΠ ) with input
win is ( µ , w1 , …, wiΠ ∪w in
, …, w p ).
A Uniform Solution to HPP in Terms of Membrane Computing 61
For any given directed graph G with n vertices, we construct a recognizer P system
Π ( n) to solve HPP. Therefore the family presented here is
Π = {( Π ( n) , ∑(n) , iΠ ): n∈N}
Π ( n) =( V (n) , H , µ , w1 , w2 , R )
Where,
∪{ v | 1≤ i ≤ n-1}∪{ t , z | 1≤ i ≤ n}
V (n) ={xi, j, k | 1 ≤ i, j ≤ n, -1 ≤ k ≤ i} i i i
i i 0
H = {1, 2}
µ = [[ ] +2 ]10
w1 = f 0 , w2 = d 0 v1
And the set R contains the rules as follows:
A Uniform Solution to HPP in Terms of Membrane Computing 63
[vi ]+2 → [ti ]−2 [vi +1 ]+2 (1≤ i ≤ n-2), [vn−1 ]+2 → [t n−1 ]−2 [t n ]−2 , [t j → c j −1r j′g ]−2 (1≤ j ≤ n) (1)
At the preparing stage, we have { d 0 v1 } and an input multiset win in membrane la-
beled with 2, which is encoded according to the instance G. From v1 , other vertices in
G can be obtained by rules (1). This means that we can find HPs from the n vertices
respectively. Object r j′ represents the starting vertex and object cj-1 can be regarded as
a counter whose subscript is decreased by 1 each time in generating stage. These rules
are applied at the preparing stage.
[ g ]−2 → [ ]02 g (2)
In the membranes labeled with 2, when their polarization is 0, the third subscript k of
xi, j, k and the subscript i of ci are decreased by 1 simultaneously. For the i-th vertex in
G that has been added into the current path, Objects ri′ and ri are two different ver-
sions of representation in the membranes with different polarizations. Xi, j, k can not
evolve any more once k reaches -1. c0 and xi, j,0 will appear after I steps if we have
(i, j) ∈E.
[c0 ]02 → [ ] 2+ c0 (4)
Vertex vi is the last vertex of the current path, and we will extend this path by adding
vj with the appearance of c0 if we have (i, j) ∈E. Several vertices will connect vi be-
sides vj, so new membranes are needed to be created to contain the new paths. Rule
(4) can change the polarization of membrane to positive in order to trigger membrane
divisions to obtain new membranes.
[ d i → d i′] +2 , [ ri → ri′] +2 , [ xi , j , 0 ] +2 → [ z j ] −2 [ g ] +2 (1 ≤ i, j ≤ n, 0 ≤ k ≤ i) (5)
Objects d i , d i′ both mean the fact that the length of current path is I, but they are two
different versions in membranes labeled with 2 with different polarizations. Xi, j, 0 in
the membranes with positive polarization will cause the membrane division to extend
the current path. Since not only one vertex connects with vi, there will be several
objects like xi, j, 0 whose third subscript k is 0. We non-deterministically choose one of
them to trigger the division, and the rest will be remained in the two new membranes
with different polarizations, which indicate that they will be processed by different
rules: the division will be triggered again by one of them in the new membrane with
positive polarization in order to obtain another new path; while all of them will be
deleted in the one with negative polarization.
[ z j → r j′c j −1 g ] −2 , [ d i′ → d i ] −2 , [ xi , j , k → xi , j , i ] −2 (6)
64 H. Chen and Z. He
[ xi , j , 0 → λ ] 2− , [ d i → d i +1 ] −2 (1 ≤ i, j ≤ n, k ≠ 0)
In the new membrane labeled with 2 with negative polarization, new object r j′ is
introduced, which indicates that vertex vj is added to the current path. At the same
time, new counter cj-1 is introduced and other objects, such as xi, j, k, return to xi, j, i.
After the length of the path is increased, the polarization is changed by rule (2). We
return to the beginning of generating stage and resume the procedure for extending
the current path.
[ g → λ ]+2 (7)
In the new membrane labeled with 2 with positive polarization, object g does not
work any more and is deleted.
[d n → se]02 (8)
When the length of path is n, we know that n vertices are in the path and we can enter
the next stage to check if the path is Hamiltonian or not. Two new objects s and e are
introduced to prepare for the checking stage.
[e → e0 r0 ] +2 , [ s] +2 → [ ]02 s (9)
With the appearance of dn, the counter ct (t>0) associated with the last vertex is intro-
duced and it still can evolve until c0 is sent out of the membrane. Objects e and s
evolve by rule (9) to trigger the next stage.
[ei → gei +1 ]02 , [r0 ]02 → [ ]2− r0 ( 0 ≤ i ≤ n) (10)
In checking stage, we use rules (10) and (11) with a loop. If the object r0 is present,
we eliminate it and perform a rotation of the objects ri, ..., rn with their subscripts
decreased by 1. It is clear that a membrane contains HP if and only if we can run n+1
steps of the loop. Note that this check method is not the same as the one in [10], for
the communication rule are not used.
[en +1 → yes] −2 (12)
The object en+1 introduced by rules (10) means that the path is Hamiltonian, so it
evolves to object yes, which will leave the membrane labeled with 2 and enter
membrane 1.
[ g → λ ]10 , [c0 → λ ]10 , [r0 → λ ]10 , [ s → λ ]10 , [ f i → f i +1 ]10 (0≤ i ≤ 3n(n+3)/2+4) (14)
All of these rules are applied in parallel in membrane 1 with neutral polarization.
Since objects g, c0, r0 and s are sent out of the membranes labeled with 2 and will be
not used any more, it necessary to delete them to release computing recourse. Object fi
evolves at each computation step for counting how many steps have been performed.
A Uniform Solution to HPP in Terms of Membrane Computing 65
With changing the polarization to positive, object yes in membrane 1 will be sent out
the environment to show that there is a HP among the n vertices. If at step
3n(n+3)/2+5 the polarization of the membrane 1 has not changed to positive, the an-
swer must be no and no will be sent out to the environment. The special step will be
analyzed in subsection3.2. Once the polarization is change all rules in membrane 1
cannot be applied any more and the computation halts.
All of foregoing rules applied in membranes labeled with 2 are illustrated in Fig.1, in
which 0, + and - respectively represent the polarization of the membranes. Fig.1 looks
like state machine, and it can be seen that different polarizations and objects indicate
distinct stages. Rules (2), (4), (9), (10) are used to change the polarizations in order to
enter another stage of computation.
At preparing stage, we have v1 in membrane labeled with 2. From v1, other vertices
in G can be obtained by application of rules (1). After n starting vertices obtained, we
will enter next stage to find HPs respectively. Generating stage consists of three
phases: a vertex that connects with the last vertex in the current path is expected to
find in search phase, then it will be add into the path through divide phase, and the
length of the path will be increased and the objects related to others edges will recover
in restore phase. Once a vertex is added in the path, all of the edges start from it will
be consumed in the extensions of the path. These three phases will be looped until the
length of the path reaches n. Checking stage is needed to decide whether the path with
length of n is HP or not. The answer yes or no will be sent out at the output stage.
ri ri o ri1
In this section, we will prove that the uniform family Π of recognizer P system in
Section3 gives a polynomial solution for the HPP. According to Definition3 in
Section2, we have:
Theorem 1. HPP ∈ PMCAM.
Proof: (1) It obviously that the family Π of recognizer P system defined is AM-
consistent, because for every P system of the family is a recognizer P system with
active membrane using 2-division and all of them are confluent.
(2) The family Π of recognizer P system is polynomially uniform Turing Machine.
For each n, which is the size of an instance G, a P system can be constructed. The
rules of Π (n) are defined in a recursive manner from n. The necessary resources to
construct the system are:
Size of the alphabet: n3+(7n2+27n)/2+14= Θ( n 3 )
Initial number of membrane: 2= Θ(1)
Initial number of objects: 3= Θ(1)
The total number of evolution rules: n3+(9n2+31n+38)/2= Θ( n 3 )
All these resources are bounded polynomially time of n; therefore a Turing machine
can build the P system in polynomial time according to n.
∪∈
(3) We have the functions g: I HPP → t NI∏(t) and h: I HPP →N defined for an in-
stance u=v1, …, vn as g(u)={xi, j, k : 1≤ i, j ≤ n} and h(u)=n. Both of them are total and
polynomially computable. Furthermore, (g, h) conforms an encoding of the set in-
stances of the HPP problem in the family of recognizer P system as for any instance u
we have that g(u) is a valid input for the P system.
Π is polynomially bounded, with respect to (g, h). A formal description of the
computation let prove that Π always halts and sends to the environment the object yes
or no in the last step. The number of steps of the P system is within 3n(n+3)/2+4 if the
A Uniform Solution to HPP in Terms of Membrane Computing 67
output is yes and 3n(n+3)/2+5 if the output is no, therefore there exists a polynomial
bound for the number of steps of the computation.
Π is sound with respect to (g, h). As we describe the computations of P system of
Π above, that object yes is send out of the system means that there are n different
vertices in the path, which is the actual Hamiltonian path in the instance G according
to the definition of HPP.
Π is complete with respect to (g, h). The P system searches the HP from every ver-
tex respectively at the beginning of the computation. We will get objects such as ri (1
≤ i ≤ n) and delete objects such as xi, j, k (1 ≤ i, j ≤ n, -1 ≤ k ≤ i) corresponding to the
edges whose starting vertex vi is added to the current path. When there is a Hamilto-
nian path in the instance G with n vertices, the system Π will obtain objects r1, r2, ...,
rn in some membrane, and output object yes to the environment. □
Since this class is closed under polynomial-time reduction and complement we
have:
Theorem 2. NP ∪ co-NP ⊆ PMC AM.
5 Conclusion
In this paper, we present a uniform solution for HPP in terms of P system with
membrane division. This has been done in the framework of complexity classes in
membrane computing.
It can be observed that many membranes will not evolve any more at some step be-
fore the whole system halts. Some kind of rule should be developed to dissolve these
membranes for releasing the computing resource, and it will contribute to the con-
struction of the efficient simulators of recognizer P systems for solving NP-complete
problems.
Acknowledgments. This research is supported by Major National S&T Program of
China (#2008ZX 07315-001).
References
1. Gutiérrez-Naranjo, M.A., Pérez-Jiménez, M.J., Romero-Campero, F.J.: A uniform solution
to SAT using membrane creation. Theoretical Computer Science 371(1-2), 54–61 (2007)
2. Pan, L., Alhazov, A.: Solving HPP and SAT by P systems with active membranes and
separation rules. Acta Informatica 43(2), 131–145 (2006)
3. Păun, G.: Computing with Membranes: Attacking NP-Complete Problems. In: Proceedings
of the Second International Conference on Unconventional Models of Computation,
pp. 94–115. Springer, London (2000)
4. Păun, A.: On P Systems with Membrane Division. In: Proceedings of the Second Interna-
tional Conference on Unconventional Models of Computation, pp. 187–201. Springer,
London (2000)
5. Zandron, C., Ferretti, C., Mauri, G.: Solving NP-Complete Problems Using P Systems
with Active Membranes. In: Proceedings of the Second International Conference on
Unconventional Models of Computation, pp. 289–301. Springer, London (2000)
68 H. Chen and Z. He
6. Pérez-Jiménez, M.J., Riscos-Núñez, A.: Solving the Subset-Sum problem by active mem-
branes. New Generation Computing 23(4), 367–384 (2005)
7. Jesús Pérez-Jímenez, M., Riscos-Núñez, A.: A Linear-Time Solution to the Knapsack
Problem Using P Systems with Active Membranes. In: Martín-Vide, C., Mauri, G., Păun,
G., Rozenberg, G., Salomaa, A. (eds.) WMC 2003. LNCS, vol. 2933, pp. 250–268.
Springer, Heidelberg (2004)
8. Gutiérrez-Naranjo, M.A., Pérez-Jiménez, M.J., Riscos-Núñez, A.: A fast P system for find-
ing a balanced 2-partition. Soft Computing 9(9), 673–678 (2005)
9. Pérez-Jiménez, M.J., Romero-Campero, F.J.: Solving the BINPACKING problem by rec-
ognizer P systems with active membranes. In: Proceedings of the Second Brainstorming
Week on Membrane Computing, Seville, Spain, pp. 414–430 (2004)
10. Pérez-Jiménez, M.J., Romero-Jiménez, A., Sancho-Caparrini, F.: Computationally Hard
Problems Addressed Through P Systems. In: Ciobanu, G. (ed.) Applications of Membrane
Computing, pp. 315–346. Springer, Berlin (2006)
Self-organizing Quantum Evolutionary Algorithm
Based on Quantum Dynamic Mechanism
1 Introduction
Most studies have used hybrid evolutionary algorithm in solving global optimization
problems. Hybrid evolutionary algorithm takes advantages of a dynamic balance be-
tween diversification and intensification. Proper combination of diversification and
intensification is important for global optimization. Quantum computing is a very
attractive research area; research on merging evolutionary algorithms with quantum
computing [1] has been developed since the end of the 90's. Han proposed the quan-
tum-inspired evolutionary algorithm (QEA), inspired by the concept of quantum com-
puting, and introduced a Q-gate as a variation operator to promote the optimization of
the individuals Q-bit [2]. Up to now, QEA has been developed rapidly and applied in
several applications of knapsack problem, numerical optimization and other fields [3].
It has gained more attraction for its good global search capability and effectiveness.
Although quantum evolutionary algorithms are considered powerful in terms of global
optimization, they still have several drawbacks such as premature convergence. A
number of researchers have experimented with biological immunity and cultural
dynamic systems to overcome these particular drawbacks implicit in evolutionary
algorithms [4], [5]; here we experiment with quantum dynamic mechanism.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 69–77, 2009.
© Springer-Verlag Berlin Heidelberg 2009
70 S. Liu and X. You
}
}
end.
where Q(t) is a population of qubit chromosomes at generation t, and P(t) is a set of
binary solutions at generation t.
1) In the step of ‘initialize Q(t)’, all qubit chromosomes are initialized with the
same constant 1/ 2 . It means that one qubit chromosome represents the linear super-
position of all possible states with the same probability.
2) The next step makes a set of binary solutions, P(t), by observing Q(t) states. One
binary solution is formed by selecting each bit using the probability of qubit. And
then each solution is evaluated to give some measure of its fitness.
3) The initial best solution is then selected and stored among the binary solutions,
P(t).
4) In the while loop, A set of binary solutions, P(t), is formed by observing Q(t-1)
states as with the procedure described before, and each binary solution is evaluated to
give the fitness value. It should be noted that P(t) can be formed by multiple observa-
tions of Q(t-1).
5) In the next step, ‘update Q(t)’, a set of qubit chromosomes Q(t) is updated by ap-
plying rotation gate defined below
⎡ COS ( ∆ θ i) − SIN ( ∆ θ i)⎤
U (∆θ i ) = ⎢
⎣ SIN ( ∆ θ i) COS ( ∆ θ i)⎥⎦ (1)
|1>
(¦Ái',¦Âi' )
△θ i
(¦Ái,¦Âi)
|0>
6) The best solution among P(t) is selected, and if the solution is fitter than the
stored best solution, the stored solution is replaced by the new one. The binary solu-
tions P(t) are discarded at the end of the loop.
Table 1. Lookup table of ∆θ i , where f(·) is the fitness function, and bi and xi are the i-th bits of
the best solution b and the binary solution x, respectively
xi bi f(x)≥f(b) ∆θ i
0 0 false θ1
0 0 true θ2
0 1 false θ3
0 1 true θ4
1 0 false θ5
1 0 true θ6
1 1 false θ7
1 1 true θ8
At time t, the potential energy function Q(t) that is related to interactive forces among
particles is defined by Eq(3)
Qi (t ) = ∑ ∫
Ui ( t )
{[1 + exp(−ζ i j x)]−1 − 0.5}dx;
0
j
(3)
QD (t ) = ∑ Qi (t )
i
Let S i (t ) be the vertical coordinate of particle S i at time t. The dynamic equation for
particle S i is defined by Eqs (5) and (6)
74 S. Liu and X. You
dSi (t ) dU (t ) dJ (t ) dP (t ) dQ (t )
= λ1 i + λ 2 D − λ 3 D − λ 4 D
dt dt dt dt dt
∂J D (t ) ∂PD (t ) ∂QD (t ) dU i (t ) (5)
= (λ1 + λ 2 − λ3 − λ4 )
∂U i (t ) ∂U i (t ) ∂U i (t ) dt
= g ( Si (t ), U i (t )), 0 ≤ λ1, λ 2, λ 3, λ 4 ≤ 1
dU i (t )
= exp(− fitness (t )) × fitness ' (t )
dt
(6)
= (1 − U i (t )) × fitness ' (t )
= g1 ( Si (t ), U i (t ))
g, g1 are functions of S i (t ) and U i (t ) .
If all particles reach their equilibrium states at time t, then finish with success.
4 Experimental Study
In this section, DQEA is applied to the optimization of well-known test functions and
its performance is compared with that of CRIQEA [6] and MIQEA [7] algorithm.
When testing the algorithm on well understood problems, there are two measures of
performance: (i) CR: Convergence Rate to global minima; (ii) C: The average number
of objective function evaluations required to find global minima.
The test examples used in this study are listed below:
x12 + x2 2 x
f1 ( x1 , x2 ) = + 1 − cos( x1 ) × cos( 2 ), −600 ≤ x1 , x2 ≤ 600 (7)
4000 2
The certainty of convergence of the DQEA may be attributed to its ability to main-
tain the diversity of its population. In fact, the superior performance of self-organizing
operator may be attributed to its adaptability and learning ability; a larger pool of
feasible solutions enhances the probability of finding the optimum solution. Coopera-
tion of subpopulations by quantum dynamic mechanism can help to obtain the opti-
mal solution faster. Comparison of the result indicates that self-organizing operator
can keep the individual diversity and control the convergence speed.
5 Conclusions
In this study, a self-organizing quantum evolutionary algorithm based on quantum
dynamic mechanism for global optimization (DQEA) is proposed. The efficiency of
quantum evolutionary algorithm is enhanced by using the self-organizing operator.
We estimate the performance of algorithm. The efficiency of the approach has been
illustrated by applying to some test cases. The results show that integration of the self-
organizing operator in the quantum evolutionary algorithm procedure can yield sig-
nificant improvements in both the convergence rate and solution quality. Cooperation
of subpopulations by quantum dynamic mechanism can help to convergence faster.
The next work is to exploit quantum dynamic mechanism further to deal with the
global optimization.
Acknowledgments. The authors gratefully acknowledge the support of Natural Sci-
ence Foundation of Shanghai (Grant No.09ZR1420800), Development Foundation of
SUES(Grant No.2008XY18 ) and Doctor Foundation of SUES.
References
1. Narayanan, M.M.: Genetic Quantum Algorithm and Its Application to Combinatorial Opti-
mization Problem. In: Proc. IEEE International Conference on Evolutionary Computation
(ICEC 1996), pp. 61–66. IEEE Press, Piscataway (1996)
Self-organizing Quantum Evolutionary Algorithm 77
2. Han, K.H., Kim, J.H.: Quantum-inspired Evolutionary Algorithm for a Class of Combinato-
rial Optimization. IEEE Transactions on Evolutionary Computation 6(6), 580–593 (2002)
3. Han, K.H., Kim, J.H.: Quantum-Inspired Evolutionary Algorithms with a New Termination
Criterion, Hε gate, and Two-Phase Scheme. IEEE Transactions on Evolutionary Com-
putation 8(2), 156–169 (2004)
4. Fukuda, T., Mori, K., Tsukiyama, M.: Parallel search for multi-modal function optimization
with diversity and learning of immune algorithm. In: Artificial Immune Systems and Their
Applications, pp. 210–220. Spring, Berlin (1999)
5. Saleem, S.: Knowledge-Based Solutions to Dynamic Problems Using Cultural Algorithms.
PhD thesis, Wayne State University, Detroit Michigan (2001)
6. You, X.M., Liu, S., Sun, X.K.: Immune Quantum Evolutionary Algorithm based on Chaotic
Searching Technique for Global Optimization. In: Proc. The First International Conference
on Intelligent Networks and Intelligent Systems, pp. 99–102. IEEE Press, New York (2008)
7. You, X.M., Liu, S., Shuai, D.X.: Quantum Evolutionary Algorithm Based on Immune The-
ory for Multi-Modal Function Optimization. Journal of Petrochemical Universities 20, 45–
49 (2007)
8. Hey, T.: Quantum Computing: An Introduction. Computing & Control Engineering Jour-
nal 10, 105–112 (1999)
9. Shuai, D., Shuai, Q., Liu, Y., Huang, L.: Particle Model to Optimize Enterprise Computing.
In: Research and Practical Issues of Enterprise Information Systems. International Federa-
tion for Information Processing, vol. 205, pp. 109–118. Springer, Boston (2006)
Can Moral Hazard Be Resolved by
Common-Knowledge in S4n-Knowledge?
Takashi Matsuhisa
1 Introduction
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 78–85, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Can Moral Hazard Be Resolved by Common-Knowledge in S4n-Knowledge? 79
2 Moral Hazard
Let us consider the principal-agents model as follows: There are the principal
P and n agents {1, 2, · · · , k, · · · , n} (n ≥ 1) in a firm. The principal makes a
80 T. Matsuhisa
profit by selling the productions made by the agents. He/she makes a contact
with each agent k that the total amount of all profits is refunded each agent k
in proportion to the agent’s contribution to the firm.
Let ek denote the measuring managerial effort for k’s productive activities.
The set of possible efforts for k is denoted by Ek with Ek ⊆ R. Let Ik (·) be a
real valued continuously differentiable function on Ek . It is interpreted as the
profit by selling the productions made by the agent k with the cost c(ek ). Here
we assume Ik (·) ≥ 0 and the cost function c(·) is a real valued continuously
differentiable function on E = ∪nk=1 Ek . Let IP be the total amount of all the
research grants awarded:
n
IP = Ik (ek ).
k=1
The principal P cannot observe these efforts ek , and shall view it as a random
variable on a probability space (Ω, µ). The optimal plan for the principal then
solves the following problem:
n
Maxe=(e1 ,e2 ,··· ,ek ,··· ,en ) {Exp[IP (e)] − ck (ek )}.
k=1
Wk (ek ) = rk IP (e),
n
with k=1 rk = 1, 0 ≤ rk ≤ 1, where rk denotes the proportional rate represent-
ing k’s contribution to the firm. The optimal plan for each agent also solves the
problem: For every k = 1, 2, · · · , n,
3 Common-Knowledge
Let N be a set of finitely many agents and i denote an agent. The specification
is that N = {P, 1, 2, · · · , k, · · · , n} consists of the president P and the faculty
Can Moral Hazard Be Resolved by Common-Knowledge in S4n-Knowledge? 81
Ref ω ∈ Πi (ω);
Trn ξ ∈ Πi (ω) implies Πi (ξ) ⊆ Πi (ω).
This structure is equivalent to the Kripke semantics for the multi-modal logic
S4n. The set Πi (ω) will be interpreted as the set of all the states of nature that
i knows to be possible at ω, or as the set of the states that i cannot distinguish
from ω. We call Πi (ω) i’s information set at ω.
A partition information structure is a RT-information structure Ω, (Πi )i∈N
satisfying the additional postulate: For each i ∈ N and for any ω ∈ Ω,
Ki E = {ω | Πi (ω) ⊆ E }
The event Ki E will be interpreted as the set of states of nature for which i knows
E to be possible.
We record the properties of i’s knowledge operator3: For every E, F of 2Ω ,
N Ki Ω = Ω;
K Ki (E ∩ F ) = Ki E ∩ Ki F ;
T Ki E ⊆ E
4 Ki E ⊆ Ki (Ki E).
5 Ω \ Ki E ⊆ Ki (Ω \ Ki E).
1
This stands for a reflexive and transitive information structure.
2
C.f.; Fagin et al [2].
3
According to these properties we can say the structure Ω, (Ki )i∈N is a model for
the multi-modal logic S4n.
82 T. Matsuhisa
KC F = ∩n∈N (KE )n F.
M E := Ω \ KC (Ω \ E).
Let Z be a set of decisions, which set is common for all agents. By a decision
function we mean a mapping f of 2Ω × 2Ω into the set of decisions Z. We refer
the following properties of the function f : Let X be an event.
DUC (Disjoint Union Consistency): For every pair of disjoint events S and T ,
if f (X; S) = f (X; T ) = d then f (X; S ∪ T ) = d;
PUD (Preserving Under Difference): For all events S and T such that S ⊆ T ,
if f (X; S) = f (X; T ) = d then f (X; T \ S) = d.
By the membership function associated with f under agent i’s private infor-
mation we mean the function di from 2Ω × Ω into Z defined by di (X; ω) =
f (X; Πi (ω)), and we call di (X; ω) the membership value of X associated with f
under agent i’s private information at ω.
Definition 3. We say that consensus on X can be guaranteed among all agents
(or they agree on it) if di (X; ω) = dj (X; ω) for any agent i, j ∈ N and in all
ω ∈ Ω.
Can Moral Hazard Be Resolved by Common-Knowledge in S4n-Knowledge? 83
We can now state explicitly Theorem 1 as below: Let D be the event of the
membership degrees of an event X for all agents at ω, which is defined by
Theorem 3. Assume that the agents have a S4n-knowledge structure and the
decision function f with satisfying the two properties (DUC) and (PUD). If
ω ∈ KC D then di (X; ω) = dj (X; ω) for any agents i, j ∈ N and in all ω ∈ Ω.
Proof. Will be given in the same line in Matsuhisa and Kamiyama [4].
This section investigates the moral hazard problem from the common-knowledge
view point. Let us reconsider the principal-agents model and let notations and
assumptions be the same in Section 2. We show the evidence of Theorem 2
under additional assumptions A1-2 below. This will give a possible solution of
our moral hazard problem.
From the necessity condition for critical points together with A2 it can been
seen that the principal’s marginal expected costs for agent k is given by
[c (e(·); ω)] = ∩i∈N ∩ξ∈Ω {ζ ∈ Ω|f (ξ; Πi (ζ)) = f (ξ; Πi (ω))}.
7 Concluding Remarks
It ends well this article to pose additional problems for making further progresses:
1. If the proportional rate rk representing k’s contribution to the college de-
pends only on his/her effort for research activities in the principal-agents model,
what solution can we have for the moral hazard problem?
2. Can we construct a communication system for the principal- agents model,
where the agents including Principal communicate each other about their ex-
pected marginal cost as messages? The recipient of the message revises his/her
information structure and recalculates the expected marginal cost under the
revised information structure. The agent sends the revised expected marginal
cost to another agent according to a communication graph, and so on. In the
circumstance does the limiting expected marginal costs actually coincide ? Mat-
suhisa [3] introduces a fuzzy communication system and extends Theorem 3 in
the communication model. By using this model Theorem 4 can be extended in
the communication framework, and the detail will be reported in near future.
References
1. Aumann, R.J.: Agreeing to disagree. Annals of Statistics 4, 1236–1239 (1976)
2. Fagin, R., Halpern, J.Y., Moses, Y., Vardi, M.Y.: Reasoning about Knowledge. MIT
Press, Cambridge (1995)
Can Moral Hazard Be Resolved by Common-Knowledge in S4n-Knowledge? 85
3. Matsuhisa, T.: Fuzzy communication reaching consensus under acyclic condition. In:
Ho, T.-B., Zhou, Z.-H. (eds.) PRICAI 2008. LNCS (LNAI), vol. 5351, pp. 760–767.
Springer, Heidelberg (2008)
4. Matsuhisa, T., Kamiyama, K.: Lattice structure of knowledge and agreeing to
disagree. Journal of Mathematical Economics 27, 389–410 (1997)
5. Samet, D.: Ignoring ignorance and agreeing to disagree. Journal of Economic
Theory 52, 190–207 (1990)
Casuist BDI-Agent: A New Extended BDI Architecture
with the Capability of Ethical Reasoning
Keywords: ethical artificial agent, explicit ethical agent, BDI agent, ethical rea-
soning, CBR-BDI agent.
1 Introduction
At the present moment, many researchers work on projects like expert systems which
can assist physicians for diagnosing malady of patient, warplanes which can be oper-
ated and used without human operators in war, autonomous driverless vehicles which
can be used for urban transportation, and suchlike too. The common goal between
these projects is augmentation of autonomy in behavior of such machines. As a con-
sequence of this autonomy they can act on behalf of us without any interference and
guidance, so it causes human comfort in their duties. But if we do not consider and
control the autonomy of these entities, we will face serious problems because of the
confidence in intelligence of autonomous system without any control and restriction
on their operations.
The new interdisciplinary research area of “Machine Ethics” is concerned with
solving this problem [1,2,3]. Anderson proposed Machine Ethics as a new issue which
consider the consequence of machine’s behavior on humanlike. The ideal and ulti-
mate goal of this issue is the implementation of ethics in machines, as machines can
autonomously detect the ethical effect of their behavior and follow an ideal ethical
principle or set of principles, that is to say, it is guided by this principle or these prin-
ciples in decisions it makes about possible courses of actions it could take [3]. So with
simulation of ethics in autonomous machine we can avoid the problems of autonomy
in autonomous machines.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 86–95, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Casuist BDI-Agent: A New Extended BDI Architecture 87
2 Preliminaries
BDI agents have been widely used in relatively complex and dynamically changing
environments. BDI agents are based on the following core data structures: beliefs,
desires, intentions, and plans [7]. These data structures represent respectively, infor-
mation gathered from the environment, a set of tasks or goals contextual to the envi-
ronment, a set of sub-goals that the agent is currently committed, and specification of
how sub goals may be achieved via primitive actions. The BDI architecture comes
88 A.R. Honarvar and N. Ghasem-Aghaee
with the specification of how these four entities interact, and provides a powerful
basis for modeling, specifying, implementing, and verifying agent-based systems.
Case-based reasoning (CBR) has emerged in the recent past as a popular approach to
learning from experience. Case-based reasoning (CBR) is a reasoning method based
on the reuse of past experiences which called cases [8]. Cases are description situa-
tions in which agents with goals interact with the world around them. Cases in CBR
are represented by a triple (p,s,o), where p is a problem, s is the solution of the prob-
lem, and o is the outcome (the resulting state of the world when the solution is carried
out). The basic philosophy of CBR is that the solution of successful cases should be
reused as a basis for future problems that present a certain similarity [9]. Cases with
unsuccessful outcomes or negative cases may provide additional knowledge to the
system, by preventing the agent from repeating similar actions that leads to unsuc-
cessful results or states.
2.3 Casuistry
The term “casuistry” refers descriptively to a method of reasoning for resolving per-
plexities about difficult cases that arise in moral and legal contexts [10]. Casuistry is a
broad term that refers to a variety of forms of case-based reasoning Used in discus-
sions of law and ethics. casuistry is often understood as a critique of a strict principle-
based approach to reasoning [11]. For example, while a principle-based approach may
conclude that lying is always morally wrong, the casuist would argue that lying may
or may not be wrong, depending on the details surrounding the case. For the casuist,
the circumstances surrounding a particular case are essential for evaluating the proper
response to a particular case. Casuistic reasoning typically begins with a clear-cut,
paradigm case which means "pattern" or "example". From this model case, the casuist
would then ask how close the particular case currently under consideration matches
the paradigm case. Cases similar to the paradigm case ought to be treated in a similar
manner; cases unlike the paradigm case ought to be treated differently. The less a
particular case resembles the paradigm case, the weaker the justification for treating
that particular case like the paradigm case.
2.4 Consequentialist
Consequentialist refers to those moral theories which hold that the consequences of a
particular action form the basis for any valid moral judgment about that action. Thus,
from a consequentialist standpoint, a morally right action is one that produces a good
outcome, or consequence [12].
evaluation of each action without knowing codes of ethics explicitly. When human
dose any action, he will see the result of his action in future. If he considers implicitly
the result of his action from ethical aspect, he can use this experience in future situa-
tions. When he is situated in similar situation to previous case, he will use his experi-
ences and behave similarly if his previous action is successful and implicitly ethical
(we assume Humans don’t know codes of ethics). This idea called “casuistry” in ethics.
For implementing ethical decision making capability in artificial agents we use this
idea which is related to casuistry bottom-up approach in ethics. For considering and
evaluating situation from ethical aspect (without knowing codes of ethics) we use and
adapt previous works that try to make ethics computable.
In BDI architecture, agent behavior is composed of beliefs, desires, and intentions.
These mental attitudes determine the agent’s behavior. In casuist BDI-Agent architec-
ture, BDI-agent’s behavior is adapted to behave ethically. In this architecture, agents
sense environment <E> and make a triple which shows current situation that consists
of current agent’s beliefs, desires and details of sensed environment. The current
situation is denoted by a triple <E, B, D>. The current situation is delivered to Case-
Retriever of Casuist BDI-architecture. Case-Retriever is responsible for retrieving
previous cases which is similar to current situation. As each case in case memory
consist of solution part that shows how agent should act on basis of situation part of a
case, If any case is retrieved, agent should accept the solution part, adapt it and be-
have similar to solution part of retrieved case. If no case is retrieved, agent behaves
like normal BDI-agent. In this situation the evaluator part of casuist BDI-Agent archi-
tecture comes to play its role. Evaluator part evaluates the result of agent’s behavior.
The result of this evaluation is denoted by a <EV>. This evaluation is sent to Case-
Updater. Case-Updater creates a new case and saves the current situation in situation
part of a case, the behavior of agent in solution part of a case and the evaluation in
outcome part of a case ( if this case is not exist in case memory, otherwise it updates
previous cases). Fig. 1 shows the general structure of Casuist BDI-Agent architecture.
The algorithm of casuist BDI-Agent is described below.
BDI-Agent Environment
Case-Retriever Case-Evaluator
Moral Agent
Acts on
Moral Patient
Agent
Human Organization Artificial
Agent
Human A human acts An An artificial
on a human organization agent acts on
P acts on a a human
human
A
Organization A human acts An An artificial
T
on an organization agent acts on
I organization acts on an an
organization organization
E
Artificial A human acts An An artificial
N Agent on an artificial organization agent acts on
agent acts on an an artificial
T artificial agent agent
With this new simplification, outcome part of a case is divided to three sections.
These sections contain ethical evaluation values of performed agent’s intentions on
humans, organizations and artificial agents or non-humans. Ethical evaluation values
are calculated by a Case-Evaluator of casuist BDI architecture. This component is
described next section. The main structure of each case in casuist BDI architecture is
illustrated in Fig. 4. According to the specific application domain of artificial ethical
agent, more details can be added to this structure.
Solution
(intentions)
Outcome
On humans x
On y
organizations
On artificial z
agents
ܶ݁ݎݑݏ݈ܽ݁ܲݐ݁ܰ ݈ܽݐ
This computation would be performed for each alternative action. The action with the
highest Total Net Pleasure is the right action. In other words we can say their pro-
posed equation is equal to Gips’s equations with extended parameter of time. Accord-
ing to Gips’s equation, this new equation has the following form:
σ ܹ ܲ כ ܶ כ (3)
Where wi is the weight assigned each person, pi is the measure of pleasure or happi-
ness or goodness for each person and Ti is the duration of action’s pleasure for each
person. Our Case-Evaluator considers the ethical classifications of situation which is
introduced. This ethical evaluator determines the kind of entity which is affected by
each agent’s intention or behavior, Computes duration, intensity and probability of
effects of intentions on each entity. These evaluations send to Case-Updater compo-
nent of Casuist BDI agent for updating the experiences of agent or Case Memory. The
evaluation method of this evaluator is described in equation 7 which uses equation 4-6
for its operation:
ܶܰܲ ܪൌ σୀ ୧ כ୧ כ୧ (4)
TNPH is the total net pleasure of humans after agent’s behavior. Whi is the weight
assigned to humans which shows the importance of each person in specific situation
and application domain. Phi is the probability that a person is affected. Thi is the dura-
tion of pleasure/displeasure of each person after agent’s behavior. n shows the num-
bers of people in that situation.
ܱܶܰܲ ൌ σୀ ୧ כ୧ כ୧ (5)
TNPO is the total net pleasure of organizations after agent’s behavior. Woi is the
weight assigned to organizations which shows the importance of each organization in
specific situation and application domain. Poi is the probability that an organization is
affected. Toi is the duration of pleasure/displeasure of each organization after agent’s
behavior. n shows the numbers of organization in that situation.
ܶܰܲ ܣൌ σୀ ୧ כ୧ כ୧ (6)
TNPA is the total net pleasure of artificial agent after agent’s behavior. Wai is the
weight assigned to artificial agents which shows the importance of each artificial
agent in specific situation and application domain. Pai is the probability that an artifi-
cial agent is affected. Tai is the duration of pleasure/displeasure of each artificial
agent after agent’s behavior. n shows the numbers of artificial agents in that situation.
ܶܰܲ ൌ ܹܶܰܲ כ ܪ ܱܶܰܲ ܹ כ ܹܶܰܲ כ ܣ (7)
94 A.R. Honarvar and N. Ghasem-Aghaee
TNP is the total net pleasure of all three kinds of entities which participate in applica-
tion domain of agent’s behavior. Wh, Wo, Wa illustrate the participation degree of
humans, organizations and artificial agents, respectively. The summation of Wh, Wo
and Wa equals one.
TNPH, TNPO, TNPA and TNP are stored in outcome part of a case in Case Mem-
ory. These values can use by a Case-Retriever for retrieving cases when agents en-
counter a problem.
6 Conclusion
In this paper a new extension of BDI architecture, Casuist BDI-Agent, proposed
which has the previous capability of BDI-agent architecture in addition to capability
of ethical decision making. This architecture can be used for designing BDI-Agent in
domains where BDI-Agent can be used and ethical considerations are important issue.
With the aid of this new architecture, agents can consider ethical effects of their be-
haviors when they make decision. The main idea in this architecture is based on a
method of ethical reasoning which uses previous experiences and doesn’t use any
codes of ethics. The main advantage of using this method for implementing ethical
decision making capability in agent is elimination of needs to convert a set of ethical
rules in application domain of agents to algorithm which needs conflict management
of rules. In this architecture agents can adapt ethically to its application domain and
can augment its implicit ethical knowledge, so behave more ethically.
References
1. Anderson, M., Anderson, S., Armen, C.: Toward Machine Ethics: Implementing Two
Action-Based Ethical Theories. In: AAAI 2005 Fall Symp. Machine Ethics, pp. 1–16.
AAAI Press, Menlo Park (2005)
2. Allen, C., Wallach, W., Smith, I.: Why Machine Ethics? IEEE Intelligent Systems Special
Issue on Machine Ethics 21(4), 12–17 (2006)
3. Anderson, M., Anderson, S.: Ethical Healthcare Agents. Studies in Computational Intelli-
gence, vol. 107, pp. 233–257. Springer, Heidelberg (2008)
4. Wallach, W., Allen, C.: Moral Machines: Teaching Robot Right from Wrong. Oxford
University Press, Oxford (2009)
5. Moor, J.H.: The nature, importance, and difficulty of Machine Ethics. IEEE Intelligent
Systems Special Issue on Machine Ethics 21(4), 18–21 (2006)
6. Honarvar, A.R., Ghasem-Aghaee, N.: Simulation of Ethical Behavior in Urban Transporta-
tion. Proceedings of World Academy of Science, Engineering and Technology 53,
1171–1174 (2009)
7. Rao, G.: BDI Agents: From Theory to Practice. In: Proceedings of the First International
Conference on Multi-Agent Systems (ICMAS 1995), San Fransisco, USA (1995)
8. Pal, S., Shiu: Foundations of Soft Case-Based Reasoning. Wiley-Interscience, Hoboken
(2004)
9. Kolodner: Case-Based Reasoning. Morgan Kaufmann, San Mateo (1993)
Casuist BDI-Agent: A New Extended BDI Architecture 95
10. Keefer, M.: Moral reasoning and case-based approaches to ethical instruction in science.
In: The Role of Moral Reasoning on Socioscientific Issues and Discourse in Science
Education. Springer, Heidelberg (2003)
11. http://wiki.lawguru.com/index.php?title=Casuistry
12. Gips: Towards the Ethical Robot, Android Epistemology, pp. 243–252. MIT Press,
Cambridge (1995)
13. Kendal, S.L., Creen, M.: An Introduction to Knowledge Engineering. Springer, Heidelberg
(2007)
14. Olivia, C., Change, C., Enguix, C.F., Ghose, A.K.: Case-Based BDI Agents: An Effective
Approach for Intelligent Search on the web. In: Proceeding AAAI 1999 Spring
Symposium on Intelligent Agents in Cyberspace Stanford University, USA (1999)
15. Corchado, J.M., Pavón, J., Corchado, E., Castillo, L.F.: Development of CBR-BDI Agents:
A Tourist Guide Application. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004.
LNCS (LNAI), vol. 3155, pp. 547–559. Springer, Heidelberg (2004)
16. Al-Fedaghi, S.S.: Typification-Based Ethics for Artificial Agents. In: Proceedings of Sec-
ond IEEE International Conference on Digital Ecosystems and Technologies (2008)
17. Floridi, L., Sanders, J.W.: On the Morality of Artificial Agents. Minds and
Machines 14(3), 349–379 (2004)
Research of Trust Chain of Operating System
Abstract. Trust chain is one of the key technologies in designing secure oper-
ating system based on TC technology. Constructions of trust chain and trust
models are analyzed. Future works in these directions are discussed.
1 Introduction
It seems likely that computing platforms compliant with Trusted Computing Group
(TCG) specifications will become widespread over the next few years [1, 2]. The ad-
venture of TC (Trust Computing) technology gives new opportunity to the research and
development of secure operating system [3, 4]. On the one hand, secure operating
system has the ability to protect legal information flow, protect the information from
unauthorized access. On the other hand, TC technology can protect operating system of
its own security. Therefore, with the help of TC technology to develop secure operating
system is an effective way to solve the security of terminal. Future operating system
will be open-source, trusted, secure, etc [5].
Trust chain is one of the key technologies of TC, which plays vital role in con-
struction of high-assured secure operating system. In this paper we summary in detail
two important problems about trust chain of secure operating system, mainly focus on
Linux.
The paper starts with a description of transitive trust and the notion of trust chain of
operating system in Section 2. Section 3 details the research on construction of oper-
ating system trust chain and analyzes future work on Linux. Trust models and its future
work are discussed in Section 4. We conclude the paper in Section 5.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 96–102, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Research of Trust Chain of Operating System 97
group of functions can give a trustworthy description of the third group of functions,
etc. Transitive trust is used to provide a trustworthy description of platform character-
istics, and also to prove that non-migratable keys are non-migratable.
The transitive trust enables the establishment of a chain of trust by ensuring that
the trust, on each layer of the system, is based on, and is only based on, the trust on the
layer(s) underneath it, all the way down to the hardware security component, which
serves as the root of trust. If verification fails to succeed at any given stage, the system
might be put in a suspended-mode to block possible attacks. The resulting integrity
chain inductively guarantees system integrity. Figure 1 illustrates the process of tran-
sitive trust of operating system. Trust chain starts from the trust root. The unmodifiable
part of the BIOS measure the integrity of the BIOS and then the BIOS measures the
MBR (Master Boot Record) of the bootstrap device. Then, MBR measures the OS
loader, and OS loader measures OS kernel. Finally, OS measure applications.
Many researchers have endeavor to lots of works on operating system based on TCPA.
AEGIS is a system structure for integrity examination developed by University of
Pennsylvania [6]; AEGIS architecture establishes a chain of trust, driving the trust to
lower levels of the system, and, based on those elements, secure boot. It validates
integrity at each layer transition in the bootstrap process. AEGIS also includes a re-
covery process for integrity check failures.
IBM T. J. Watson Research Center developed an integrity mechanism on Linux [7].
There system is the first to extend the TCG trust concepts to dynamic executable con-
tent from the BIOS all the way up into the application layer. They present the design
and implementation of a secure integrity measurement system for Linux. All executa-
ble content that is loaded onto the Linux system is measured before execution and these
measurements are protected by the Trusted Platform Module (TPM) that is part of the
TCG standards. Their measurement architecture is applied to a web server application
to show how undesirable invocation can be detected, such as rootkit programs, and that
measurement is practical in terms of the number of measurements taken and the per-
formance impact of making them.
Huang-Tao also designed a trusted boot method in server system [8]. Their research
proposed a solution of trusted boot. The solution performs integrity verification on
every control capability transfer between one layer and next layer. If OK, control can
transfer. Therefore, the boot of system can execute according to the transition of trusted
chain. In this way, system boot can be in a trusted state, which can improve the sys-
tem’s security. Also, the solution is easily and flexibility to implement.
98 H. Li and X. Tian
Dynamic multi-path trust chain (DMPTC) is a software type and character based
mechanism to assure system trustworthiness on WINDOWS system [9]. DMPTC dif-
ferentiates static system software and dynamic application software and takes different
ways and policies to control the loading and running of various executable codes. The
goal of DMPTC is to build a trusted computing platform by making computing plat-
form only load and run trustworthy executables. Also, DMPTC gives great considera-
tion to the impact it causes to system performance1.Based on the attributes of various
executables and by taking advantage of WINDOWS interior security mechanisms,
DMPTC reduces the time cost of the executables verification process greatly. And
ultimately assures the flexibility and utility of DMPTC.
Maruyama etc deeply studied transitive mechanisms from GRUB boot loader to
operating system, describe a TCPA integrity mechanism from end to end and imple-
ment it on Linux kernel [10]. Integrity measurement of the kernel is done through a
chaining of PCR (Platform Configuration Register) updates during the bootstrap
process of the GRUB kernel loader. Kernel integrity measure is done by update the
PCR chain in kernel loader during booting. The measure results which involved digi-
tally-signed PCR values are reported to remote server by using a JAVA application,
Hadi Nahari describes a mechanism to construct effective CONTAINMENT (that is, a
mechanism to stop an exploited application start to against to attacks on another ap-
plication), which is applied to embedded system [11]. The MAC (Mandatory Access
Control) is provided by SeLinux. The focus will be on practical aspects of hardware
integration as well as porting SeLinux to resource-constrained devices. The methods
provides a high-level, structured overall infrastructure to provide basic and necessary
function to establish the trust of operating system services (via connect a hard-
ware-based root of trust).
For Linux kernel, changes may take place during execution. Due to the multi-aspect
and in-order of applications, single chained boot mechanism of system boot cannot
apply to the trust transition from OS to applications. Once the kernel is booted, then
user-level services and applications may be run. In Linux, a program execution starts
by loading an appropriate interpreter (i.e., a dynamic loader, such as ld.so) based on the
format of the executable file. Loads of the target executable's code and supporting
libraries are done by the dynamic loader. In Linux operating system, kernel and its
modules, binary execution files, shared libraries, configuration files, and scripts run
loosely serially, and can change the system’s state during execution, which is the key
difficulty to the research of trusted Linux. In such situations, even OS loader has veri-
fied that the kernel is trusted. The trusted states can also be destroyed because the
kernel modules can be inserted or uninstalled, which incurs that the PCR values cannot
represent the current execution condition. It needs verification whether the module has
effects on kernel states on inserting or uninstalling the kernel modules.
One must note that the former research though ensuring the integrity of the oper-
ating environment when a hard boot occurs, does not guarantee its integrity during
the runtime; that is, in case of any malicious modification to the operating environ-
ment during the runtime, this architecture will not detect it until the next hard boot
happens. Trust of application embodies the integrity of privacy of complete trust
Research of Trust Chain of Operating System 99
chain. Though in reference [7], the trust is extended to dynamic executable content,
mandatory access control policy are not measured, future architecture extensions
should include such measurements.
On the other hand, a MAC (Mandatory Access Control) mechanism will be able to
address one of its fundamental shortcomings; providing a level of protection at runtime.
Deploying MAC mechanisms at the same time balance performance and control are
particularly challenging tasks.
TPM-EMULATOR provides better technical support for the practice and verifica-
tion for the research of trust chain [12, 13]. There is a tremendous opportunity for
enhanced security through enabling projects to use the TPM.
4 Trust Model
TC models centralized on how to compute trust in information world, that is, put use
the trust relations among humans into computing environments to achieve the goal of
trust.
Patel J etc proposed a probabilistic trust model[14], their research aims to develop a
model of trust and reputation that will ensure good interactions amongst software
agents in large scale open systems in particular[14]. Key drivers for their model are: (1)
agents may be self-interested and may provide false accounts of experiences with other
agents if it is beneficial for them to do so; (2) agents will need to interact with other
agents with which they have no past experience. Specifically, trust is calculated using
probability theory taking account of past interactions between agents. When there is a
lack of personal experience between agents, the model draws upon reputation infor-
mation gathered from third parties.
Trust management model based on the fuzzy set theory deals with the authenticity in
open networks [15]. The author showed that authentication can not be based on public
key certificate alone, but also needs to include the binding between the key used for
certification and its owner, as well as the trust relationships between users. And develop
a simple algebra around these elements and describe how it can be used to compute
measures of authenticity.
In algebra for assessing trust in certification chains, the fuzzy nature of subjective
trust is considered, and the conceptions of linguistic variable and fuzzy logic are in-
troduced into subjective trust management [16]. A formal trust metric is given, fuzzy
IF-THEN rules are applied in mapping the knowledge and experiences of trust rea-
soning that humanity use in everyday life into the formal model of trust management,
the reasoning mechanisms of trust vectors are given, and a subjective trust management
model is provided. The formal model proposed provides a new valuable way for
studying subjective trust management in open networks.
Due to the open network is dynamic, distributed, and non-deterministic, in reference
to trust relation among human, a trust evaluation model based on D2S theory is pro-
posed [17]. The model gives the formal description of direct trust based on the history
trade record among grid and constructed the trust recommend mechanism of grid
nodes. Combining D2S theory with evidence, we can get indirect trust. After that,
100 H. Li and X. Tian
combine direct trust and indirect trust, effectively. The results show that the trust model
is viable and usability.
Reference [18] proposed a trust management model based on software behavior. An
agent-based software service coordination model is presented to deal with the trust
problem in open, dynamic and changeful application environment. The trust valuation
model is given to value trust relationships between software services. Trust is
abstracted as a function of subjective expectation and objective experience, and a rea-
sonable method is provided to combine the direct experience and the indirect experi-
ence from others. In comparison with another work, a complete trust valuation model is
designed, and its reasonability and operability is emphasized. This model can be used
in coordination and security decision between software services.
Non-interference theory is introduced into the domain of trusted computing to con-
struct the trusted chain theoretic model [19]. The basic theory of the computing trusted is
proposed and a non-interference based trusted chain model is built from the dynamic
point of view, and then the model is formalized and verified. Finally, the process of
start up based on Linux operating system kernel is implemented. The implementation
provides a good reference for the development and application of the trusted computing
theory as well.
TCG introduces the idea of trust into the computing environment, but there is still not
the formalized uniform description. Trusted computing is still a technology but not a
theory, and the basic theory model has not been established. Above trust models focus
on sociology human relation, does not in accordance with the definition of TCG. Also,
present models should be further optimized to objective, simple and usability.
Linux is multi-task operating system, multi-program run simultaneously; therefore,
verification of the former and then transit to the latter is not viable. To maintain the
trusted state of the system, there is need to verify the file and associated execute pa-
rameters is trusted or not. At the same time, there is also need verification before
loading to execution to those objects which probably change system state.
Due to the property of applications, formal description should focus on the separa-
tion of processes, etc. The theory which support trust transition should pay more at-
tention to the dynamic execution. Also, the security principle, such as least privilege,
need-to-know policy should also take into account. Efficiency of integrity measure-
ment and security policy enforcement should also further improved.
In essence, non-interference theory is based on information flow theory, and it can
detect covert channel of the system which meets the requirements of designing
high-level secure operating system. Future research on trust model should pay more
attention to the non-interference theory, which support the construction and extend of
trusted chain.
5 Conclusions
In this paper, we summarize construction and formalization of trust chain of operating
system. We conclude that future research on trust chain of secure operating system
Research of Trust Chain of Operating System 101
should focus on dynamic trusted chain. Make full use of trusted mechanisms supplied
by TPM, extending the TCG concept of trust to application layer to study the trusted
chain and its formal description.
Acknowledgments. We are grateful to Shanghai Municipal Education Commission‘s
Young Teacher Research Foundation, under grant N0. SDL08024.
References
1. Shen, C.-x., Zhang, H.-g., Feng, D.-g.: Survey of Information Security. China Sci-
ence 37(2), 129–150 (2007) (in Chinese)
2. Trusted Computing Group. TCG Specification Architecture Overview [EB/OL]
[2005-03-01], http://www.trustedcomputinggroup.org/
3. Changxiang, S.: Trust Computing Platform and Secure Operating System. Network Security
Technology and Application (4), 8–9 (2005) (in Chinese)
4. Shieh, A.D.: Nexus: A New Operating System for Trustworthy Computing. In: Proceedings
of the SOSP, Brighton, UK. ACM, New York (2005)
5. Zhong, C., Shen, Q.-w.: Development of Modern Operating System. Communications of
CCF 9, 15–22 (2008) (in Chinese)
6. Arbaughz, W.A., Farber, D.J., MSmith, J.: A Secure and Reliable Bootstrap Architecture.
In: IEEE Security and Privacy Conference, USA, pp. 65–71 (1997)
7. Sailer, R., Zhang, X., Jaeger, T., et al.: Design and Implementation of a TCG -based Integ-
rity Measurement Architecture. In: The 13th Usenix Security Symposium, San Diego (2004)
8. Tao, H., Changxiang, S.: A Trusted Bootstrap Scenario based Trusted Server. Journal of
Wuhan University (Nature Science) 50(S1), 12–14 (2004) (in Chinese)
9. Xiaoyong, L., Zhen, H., Changxiang, S.: Transitive Trust and Performance Analysis in
Windows Environment. Journal of Computer Research and Development 44(11),
1889–1895 (2007) (in Chinese)
10. Maruyama, H., Nakamura, T., Munetoh, S., et al.: Linux with TCPA Integrity Measurement.
IBM, Tech. Rep.: RT0575 (2003)
11. Nahari, H.: Trusted Embedded Secure Linux. In: Proceedings of the Linux Symposium,
Ottawa, Ontario Canada, June 27-30, pp. 79–85 (2007)
12. Hall, K., Lendacky, T., Raliff, E.: Trusted Computing and Linux,
http://domino.research.ibm.com/comm/research_projects.nsf/
pages/gsal.TCG.html/FILE/TCFL-TPM_intro.pdf
13. Strasser, M.: A Software-based TPM Emulator for Linux. Swiss Federal Institute of
Technology (2004),
http://www.infsec.ethz.ch/people/psevinc/TPMEmulatorReport.pdf
14. Patel, J., Teacy, W.T.L., Jennings, N.R., Luck, M.: A Probabilistic Trust Model for Han-
dling Inaccurate Reputation Sources. In: Herrmann, P., Issarny, V., Shiu, S.C.K. (eds.)
iTrust 2005. LNCS, vol. 3477, pp. 193–209. Springer, Heidelberg (2005)
15. Wen, T., Zhong, C.: Research of Subjective Trust Management Model based on the Fuzzy
Set Theory. Journal of Software 14(8), 1401–1408 (2003) (in Chinese)
16. Audun, J.: An Algebra for Assessing Trust in Certification Chains. In: The Proceedings of
NDSS 1999, Network and Distributed System Security Symposium, The Internet Society,
San Diego (1999)
17. Lulai, Y., Guosun, Z., Wei, W.: Trust evaluation model based on DempsterShafer evidence
theory. Journal of Wuhan University (Natural Science) 52(5), 627–630 (2006) (in Chinese)
102 H. Li and X. Tian
18. Feng, X., Jian, L., Wei, Z., Chun, C.: Design of a Trust Valuation Model in Software Service
Coordination. Journal of Software 14(6), 1043–1051 (2003) (in Chinese)
19. Jia, Z., Changxiang, S., Jiqiang, L., Zhen, H.: A Noninterference Based Trusted Chain
Model. Journal of Computer Research and Development 45(6), 974–980 (2008)
(in Chinese)
A Novel Application for
Text Watermarking in Digital Reading
1 Introduction
It has been several ten years since digital watermarking technology emerged. Both of
technology diversity and theoretical research have made major strides in this field.
But its application in business is always a big problem which perplexes the develop-
ment of watermarking. Many watermarking companies have gone out of business or
suspended their watermarking efforts except some focus on tracking and digital
fingerprint [1].
This problem is due to, in our opinion, the misunderstanding of the nature of digital
watermarking. The real value of digital watermarking is the information bound in
watermarking. In other words, watermarking content and application methods are the
key issues in digital watermarking research. Unfortunately, the current research in this
area is little to write about.
As an important part in digital watermarking, text watermarking encounters the same
problem with other kinds of watermarking technology. While most people agree that
text watermark should be been widely used in the field of digital reading, but this is not
the truth. On one hand lack of appropriate application methods make text watermarking
useless. On the other hand, text watermarking is characterized by its attack sensitiveness
(this part is discussed in section2). This easily led to watermark information lost.
In this paper, one application method is proposed for text watermarking in digital
reading. We combine original digital content and advertisement as a whole digital
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 103–111, 2009.
© Springer-Verlag Berlin Heidelberg 2009
104 J. Zhang et al.
document. The rules of combination are embedded in this digital document as water-
marking information. Once the document is under attack, advertisement will be re-
leased from it. The attacker will have to read them with real content together, which
will serve as compensation for the copyright holder.
This paper is organized as follows: in section 2, we review other kinds of text wa-
termarking. And the procedure of our solution is presented generally in section 3. And
we expound solution detail and experiment result in section 4. At last we draw the
conclusion in section 5.
2 Related Works
In general, the research object of text watermarking is digital document. And docu-
ment is always made up of two parts, content and format.
Many researchers choose to embed watermarking by taking content as manifesta-
tions of language. That is semantics-based watermarking. When the watermarking
information is embedded, the content is modified according to linguistic rules. Atal-
lah’s TMR is the representative of this kind of watermarking [2].Dai and Yang also
proposed their solution [3, 4]. This kind of watermarking technology always needs
more time to consume. And it is very sensitive to tamper attack and deleting attack
[5]. At the same time, less capacity is another shortcoming of them.
Some other researchers deal with content as a still picture. In this way, they can
build additional redundant space to embed watermarking so as to enhance capacity.
Brassil and Low’s line-shift/word-shift coding and word feature coding are the repre-
sentative of such solutions [6, 7]. In their solution, content is a binary image. They use
lines or words’ tiny mobile in different directions to embed information. And Huang
complemented their work [8].Liu and Chen’s work is based on interconnected do-
main. They regarded Chinese characters as special images [9]. It is worth mentioning
that Li and Zhang propose their method which integrates the ideas of Liu and seman-
tics-based watermarking [10]. They make a significant reduction in overall algorithm
consumption. But OCR technology is their natural enemies. No matter how sophisti-
cated these methods are, the majority of watermarking information will be removed
after the OCR or retyping attack. Tamper and deleting attack also could reduce their
performance.
There are some researchers who hope that watermarking can be hidden in docu-
ment’s format. Fu and Wang’s design is a prime example [11]. Zhou and Sun use spe-
cial characters in RTF file to embed watermark [12]. And some solution is based on
redundant coding space, such as Zhou’s work [13]. Other techniques such as add
blanks and other perceptible characters are also proposed. Although they remove the
impact of tamper attack and deleting attack. Any change to the document will have a
fatal impact on them, much more OCR and retyping attack.
To sum up, all kinds of watermarking are sensitive to special attack. There is little
information could be maintained under the niche attack.
A Novel Application for Text Watermarking in Digital Reading 105
3 Solution Design
As mentioned above, there is none text watermarking technology can resist all kinds
of attack. In particular the realization of these attacks is also very low cost. Rather
than spending a lot of energy to design a more robust watermark, it is better to find a
method which could make attacker wouldn’t burn their house to rid it of a mouse.
CA
C0 ⊕ W
⊕ CW
nA ⊕
W CWn
C0
Step 1: C0 and CA combine in accordance with the rules. These rules are encoded
into a special format, which is watermarking information W. In other words, both of
C0 and CA are watermarked.
Step 2: After being embedded W, digital document will be distributed or attacked.
In this way attack noise nA. is introduced.
Once attack happened during transmission, nA will add into document. At this time
CW turns into Cwn. As formula (2) shows: When attack misses, nA is given the value 0.
CW equals to Cwn in this condition.
Step 3: When user’s device or client software receive Cwn, it will extract W and
restore C0 according to W. If nA equals to 0, user could watch original content. But if
watermarking is totally erased, user will have to watch CA while they are browsing C0.
In the worst case that W is totally distorted, there are only garbled characters left.
We implement our design on one kind of hand-held digital reading device, and apply
it in related digital works release platform. The algorithm in [10] is taken as default
algorithm. We use a txt document which contains 10869 Chinese characters as ex-
periment object.
Figure2. (a) shows the document which contains watermarking and advertisement.
In this screenshot, it is open directly without stripping. And Figure2. (b) shows the
result of opening it after stripping. In Figure2.(b) both of advertisement and water-
marking signs are removed by content stripper.
(a) (b)
Original content C0 is split into N blocks, sizes of which are not equal and are de-
noted by sequence {Sai}. And sequence {Sbi} is used to denote the sizes of CA blocks.
These combination rules are described by W, which is made up of sequence
{Wi}.We define Wi as follows (3):
Wi = {li ,{Ci , ai , bi }R , S } (3)
li: total length of Wi.
Ci: check code of ai and bi.
{}R: redundancy construction function. It could be replaced by cryptographic
functions.
S: barrier control code. It is used to isolate different Wi. It is also helpful when delet-
ing attack appears. Different code words can also play the role of control information
transmission.
{ai}: relative length of C0 block i. It equals to the distance between first WT in this to
the last character in this C0 block.
{bi}: length of CA block i. It usually equals to Sbi.
Description above is just content structure. After these processing, it could be sealed
in other document format, such as DOC, Html. Because of lager consumption of sys-
tem, this work should be arranged in content servers or distribution servers.
Most main DRMs use asymmetric cryptographic algorithm and special document
format to ensure security. Our design focuses on content which is independent of
document format. And then it could be applied in most DRM smoothly.
Dynamic DRM (DDRM) is a special DRM which support superdistribution and
distribution among users [15]. And its technical basis is its license mechanism. And
all digital works or digital documents on this platform must be divided into blocks
and reorganized. Especially dynamic advertisement block is supported in DDRM.
That means the advertisement in digital works could be changed with distribution
gong on.
As mentioned in section 3, watermarking is embedded into total content. Both of
C0 and CA contain watermarking. In this case, device has to reorganize digital docu-
ment to update CA after each distribution. Obviously it is very hard for today’s em-
bedded devices.
Therefore we have to amend formula (3), and make it turn into another form as (4)
shows.
CW = F (W , C0 ) ⊕ C A (4)
• New CA block must have the same size with the old one. That is the reason why
we usually give CA block a larger size.
• Validity of CA should also be checked. And barrier control code S’s value is
used to indicate whether dynamic CA is supported.
As discussed in section 2, most text watermarking algorithm including our design are
attack sensitive. Once attack match with watermarking algorithm’s shortcoming, there
could be little information left.
Rather than pay attention to more robust algorithm, we focus on risk management
after attack happened. Figure 5 shows the result of watermarking information lost in
our design.
Our research mainly focuses on finding a new method to apply digital watermarking.
Therefore the correlation of selection algorithm needs to be discussed.
Though most algorithms could smoothly be applied in our design, we need use
different way to split content according to different algorithms.
When one algorithm hiding information in format is selected, splitting method is
very easy. If the algorithm tries to regard text as a picture, the descriptors about size
or offset should be replaced by the descriptors about coordinates.
110 J. Zhang et al.
(a)
(b)
5 Conclusion
As an application technology, digital watermarking research should not only indulge in
finding new algorithm or blindly seek technical target. Searching efficient application
method maybe more meaningful.
Out of viewpoint above, this paper propose a novel application method for text wa-
termarking in digital reading field. It will release advertisement as interference infor-
mation under attack. On the one hand, reducing the quality of digital works could
inhabit unauthorized distribution. On the other hand, it point out a new way for
watermarking application.
A Novel Application for Text Watermarking in Digital Reading 111
References
1. Delp, E.J.: Multimedia security: the 22nd century approach. Multimedia Systems 11,
95–97 (2005)
2. Atallah, M.J., Raskin, V., Crogan, M., Hempelmann, C., Kerschbaum, F., Mohamed, D.,
Naik, S.: Natural language watermarking: Design, analysis, and a proof-of-concept imple-
mentation. In: Moskowitz, I.S. (ed.) IH 2001. LNCS, vol. 2137, pp. 185–200. Springer,
Heidelberg (2001)
3. Dai, Z., Hong, F., Cui, G., et al.: Watermarking text document based on statistic property
of part of speech string. Journal on Communications 28(4), 108–113 (2007)
4. Yang, J., Wang, J., et al.: A Novel Scheme for Watermarking Natural Language Text:
Intelligent Information Hiding and Multimedia Signal Processing. IEEE Intelligent Soci-
ety 2, 481–488 (2007)
5. Li, Q., Zhang, J., et al.: A Chinese Text Watermarking Based on Statistic of Phrase
Frequency: Intelligent Information Hiding and Multimedia Signal Processing. IEEE Intel-
ligent Society 2, 335–338 (2008)
6. Brassil, J., Low, S., Maxemchuk, N., O’Gorman, L.: Electron marking and identification
techniques to discourage document copying. IEEE J. Select. Areas Commun. 13,
1495–1504 (1995)
7. Low, S., Maxemchuk, N.: Capacity of Text Marking Channel. IEEE Signal Processing
Letter 7(12), 345–347 (2000)
8. Huang, H., Qi, C., Li, J.: A New Watermarking Scheme and Centroid Detecting Method
for Text Documents. Journal of Xi’an Jiaotong University 36(2), 165–168 (2002)
9. Liu, D., Chen, S., Zhou, M.: Text Digital Watermarking Technology Based on Topology
of Alphabetic Symbol. Journal of Chinese Computer Systems 27(25), 812–815 (2007)
10. Li, Q., Zhang, Z., et al.: Natural text watermarking algorithm based on Chinese characters
structure. Application Research of Computers 26(4), 1520–1527 (2009)
11. Fu, Y., Wang, B.: Extra space coding for embedding wartermark into text documents and
its performance. Journal of Xi’an Highway University 22(3), 85–87 (2002)
12. Zou, X., Sun, S.: Fragile Watermark Algorithm in RTF Format Text. Computer Engineer-
ing 33(4), 131–133 (2007)
13. Zhou, H., Hu, F., Chen, C.: English text digital watermarking algorithm based on idea of
virus. Computer Engineering and Applications 43(7), 78–80 (2007)
14. Cox, J., Miller, M., et al.: Digital Watermarking and Steganography, 2nd edn. Morgan
Kaufmann, San Francisco (2007)
15. Li, Q., Zhang, J., et al.: A Novel License Distribution Mechanism in DRM System. In:
Proceedings of 22nd Advanced Information Networking and Applications - Workshops,
pp. 1329–1334. IEEE Computer Society, Los Alamitos (2008)
Optimization of Real-Valued Self Set for Anomaly
Detection Using Gaussian Distribution
College of Computer Science and Technology, Harbin University of Science and Technology,
150080 Harbin, China
xljyp2002@yahoo.com.cn, zhangfb@hrbust.edu.cn,
stonetools@sohu.com
Abstract. The real-valued negative selection algorithm (RNS) has been a key
algorithm of anomaly detection. However, the self set which is used to train de-
tectors has some problems, such as the wrong samples, boundary invasion and
the overlapping among the self samples. Due to the fact that the probability of
most real-valued self vectors is near to Gaussian distribution, this paper pro-
poses a new method which uses Gaussian distribution theory to optimize the
self set before training stage. The method was tested by 2-dimensional synthetic
data and real network data. Experimental results show that, the new method
effectively solves the problems mentioned before.
1 Introduction
The anomaly detection problem can be stated as a two-class problem: given an ele-
ment of the space, classify it as normal or abnormal[1]. There exist many approaches
for anomaly detection which include statistical, machine learning, data mining and
immunological inspired techniques[2, 3, 4]. The task of anomaly detection may be
considered as analogous to the immunity of natural systems, while both of them aim
to detect the abnormal behaviors of system that violate the established policy[5,6].
Negative selection algorithm (NSA)[7] has potential applications in various areas, in
particular anomaly detection. NSA trains detector set by eliminating any candidate
that match elements from a collection of self samples (self set). These detectors sub-
sequently recognize non-self data by using the same matching rule. In this way, it is
used as an anomaly detection algorithm that only requires normal data to train.
Most works in anomaly detection used the problem to binary representation in the
past. However, many applications are natural to be described in real-valued space.
Further more, these problems can hardly be processed properly using NSA in binary
representation. Gonzalez et al.[8] proposed the real-value representation for the
self/non-self space, which differs from the original binary representation. The real-
valued self/detector is a hypersphere in n-dimensional real space; it can be repre-
sented by an n-dimensional point and a radius.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 112–120, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Optimization of Real-Valued Self Set for Anomaly Detection 113
As we know that, the detection performance is mainly relied on the quality of de-
tector set: detector coverage of the non-self space. The detector is trained by the self
set, so that the self set plays an important role on the whole detection process, and the
health of self set is very important to be paid enough attention. It should be noted that,
the real-valued self set indeed has some problems which the binary representation has
not, such as, the wrong self samples, the overlapping of self samples and the boundary
invasion. To solve these problems, this paper proposes a novel method to optimize the
self set. Starting from the point that self samples approach the Gaussian distribution
as the number of samples is large enough, so this method employs Gaussian distribu-
tion theory to deal with the problems mentioned above.
Before training detectors, the self set must be secure. However, the real situation may
not meet to the requirement all the time, and the self set is likely to collect some
wrong samples. The figure 1(a) illustrates the problem of wrong self in 2-dimensional
space, there are two samples is so far from the self region, which we can consider as
the wrong self. The wrong self samples may result in the holes, so it is important to
discard the wrong samples before RNS.
(a) The wrong self samples (b) The overlapping (c) The boundary invasion
The overlapping self samples: In a work situation, the self samples distribute inten-
sively. As shown in figure 1(b), a large number of self samples are intensively distribute
in the self region, and the overlapping rate is very high, and there are many unnecessary
samples in the self region. Now we discuss the relationship between the number of self
samples and the candidates. Some variables should be defined as below:
f : The probability of failure matching of a candidate to any self sample
pf : The probability of an attack that is not detected
pm : The matching probability
Ns : The number of self samples
114 L. Xi, F. Zhang, and D. Wang
f = (1 − pm ) Ns ≈ e− Ns pm .
When the N d is large enough,
p f = (1 − pm ) N d ≈ e − N d pm .
Because
− ln( p f ) ,
N d = N d0 ∗ f =
pm
so
− ln( p f )
N d0 = (1)
pm (1 − pm ) N s
From the equation 1, we can see that the N d has exponential relationship with the N s ,
0
so the more self samples, the more candidates and the higher cost of the detector
training.
The follow equation defines an approximate measure of overlapping between self
samples:
2
− si − s j
r2 (2)
Overlapping( si , s j ) = e s
The maximum value, 1, is reached when the distance between the two sample is 0, on
the contrary, when the distance is equal to 2 rs , the value of this function is almost close
to 0. Based on the equation 2, the amount of overlapping of a self set is defines as
2
− si − s j
Overlapping( S ) = ∑ e r2
s
, i, j = 1, 2,..., n. (3)
i≠ j
As a result of the self’s radius, on the border of the self region, the covering area
should invade the non-self region’s boundary. We can see this phenomenon clearly in
the figure 1(c). So when training the detectors by self set, the vicinity of the non-self
boundary may not be covered completely.
Optimization of Real-Valued Self Set for Anomaly Detection 115
3 Optimization Method
In view of the analysis above, to optimize the self set is significant. The aim of the
optimization is using the least self samples to cover the self region but not the non-
self region. The problem of optimization can be stated as follows:
minimize:
V ( S ) = Volume{x ∈ U | ∃s ∈ S ,|| x − s ||≤ rs } (4)
restricted to:
{∀s ∈ S | ∃d ∈ D, s − d ≤ rs } = ∅ (5)
and
{∀si , s j ∈ S ,|| si − s j ||≤ rsi or || si − s j ||≤ rs j } = ∅ (6)
The function defined in equation 4 represents the amount of the self space covered by
self samples. S, which corresponds to the volume covered by the union of hyper-
spheres associated with each self. The restriction: equation 5 tells that no self should
invade the detector set; equation 6 tells that, there is no self sample covered by the
other samples in the self set.
As we said before, if the number of self samples is large enough, the probability
distribution of self samples approaches the Gaussian distribution. Furthermore, the
central limit theorem clarifies that whichever kind of the probability distribution all
samples approach, the distribution of the samples’ mean is near to a Gaussian distri-
bution; the mean of Gaussian distribution is equal to the mean of all samples.
According to the Gaussian distribution, we can also realize that, the smaller the
distance between the sample and the mean point is, the larger the value of the sam-
ple’s probability density. As discussed before, the nearer the sample is next to the
center of the self region, the heavier the overlapping rate is. So we can use this rela-
tionship to deal with the problem of the unnecessary self samples. This paper pro-
poses the method is that adjusting every sample’s radius by its probability density to
deal with the boundary invasion, and then discarding each unnecessary sample by the
radius.
There is an important and famous proposition in the Gaussian distribution: the “3σ”
Criterion, which shows that each normal sample is almost in the “3σ” district, al-
though the range is from −∞ to ∞ . So we can use the “3σ” Criterion to deal with
the problem of the wrong self samples.
As analyzed above, we may draw a conclusion that, the optimization by the prob-
ability theory is reasonable. To solve the three problems with self set mentioned
above, the optimization is composed by three steps as follows:
Step.1: Discarding the wrong self samples by the “3σ” criterion;
Step.2: Adjusting the self radius by the self’s probability density;
Step.3: Discarding the unnecessary self samples by the self radius.
Before describing the optimization, some variables frequently used in the following
sections are defined here:
116 L. Xi, F. Zhang, and D. Wang
BEGIN
Collection the self samples: S0 ← s;
//step. 1: discarding the wrong self samples
Regularize S0, and then compute the µ and σ;
n=0;
while (n<N0){
if (sn is out of the “3σ” district){
discard sn, N0--;
}
n++;
}
// step. 2: adjusting every sample’s radius
Compute the smaxpdf and sminpdf ;
L= (maxpdf-minpdf)/num;
while (n<N0){
l = ( pdf s − min pdf ) / L ;
n
Sn.r=l×k;
}
// step. 3: discarding the unnecessary self samples
S ← s0, N=1, flag=0;
while (n<N0){
i=0;
while (i<N){
d =|| sn − si || ;
if (d<sn.r || d<si.r){
flag=1, break;
}
i++;
}
if (flag==0){
S ← si, N++;
}
n++;
}
END
In the step.1, the self sample should be discarded if its probability density is out of
the “3σ” district. The time complexity of this step is O(n).
Optimization of Real-Valued Self Set for Anomaly Detection 117
In the step.2, the nearer self samples are closed to the center of self region, the lar-
ger the radii of them. So the radii of self samples near the boundary of self region are
adjusted to be a relatively rational level so that the boundary invasion is avoided,
while the around the central region, each sample’s radius is adjusted to be a relatively
larger level than before so that the overlapping is severer than before. The time com-
plexity of this step is O(n).
In the step.3, by discarding the unnecessary samples, the problem with more over-
lapping in the step.2 of optimization is solved. The samples with larger level radius
can cover more space than the fixed radius samples before, so that, the self region can
be covered by fewer samples than before. The time complexity of this step is O(n2).
Via the optimization, the self set is more rational than before. The total time com-
plexity of optimization is O(n2).
Firstly, some self samples chosen from a fixed number of random points, which fit
with the normal distribution, are normalized to compose the self set (198 samples,
radius is fixed as 0.05). So this set has no wrong samples, and the step.1 optimization
will be illustrated in the real network experiment. Each result of optimization is
shown by the figure 2.
(a) The initial self set (b) The self set optimized by step.2 (c) The self set optimized by step.3
(d) Detectors trained by (e) Detectors trained by self set (f) Detectors trained by self set
initial self set optimized by step.2 optimized by step.3
The figure 2(a) shows the initial self set, and we can see that the overlapping and
the boundary invasion are extremely severe. The detectors trained by the initial self
set are shown in figure 2(d), and we can see clearly that the boundary of the non-
self is not covered completely which is the consequence of the self samples bound-
ary invasion. Likewise, the optimized self set and its detector set are shown by the
other figures of the figure 2 (the unit radius is set as 0.005). After all the steps of
optimization, the number of optimized self is dropped to 67, and the overlapping(S)
is dropped to 2.71% (by the equation 3). As the figures show, adjusting each sam-
ple’s radius avoids the boundary invasion so that the non-self region’s boundary is
covered well; and then the coverage of self region by the most reasonable self sam-
ples almost staying the same as before.
In the real network experiment, the self samples are come from the network security
Lab of the HUST (Harbin University of Science and Technology). The value is 2-
dim: pps(packages per second) and bps(bytes per second). We collected 90 samples
and drew the joint probability distribution in figure 3. And we can see that the joint
distribution is quite similar to the standardized Gaussian distribution.
(a) The joint probability distribution of pps and bps (b) Two-dimensional standardized Gaussian
The results of optimization are shown by the figure 4. The optimized self sets are
in figure 4(a, b, c, d), and the corresponding detector sets are in the other figures of
figure 4.
As shown in the figure 4(a), the self set had 3 wrong self samples which were far
away from the self region. Therefore, the detector set generated by the initial self set
has holes shown in the figure 4(e). After the step.1, the wrong samples were discarded
(shown in the figure 4(b)). The results of other steps of optimization are just like the
simulation experiment above. The final self set has 19 samples.
To perform a comparison of generating detector set between the initial self set and
the optimized one, we used the two data sets to generate 3000 detectors. We can
clearly see the process from the figure 5, the detector generating speed using opti-
mized self set is faster than using initial self set.
Optimization of Real-Valued Self Set for Anomaly Detection 119
Fig. 5. The detector generating speed comparison between initial and optimized self set
To perform a comparison of the detection effect of the two detector sets, we used
the SYN flooding to attack the two hosts with different detector sets simultaneously.
All the results shown in the table 1 are average of 100 times experiment. From the
table we can see clearly that the detection rate with detectors using optimized self set
is much higher than the other (98.9% vs. 83.7%) and the false alarm rate is remark-
able lower than the other (2.3% vs.8.3%).
False Alarm
Detector Set Connection Invasion Detecting Detection Rate
Rate
By initial 13581653.3 100 92.5 83.7% 8.3%
By optimized 13581653.3 100 101.2 98.9% 2.3%
120 L. Xi, F. Zhang, and D. Wang
5 Conclusion
In this paper we have shown that the real-valued self set’s optimization in anomaly
detection with the probability theory. Experimental results demonstrated that this
optimization is necessary and the method is obviously effective. Moreover, the opti-
mized self set provides a favorable foundation of generating detectors. The detector
set trained by the optimized self set is more reliable because of the advantages of the
optimization algorithm:
1. Holes can be better covered by the generated detectors because of discarding the
wrong self samples.
2. Time to generate detectors is saved by using smaller number of self samples, which
also requires less space to store them.
3. The radius of the border self samples is adjusted to a rational range so that the
boundary invasion is solved and the false alarm rate is reduced obviously.
The influence of real-valued self set optimization needs further study, including more
real network experiment and strict analysis. The implication of self radius, or how to
interpret each self sample, is also an important topic to be explored.
Acknowledgment
This work was sponsored by the National Natural Science Foundation of China
(Grant No.60671049), the Subject Chief Foundation of Harbin (Grant
No.2003AFXXJ013), the Education Department Research Foundation of Heilongji-
ang Province (Grant No. 10541044, 1151G012).
References
1. Patcha, A., Park, J.M.: An overview of anomaly detection techniques: existing solutions and
latest technological trends. Computer Networks 51, 3448–3470 (2007)
2. Hofmeyr, S., Forrest, S.: Architecture for an artificial immune system. IEEE Transactions on
Evolutionary Computation 8, 443–473 (2000)
3. Simon, P.T., Jun, H.: A hybrid artificial immune system and self and self organising map for
network intrusion detection. Information Sciences 178, 3024–3042 (2008)
4. Forrest, S., Perelson, A.S., Allen, L.: Self-non-self discrimination in a computer. In: Proc. of
the IEEE Symposium on Research in Security and Privacy, pp. 202–212 (1994)
5. Dasgupta, D., Gonzalez, F.: An immunity based technique to characterize intrusions in com-
puter network. IEEE Transactions on Evolutionary Computation 69, 281–291 (2002)
6. Boukerche, A., Machado, R.B., Juca, R.L.: An agent based and biological inspired real-time
intrusion detection and security model for computer network operations. Computer Com-
munications 30, 2649–2660 (2007)
7. Ji, Z., Dasgupta, D.: Real-valued negative selection algorithm with variable-sized detectors.
In: Deb, K., et al. (eds.) GECCO 2004, Part I. LNCS, vol. 3102, pp. 287–298. Springer,
Heidelberg (2004)
8. Gonzalez, F., Dasgupta, D., Kozema, D.: Combining negative and classification techniques
for anomaly detection. In: Proc. of the 2002 Congress on Evolutionary Computation,
pp. 705–710 (2002)
A GP Process Mining Approach from a Structural
Perspective
1 Introduction
Today, information about business processes is mostly recorded by enterprise infor-
mation systems such as ERP and workflow management systems in the form of
so-called “event logs” [1]. As in many domains processes are evolving and become
more and more complex, there is a need to understand the actual processes based on
logs. Thus process mining is employed since it automatically analyzes these logs to
extract explicit process models. Currently, most of process mining techniques try to
mine models from the behavior aspect only (i.e., reflect the exact behavior expressed in
logs) while ignore the complexity of mined models. However, as processes nowadays
are becoming more complex, process designers and analysts also consider mining
structurally simple models since complex processes may result in errors, bad under-
standability, defects and exceptions [2]. Therefore, “good” process models are required
to conform to logs and somehow have simple structure to clearly reflect the desired
behavior [3]. Thus to consider both behaviorally and structurally, we propose a genetic
process mining approach coupled with a process complexity metric.
In fact, utilizing evolutionary computation in process mining research is not a new
concept. In 2006, Alves introduced the genetic algorithm (GA) to mine process
models due to its resilience to noisy logs and the ability to produce novel sub-process
∗
Corresponding author.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 121–130, 2009.
© Springer-Verlag Berlin Heidelberg 2009
122 A. Wang et al.
combinations [4]. In this case, an individual is a possible model and the fitness
evaluates how well an individual is able to reproduce the behavior in the log. How-
ever, since the individual was abstracted to the level of a binary string, it had prob-
lems when mining certain processes, especially those exhibiting high level of parallel
execution. Thus, a genetic programming (GP) approach coupled with graph-based
representation was proposed in [5]. Its abstraction of a directed graph structure also
allows greater efficiency in evaluating individual fitness.
However, there are two main problems in these approaches: Firstly some relations
between process tasks are not considered in the individual representation; the other is
that they neglect the complexity of mined models. Thus we improve the GP approach
by extending the individual representation in [4] and define a new part of the fitness
measure that benefits individuals with simpler structure. The new structural fitness is
based on the structuredness metric (SM) proposed in [6], which is a process complexity
metric. Our paper is organized as follows: Section 2 defines the individual representa-
tion and section 3 extends the SM. The GP approach is shown in section 4 with
experiments discussed in section 5. Finally, conclusions are drawn in section 6.
As we can see, without supporting other business logic, previous GA and GP ap-
proaches may cause unexpected errors. For instance, consider the event log shown in
Table 1. It shows three different process instances for job applicants. Note that when
HR screens resumes (A), he/she will decline the resume (E) or arrange examinations
including interview (B) and a written test(C) and score the applicant (D) when his/her
resume is passed. Finally, the result is sent to the applicant (F).
The result model is expected to be Fig. 1(a), where the black rectangle is an implicit
task. However, previous approaches will wrongly generate Fig. 1(b) since the indi-
vidual’s representation in these approaches cannot express the relation of exclusive
disjunction of conjunctions. Therefore, we give an improved version of the causal
matrix that supports other relations, i.e., the extended CM (ECM for short).
A GP Process Mining Approach from a Structural Perspective 123
Fig. 1. Expected model (a) and mined result using previous GA and GP (b)
ECM can be denoted as a quadruple (T, C, I, O), where T is a finite set of activities
(tasks) appeared in workflow logs, C ⊆ T × T is the causality relation and I and O are
the input and output condition functions respectively.
The reason why we choose the tree type in ECM is that it provides the feasibility to
express the relation OR. Moreover, it facilitates some operations such as crossover and
mutation since it allows entire sub-tree structures to be swapped over with ease. Note
that there may be duplicate tasks in sub trees. Similar to original CM, our extended CM
can also deal with the loop structure.
124 A. Wang et al.
The GP approach in this paper tries to mine process models not only from the behavior
perspective but also from the structure one. This is done by defining new partial
structure fitness so as to mine process models from the structural perspective. We need
first introduce a process complexity metric based on ECM. Until now, many com-
plexity metrics of process models have been proposed and we choose the structured-
ness metric (SM)[6] in this paper. The SM focuses on the model structure. Addition-
ally, both the GA mining algorithm and SM have been implemented in the context of
ProM, which is characterized by a pluggable framework for process analysis, thus some
modifications can be made to implement the improved GP.
In this section, we extend the Petri net version of SM in [6] to propose a ECM
based SM. The idea behind this metric stems from the observation that process
models are often structured in terms of the combination of basic patterns such as
sequence, choice, parallelism and iteration [7]. To calculate the complexity, we de-
fine the component as a sub ECM to identify different kinds of structures in the ECM
and then score each structure by giving it some “penalty” value. Finally, the sum of
these values is used to measure the complexity of process models, i.e., the complexity
of the individual in the GA. To understand the idea, some definitions are given
below:
Firstly, some structural properties based on ECM are discussed.
Definition 1. Free choice. An ECM is a free-choice ECM iff for every two tasks t1 and
t2, I(t1) ∩ I(t2)≠null implies I(t1) = I(t2)
Definition 2. State machine. An ECM is a State machine iff for every task t, there exists
no AND/OR in I(t) and O(t).
Definition 3. Marked graph. An ECM is a marked graph iff for every task t, there exists
no XOR in I(t) and O(t).
A component corresponds to a behavioral pattern and is basically an ECM
C (Component) ρ ECM (C )
MAXIMAL SEQUENCE ∑ t ∈T
τ (t )
CHOICE 1.5 ⋅ ∑ t ∈T
τ (t )
WHILE ∑ t ∈{ I ( C ), O ( C )}
τ ( t ) + 2 ⋅ ∑ t ∈ T τ (t )
MAXIMAL MARKED GRAPH 2⋅∑ t ∈T
τ (t ) ⋅ diff (T )
MAXIMAL STATE MACHINE 2⋅∑ t ∈T
τ (t ) ⋅ diff ( P )
MAXIMAL WELL STRUCTURED 2⋅∑ t ∈T
τ (t ) ⋅ diff ( P ) ⋅ diff (T )
otherwise 5⋅∑ t ∈T
τ (t ) ⋅ diff ( P ) ⋅ diff (T )
We build the initial causal matrices in a heuristic way, which tries to determine the
causality relation by utilizing a dependency measure, which aims to ascertain the
strength of the relationship between tasks by calculating the amount of times one task is
directly preceded by another. The measure is also able to determine which tasks are in a
loop.
Once the causality relation is determined, the condition functions I and O are ran-
domly built. i.e., every task t1 that causally precedes a task t2 is randomly inserted as a
leaf in the tree structure, which corresponds to the input condition function of t2. Op-
erators (AND, OR and XOR) are also inserted randomly to construct a tree. A similar
process is done to set the output condition function of a task.
A GP Process Mining Approach from a Structural Perspective 127
Fitness function is used to assess the quality of an individual and this is assessed by:
1) benefiting the individuals that can parse more event traces in logs (the “com-
pleteness” requirement);
2) punishing the individuals that allow for more extra behavior than the one ex-
pressed in logs (the “preciseness” requirement);
3) punishing the individuals that are complex, i.e., individuals with high SM
value(the “uncomplexity” requirement). Equation (1) depicts the fitness function of our
GP-based process mining algorithm, in which L is event logs and ECM is an individual.
The notation ECM[] represents a generation of process models. Three partial fitness
measures are detailed below.
In equation (1), the functions PFcomplete and PFprecise derived from the previous GA
approach are measured from the behavior perspective for completeness and precise-
ness. Since ECM is a general version of CM, the functions are nearly the same as those
in previous GA approach [8].
The function PFcomplete shown in equation (2) is based on the continuous parsing
of all the traces against an individual in event logs. In this equation, all missing tokens
and extra tokens left behind act as a penalty in the fitness calculation. More details can
be found in [8] Obviously, ‘OR’ construct parsing should also be considered.
PFcomplete( L, ECM ) = (2)
allParseTasks ( L, ECM ) − punishment ( L, ECM )
numTasksLog ( L)
In terms of preciseness, the individual should contain “less extra behavior”, i.e., the
individual tends to have less enabled activities. PFprecise in equation (3) provides an
measure of the amount of extra behavior an individual allows for in comparison to other
individuals in the generation [8].
PFprecise ( L, ECM , ECM []) = (3)
allEnabledTasks ( L, ECM )
max(allEnabledTasks ( L, ECM []))
As mentioned before, the GA approach and most other process mining techniques
focus on the behavior in logs but ignore the structure of the actual individual. Thus we
defined the uncomplex fitness in equation (4). It gives the transformation of the
structuredness metric values to the interval [0, 1], where max and min are the maximum
and minimum value of the entire generation of individuals respectively.
128 A. Wang et al.
4.3 GP Operators
Genetic operators such as selection, crossover and mutation are used to generate indi-
viduals of the next generation, and the operators in [8] have been adapted for our GP
approach. In our case, the next population consists of the best individuals and others
generated via crossover and mutation. And parents are selected from a generation by
playing a five individual tournament selection process.
In terms of crossover, a task, which exists in both parents, is selected randomly as
the crossover point. The input and output tree of the task are then split at a randomly
chosen swap point (a set of tree nodes) for each parent. Our crossover algorithm covers
the complete search space defined by ECM. The following is an illustration of the
crossover algorithm, which shows the input trees of two parents for task K.
• Crossover algorithm in our GP process mining approach
parentTask1 = AND(XOR(A,B,C),OR(B,D),E)
parentTask2 = OR(AND(A,B),XOR(C,D),F)
swap1 = XOR(A,B,C),OR(B,D); remainder1 = E
swap2 = AND(A,B),F; remainder2 = XOR(C,D)
If crossover, for each branch in swapset1
If the selected branch shares the same operation with one
of the remainder2 (e.g., XOR(A,B,C) and XOR(C,D) are both
XOR), with the equal probability, select one of the
following three crossover methods:
i) Add it as a new branch in remainder
(e.g., XOR(A,B,C),XOR(C,D))
ii) Join it with an existing branch in remainder2
(e.g., XOR(A,B,C,D))
iii) Add as a new branch, and then select the branch
with the same operation in remainder2 and remove the tasks
that are in common(i.e., the same tasks)
(e.g., XOR(A,B,C),D)
Else select one crossover method conditionally.
If the operation of selected branch is the same as the
root of parentTask2
Add as a new leaf of the root in remainder2 for each
task in selected branch
(e.g., XOR(C,D),B,D)
Else add as a new branch in remainder2
Repeat for the combination swap2 and remainder1
In the crossover code, some examples are shown to make sure you can understand.
After execution, the result child for input2(K) may be one of these:
OR(XOR(A,B,C),XOR(C,D),B,D);OR(XOR(A,B,C,D),B,D); OR(XOR(A,B,C),B,D).
The cycle is repeated for both input and output trees of the selected task.
A GP Process Mining Approach from a Structural Perspective 129
In the mutation, with probability mutation rate, every task in the individual is mu-
tated for its input and output tree. The following program depicts the mutation.
• Mutate algorithm in our GP process mining approach
If mutation, for each task t in the individual (Assuming I(t)
= AND(XOR(A,B,C),OR(B,D),E)), one of the following opera-
tions are done:
i) Choose a branch and add a task to it (randomly chosen
from the complete set of available tasks in the individual),
if the branch is a leaf (i.e., one task), randomly add an
operation on the new branch.
(e.g., XOR(A,B,C),OR(B,D,F),E)
ii) Remove a task from a chosen branch
(e.g., XOR(A,B,C),OR(B,D))
iii) Change the operation for the chosen branch
(e.g., XOR(A,B,C),XOR(B,D),E)
iv) Redistribution the elements in I(t)
(e.g., XOR(B,C),OR(A,B),D,E)
Repeat it for output trees of each task
Both crossover and mutation operators utilize a repair routine executed after an in-
put/output has been changed to make sure that only viable changes are made.
5 Experiments
We have implemented our GP process mining approach in Java and plug it into the
ProM framework. The experiments in this section allow us to validate the approach.
Table 4 shows the parameters being used in the experiments.
Fig. 3. Two mined models (a) using previous GA/GP approach; (b) using our GP approach
130 A. Wang et al.
Fig. 4. Another two mined models (a) using previous GA/GP approach; (b) using our GP
To compare with the previous GA/GP approach, we have tested various workflow
logs. For short, we give two typical comparisons. Fig. 3(b) is the expected model and
our GP approach has successfully mined it while previous approaches produced a
confusing model since our GP approach supports the task relation of disjunction of
conjunctions. Fig. 4 shows two mined results when mining process with short parallel.
And it illustrates that the structurally uncomplex fitness help our GP method outper-
forms when dealing with process models containing short parallel constructs.
6 Conclusions
The advantage of our GP process mining approach is that it can mine process behav-
iorally and structurally. This owes to the novel idea of combining the genetic
programming with the structuredness metric. Additionally, the extended CM for rep-
resenting the individuals provides benefits in crossover and mutation. Thus, our GP
approach outperforms other process mining approach in some aspects.
References
[1] van der Aalst, W.M.P., van Dongen, B.F., Herbst, J., Maruster, L., Schimm, G., Weijters,
A.J.M.M.: Workflow mining: a survey of issues and approaches. Data & Knowledge En-
gineering 47(2), 237–267 (2003)
[2] Cardoso, J.: Control-flow complexity measurement of processes and Weyuker’s properties.
Transactions on Enformatika, Systems Sciences and Engineering 8, 213–218 (2005)
[3] Rozinat, A., van der Aalst, W.M.P.: Conformance Testing: Measuring the Fit and Appro-
priateness of Event Logs and Process Models. In: Bussler, C.J., Haller, A. (eds.) BPM 2005.
LNCS, vol. 3812, pp. 163–176. Springer, Heidelberg (2006)
[4] Alves de Medeiros, A.K., Weijters, A.J.M.M.: Genetic Process Mining. Ph.D Thesis,
Eindhoven Technical University, Eindhoven, The Netherlands (2006)
[5] Turner, C.J., Tiwari, A., Mehnen, J.: A Genetic Programming Approach to Business
Process Mining. In: GECCO 2008, pp. 1307–1314 (2008)
[6] Lassen, K.B., van der Aalst, W.M.P.: Complexity metrics for Workflow nets. Information
and Software Technology 51, 610–626 (2009)
[7] van der Aalst, W.M.P., ter Hofstede, A.H.M., Kiepuszewski, B., Barros, A.P.: Workflow
patterns. Distributed and Parallel Databases 14(1), 5–51 (2003)
[8] Alves de Medeiros, A.K., Weijters, A.J.M.M., van der Aalst, W.M.P.: Genetic process min-
ing: an experimental Evaluation. Journal of Data Mining and Knowledge Discovery 14(2),
245–304 (2007)
Effects of Diversity on Optimality in GA
1 Introduction
Genetic Algorithm (GA) is an Evolutionary Computation (EC) method proposed by
Holland [1] which searches the binary space for solutions that satisfy pre-defined
criteria. GA is a heuristic search method that contains three operators: Selection,
Combination and Mutation. It operates on a collection of individuals (populations). A
population contains individuals which have the potential to differ from each other.
The GA works by identifying individuals which most closely satisfy pre-defined
criteria, called selection. These selected individuals (the fittest) are combined. Each
combination cycle is called a generation. To create new individuals, the combined
individuals must differ. The amount of difference in the population is called its diver-
sity. Combination occurs by swapping parts between the individuals. The combination
methods can vary significantly [2-6]. Goldberg [3] suggested Single Point Crossover
(SPC) and Multi-Point Crossover (MPC) as pattern identifying crossover methods.
Punctuated Crossover [4] is a deterministic crossover method which alters the prob-
ability based on the success of previous crossover points. Random Respectful Cross-
over [5] produces offspring by copying the parts at positions where the parents are
identical and filling the remaining positions with random members from the bit set.
Disrespectful Crossover (DC) [6] is a variant on SPC, where the bits in the section
below the crossover point are inverted if they are different or have a 50% chance of
being in either state if they are the same. DC is essentially a localized high probability
mutation algorithm incorporated into a crossover function. Its purpose is in encourag-
ing search divergence, not search convergence. Reduced Surrogate [4] randomly
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 131–140, 2009.
© Springer-Verlag Berlin Heidelberg 2009
132 G. MacDonald and G. Fang
selects bits which are different as the crossover points for single or double point
crossover. This has the advantage of not performing redundant crossovers and is a
more efficient crossover as search diversity decreases. Hoshi [2] showed that encour-
aging similarly ranked individuals to combine resulted in superior solutions.
If the combination of two different individuals can lead to all individuals in the
search space then it is possible that the globally optimal solution will be found. How-
ever, in GA, like in biology, Combination and Selection are what makes the search
converge, as traits in individuals become more common after successive generations.
It is believed that these traits are what make the individuals ‘fit’. However these
‘traits’ could be sub-optimal. If this is the case, a population will fail to continually
improve and will have reached a locally optimal solution. This means that the GA
may not produce an optimal solution for the problem, i.e., the global optima.
To encourage the population to avoid local optima, individuals are altered in a
probabilistic manner, called mutation [3]. This is modeled on biological mutation,
which is the alteration of individual genes within an organisms DNA. Commonly,
mutation is implemented in GA by probabilistically changing bit states, whereby
every single bit in an individual has a specific probability of changing state. This type
of mutation is also known as De-Jong mutation [7].
The effectiveness of De-Jong mutation has proven to be dependent on the mutation
probability and the effectiveness of a mutation probability has been shown to be prob-
lem dependent [8-13]. The mutation probability affects the probability of an individ-
ual being mutated as well as the number of bits mutated within that individual. A high
mutation probability means that an individual has a high chance of having a large
number of bits altered; this results in a highly random search which mostly ignores
convergence through crossover. A low mutation probability means that an individual
has a low chance of having a small number of bits altered; this creates a search which
is highly reliant on combination.
Searches which ignore combination do not converge. Searches which are entirely
reliant on combination are prone to premature convergence. Mutation alone amounts
to a random search and forever diverges. Mutation when combined with selection and
combination is a parallel, noise tolerant, hill-climbing algorithm [8]. Hill-climbing
algorithms guarantee the location of a local optimum, but will be deceived by a prob-
lem with many local optima. Methods of determining the mutation probability based
on generation [9], population size [10], length [11], crossover success and the rate of
change in fitness have been investigated [12]. A method which prolongs the GA
search time by systematically varying the mutation rate based on population diversity
has significantly improved the final solution optimality [13]. These mutation strate-
gies have improved the GA in a problem independent fashion. However they do not
present any significant improvement to the ideology behind mutation. They do not
address the search algorithm, they primarily present methods of altering one of the
algorithms parameters in order to prolong the search or more reliably disrupt conver-
gence. This is unlikely to make the search more efficient or more robust but rather
just make it more exhaustive. They also do not allow convergence and divergence to
be complimentary.
In this paper a new mutation method is introduced to improve the diversity of
population during the convergence. This method also intends to ensure that the
convergence is not disrupted by the mutation process.
Effects of Diversity on Optimality in GA 133
The remainder of the paper is organized as follows: section 2 details the methodol-
ogy used for maintain the diversity while ensuring convergence. The numerical
results are shown in section 3. The conclusions are then given in section 4.
n n
D= − −d . (2)
2 2
Mutation traditionally requires the specifying and tuning of the rate at which mutation
affects the population. The mutation rate for the CDM and the implemented version
of RI was determined by the number of repeated individuals in the population. There-
fore, this method requires the comparison of all individuals in the population.
The computational cost of this comparison is:
(N 2 / 2 − N ) × L . (3)
3 Results
The GA testbed used ranking selection without elitism and was tested on two popula-
tion sizes, 50 and 100, over 20 generations. The tests were conducted using both sin-
gle point and dual point crossover. The mutation methods were tested on six (6)
benchmark functions with various shapes: Two Multi-Modal functions, two single
Effects of Diversity on Optimality in GA 135
optima functions and two flat functions. The functions were tested over 3 to 5 dimen-
sions with 6 to 12 Bit resolution per dimension. The mutation methods were com-
pared with respect to diversity, and the final fitness. The results were averaged over
100 tests.
The following six (6) bench mark functions [16] are used in this paper to evaluate the
effectiveness of the proposed CDM. They are:
Two single-optimum functions:
4. Griewank’s Function:
n n
⎛ xi ⎞
∑ x − ∏ cos⎜⎜⎝
1
f ( x) = 2
i ⎟⎟ + 1 . (7)
4000 i =1 i =1 i⎠
6. Rosenbrock’s Function:
n
f ( x) = ∑ (100( x
i =1
i +1 − xi2 ) 2 + ( xi − 1) 2 ) . (9)
The GA results, in terms of the mean final fitness, are shown in Tables 1 – 6 of the
above six benchmark functions. The results were obtained by using different mutation
methods – the proposed CDM method, the completely random RI method and De-
Jong method with 0.001, 0.002, 0.005 and 0 probability mutation rates.
Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/6-Bit 0.38 0.86 1.21 0.62 0.16 1.89
Dual point/ 6-Bit 1.56 3.24 2.86 2.14 0.92 4.16
Single point/ 12Bit 0.3 0.39 0.54 0.36 0.07 0.9
Dual point/ 12Bit 1.58 2.11 1.79 1.26 0.5 2.56
Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/ 6-Bit 0.1 0.18 0.24 0.12 0.01 0.62
Dual point/ 6-Bit 0.07 0.08 0.07 0.04 0.01 0.2
Single point/ 12Bit 0.81 1.77 1.31 0.77 0.1 3.22
Dual point/ 12Bit 0.66 0.9 0.65 0.27 0.03 1.43
Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/ 6-Bit 1.66 2.34 4.56 4.15 4.04 4.89
Dual point/ 6-Bit 1.59 1.89 2.98 3.04 2.69 3.3
Single point/ 12Bit 6.55 9.76 10.56 9.74 7.38 15.81
Dual point/ 12Bit 5.34 6.06 5.91 5.76 4.72 7.27
Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/ 6-Bit 0.04 0.05 0.07 0.05 0.03 0.11
Dual point/ 6-Bit 0.04 0.03 0.05 0.04 0.03 0.07
Single point/ 12Bit 0.08 0.09 0.09 0.07 0.05 0.16
Dual point/ 12Bit 0.06 0.06 0.07 0.05 0.04 0.09
Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/6-Bit 8.9 29.6 54.3 35.4 19.4 115
Dual point/6-Bit 11.1 17.9 22.7 16.4 10.6 42
Single point/12Bit 294 448 430 136 55.1 1419
Dual point/12Bit 165 257 149 84.2 33.8 305
Effects of Diversity on Optimality in GA 137
Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/6-Bit 0.01 0.01 0.07 0.06 0.05 0.09
Dual point/6-Bit 0.01 0.01 0.05 0.05 0.04 0.05
Single point/12Bit 0.01 0.03 0.1 0.06 0.03 0.17
Dual point/12Bit 0.03 0.02 0.05 0.03 0.04 0.11
From these results it can be seen that the RI produces worst results than the CDM
methods. This is expected as the CDM is designed to encourage diversity whilst still
maintaining population convergence.
The more interesting aspects of the results are observed in the fitness results for the
two single-optima functions (Tables 1-2) of CDM in comparison with the De-Jong
methods. It seems that the CDM is not generating better results than the normal prob-
abilistic mutation methods. It is believed that this is due to the fact that single optima
solutions can be found using hill-climbing methods that the De-Jong’s method falls
into. Therefore, the normal mutation method will generate acceptable results.
The CDM in general will generate better or similar results for the multi-optima func-
tions (Tables 3-4). In these cases, as the CDM is designed to encourage diversity, it is
likely to generate better, or at least no worse results than the other mutation methods.
In dealing with the flatter-optima functions (Tables 5-6), it can be seen that the
CDM’s final results are compatible to those from the normal mutation methods. This
is expected as on the flatter surface diversity will not have a significant impact on
diversity outcomes.
It is also noted in the computation that the CDM in general performs better in deal-
ing with simpler cases, i.e., with single cross-over point and with shorter bit length in
population. CDM performed relatively worse with dual point crossover (DPC) be-
cause DPC is better than single point crossover (SPC) in maintaining population di-
versity. It is believed that CDM performed relatively worse when tested with longer
individuals because the search space became too large for the given population size to
be effective.
During the computation, it is noted that the diversity varies significantly with different
mutation rates and mutation methods. It was also found that it varies between bench-
mark functions. However, the most significant effect on diversity came with search
space size, which is a combination of parameter number and parameter resolution.
Shown in Fig. 1 is the average population hamming distances for all tests in 6-Bit
and 12-Bit parameter resolutions.
It is clear that there are marked differences in search space sizes between 6-Bit and
12-Bit parameter resolutions. The average distance between all solutions at the begin-
ning for a 6-Bit 5-dimensional case is 12 Bits while it is 25 Bits for the 12-Bit case.
In all tests the diversity measurement was made after the crossover. This means
that only the useful diversity created through the mutation method is recorded, i.e.,
when the mutation method creates individuals with above average fitness and a larger
bit difference, then it will improve the diversity of the next generation. If mutation
does not create fitter individuals with a larger distance from the other relatively fit
individuals then it will not improve the next generation’s diversity.
138 G. MacDonald and G. Fang
Figure 1 shows the generational diversity summaries for the 6-Bit and 12-Bit tests.
In the figure, CDM refers to Coherent Diversity Maintenance mutation. RI is for Ran-
dom Immigrants mutation and DJ .001 – DJ .005 refer to the De-Jong Mutation with
the stated probability.
It can be seen from the figure that the diversity levels of CDM and RI are similar
for both parameter resolutions. However CDM consistently creates higher population
diversity during the mid generations, whilst tapering off at the final generations. The
lower diversity relative to RI in the final generations is to be expected as CDM only
creates individuals which are coherent with the converging search space whereas RI
disregards the convergence of the search space.
RI and CDM encourage roughly equivalent diversity levels for all tests but pro-
duce significantly different final fitness results, thus one can conclude that there is an
inconsistent link between diversity and superior convergence. As RI encourages new
individuals to come from the entire search space and CDM encourages new indi-
viduals to come from the converging search space one can conclude that specific
diversity gives better results than unspecific diversity. This occurs because as con-
vergence progresses, CDM is able to provide crossover with a more representative
sample of the converging space which enables it to better select the next generation
of individuals.
Effects of Diversity on Optimality in GA 139
4 Conclusions
In this paper a new mutation scheme, named Coherent Diversity Maintenance (CDM),
is introduced. This method is designed to encourage diversity in the GA population by
maintaining the convergence trend. It is believed that this method will explore the
converging parameter space more fully therefore resulting in better convergence re-
sults. Numerical results have shown that the CDM tends to generate better, or at least
no worse, results for multi-optima or flatter optima situations than the normal prob-
abilistic mutation method and complete random mutation method.
References
1. Holland, J.H.: Adaption in Natural and Artificial Systems: An Introductory Analysis with
Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge (1992)
2. Chakraborty, G., Hoshi, K.: Rank Based Crossover – A new technique to improve the
speed and quality of convergence in GA. In: Proceedings of the 1999 Congress on Evolu-
tionary Computation, CEC 1999, vol. 2, p. 1602 (1999)
3. GoldBerg, D.E.: Genetic Algorithms in search, optimization and machine learning. Addi-
son-Wesley Publishing Company, Reading (1989)
4. Dumitrescu, D., Lazzerini, B., Jain, L.C., Dumitrescu, A.: Evolutionary Computation.
CRC Press, Boca Raton (2000)
5. Radcliff, N.J.: Forma Analysis and Random Respectful Recombination. In: Proceedings of
the fourth International Conference on Genetic Algorithms (1991)
6. Watson, R.A., Pollack, J.B.: Recombination Without Respect: Schema Combination and
Disruption in Genetic Algorithm Crossover. In: Proceedings of the 2000 Genetic and Evo-
lutionary Computation Conference (2000)
7. De Jong, K.A.: Analysis of the Behaviour of a class of Genetic Adaptive Systems. Techni-
cal Report, The University of Michigan (1975)
8. Hoffmann, J.P.: Simultaneous Inductive and Deductive Modeling of Ecological Systems
via Evolutionary Computation and Information Theory. Simulation 82(7), 429–450 (2006)
9. Fogarty, T.C.: Varying the Probability of mutation in the Genetic Algorithm. In: The Pro-
ceedings of the Third International Conference on Genetic Algorithms 1989, pp. 104–109
(1989)
10. Hesser, J., Manner, R.: Towards an optimal mutation probability for Genetic Algorithms.
In: Schwefel, H.-P., Männer, R. (eds.) PPSN 1990. LNCS, vol. 496, pp. 23–32. Springer,
Heidelberg (1991)
11. Back, T.: Optimal Mutation Rates in Genetic Search. In: Proceedings of the Fifth Interna-
tional Conference on Genetic Algorithms, June 1993, pp. 2–8 (1993)
12. Davis, L.: Adapting Operator Probabilities in Genetic Algorithms. In: Proceedings of the
Third International Conference on Genetic Algorithms, pp. 61–69 (1989)
13. Ursem, R.K.: Diversity-Guided Evolutionary Algorithms. In: Guervós, J.J.M., Adamidis,
P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS,
vol. 2439, pp. 462–471. Springer, Heidelberg (2002)
14. Janikow, C.Z., Michalewicz, Z.: An experimental comparison of binary and floating point
representations in genetic algorithms. In: Belew, R.K., Booker, J.B. (eds.) Proceedings of
the Fourth International Conference Genetic Algorithms, pp. 31–36. Morgan Kaufmann,
San Mateo (1991)
140 G. MacDonald and G. Fang
15. Gasieniec, L., Jansson, J., Lingas, A.: Efficient Approximation Algorithms for the Ham-
ming Centre Problem. Journal of Discrete Algorithms 2(2), 289–301 (2004)
16. Kwok, N.M., Ha, Q., Liu, D.K., Fang, G., Tan, K.C.: Efficient Particle Swarm Optimiza-
tion: A Termination Condition Based on the Decision-making Approach. In: 2007 IEEE
Congress on Evolutionary Computation, Singapore (2007)
Dynamic Crossover and Mutation Genetic Algorithm
Based on Expansion Sampling
Abstract. The traditional genetic algorithm gets in local optimum easily, and its
convergence rate is not satisfactory. So this paper proposed an improvement,
using dynamic cross and mutation rate cooperate with expansion sampling to
solve these two problems. The expansion sampling means the new individuals
must compete with the old generation when create new generation, as a result,
the excellent half ones are selected into the next generation. Whereafter several
experiments were performed to compare the proposed method with some other
improvements. The results are satisfactory. The experiment results show that
the proposed method is better than other improvements at both precision and
convergence rate.
1 Introduction
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 141–149, 2009.
© Springer-Verlag Berlin Heidelberg 2009
142 M. Dong and Y. Wu
Traditional genetic algorithm uses the fixed cross-rate operator. Because the cross-
rate of individuals is the same, it will make all of the contemporary individuals in the
cross-operation retained at the same probability, thus the current better individuals
will be selected several times at the choice operation of the next round, and the rela-
tively poor individuals in current generation will be eliminated, leading the population
quickly evolving to the direction of the current optimal individual. If the current op-
timal individual is a local optimum, then the entire algorithm can easily fall into local
optimum. To avoid this situation and increase the diversity of the population, this
paper presents a dynamic cross-rate, namely using the ratio between the Euclidean
distance of two chromosomes and the Euclidean distance of the largest and the small-
est fitness individual in population as cross-rate:
| f (a ) − f (b) |
Pc = (1)
max( f ) − min( f )
where f(a) is a chromosome’s fitness, f(b) for b chromosome’s fitness, max(f) and
min(f) respectively are the largest and the smallest fitness in the population, Pc is the
crossover probability. Such cross-rate will make the individuals in the middle of
population retained at a greater chance, and the individuals at both ends of population
a greater probably cross, so that the better individuals and the poorer individuals are
both changed to avoid too intense competition in the choice operation of next round.
The poor individuals’ genes can also contribute to the development of the population.
The principle of this improvement is a simple probability problem of average dis-
tribution, in which, the crossover probability of a subject to the average distribution of
f(b). To illustrate this principle, we assume that max(f) for 2, min(f) for 0, f(b) average
distributes in the interval [0,2]. In this case, if an individual a1 is in the middle of the
interval, that is, its f(a1) is 1, then its cross-rate with b is 1/4, and if an individual a2 at
both ends of the interval, that is, its f(a2) is 0 or 2, then its cross-rate with b is 1/2,
larger than individuals in the middle. The distribution is shown in Figure 1 below
(x-axis is f(a) values, y-axis is crossover rate):
In addition, this algorithm’s dynamic crossover probability basically is effective
crossover probability (we call crossover between two long distance individuals the
effective cross-operation. Because of their relatively big difference between each
other, cross-operation changes them comparatively large. On the other side, the
cross-operation between very close chromosomes changes individuals very small,
the individuals after cross-operation almost unchanged, such cross-operation is inef-
fective),which can improve the effect of cross-operation and avoid inbreeding and
premature convergence effectively. Therefore, although the cross probability of this
paper is lower than other genetic algorithms, the overall effect is better. And the
lower cross-rate will reduce the computational complexity and accelerate the speed
of evolution.
Dynamic Crossover and Mutation Genetic Algorithm 143
f (a ) 2
pm = k *( ) 0 < k <1 (2)
max( f )
Where f(a) is the fitness of an individual a, max(f) is the largest fitness, Pm is the
mutation probability, k is a parameter, valid in interval (0,1). This mutation rate
allows the better individuals have the bigger probability in order to avoid better indi-
viduals rapidly occupying the entire population, lead the population evolving to the
direction of diversification. The distribution of mutation rate is shown in Figure 2
below (x-axis is f(a) values, y-axis is mutation rate):
At the same time, this paper uses the expansive optimal sampling, namely puts the
new individuals with the previous generation together and chooses the best half indi-
viduals into the next-generation. Traditional genetic algorithm puts the new individu-
als directly into the next generation, which will make the older better individuals
which have been crossed or mutated no chance accessing to future generations, thus
slows down the convergence rate of the population. And the expansive optimal sam-
pling makes those better individuals enter the next generation, thus rapidly accelerates
convergence speed.
The traditional genetic algorithm uses fixed mutation rate in computation. As a re-
sult the algorithm is hard to jump out of local optimum because of the relatively small
mutation probability when the algorithm gets in local optimum. And sometimes even
jumped out, there would be useless because they are not selected at the next choice
operation. To avoid this status, this paper combines the dynamic mutation rate with
144 M. Dong and Y. Wu
the expansive optimal sampling. Dynamic mutation rate makes algorithm more easily
jumping out of local optimum when the algorithm get in a local optimum, and the
expansive optimal sampling ensures this mutated and excellent individuals can be
selected into future generations, thereby greatly reducing the probability of getting in
local optimum. If the evolution enough, in theory, the genetic algorithm would not get
in a local optimum assuredly.
4 Algorithm Processes
The procedures of the proposed method are as follows:
1) Initialize the control parameters: set the population size N, the evolution genera-
tion g;
2) g = 0, randomly generate a population of N individuals Po = (x1, x2, ...);
3) Determine the fitness function, calculate the value of individual fitness f(xi), (i =
1,2, ..., N), to evaluate the individuals in population;
4) Determine the choice strategy, the algorithm population in this paper mainly
uses the method of roulette, select the advantage individuals from the individuals Pg
of generation g to form the mating pool;
5) Determine the dynamic crossover operator. Determine cross-rate of each pair of
individuals through the P c = | f ( a ) − f ( b ) | /(m ax( f ) − m in( f )) , and then
proceed cross-operation to produce the middle results;
6) Determine the dynamic mutation operator. Use the P m = k * ( f ( a ) / m ax( f ))
2
5 Simulation Experiments
5.1 Experiment 1
In this paper, the first experiment uses typical complex optimization function as
follows:
y = x *sin(10* pi * x) + 2 x ∈ [ −1, 2] (3)
We use improved mutation operator genetic algorithm (IMGA) in paper [7] and fuzzy
adaptive genetic algorithm (FAGA) in paper [2] to compare with this paper’s dynamic
crossover and mutation genetic algorithm based on expansion sampling (ESDGA), simu-
lating in Matlab7. We change the parameters of genetic algorithm in experiment in order
to observe the performance of several algorithms better. In experiment N is the popula-
tion size, Pc is the crossover probability. In improved mutation operator genetic algo-
rithm the gene discerption proportion L=0.6, the various parameters: a1 = 0.01, b1 =
0.01, a2 = O.01, b2 = 0.05, α = 0.03, g0 = 40. The evolution generation is 500, each kind
of parameter repeat 50 times. Min and Max are the minimum and maximum optimal
solution in 50 times, Avg is the average, Gen is the convergence generations, local is the
number of local optimum solutions. The experiment’s results are shown in Table 1, Table
2 and Table 3 (Pc are useless in FAGA and ESDGA, k is used only in ESDGA):
From the tables it is clear that IMGA needs about 130 iterations to converge, with
an average of about 12% local optimum. FAGA needs an average of about 20 itera-
tions to converge, 10% local optimum. ESDGA needs only about 12 iterations to
converge, and there is no local optimum in total 200 results, better than the two algo-
rithms. Also need to point out that for this paper’s algorithm, we have a lot of experi-
ments to select the k value, the table only lists representative values, as proved, the
effect of the algorithm are better when k above 0.3.
The speed and accuracy is a contradiction in genetic algorithms. If the algorithm
requires high speed, only evolving a few generations, the accuracy of the results will
be not satisfactory, and most likely have not convergence or convergence in local
optimum. If the algorithm continues the evolution infinitely, it must converge to the
global optimum finally, if there is a global optimum. In this paper, we use the latter
situation to compare the performance of the proposed algorithm affected by different
k values in Experiment 1. This is to say, Algorithm will evolve infinitely until the
population converges to the global optimum. We set the population size N=100 and
choose different k in the valid interval (0, 1). The result is shown in Figure 3 below
(x-axis is k values, y-axis is convergence generations):
From the table we can clearly see that, the bigger k the better performance of this
algorithm. 0.3 is a dividing line, and there is not large difference above 0.6. This re-
sult subjects to the theory discussed earlier.
5.2 Experiment 2
In the function: a = 3.0, b = 0.05; maximum value of the function is 3600 when x = 0,
y = 0. And there are four local maximum points (-5.12,5.12), (-5.12, -5.12) ,
(5.12,5.12), (5.12, -5.12), function value is 2748.78.
This is a typical GA deceptive problem. Here we still use improved mutation op-
erator genetic algorithm (IMGA) in paper [7] and fuzzy adaptive genetic algorithm
(FAGA) in the paper [2] to compare with this paper’s dynamic crossover and muta-
tion genetic algorithm based on expansion sampling (ESDGA) in Matlab7. The evolu-
tion generation is also 500, each parameter repeat 50 times. Min and Max are the
minimum and maximum optimal solution in 50 times, Avg is the average, Gen is the
convergence generations, local is the number of local optimum solutions. The ex-
periment results are shown in Table 4, Table 5, Table 6 (Pc are useless in FAGA and
ESDGA, k is used only in ESDGA):
As can be seen from the tables, IMGA and FAGA appear a few local optimums, in
which FAGA need about 50 iterations to converge, IMGA needs about 270 iterations
to converge. However ESDGA did not lead to local optimum, and the convergence
rate is also very fast, basically as long as 20 on average, better than IMGA and
FAGA.
Then we compare the performance of the proposed algorithm affected by different
k values in Experiment 2 as same as Experiment 1. We also set the population size
N=100 and choose different k in the valid interval (0, 1). The result is shown in
Figure 4 below (x-axis is k values, y-axis is convergence generations):
The conclusion is similar with Experiment 1 though there are a few differences, the
bigger k the better performance of this algorithm.
6 Conclusion
Simple genetic algorithm has defects of slow convergence speed and getting in local
optimum easily, making the application of genetic algorithm being limited. To solve
the two largest defects, this paper proposed an improved method. As we can see from
the experiment results, this paper’s dynamic crossover and mutation genetic algorithm
Dynamic Crossover and Mutation Genetic Algorithm 149
References
1. Kalyanmoy, D., Karthik, S., Tatsuya, O.: Self-adaptive simulated binary crossover for real-
parameter optimization. In: Genetic and Evolutionary Computation Conference, pp. 1187–
1194 (2007)
2. Huang, Y.P., Chang, Y.T., Sandnes, F.E.: Using Fuzzy Adaptive Genetic Algorithm for
Function Optimization. In: Annual meeting of the North American Fuzzy Information Proc-
essing Society, June 3-6, pp. 484–489. IEEE, Los Alamitos (2006)
3. Zhong, W.C., Liu, J., Xue, M.Z., Jiao, L.C.: A Multi-agent Genetic Algorithm for Global
Numerical Optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part
B 34(2), 1128–1141 (2004)
4. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multi-objective genetic
algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197
(2002)
5. Xiang, Z.Y., Liu, Z.C.: Genetic algorithm based on fully adaptive strategy. Journal of Cen-
tral South Forestry University 27(5), 136–139 (2007)
6. Li, Y.Y., Jiao, L.C.: Quantum clone genetic algorithm. Computer Science 34(11), 147–149
(2007)
7. Li, L.M., Wen, G.R., Wang, S.C., Liu, H.M.: Independent component analysis algorithm
based on improved genetic algorithm. Journal of System Simulation 20(21), 5911–5916
(2008)
Multidisciplinary Optimization of Airborne Radome
Using Genetic Algorithm
1 Introduction
Airborne radomes are often used to protect antennas from a variety of environmental
and aerodynamic effects. The design of a high performance airborne radome is a
challenging task as the aerodynamic, electromagnetism, structural mechanics, etc
requirements are generally involved. The performances of the radome in different
disciplines are usually in conflict with each other. A thin and light-weighted radome
with excellent electromagnetic transmission ability is apparently structurally unreli-
able, while a well-designed radome structure with high stiffness and stability will
affirmatively have a poor electromagnetic performance. The radome design is hence a
multidisciplinary procedure because the analyses of different disciplines are tightly
coupled. However, the traditional engineering design approach separates the design
procedure into sequential stages. A design failure at certain stage would cause the
whole design to start from scratch. This will result in a tremendous cost of design
cycle and resources.
The great improvement of the optimization theory and algorithms in the past several
decades has provided an efficient solution for radome design. The optimization procedure
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 150–158, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Multidisciplinary Optimization of Airborne Radome Using Genetic Algorithm 151
can simultaneously take into account the radome characteristics in different disciplines and
searches for an optimal design with all design requirements to be well-balanced.
In the earlier researches, the techniques of simulating the electromagnetic charac-
teristics of radomes were well developed, which can be classified into two kinds: (1)
high-frequency methods, such as the Ray Tracing (RT) [1] technique based on Geo-
metric Optics (GO), Aperture Integration-Surface Integration (AI-SI) [2] and Plane
Wave Spectrum-Surface Integration (PWS-SI) [3] based on Physical Optics (PO); (2)
low-frequency methods, such as the Finite Element Method (FEM) [4] and Method of
Moment (MoM) [5]. Generally, the high-frequency methods are more suitable for
electrically large problems because of high computational efficiency, while the low-
frequency methods can provide more accurate analysis with much higher computa-
tional complexity. Since the iteration strategy with large computing cost is used in the
optimization methods, the high-frequency methods are superior to low-frequency
methods for electromagnetic analysis of the radome optimization.
Even though the researches on radome optimization were pioneered early by using a
simulated annealing technique [6], it is until recently that more advanced optimization
algorithms have been used and promoted the development of the radome design. For
example, the particle swarm optimization and the Genetic Algorithm are applied for the
radome optimization. The layered thickness of the sandwich radome wall and the shape
of the radome are optimized to maximize the overall transmission coefficient for the
entire bandwidth [7] or to minimize the boresight error [8]. A multi-objective optimiza-
tion procedure was further proposed to optimize the boresight error and power transmit-
tance simultaneously using the genetic algorithm and RT method [9]. However, these
researches on radome optimization are limited to the electromagnetic characteristics.
Recently, the Multidisciplinary Radome Optimization System (MROS) [10] is proposed
as a synthesis procedure, in which the material selection, structural analysis, probabilis-
tic fracture analysis and electromagnetic analysis are incorporated to perform the mul-
tidisciplinary optimization. This work indicates a new trend of radome optimization
which will be more applicable for practical engineering designs.
In this paper, the structural and electromagnetic characteristics are considered si-
multaneously to perform the radome design. A multidisciplinary optimization proce-
dure is developed based on the finite element model of the radome. The structural
analysis and electromagnetic analysis are carried out to obtain the structural failure
indexes and the overall transmission coefficient, respectively. The genetic algorithm
is employed for the optimization.
Normally, the Finite Element Method (FEA) and the Physical Optics (PO) method are
the preferred approaches used for the structural analysis and electromagnetic analysis
of radome, respectively. However, in the traditional design scheme illustrated in
Fig.1, these analyses, as well as the procedure of modeling and postprocessing are
implemented separately. There is no collaboration or communion between different
analysis procedures, which can no longer match the requirements of rapid design
cycle for the modern products.
152 X. Tang, W. Zhang, and J. Zhu
Finite element
model
Electromagnetic
Finite element model
analysis
Physical optics
analysis
Analysis result
postprocessing
Because of requirements of the low dielectric constant and high mechanical strength,
the airborne radome is usually manufactured of sandwich laminate. The choice of mate-
rials and wall thickness has a significant influence on the structural strength and electri-
cal performance. The idea in this paper is to find the proper configuration of these
parameters to satisfy the design requirements. The validity of the design can be verified
with the structural analysis and electromagnetic analysis of the radome respectively.
Multidisciplinary Optimization of Airborne Radome Using Genetic Algorithm 153
The basic issue in radome structural analysis is the material definition. This is very
difficult because multilayer thin walled configurations including the A sandwich, B
sandwich, and C sandwich are often used as illustrated in Fig.3. Taking A sandwich
for example, it consists of three layers: two dense high-strength skins separated by a
lower-density, lower-dielectric core material such as foam or honeycomb. This con-
figuration can provide much higher strength for a given weight than a monolithic wall
of homogeneous dielectric material. The construction of the rest two kinds of wall
configuration can be concluded accordingly. In our demonstrative application, we use
A sandwich as the wall configuration which is most frequently used.
Another important aspect of the radome structural analysis is the structural loads
and constraints. For common airborne radome (non high-speed radomes), the aerody-
namic load due to airflow is the main cause of mechanical stress. The radome is
tightly attached to the airframe with special bolts.
The radome structural analysis can performed in Patran/Nastran. The A sandwich
configuration is modeled in Patran Laminate Modeler by treating the sandwich layer
as an orthotropic layer with the equivalent material properties. The structural validity
of the radome can be assessed by the failure analysis of composite materials. Fre-
quently used failure criterions include the Hill, Hoffman, Tsai-Wu, Maximum Stress
and Maximum Strain.
Based on the equivalence principle, the approach of PWS-SI method can be divided
into three steps: 1) a transmitting antenna is considered and the fields transmitted to
the radome interior S1 are computed by the Plane Wave Spectrum method; 2) Trans-
mission through the radome wall is then calculated by the transmission line analogy to
obtain the fields over the radome outer surface S 2 ; 3) Radiation to the far field is then
determined by the surface integration of the equivalent currents obtained from the
tangential field components over the outer surface S 2 .
The antenna-radome system model is illustrated in Fig.4. According to the planar
slab approximation technique, the radome wall is treated as being locally planar and
modeled as an infinite planar slab with complex transmission coefficients, lying in the
tangent plane at the incident points of the radiation rays.
154 X. Tang, W. Zhang, and J. Zhu
S2
S1
J
n
M Et , H t
S2
S1
E i , H iθ
Er ,H r
According to PWS technique, the transmitted electric field at the near-filed point
P can be calculated by
1
( ) A( k x , k y ) (1)
2
where ( x, y , z ) are the coordinates of P in the antenna coordinate system,
k0 = ik x + jk y + kk z is the radial wave number and r = ix + jy + kz is the vector of point
P. A( k x , k y ) is the so-called plane wave spectrum or the Fourier transform of the
antenna aperture, which can be formulated as
1
∫∫ E ( x, y,0)e
j ( kx x + k y y )
A( k x , k y ) = dξ dη (2)
2π S
If the antenna aperture is circularly symmetric, the surface integral (2) will regress to
a linear integral. Thus the computational complexity will be greatly reduced.
The transmission coefficients for perpendicular polarization T⊥ (θ P ) and parallel
polarization T// (θ P ) of the multilayered radome wall can be determined by the trans-
mission line analogy [11]. Thus the fields over the radome outer surface can be
formulated by
t i i
M BM BM T ( P ) BM t BM T// ( M )
t i i
(3)
M BM BM T// ( P ) BM t BM T ( M )
where
Re( E × H ∗ )
nBP = PM × nRM , t BP = nBP × nRM , PM ( x , y , z ) =
Re( E × H ∗ )
Thus, the electric far field can be calculated with a surface integral technique by
1
t
M 1 ( t
M ) e jkr R1
dS2 (4)
S2
Multidisciplinary Optimization of Airborne Radome Using Genetic Algorithm 155
where R1 is the unit vector of the observation direction, r ′ is the unit vector of the
incidence point on the outer surface S 2 and n is the normal vector of the outer sur-
face S 2 on the incidence point.
With the far field distribution of the antenna-radome system, the electromagnetic
performance of the radome, such as the transmission ratio, boresight error, etc.
ti ,min ≤ ti ≤ ti ,max
where ti denotes the thickness of the ith layer, Tran denotes the overall transmission
and f i , j is the failure index of the ith layer of the jth element.
156 X. Tang, W. Zhang, and J. Zhu
4 Numerical Results
Iterations
Design Variables t1 t2 t3 t4 t5 t6 t7
Lower limit 0.1 0.1 0.1 1 0.1 0.1 0.1
Upper limit 0.5 0.5 0.5 10 0.5 0.5 0.5
Initial value 0.2 0.2 0.2 8.8 0.2 0.2 0.2
Optimal value 0.1 0.1 0.1 9.96 0.1 0.1 0.1
With Radome
Without radome
E field in dB
5 Conclusion
This paper proposes a multidisciplinary optimization scheme for airborne radome
design. The design procedure considers the structural and electromagnetic performance
of the radome simultaneously. Genetic Algorithm is employed for the optimization to
158 X. Tang, W. Zhang, and J. Zhu
maximize the overall transmission coefficient under constraints of the structural failure
of the radome material. The optimization scheme is successfully validated by the de-
sign optimization of a paraboloidal radome.
This work provides the new trend of radome optimization which will be more effi-
cient and applicable for the radome design engineering. Even though the results of the
demonstration are primitive, the proposed optimization scheme and the solution ap-
proach can be easily extended to more complicated applications.
References
1. Tricoles, G.: Radiation Patterns and Boresight Error of a Microwave Antenna Enclosed in
an Axially Symmetric Dielectric Shell. J. Opt. Soc. Am. 54, 1094–1101 (1964)
2. Paris, D.T.: Computer-Aided Radome Analysis. IEEE. Trans. on AP 18, 7–15 (1970)
3. Wu, D.C.F., Rudduck, R.C.: Wave Spectrum Surface Integration Technique for Radome
Analysis. IEEE Trans. on AP 22, 497–500 (1974)
4. Povinelli, M.J., Angelo, J.D.: Finite Element Analysis of Large Wavelength Antenna Ra-
dome Problems for Leading Edge and Radar Phased Arrays. IEEE Trans. on Magnetics 27,
4299–4302 (1991)
5. Chang, D.C.: A Comparison of Computed and Measured Transmission Data for the
AGM-88 Harm Radome. Master’s thesis, Naval Postgraduate School, Monterey, California,
AD-A274868 (1993)
6. Chan, K.K., Chang, P.R., Hsu, F.: Radome design by simulated annealing technique. In:
IEEE International Symposium of Antennas and Propagation Society, pp. 1401–1404.
IEEE Press, New York (1992)
7. Chiba, H., Inasawa, Y., Miyashita, H., Konishi, Y.: Optimal radome design with particle
swarm optimization. In: IEEE International Symposium of Antennas and Propagation So-
ciety, pp. 1–4. IEEE Press, New York (2008)
8. Meng, H., Dou, W., Yin, K.: Optimization of Radome Boresight Error Using Genetic Al-
gorithm. In: 2008 China-Japan Joint Microwave Conference, pp. 27–30. IEEE Press, New
York (2008)
9. Meng, H., Dou, W.: Multi-objective optimization of radome performance with the struc-
ture of local uniform thickness. IEICE Electronics Express 5, 882–887 (2008)
10. Baker, M.L., Roughen, K.M.: Structural Optimization with Probabilistic Fracture Con-
straints in the Multidisciplinary Radome Optimization System (MROS). In: 48th
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Confer-
ence, Honolulu, Hawaii (2007); AIAA-2007-2311
11. Ishimaru, A.: Electromagnetic wave propagation, radiation, and scattering. Prentice Hall,
Englewood Cliffs (1991)
12. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press,
Michigan (1975)
13. Riolo, R., Soule, T., Worzel, B.: Genetic programming theory and practice IV. Springer,
New York (2007)
Global Similarity and Local Variance in Human Gene
Coexpression Networks
1 Introduction
In the last few years, gene coexpression networks attracts attention of many re-
searches[1,2]. According to previous studies, these networks are considered as a
graph where each node represents a gene, and edges represent statistically high rela-
tionship between genes. The interaction between two genes in a gene network does
not necessarily imply a physical interaction[4]. For the study presented here, we per-
formed a comparative analysis of whole-genome gene expression variation in 210
unrelated HapMap individuals[7] to assess the extent of expression divergence be-
tween 4 human populations and to explore the connection between the variation of
gene expression and function. Gene coexpression network could be constructed by
using the GeneChip expression profiles data in NCBI GEO (Gene Expression Omni-
bus repository, database of gene expression data). [3]
∗
Corresponding authors.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 159–166, 2009.
© Springer-Verlag Berlin Heidelberg 2009
160 I. Krivosheev et al.
Expression profiles of human gene pairs were compared in order to evaluate the di-
vergence of human gene expression patterns. A total of 47,294 human transcripts for
every population were considered. All-against-all gene expression profile compari-
sons for human populations’ matrices (47294*60 CEU, 47294*45 CHB, 47294*45
JPT, and 47294*60 YRI) were used to generate population-specific coexpression
networks. For coexpression networks, nodes correspond to genes, and edges link two
genes from the same population if their expression profiles are considered sufficiently
similar.
Population
Parameter
CEU CHB JPT YRI
Clustering coeffi-
0.350 0.272 0.304 0.320
cient
Network diameter 16 19 24 26
Network centrali-
0.047 0.032 0.066 0.042
zation
Average degree 4.576 10.721 16.46 13.781
As described above, the human population gene coexpression networks are closely
similar in terms of their global topological characteristics; they share similar node
degree (k) distributions and C(k) distributions as well as similar average node degrees
(<k>), clustering coefficients (<C>) and path lengths (<l>). Other parameters related
to neighborhood, such as network density, network centralization and network het-
erogeneity are closely similar.
We further sought to evaluate the similarity between the population-specific coex-
pression networks at a local level. There is as yet no general method for assessing
local network similarity (or graph isomorphism). However, in the case of the human
population gene coexpression networks generated here, the use of orthologous gene
pairs results in a one-to-one mapping between the nodes of the two networks. In this
sense, the networks can be considered to be defined over the same set of nodes N, and
thus can be directly compared by generating an intersection network. The human
population intersection network is defined as the network over the set of nodes N
where there is a link between two nodes i and j if i and j denote two pairs of ortholo-
gous genes which are connected in every human population network. Thus, the inter-
section network captures the coexpressed gene pairs conserved between 4 human
populations.
The global characteristics of the intersection network are shown in Table 2. The in-
tersection network node degree and C(k) distributions are clearly similar to those of
the population-specific networks as are the average clustering coefficient (<C> =
0.213) and average path length (<l> = 3.04). Network diameter equals 10. The net-
work diameter and the average shortest path length, also known as the characteristic
path length, indicate small-world properties of the analyzed network. Taken together,
these findings indicate that the global structure of the population-specific coexpres-
sion networks is preserved in the intersection network. However, the most striking
feature of the intersection network is the small fraction of genes (~20%) and edges
(~4–16%) that are conserved between populations networks (Table 3). Accordingly,
the average node degree is lower (<k> = 7.518) in the intersection network than it is
in each of the population-specific networks.
Nodes Edges
Intersection
713 2680
network
CEU 5546 (13%) 72073 (4%)
CHB 3180 (22%) 17047 (16%)
JPT 3572 (20%) 29398 (9%)
YRI 3061 (23%) 21092 (13%)
Global Similarity and Local Variance in Human Gene Coexpression Networks 163
Intersection Random
Parameter
network network
Clustering coefficient 0.213 0.001
Network diameter 10 10
Network centralization 0.061 -
Average degree 7.518 0.0
Number of nodes 713 713
Number of edges 2680 2680
Network density 0.011 -
Network heterogeneity 1.592 -
Characteristic path
3.040 0.005
length
Genes in the networks were functionally categorized using their Gene Ontology (GO)
biological process annotation terms. Overrepresented GO terms were identified with
BINGO[14] by comparing the relative frequencies of GO terms in specific clusters with
the frequencies of randomly selected GO-terms. The Hypergeometric test was used to
do this with the Benjamini and Hochberg false discovery rate correction for multiple
tests and a P-value threshold of 0.001. Pairwise similarities between gene GO terms
were measured using the semantic similarity method, which computes the relative dis-
tance between any two terms along the GO-graph . Result is shown in Table 4.
The graph (Figure 1) visualizes the GO categories that were found significantly
over-represented in the context of the GO hierarchy. The size (area) of the nodes is
proportional to the number of genes in the test set which are annotated to that node.
The color of the node represents the (corrected) p-value. White nodes are not sig-
nificantly over-represented, the other ones are, with a color scale ranging from yel-
low (p-value = significance level, e.g. 0.001) to dark orange (p-value = 5 orders of
magnitude smaller than significance level, e.g. 10-5 * 0.001). The color saturates at
dark orange for p-values which are more than 5 orders of magnitude smaller than
the chosen significance level.
In fact, from the figure it could be seen that the category ‘biopolymer metabolic
process’ is the important one, and that the over-representation of ‘macromolecule
metabolic process ' and 'metabolic process ' categories is merely a result of the pres-
ence of those 'protein modification' genes. The fact that both categories are colored
equally dark, is due to the saturation of the node color for very low p-values.
4 Conclusion
The global topological properties of the human population gene coexpression networks
studied here are very similar but the specific architectures that underlie these properties
are drastically different. The actual pairs of orthologous genes that are found to be co-
expressed in the different population are highly divergent, although we did detect a
substantial conserved component of the co-expression network. One of the most preva-
lent functional classes that show clear function-expression coherence are genes involved
in biopolymer metabolism. Example of these cluster is shown in Figure 2.
The biological relevance of the global network topological properties appears ques-
tionable[10]. Of course, this does not prevent network analysis from being a powerful
approach, possibly, the most appropriate one for the quantitative study of complex
systems made up of numerous interacting parts.
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China
(Grant Nos. 30871394, 30370798 and 30571034), the National High Tech Develop-
ment Project of China, the 863 Program (Grant Nos. 2007AA02Z329), the National
Basic Research Program of China, the 973 Program (Grant Nos. 2008CB517302) and
the National Science Foundation of Heilongjiang Province (Grant Nos. ZJG0501,
1055HG009, GB03C602-4, BMFH060044, and D200650).
166 I. Krivosheev et al.
References
1. Horvath, S., Dong, J.: Geometric interpretation of gene coexpression network analysis.
PLoS Comput. Biol. 4(8), e1000117 (2008)
2. Carter, S.L., et al.: Gene co-expression network topology provides a framework for mo-
lecular characterization of cellular state. Bioinformatics 20(14), 2242–2250 (2004)
3. Bansal, M., et al.: How to infer gene networks from expression profiles. Mol. Syst. Biol. 3,
78 (2007)
4. Potapov, A.P., et al.: Topology of mammalian transcription networks. Genome
Inform. 16(2), 270–278 (2005)
5. Yu, H., et al.: The importance of bottlenecks in protein networks: correlation with gene
essentiality and expression dynamics. PLoS Comput. Biol. 3(4), e59 (2007)
6. Stranger, B.E., et al.: Relative impact of nucleotide and copy number variation on gene
expression phenotypes. Science 315(5813), 848–853 (2007)
7. The International HapMap Project. Nature 426(6968), 789–796 (2003)
8. Margolin, A.A., et al.: ARACNE: an algorithm for the reconstruction of gene regulatory
networks in a mammalian cellular context. BMC Bioinformatics 7(suppl. 1), S7 (2006)
9. Vlasblom, J., et al.: GenePro: a Cytoscape plug-in for advanced visualization and analysis
of interaction networks. Bioinformatics 22(17), 2178–2179 (2006)
10. Tsaparas, P., et al.: Global similarity and local divergence in human and mouse gene
co-expression networks. BMC Evol. Biol. 6, 70 (2006)
11. Khaitovich, P., et al.: A neutral model of transcriptome evolution. PLoS Biol. 2(5), E132
(2004)
12. Yanai, I., Graur, D., Ophir, R.: Incongruent expression profiles between human and mouse
orthologous genes suggest widespread neutral evolution of transcription control.
OMICS 8(1), 15–24 (2004)
13. Jordan, I.K., Marino-Ramirez, L., Koonin, E.V.: Evolutionary significance of gene expres-
sion divergence. Gene 345(1), 119–126 (2005)
14. Babu, M.M.: Introduction to microarray data analysis. In: Grant, R.P. (ed.) Computational
Genomics: Theory and Application. Horizon Press, Norwich (2004)
A Grid Based Cooperative Co-evolutionary
Multi-Objective Algorithm
1 Introduction
The basic idea of Evolutionary Algorithms (EA) [1] is to encode candidate solutions
for specific problems into corresponding chromosomes and then evolve this chromo-
some via some iterative evolutions and selection phases to achieve the best possible
chromosome and decode it as the resulted solution.
Genetic algorithms are a well-known family in the area of EA which have been de-
signed and developed by John Holland in 1960s and were developed by Holland and
his colleagues and students at the university of Michigan in during the 1960s and the
1970s [2]. In this method, chromosomes are in form of strings form alphabet {0, 1}.
There exists three main phases in the evolution process of GA: Crossover or recombi-
nation, mutation and the selection phase where crossover and mutation are responsi-
ble to produce new chromosomes and selection tries to select best of them.
Genetic algorithms are well-known problem-solver in the area of multi-objective
optimization. Fonseca and Fleming [3] represented the idea of the relationship be-
tween the fitness function used in GA to discover good chromosomes and the Pareto
Optimality concept used in multi-objective optimization. Since then, several methods
in usage of genetic algorithm to solve multi-objective problems have been composed
such as those mentioned in [1].
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 167–175, 2009.
© Springer-Verlag Berlin Heidelberg 2009
168 S.M. Fard, A. Hamzeh, and K. Ziarati
2 Multi-Objective Optimization
Mathematical foundations of multi-objective optimization were introduced during
1895-1906 [4]. The concept of vector maximum problem was introduced by Harold
W. Kuhn and Albert W. Tucker [5].
Suppose F = ( f1 ( x), f 2 ( x),..., f m ( x)) : X → Y is a multi-objective function
where x = ( x1 , x 2 ,..., x n ) ∈ X . Now we can define a multi-objective optimization
problem as follow:
min F ( x) = ( f 1 ( x), f 2 ( x),..., f m ( x))
(1)
subject to : x ∈ X
Where X is called the decision space and Y is called the objective space.
Another important concept is domination which is defined as follows: A vector x ′
is said to dominate a vector x ′′ regarding problem F, if for each i = 1,2,..., m ,
f j ( x ′) ≤ f j ( x ′′) and there exists at least one j ∈ {1,2,..., m} that f j ( x ′) < f j ( x ′′) .
A non-dominated solution is not dominated by any other solution in the search space.
So, in the process of solving multi-objective optimization problems, the goal is to find
as much as possible of the existing non-dominate solution. Vilfredo Pareto [6] intro-
duced the concept of optimality in multi-objective optimization. A vector x p ∈ X is
said to be Pareto Optimal with respect to X if there is no vector xd ∈ X such that
F ( x d ) = ( f1 ( x d ), f 2 ( xd ),..., f m ( x d )) dominates F ( x p ) = ( f1 ( x p ), f 2 ( x p ),..., f m ( x p )) .
3 Related Works
Co-operative Co-evolutionary effect is originally used in Co-operative Co-
evolutionary Genetic Algorithm (CCGA) that was designed by Potter and De Jong
[12]. In CCGA, there exist several sub-populations, which each of them contain par-
tial solutions. Then, an individual from each subpopulation is selected and combined
with other individuals to form a total solution. Fitness of each individual will be
evaluated based on the fitness of the combined solution. After that, each sub-
population is evolved using a traditional genetic algorithm.
Keervativuttitumrong et al. [8] employed CCGA [12] in MOEA field and combine
it with MOGA [3] to form the Multi-Objective Co-operative Co-evolutionary Genetic
Algorithm (MOCCGA) [8]. This approach is described in the next section in more
details.
As the next work in the chain, in [9], Srinivas et al. introduced the non-dominated
sorting genetic algorithm (NSGA) which is based on sorting a population according to
the level of non-domination.
Deb et al. [10] proposed modified version of NSGA and named it NSGA-II
which demonstrates better performance using some tricky modification. Finally,
Iorio et al. [13] designed a co-operative co-evolutionary multi-objective genetic
algorithm using the non-dominated sorting, by combining NSGA-II with CCGA
and called NSCCGA. At the best of our knowledge, NSCCGA is the current state of
170 S.M. Fard, A. Hamzeh, and K. Ziarati
the art based on CCGA in the family of MOEA But same as previous methods it
suffers from the high computational complexity due to the non-domination sorting
mechanism. In this paper, we are going to introduce a mechanism based on CCGA.
It is to be mentioned that it has a comparative performance to NSCCGA but with
very lower computational complexity.
In this paper, MOCCGA has been modified employing a new niching technique
adapted from PAES [11] with some modification. This new approach is called Grid
Based multi-objective Co-operative Co-evolutionary Genetic Algorithm or GBCCGA.
To describe this new method, at first, we must describe PAES and MOCCGA in more
details.
PAES has a novel diversity technique proposed in [11] based on a low computa-
tional complexity archiving mechanism. This archive stores and maintains some of
non-dominated solutions that have been found by the algorithm from the beginning
of the evolution. Also, PAES utilize a novel fitness sharing method in which the
objective space is divided into different hyper cubes recursively. Then, each solu-
tion, in accordance with its objectives, is inserted into one of those hyper cubes and
its fitness is decreased based on the number of previously found solutions reside in
the same hypercube. This diversity technique has a lower computational complexity
comparing other diversity technique such as fitness sharing in MOGA [3] and clus-
tering in NSGA-II [10] where show promising performance to keep diversity of an
evolving population. In this technique, the objective space is divided recursively to
form a hyper grid. This hyper grid divides the objective space into hyper cubes
where each hyper cube has width of d r 2 l , where dr is the range of values in objec-
tive d of the current solution, and l is an arbitrary value selected by user.
MOGA [3] and CCGA [12] was integrated together to form MOCCGA in [8]. This
algorithm decomposes the problem into sub-populations according to the dimension
of the search space. MOCCGA assigns rank to each individual of every sub-
population with respect to its sub-population. The ranking schema is similar to
MOGA's ranking schema [3]. Each candidate individual of a sub-population is com-
bined with the best individual of other sub-populations to form a complete solution.
After that, this solution is compared with other solutions and ranked based on Pareto
dominance. Finally, this rank is assigned to candidate individual. For the purpose of
diversity, fitness sharing mechanism is used and applied in the objective space. Then
each sub-population is evolved separately using traditional genetic algorithm.
A Grid Based Cooperative Co-evolutionary Multi-Objective Algorithm 171
Considering results reported by NSGA-II [10] and NSCCGA [13], they perform better
than MOCCGA. All of them use the same niching technique and a very computational
complex domination level mechanism. But, based on our analysis, we hypothesize that
performance of MOCCGA is heavily depends on its niching technique. So, we decide
to adapt PAES's diversity mechanism, as a well-known, well performing niching
mechanism, in MOCCGA and investigate its performance. So, GBCCGA is developed
based on the original MOCCGA and a fitness sharing mechanism inspired by PAES.
Like MOCCGA, GBCCGA generates initial population randomly. It decomposes this
population into several sub-populations according to the dimension of the search space.
Each candidate individual of every sub-population is combined with the best individu-
als of other sub-populations to form a complete solution.
Again, it is combined with randomly chosen individuals of other sub-populations
to form another complete solution. Then, all of these solutions are ranked based on
Pareto non-domination schema [3].
Now, the niching mechanism is activated to calculate the penalty function for each
candidate solution. The main difference between GBCCGA and MOCCGA is in this
step where the latter uses fitness sharing mechanism from MOGA [3] and former uses
modified PAES's niching technique. The first step to adapt this technique for use in
GBCCGA is to define a policy to be able to share fitness of all individuals in a differ-
ent sub-population based on one archive.
To achieve this, we compare both combined solutions: the one which is based on
selection of the best partner and the random-based one, according to Pareto ranking
schema. Then, the better one is located in the archive grid, based on the value of its
objective function. After that, for each archived solution resides in the same hyper-
cube of that grid, one unit of penalty is applied to the rank of that combined solution.
Then, after applying penalty, the rank is assigned to the candidate individual, which
participates in combined solution, of the current sub-population as its fitness value.
Also, if the rank of the candidate individual is one, the corresponding combined solu-
tion is stored in the proper location at the archive grid. So, each candidate individual
in any sub-population can participate in forming the same archive.
Now, it can be seen that in GBCCGA, the original fitness sharing of MOCCGA is
replaced with PAES based one as described above. In figure 1, the overall procedure
of GBCCGA is depicted.
1
The implementation is available online at
http://www.cse.shirazu.ac.ir/~ali/GBCCGA/matlab-code.zip
A Grid Based Cooperative Co-evolutionary Multi-Objective Algorithm 173
Fig. 2. Non-dominated solutions for ZDT1 by (a) MOCCGA, (b) GBCCGA and (c) NSCCGA
Fig. 3. Non-dominated solutions for ZDT2 by (a) MOCCGA, (b) GBCCGA and (c) NSCCGA
Fig. 4. Non-dominated solutions for ZDT3 by (a) MOCCGA, (b) GBCCGA and (c) NSCCGA
Fig. 5. Non-dominated solutions for ZDT4 by (a) MOCCGA, (b) GBCCGA and (c) NSCCGA
174 S.M. Fard, A. Hamzeh, and K. Ziarati
(a) (b)
6 Conclusion
This paper has demonstrated a modified version of MOCCGA by replacing its
niching technique with a Novel diversity technique adopted from PAES to form
GBCCGA that has lower computational complexity and remains competitive to cur-
rent state of the art NSCCGA [13] Since we use a parameter-free niching mechanism,
because it is not necessary to select the σ share ` factor which is highly effective in the
performance of MOCCGA. Also, as another advantage, GBCCGA has very lower
computational complexity in compare with NSCCGA while roughly produce the
same performance. So it can be concluded that the most weakness of MOCCGA is its
niching mechanism which is shown in this work to be repairable without heavy com-
putational complexity such as NSCCGA.
References
[1] Coello Coello, C.A., Lamont, G.B., Van Veldhuizen, D.A.: Evolutionary Algorithms for
Solving Multi-Objective Problems, 2nd edn. Springer Science+Business Media, LLC,
New York (2007)
[2] Holland, J.H.: Adaptation in Natural and Artificial Systems. An Introductory Analysis
with Applications to Biology. Control and Artificial Intelligence. University of Michigan
Press, Ann Arbor (1975)
[3] Fonseca, C.M., Fleming, P.J.: Genetic Algorithms for Multi-objective Optimization:
Formulation, Discussion and Generalization. In: Forrest, S. (ed.) Proceedings of the Fifth
International Conference on Genetic Algorithms, San Mateo, California, University of Il-
linois at Urbana-Champaign, pp. 416–423. Morgan Kaufmann Publishers, San Francisco
(1993)
[4] Stadler, W.: Initiators of Multicriteria Optimization. In: Jahn, J., Krabs, W. (eds.) Recent
Advances and Historical Development of Vector Optimization, pp. 3–47. Springer, Berlin
(1986)
[5] Kuhn, H.W., Tucker, A.W.: Nonlinear Programming. In: Neyman, J. (ed.) Proceedings of
the Second Berkeley Symposium on Mathematical Statistics and Probability, pp. 481–492.
University of California Press, California (1951)
[6] Pareto, V.: Cours D’Economie Politique, vol. I, II. F. Rouge, Lausanne (1896)
A Grid Based Cooperative Co-evolutionary Multi-Objective Algorithm 175
[7] Niching Methods for Genetic Algorithms Samir W. Mahfoud 3665E. Bay Dr. #204-429
Largo, FL 34641 IIliGAL Report No. 95001 (May 1995)
[8] Keerativuttitumrong, N., Chaiyaratana, N., Varavithya, V.: Multi-objective Co-operative
Co-evolutionary Genetic Algorithm. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G.,
Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp.
288–297. Springer, Heidelberg (2002)
[9] Srinivas, N., Deb, K.: Multiobjective Optimization Using Nondominated Sorting in
Genetic Algorithms. Evolutionary Computation 2(3), 221–248 (Fall 1994)
[10] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective
Genetic Algorithm: NSGA–II. IEEE Transactions on Evolutionary Computation 6(2),
182–197 (2002)
[11] Knowles, J.D., Corne, D.W.: Approximating the Nondominated FrontUsing the Pareto
Archived Evolution Strategy. Evolutionary Computation 8(2), 149–172 (2000)
[12] Potter, M.A., de Jong, K.: A Cooperative Coevolutionary Approach to Function Optimi-
zation. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866,
pp. 249–257. Springer, Heidelberg (1994)
[13] Iorio, A.W., Li, X.: A Cooperative Coevolutionary Multiobjective Algorithm Using Non-
dominated Sorting. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 537–548.
Springer, Heidelberg (2004)
[14] Zitzler, E., Deb, K., Thiele, L.: Comparison of Multiobjective Evolutionary Algorithms:
Empirical Results. Evolutionary Computation 8(2), 173–195 (Summer 2000)
Fuzzy Modeling for Analysis of Load Curve in Power
System
Abstract. The main purpose of this paper is to study the use of fuzzy modeling
for the analysis of customer load characteristics in power system. A fuzzy
model is a collection of fuzzy IF-THEN rules for describing the features or be-
haviors of the data set or system under study. In view of the nonlinear charac-
teristics of customer load demand with respect to time, the method of fuzzy
modeling is adopted for analyzing the studied daily load curves. Based on the
Sugeno-type fuzzy model, various models with different numbers of modeling
rules have been constructed for describing the investigated power curve. Sam-
ple results are demonstrated for illustrating the effectiveness of the fuzzy model
in the study of power system load curves.
Keywords: power system, load curve, fuzzy model, Sugeno-type fuzzy model.
1 Introduction
Understanding of system load characteristics is a primary concern in power system
planning and operations. The load curve, which is developed from the measured data
showing load demand with respect to time, plays an important role for the analysis of
an electric power customer. Based on the survey results of system load characteristics,
system engineers and operators are able to carry out the studies of load forecasting,
generation planning, system expansion, cost analysis, tariff design, and so forth. Due
to the inherent nonlinearity, many methods have been proposed for examining cus-
tomer load curves in power systems [1]-[12].
A fuzzy system model is basically a collection of fuzzy IF-THEN rules that are
combined via fuzzy reasoning for describing the features of a system under study
[13]-[21]. The method of fuzzy modeling has been proven to be well-suited for mod-
eling nonlinear industrial processes described by input-output data. The fuzzy model
describes the essential features of a system by using linguistic expressions. The fuzzy
model not only offers the accurate expression of the quantitative information for a
studied nonlinear system, but also can provide a qualitative description of the physical
feature [22]-[25]. In view of the nonlinear characteristics of a the load curve, the
fuzzy model is employed for analyzing the curve in that the method of fuzzy model-
ing is suitable for modeling nonlinear processes described by input-output data.
The main purpose of this paper is to investigate the application of the Sugeno-type
fuzzy modeling method [14] for analyzing a power system daily load curve which
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 176–184, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Fuzzy Modeling for Analysis of Load Curve in Power System 177
depicts the data of a customer load demand versus the time in a day. Different numbers
of modeling rules are taken into account in constructing the fuzzy models for repre-
senting a daily load curve. The parameters of the Sugeno-type fuzzy model are calcu-
lated by employing the algorithm of ANFIS (Adaptive Neuro-Fuzzy Inference System)
[15]-[17]. The IF-THEN structure of a fuzzy model can yield both a numerical (quanti-
tative) approximation and a linguistic (qualitative) description for the studied load
curve. It is shown from the computational results that the obtained fuzzy models are
capable of providing both quantitative and qualitative descriptions for the power sys-
tem load curve.
2 Load Curve
A load curve is obtained from the measurements of load demand of a certain cus-
tomer, region, or utility system with respect to time and thus provides a way of under-
standing how much electric energy is being consumed at different times of day. Based
on the time duration considered, there are various types of load curves: daily load
curves, weekly load curves, seasonal load curves, and annual load curves. In the load
study of power system, the daily load curve is the first concern since it reveals the
most fundamental features of a specific load [1]-[6].
the original power value of each data point converted into its percentage (%) value by
choosing the highest (peak) power demand in the curve as the base power value. The
highest (peak) demand happened around evening period after all the family returned
from work or school. It is evident that the load curve shown in Fig.1 is a nonlinear
graph and the method of fuzzy modeling is to be applied for analyzing the load curve.
3 Fuzzy Modeling
The characteristic (membership) function of a crisp (conventional) set assigns each
element of the set with a status of membership or non-membership. On the other
hand, a set consists of elements with varying degrees of membership associated with
the set is referred to as a fuzzy set [13]. Based on fuzzy set theory, the fuzzy model,
which consists of a collection of fuzzy IF-THEN rules, is used for describing the
behavior or characteristics of the data or system under study [14]-[21]. The fuzzy
model expresses an inference mechanism such that if we know a premise, then we can
infer or derive a conclusion. In modeling nonlinear systems, various types of fuzzy
rule-based system could be described by a collection of fuzzy IF-THEN rules. The
objective of this paper is to study an application of the fuzzy model to describe the
nonlinear load curve in power system.
Among various types of fuzzy models, the Sugeno-type fuzzy model has recently
become one of the major topics in theoretical studies and practical applications of
fuzzy modeling and control [14]-[17]. The basic idea of Sugeno-type fuzzy model is
to decompose the input space into fuzzy regions and then to approximate the system
in every region by a simple linear model. The overall fuzzy model is implemented by
combining all the linear relations constructed in each fuzzy region of the input space.
The main advantage of the Sugeno-type fuzzy model lies in its form with a linear
function representation in each subregion of the system.
In this study, the single-input-single-output (SISO) Sugeno-type fuzzy model con-
sisting of n modeling rules, as described in (1), is adopted for describing a load
curve:
Ri : IF T is A i THEN P = aiT + bi , i = 1, 2,L , n. (1)
where T (time in a day, Hour) denotes the system input and P (load demand, %)
represents the output in the i th subregion. It is noted that the input domain is parti-
tioned into n fuzzy subregions (time periods), each of which is described by a fuzzy
set Ai , and the system behavior in each subregion is modeled by a linear equation
P = aiT + bi in which ai is the coefficient related with time variation and bi is a
constant. Let mi denote the membership function of the fuzzy set Ai . If the system
input T is at a specific time T 0 then the overall system output P is obtained by
computing a weighted average of all the outputs Pi ’s from each modeling rule Ri as
shown in (2), (3) and (4).
wi = mi (T 0 ), i = 1, 2,L , n. (2)
Fuzzy Modeling for Analysis of Load Curve in Power System 179
Pi = ai T 0 + bi , i = 1, 2,L , n. (3)
n n
P = (∑ wi Pi ) /(∑ wi ) . (4)
i =1 i =1
⎡ ( x − ci ) 2 ⎤
m i ( x) = exp ⎢ − ⎥. (5)
⎣ 2σ i2 ⎦
where the variable x stands for the time in a day and the parameters ci and
σ i define the shape of each membership function mi ( x) for the fuzzy set Ai .
Then the data points of load demand measurements are fed into an adaptive
network structure, namely the “Adaptive Neuro-Fuzzy Inference System”
(ANFIS) [15]-[17], for the purpose of model training to calculate the parame-
ters (ci , σ i ) of the membership function in the IF-part and the coefficients
(ai , bi ) of the linear function in the THEN-part.
4 Computational Results
The residential daily load curve shown in Fig. 1 is utilized as the modeling study
object. The input variable T (Hour) is the time in a day and the output P (%) is the
power demand of the customer. Note that the load curve reveals a nonlinear time-
dependent characteristic and the approach of fuzzy modeling will be suitable in the
analysis of the load curve.
The (SISO) Sugeno-type fuzzy model described in (1) is adopted for describing
this load curve. The Gaussian type membership function in (5) is employed for repre-
senting the fuzzy set A i in the IF-part of the modeling rule and the THEN-part is a
linear model. Under different preset numbers of modeling rules, the ANFIS algorithm
is used for model training to calculate the model parameters. The number of iterations
for model training is set to be 1000.
180 P.-H. Huang et al.
2 3.7874
3 2.4922
4 1.8353
5 1.4501
6 0.3892
Rule ( ci , σ i ) ( a i , bi )
respectively. The fuzzy sets shown in Fig. 2 and Fig. 3 represent the membership
functions for the three-rule and six-rule fuzzy models. Moreover, the measured data
together with fuzzy modeling output for the three-rule and the six-rule fuzzy models
are shown in Fig. 4 and Fig. 5 for the purpose of comparison.
From the root-mean-squared errors in Table 1, it is found that more modeling rules
will yield a model with less modeling error while the number of rules vary from two
to six. It is also observed from Fig. 4 and Fig. 5 that the fitting accuracy of the six-rule
fuzzy model is better than that of the three-rule fuzzy model. Both the three-rule fuzzy
model with parameters in Table 2 and the six-rule fuzzy model with parameters in
Table 3 can provide IF-THEN rules for representing the studied load curve.
The n-rule fuzzy models in (1) can be converted into models with linguistic rules
by giving a proper time label to each fuzzy set Ai in the IF-part of the ith rule. For
example, from the parameters in Table 2 and the fuzzy sets in Fig. 2, the three-rule
fuzzy model can be converted into the following three linguistic rules in (6)-(8):
IF T is early morning, THEN P = −4.39 × T + 63.10 , (6)
5 Conclusion
The main purpose of this paper is to present the description and analysis of the power
system load curve by fuzzy modeling. In view of the nonlinear characteristic of the
power system load curve which depicts the time variations of load demand of a cus-
tomer, the method of fuzzy modeling is employed for representing the curve. A typi-
cal residential daily load curve is employed as the studied load curve. Based on the
Sugeno-type fuzzy model, various models with different numbers of modeling rules
has been identified to describe the load curve. It is found that such fuzzy model is
capable of providing both quantitative and qualitative descriptions for the load curve.
The validty of the Sugeno-type fuzzy model in the analysis of power system load
curve has been verified in this work.
Acknowledgments. The authors would like to thank Prof. Shu-Chen Wang, Mr.
Muh-Guay Yang, and Mr. Min-En Wu for all their help with this work.
References
1. Walker, C.F., Pokoski, J.J.: Residential load shape modeling based on customer behavior.
IEEE Trans. on Power Systems and Apparatus 104, 1703–1711 (1985)
2. Talukdar, S., Gellings, C.W.: Load Management. IEEE Press, Los Alamitos (1987)
3. Bjork, C.O.: Industrial Load Management-Theory, Practice and Simulations. Elsevier,
Amsterdam (1989)
4. Tweed Jr., N.B., Stites, B.E.: Managing load data at Virginia Power. IEEE Computer
Applications in Power 5, 25–29 (1992)
5. Schrock, D.: Load Shape Development. PennWell Publishing Co. (1997)
6. Pansini, A.J., Smalling, K.D.: Guide to Electric Load Management. PennWell Publishing
Co. (1998)
184 P.-H. Huang et al.
7. Chen, C.S., Hwang, J.C., Tzeng, Y.M., Huang, C.W., Cho, M.Y.: Determination of cus-
tomer load characteristics by load survey system at Taipower. IEEE Trans. on Power
Delivery 11, 1430–1435 (1996)
8. Chen, C.S., Hwang, J.C., Huang, C.W.: Application of load survey systems to proper tariff
design. IEEE Trans. on Power System 12, 1746–1751 (1997)
9. Senjyu, T., Higa, S., Uezato, K.: Future load curve shaping based on similarity using fuzzy
logic approach. IEE Proceedings-Generation, Transmission and Distribution 145, 375–380
(1998)
10. Konjic, T., Miranda, V., Kapetanovic, I.: Prediction of LV substation load curves with
fuzzy inference systems. In: 8th International Conference on Probabilistic Methods
Applied to Power Systems, pp. 129–134. Iowa State University (2004)
11. Zhang, C.Q., Wang, T.: Clustering analysis of electric power user based on the similarity
degree of load curve. In: 4th International Conference on Machine Learning and Cybernet-
ics, vol. 3, pp. 1513–1517. IEEE Press, Los Alamitos (2005)
12. Zhang, J., Yan, A., Chen, Z., Gao, K.: Dynamic synthesis load modeling approach based
on load survey and load curves analysis. In: 3rd International Conference on Electric Util-
ity Deregulation and Restructuring and Power Technologies (DRPT 2008), pp. 1067–1071.
IEEE Press, Los Alamitos (2008)
13. Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)
14. Sugeno, M., Kang, G.: Structure identification of fuzzy model. Fuzzy Sets and Systems 28,
15–23 (1988)
15. Jang, J.S.R.: ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. on
Systems, Man and Cybernetics 23, 665–685 (1993)
16. Jang, J.S.R., Sun, C.T.: Neuro-fuzzy modeling and control. Proceedings of the IEEE 83,
378–406 (1995)
17. Jang, J.S.R., Sun, C.T., Mizutani, E.: Neuro-Fuzzy and Soft Computing. Prentice-Hall,
Englewood Cliffs (1997)
18. Bezdek, B.: Fuzzy models-What are they, and why. IEEE Trans on. Fuzzy Systems 1, 1–6
(1993)
19. Sugeno, M., Yasukawa, T.: A fuzzy-logic-based approach to qualitative modeling. IEEE
Trans. on Fuzzy Systems 1, 7–31 (1993)
20. Yager, R., Filev, D.: Essentials of Fuzzy Modeling and Control. John Wiley & Sons,
Chichester (1994)
21. Ross, T.: Fuzzy Logic with Engineering Applications. Wiley, Chichester (2004)
22. Huang, P.H., Chang, Y.S.: Fuzzy rules based qualitative modeling. In: 5th IEEE Interna-
tional Conference on Fuzzy Systems, pp. 1261–1265. IEEE Press, Los Alamitos (1996)
23. Huang, P.H., Jang, J.S.: Fuzzy modeling for magnetization curve. In: 1996 North Ameri-
can Power Symposium, pp. 121–126. Mass. Inst. Tech. (1996)
24. Huang, P.H., Chang, Y.S.: Qualitative modeling for magnetization curve. Journal of Ma-
rine Science and Technology 8, 65–70 (2000)
25. Wang, S.C., Huang, P.H.: Description of wind turbine power curve via fuzzy modeling.
WSEAS Trans. on Power Systems 1, 786–792 (2006)
Honey Bee Mating Optimization Vector Quantization
Scheme in Image Compression
Ming-Huwi Horng
1 Introduction
Vector quantization techniques have been used for a number of years for data com-
pression. The operations of VQ include dividing the image to be compressed into
vectors (or blocks) and each vector is compared to the codewords of a codebook to
find its reproduction vector. The codeword, which is most similar to an input vector,
is called the reproduction vector of input vector. In the encoding process, an index,
which points to the closest codeword of an input vector, is determined. Normally, the
size of the codebook is much smaller than the original image data set. Therefore, the
purpose of image compression is achieved. In the decoding process, the associated
sub-image is exactly retrieved by the same codebook which as been used in the en-
coding phase. When each sub-image is completely reconstructed, the decoding is
completed.
Vector quantization (VQ) algorithms have been performed by many researchers;
new algorithms continue to appear. The generation of codebook is known as the most
important process of VQ. The k-means based algorithms are designed to minimize
distortion error by selecting a suitable codebook. A well-known method is the LBG
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 185–194, 2009.
© Springer-Verlag Berlin Heidelberg 2009
186 M.-H. Horng
algorithm [1]. However, the LBG algorithm is a local search procedure. It suffers
from the serious drawback that its performance depends heavily on the initial starting
conditions. Many studies have been undertaken to solve this problem. Chen, Yang
and Gou proposed an improvement based on the particle swarm optimization (PSO)
[2]. The result of LBG algorithm is used to initialize global best particle by which it
can speed the convergence of PSO. In addition, Wang et al. proposed a quantum par-
ticle swarm algorithm (QPSO) to solve the 0-1 knapsack problem [3]. Zhao et al.
employed a quantum particle swarm optimization to select the thresholds of the multi-
level thresholding [4].
Over the last decade, modeling the behavior of social insects, such as ants and
bees, for the purpose of search and problems solving had been the context of the
emerging area of swarm intelligence. Therefore, the honey-bee mating may also be
considered as a typical swarm-based approach for searching for the optimal solution
in many application domains such as clustering [5] and multi-level image threshold-
ing selection [6]. In this paper, the honey bee mating optimization (HBMO) associ-
ated with the LBG algorithm is proposed to search for the optimal codebook that
minimizes the distortion between the training set and the codebook. In other words,
HBMO algorithm is a search technique that finds the optimal codebook for the input
vectors. Experimental results have demonstrated that the HBMO-LBG algorithms
performed better than the LBG and PSO algorithms consistently.
This work is organized as follows. Section 2 introduces the vector quantization and
LBG algorithm. Section 3 presents this proposed method which searches for the
optimal codebook using the HBMO algorithm. Performance evaluation is discussed in
detail in Section 4. Conclusions are presented in Section 5.
2 Vector Quantization
This section provides some basic concepts of vectors quantization and introduces the
traditional LBG algorithm.
2.1 Definition
and
Lk ≤ c jk ≤ U k , k = 1,2,..., L . (4)
where Lk is the minimum of the kth components in the all training vectors, and U k is
the maximum of the kth components in all training vectors. The x − c is the Euclid-
ean distance between the vectors x and c.
Two necessary conditions exist for an optimal vector quantizer.
(1)The codewords c j must be given by the centroid of R j :
Nj
1
cj = ∑ x i , xi ∈ R j . (5)
Nj i =1
An algorithm for a scalar quantizer was proposed by Lloyd [7]. Linde et al. general-
ized it for vector quantization [1]. This algorithm is known as LBG or generalized
Lloyd algorithm (GLA). It applies the two following conditions to input vectors for
determining the optimal codebooks.
188 M.-H. Horng
(2). Determine the centroids of each partition. Replace the old codewords with these
centroids:
Nb
∑ µ ij xi
i =1
c j (k + 1) = Nb
, j = 1,..., N c . (8)
∑ µij
i =1
(3) Repeat steps (1) and (2) until no c j , j = 1,..., N c changes anymore.
some energy content during the flight mating and returns to her nest when the energy
is within some threshold from zero to full spermatheca.
In order to develop the algorithm, the capability of workers is restrained in brood
care and thus each worker may be regarded as a heuristic that acts to improve and/or
take care of a set of broods. An annealing function is used to describe the probability
of a drone (D) that successfully mates with the queen (Q) shown in Eq. (9).
P(Q, D ) = exp[−∆( f ) / S (t )] (9)
where ∆ ( f ) is the absolute difference of the fitness of D and the fitness of Q, and the
S (t ) is the speed of queen at time t. After each transition of mating, the queen’s
speed and energy are decayed according to the following equation:
S (t + 1) = α × S (t ) (10)
where α is the decreasing factor ( α ∈ [0, 1] ). Workers adopt some heuristic mecha-
nisms such as crossover or mutation to improve the brood’s genotype. The fitness of
the resulting genotype is determined by evaluating the value of the objective function
of the brood genotype. It is important to note that a brood has only one genotype. The
popular five construction stages of the HBMO algorithm had been proposed by
Fathian et al. [5] that are also used to develop the algorithm of multilevel image
thresholding method in this paper. The five stages are described as follow:
(1) The algorithm starts with the mating flight, where a queen (best solution) selects
drones probabilistically to form the spermatheca (list of drones). A drone then se-
lected from the list randomly for the creation of broods.
(2) Creation of new broods by crossover the drone’s genotypes with the queens.
(3) Use of workers to conduct local search on broods (trial solutions).
(4) Adaptation of worker’s fitness, based on the amount of improvement achieved on
broods.
(5) Replacement of weaker queen by fitter broods.
This section introduces a new codebook design of vector quantization algorithm that
uses the honey bee mating optimization method. In the HBMO-LBG algorithm, the
solutions include the best solution; candidate solution and the trivial solution are rep-
resented in the form of codebook. Figure 1 shows the structures of drone set (trivial
solution). The same structure is used to represent the queen and breeds. The essential
of designed algorithm is based on the Fathian’s five stage scheme. Followings are the
detail algorithm: The fitness function used also defined in Eq. (11).
1 Nb
Fitness (C ) = = 2
D (C ) Nc Nb (11)
∑ ∑ µij ⋅ xi − c j
j =1i =1
190 M.-H. Horng
where the Spi is the ith sperm in the spermatheca and the Sp i , k is the kth codeword of
the codebook represented by the Spi .
brood j = Q ± β × ( Sp j − Q ) (14)
where β is randomly number ranges from 0 to 1.
The δ is random number ranges from 0 to 1; k is one of the index of selected code-
words among the selected and the ε is pre-defined parameter. .
Step 3. The best brood, brood best with maximum fitness v alue is selected as the
candidate queen.
Step 4. If the fitness of brood best is superior to the queen, we replace queen by
brood best .
(a). LENA
Fig. 2. The test images: (a) LENA, (b) PEPPER, (c) BIRD, (d) CAMERA and (e)GOLDHILL
A2
PSNR = 10 × log10 ( ) (dB) (17)
MSE
The A is the maximum of gray level and MSE is the mean square between the origi-
nal image and the decompressed image.
1 M M
MSE = ∑ ∑ ( yij − yij )
2
(18)
N b × N c i =1 j = 1
where M × M is the image size, yij and yij denote the pixel value at the location (i, j)
of original and reconstructed images, respectively. The experimental results are
shown in Table 1. Tabe 1 show the PSNR values and the execution times of test
images by using the three different vector quantization methods. Obviously, the us-
ages of the HBMO-LAB algorithm have highest PSNR value compared with other
two methods. Furthermore, the PSNR of LBG algorithm is the worst and the other
three algorithms can significantly improve the results of LBG algorithm.
Honey Bee Mating Optimization Vector Quantization Scheme in Image Compression 193
Table 1. The PSNR and execution times for the five test images with different bit rate using the
four different vector quantization that are LBG, PSO-LGB, QPSO-LBG and HBMO-LBG
algotithms
Image Bit_rate
(512×512) (bit/pixel) LBG-LBG PSO-LBG HBM-LBG
LENA 0.3125 25.430 3 25.760 478 25.887 298
0.375 25.657 4 26.567 932 26.830 568
0.4375 25.740 6 27.508 1828 27.562 1136
0.5 25.750 9 28.169 3666 28.337 2525
0.5625 25.758 17 28.994 7406 29.198 4386
0.625 25.760 36 29.863 14542 29.958 8734
5 Conclusion
This paper gives a detailed description of how the HBMO (honey bee mating optimi-
zation) algorithm can be used to implement the vector quantization and enhance the
performance of LBG method. All of our experimental results showed that the pro-
posed algorithm can significantly increase the quality of reconstructive images with
respect to other three methods included the traditional LBG, PSO-LBG and QPSO-
LBG. The proposed HBMO-LBG algorithm can provide a better codebook with small
distortion.
194 M.-H. Horng
Acknowledgment
The author would like to thank the National Science council, ROC, under Grant No.
NSC 97-2221-E-251-001 for supports of this work.
References
1. Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Transac-
tion on Communications 28(1), 84–95 (1980)
2. Chen, Q., Yang, J., Gou, J.: Image Compression Method Using Improved PSO Vector
Quantization. In: Wang, L., Chen, K., S. Ong, Y. (eds.) ICNC 2005. LNCS, vol. 3612, pp.
490–495. Springer, Heidelberg (2005)
3. Wang, Y., Feng, X.Y., Huang, Y.X., Pu, D.B., Zhou, W.G., Liang, Y.C., Zhou, C.G.: A
novel quantum swarm evolutionary algorithm and its applications. Neurocomputing 70,
633–640 (2007)
4. Zhao, Y., Fang, Z., Wang, K., Pang, H.: Multilevel minimum cross entropy threshold selec-
tion based on quantum particle swarm optimization. In: Proceeding of Eight ACIS Interna-
tional Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/
Distributed Computing, pp. 65–69 (2007)
5. Fathian, M., Amiri, B., Maroosi, A.: Application of honey-bee mating optimization algo-
rithm on clustering. Applied Mathematics and Computations, 502–1513 (2007)
6. Horng, M.H.: Multilevel Minimum Cross Entropy Threshold selection based on Honey Bee
Mating Optimization. In: 3rd WSEAS International conference on Circuits, systems, signal
and Telecommunications, CISST 2009, Ningbo, pp. 25–30 (2009)
7. Lloyd, S.P.: Least square quantization in PCM’s. Bell Telephone Laboratories Paper,
Murray Hill, NJ (1957)
8. Abbasss, H.A.: Marriage in honey-bee optimization (HBO): a haplometrosis polygynous
swarming approach. In: The Congress on Evolutionary Computation (CEC 2001), pp. 207–214
(2001)
Towards a Population Dynamics Theory for Evolutionary
Computing: Learning from Biological Population
Dynamics in Nature
Zhanshan (Sam) Ma
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 195–205, 2009.
© Springer-Verlag Berlin Heidelberg 2009
196 Z. (Sam) Ma
1 Introduction
In much of today's evolutionary computing, the setting of population is still experi-
ence-based, and often the fixed population size is preset manually before the start of
computing. This practice is simple, but not robust. The dilemma is that small popula-
tions may be ineffective but big populations can be costly in term of computing time
and memory space. In particular, in Genetic programming, a big population is often a
culprit for the too early emergence of code bloat, which may cause failure or even crash
the system. Furthermore, many of the problems we face are NP-hard problems; effi-
cient population sizing may have significant impacts on the success of the heuristic
algorithms. Nevertheless, an optimal population size effective in exploring fitness
space and efficient in utilizing computing resources is theoretically very intriguing.
Achieving the balance in effectiveness and efficiency is such a dilemma that prompted
Harik et al's (1999) approach to the problem with the Gambler's ruin random walk
model.
If there is an optimal population size, it is very likely to be a moving target, which
may depend on selection pressure, position in search space, and fitness landscape.
Indeed, the population in nature or the biological population is a moving target—a
self-regulated dynamic system influenced by both intrinsic and environment stochas-
ticity. A natural question can be: what if we emulate the natural population dynamics in
EC? A follow-up question is how to emulate? The answer for the latter is actually
straightforward because we can simply adopt the mathematical models developed for
the natural population dynamics. The discipline that study natural population dynamics
is population ecology, which is a major branch of theoretical ecology (or mathematical
ecology). Theoretical ecology is often compared with theoretical physics, and there is a
huge amount of literature on population dynamics, on which computer scientists may
draw for developing an EC population dynamics theory. In this paper, I can only pre-
sent a bird's-eye view of the potential population dynamics models. Furthermore I limit
the models to the categories that we have experimentally tested, and the test results
demonstrated significant improvement with the dynamic populations that are con-
trolled by the mathematical models.
It should be noted that there are quite some existing studies (e.g.,[8][9][11]) that
were conducted to develop alternatives for the fixed-size populations. Due to page
space limitation, here I totally skip the review of existing studies on dynamic popula-
tions in EC (a review was presented in [23]). To the best of my knowledge, this is the
first work that introduces, in a comprehensive manner, the natural population dynamics
theory to EC. The remainder of this paper is organized as follows: Section 2 is an ex-
tremely brief overview of the natural population dynamics theory. Sections 4, 5, and 6
introduce three major categories of the population dynamics models: deterministic,
stochastic, and extended evolutionary game theory models. In Sections 4 and 5, I also
show some sample experiment results with the models introduced. Section 3 briefly
introduces the experiment problem used in Sections 4 and 5. Section 7 is a summary.
Towards a Population Dynamics Theory for Evolutionary Computing 197
Overall, this article is a summary of a series of studies with the objective to develop an
EC population dynamic theory by emulating natural populations and 'transplanting'
mathematical models developed for natural population dynamics in theoretical ecology
to EC [27][30]–[32].
Population regulation was one of the most contested theories in the history of ecology
(e.g., see [16] for a brief history) and the debates started in the 1950s and culminated in
the 1960s, and even these days, the antagonistic arguments from both schools occa-
sionally appear in ecological publications (e.g., [1][40]). The debate sounds simple
from an engineering perspective. The core of the debate lies in the fundamental ques-
tion: is population regulated by feedback mechanisms such as density-dependent ef-
fects of natural enemies or is simply limited by its environment constraints. Within the
regulation school, there are diverse theories on what factors (intrinsic such as compe-
tition, natural enemies, gene, behavior, movement, migration, etc) and how the popu-
lation is regulated. Certainly, there are mixed hypothesis of two schools. About a
dozen hypotheses have been advanced since the 1950s. The debates were "condemned"
by some critics as "bankrupt paradigm", "a monumental obstacle to progress" (cited in
[1[40]). However, there is no doubt that the debates kept population ecology as the
central field for more than three decades, and is critical for shaping population ecology
as the most quantitatively studied field in ecology. The theoretical ecology is often
dominated by the contents of population ecology [17]. In addition, the important ad-
vances in ecology such as Chaos theory, spatially explicit modeling, agent or individual
based modeling all initiated in population ecology.
The importance of population regulation cannot be overemphasized, since it reveals
the mechanisms for population dynamics. Even more important is to treat population
dynamics from the time-space paradigm, not just the temporal changes of population
numbers. In addition, the concept of metapopulation is also crucial, which implies that
local population extinction and recolonization happen in nature [10]. Obviously,
population regulation as control mechanisms for population size is very inspiring for
the counterpart problem in evolutionary computation. What is exciting is that popula-
tion regulation and population dynamics can also be unified with evolutionary game
theory, and even united with population genetics under same mathematical modeling
framework such as Logistic model and Lotka-Volterra systems (e.g., [12][29][39]).
divide L evenly, the leftmost block will be of different size. The following are four
examples of bit streams and their corresponding fitness values (F). The first and third
have optimum solutions, and the blocks that counts in the fitness function are under-
lined.
L = 32, B = 4, F=8: 1111 1111 1111 1111 1111 1111 1111 1111
L = 32, B = 4, F=2: 0000 1111 0110 0100 0111 1010 1111 1010
L = 32, B = 7, F=5: 1111 1111111 1111111 1111111 1111111
L = 32, B = 7, F=2: 1111 0000110 0001000 1111111 0011010
Three metric are used to evaluate the population-sizing schemes or different population
dynamics models. The number of times the optimum solution is found in an experiment
run (consisting of a fixed number of trials) is termed hits per experiment. The other two
metrics are PopuEvalIndex, FirstHitPopuEvlaIndex, and The small PopuEvalIndex
and FirstHitPopuEvlaIndex indicate that fewer evaluations of fitness function and
other associated computations are needed. In addition, it also implies that less memory
space is needed for storing the individual information. Of course, the large number of
hits also indicates the scheme is more robust and efficient.
For more detailed definitions of the metrics and the problem, one may refer to [27].
In the next two sections, all the sample experiment results, which are excerpted from
[30][31] to illustrate the corresponding models, are obtained with this test problem and
are based on the three metrics discussed above.
4 Deterministic Modeling
Nr
x= (4)
K (1 + r )
substituting x into equation (3) derives the one-parameter dimensionless Logistic Map,
also known as one-hump nonlinear function, or one-dimensional quadratic map.
xn +1 = ax n (1 − xn ) (5)
where a = r + 1. To avoid trivial dynamic behavior, the model requires 1<a <4 and
0<x<1. The population size (x) is converted to the (0, 1) interval, and the conversion
also eliminates the other parameter K, which makes the analysis more convenient. The
extremely rich and complex behavior represented by the deceptively simple equation
(5) was discovered by Robert May in 1976 [35]. [38] contains detailed discussion of the
Logistic Chaos map model. The one-parameter Logistic map model is particularly
convenient for controlling EC population dynamics, and offers extremely rich dy-
namics. Figures 1 and 2 show the performance of dynamic populations with the Lo-
gistic chaos map model vs. that of the fixed-size population.
7 0 3 e + 5
N u m b e r o f H it s
3 e + 5 P o p u E v a lIn d e x
Number of Hitting to the Optimum Solutions
3 e + 5 F ir s t H it In d e x
6 0
2 e + 5
2 e + 5
5 0
2 e + 5
2 e + 5
Evaluations
4 0 2 e + 5
1 e + 5
3 0 1 e + 5
1 e + 5
2 0 8 e + 4
6 e + 4
4 e + 4
1 0
2 e + 4
0 e + 0
0 F ix e d a = 1 .5 a = 2 .5 a = 3 .5 7 a = 3 .6 8 a = 3 .8 3 a = 4
F ix e d a = 1 .5 a = 2 .5 a = 3 .5 7 a = 3 .6 8 a = 3 .8 3 a = 4
Fig. 1. Total number of Hits for each Parame- Fig. 2. PopuEvalIndex and First Hit
ter value of a and for fixed population PopuEvalIndex for BlockSize=8
Figures 1 and 2 demonstrate that dynamic populations controlled with the Logistic
chaos map model under various parameter values all significantly outperform the
fixed-size population (the leftmost) except for a=4. Parameter a=4 is theoretically
impossible with the model; it was included to test the boundary case. The detailed
experiment results with the Logistic chaos map model are documented in [30].
5 Stochastic Modeling
There are many stochastic models for population dynamics (e.g., [18]). In this section, I
introduce a category of probability distribution models that are used to describe the
spatial distribution patterns of animal populations. Natural populations are dynamic in
both time and space. In nature, the population spatial distribution patterns are consid-
ered as the emergent expression (property) of individual behavior at the population
level and are fine-tuned or optimized by natural selection [33][34]. It is assumed that
EC populations emulating the spatial distribution patterns of natural populations should
be advantageous.
Towards a Population Dynamics Theory for Evolutionary Computing 201
Generally, there are three types of spatial distribution patterns: aggregated, random
and regular. Random distribution (sensu ecology) can be fitted with Poisson prob-
ability distribution (sensu mathematics), and the regular distribution (also termed
uniform distribution, sensu ecology) is totally regular with even spacing among indi-
viduals. Aggregated (also termed contagious, congregated, or clustered) distribution
(sensu ecology) represents nonrandom and uneven density in space. The probability
distributions (sensu mathematics) for aggregated distributions (sensu ecology) are
strongly skewed with very long right tails. The most widely used discrete probability
distribution (sensu mathematics) for the aggregated distribution (sensu ecology) pattern
is the Negative Binomial Distribution (NBN).
The aggregated distribution or fat-tail distribution is a property of power law. Taylor
(1961) discovered that the Power law model fits population spatial distribution data
ubiquitously well,
V = aM b , (6)
where M and V are population mean and variance, respectively, and a and b are pa-
rameters. According to Taylor (1961), b>1 corresponds to aggregated distribution, b=1
to random distribution and b<1 to regular distribution. Ma (1988, 1991) [33][34] ex-
tended Taylor’s Power law model with his concept of population aggregation critical
density (PACD), which was derived, based on the Power Law:
m0 = exp[ln(a ) /(1 − b)] , (7)
where a and b are the parameters of Power Law and m0 is the PACD. According to
Ma’s (1988, 1991) reinterpreted Power Law, population spatial distributions are
population density-dependent and form a continuum on the population density series.
The PACD is the transition threshold (population density) between the aggregated,
random and regular distributions [33][34].
7 0 6e+6
N u m b e r o f H its to P o p u E v a lIn d e x
O p t im u m S o lu t io n 5e+6
F ir s tH itP o p u E v a lI n d e x
6 0
5e+6
Fitness Evaluation Index
Hits to the Optimum Solution
4e+6
5 0
4e+6
4 0 3e+6
3e+6
3 0 2e+6
2e+6
2 0
1e+6
1e+6
1 0
5e+5
0 0e+0
F ix e d W e ib u ll N B N P o s s io n N o rm a l U n if o r m F ix e d W e ib u ll NBN P o is s o n N o rm a l U n ifo rm
Fig. 3. Number of Hits for Each Distribution Fig. 4. Fitness Evaluation Indexes
which is the probability that an individual will survive beyond time T. Various distri-
bution models (such as Weibull, lognormal, and logistic distributions) can be used as
survivor function. These distributions are termed parametric models. In addition,
semi-parametric models, which describe the conditional survivor function influenced
by environment covariates (z), can be utilized, and the most well-known seems to be
Cox's proportional hazards model (PHM):
S (t | z ) = [ S 0 (t )]exp( zβ ) (9)
⎡ t ⎤
where S0 (t ) = exp ⎢ −
⎣
∫ λ (u)du⎥⎦.
0
0 (10)
and z is the vector of covariates, which can be any factors that influence the baseline
survivor function S 0 (t ) or the lifetime of individuals. Dedicated volumes have been
written to extend the Cox model. Therefore, models (8)–(10) offer extremely rich
modeling flexibility and power. Furthermore, more complex multivariate survival
analysis ([13][28]) and competing risks analysis ([5]) can be introduced to model EC
populations.
Besides offering extremely rich and powerful stochastic models, the three 'sister'
fields of survival analysis can be utilized to model either individual lifetime (individual
level) or population survival distribution (population level). One may conjecture that
survival-analysis based EC populations should perform similar to the previous sto-
chastic populations since in both cases, probability distributions models are used to
control EC populations. My experiment results indeed confirm this conjecture [32].
However, since survival analysis is advanced to study time-to-event random variables
(also known as survival or failure time) and the lifetime of individuals in EC is a typical
time-to-event random variable, it does has some unique advantages such as capturing
the survival process mechanistically, and uniquely dealing with censoring (incomplete
Towards a Population Dynamics Theory for Evolutionary Computing 203
act and evolution cannot happen. Perhaps from this broad perspective, a term such as
ecological computing (computation) may be justified when the ecological principles
play more critical roles in EC.
0.6 1.4
0.5
-0.2
0.4
0.3
-1.8
0.2
M)
1
0.1 0 -3.4
0.0 -10
2000 -20
18 00 -30 -5.0
1600
/M
Po p 1200 -40
u la ti 1000
on S 800
600 -50 Log (M) of Fitness
iz e
Fig. 5. The relationship between mean fitness, Fig. 6. Taylor's Power law applied to fitness
population size and the fitness aggregation variance vs. mean fitness
index
References
1. Berryman, A.A.: Population regulation, emergent properties, and a requiem for den-
sity-dependence. Oikos 99(3), 600–606 (2002)
2. Bollobás, B.: Random Graphs, 2nd edn., 500 p. Cambridge University Press, Cambridge (2001)
3. Bollobás, B., Riordan, O.: Percolation, 334 p. Cambridge University Press, Cambridge
(2006)
4. Caswell, H.: Matrix Population Models, 2nd edn. Sinauer, Sunderland (2001)
5. Crowder, M.J.: Classical Competing Risks Analysis, 200 p. Chapman & Hall, Boca Raton
(2001)
6. DeJong, K.A.: An analysis of the behaviors of genetic adaptive systems. Ph.D. Thesis,
University of Michigan, Ann Arbor, MI (1975)
7. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg
(2003)
8. Goldberg, D.E., Rundnick, M.: Genetic algorithms and variance of fitness. Complex Sys-
tems 5(3), 265–278 (1991)
9. Goldberg, D.E., et al.: Genetic algorithms, Noise, and the Sizing of Populations. Complex
Systems 6, 333–362 (1992)
10. Hanski, I.: Metapopulation Dynamics. Oxford University Press, Oxford (1999)
11. Harik, G., Cantŭ-Paz, E., Goldberg, D.E., Miller, B.L.: The Gambler’s ruin problem, genetic
algorithms, and the sizing of populations. Evol. Comput. 7(3), 231–253 (1999)
12. Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics, 323 p.
Cambridge University Press, Cambridge (1998)
13. Hougaard, P.: Analysis of Multivariate Survival Data, 560 p. Springer, Heidelberg (2000)
14. Hutchinson, G.E.: The Ecological Theater and the Evolutionary Play. Yale University Press
(1966)
15. Ibrahim, J.G., Chen, M.H., Sinha, D.: Bayesian Survival Analysis, 481 p. Springer,
Heidelberg (2005)
16. Kingsland, S.E.: Modeling Nature, 2nd edn. University of Chicago Press (1995)
17. Kot, M.: Elements of Mathematical Ecology, 453 p. Cambridge University Press,
Cambridge (2001)
Towards a Population Dynamics Theory for Evolutionary Computing 205
18. Lande, R., Engen, S.: Stochastic Population Dynamics in Ecology. Oxford University Press,
Oxford (2003)
19. Lawless, J.F.: Statistical models and methods for lifetime data, 2nd edn. Wiley, Chichester
(2003)
20. Legendre, P., Legendre, L.: Numerical Ecology, 851 p. Elsevier, Amsterdam (1998)
21. Ma, Z.S.: New Approaches to Reliability and Survivability with Survival Analysis, Dy-
namic Hybrid Fault Models, and Evolutionary Game Theory. PhD Dissertation, University
of Idaho, 177 p. (2008)
22. Ma, Z.S., Bechinski, E.J.: Survival-analysis-based Simulation Model for Russian Wheat
Aphid Population Dynamics. Ecological Modeling 216(2), 323–332 (2008)
23. Ma, Z.S.: Why Should Populations be Dynamic in Evolutionary Computation?
Eco-Inspirations from Natural Population Dynamics and Evolutionary Game Theory. In:
The Sixth International Conferences on Ecological Informatics. ICEI-6 (2008)
24. Ma, Z.S., Bechinski, E.J.: Accelerated Failure Time Modeling of the Development and
Survival of Russian Wheat Aphid. Population Ecology 51(4), 543–548 (2009)
25. Ma, Z.S., Krings, A.W.: Survival Analysis Approach to Reliability Analysis and Prognostics
and Health Management (PHM). In: Proc. 29th IEEE–AIAA AeroSpace Conference, 20 p.
(2008a)
26. Ma, Z.S., Krings, A.W.: Dynamic Hybrid Fault Models and their Applications to Wireless
Sensor Networks (WSNs). In: The 11-th ACM/IEEE MSWiM 2008, Vancouver, Canada, 9 p.
(2008)
27. Ma, Z.S., Krings, A.W.: Dynamic Populations in Genetic Algorithms. In: SIGAPP 23rd
Annual ACM Symposium on Applied Computing (ACM SAC 2008), Brazil, March 16-20,
5 p. (2008)
28. Ma, Z.S., Krings, A.W.: Multivariate Survival Analysis (I): Shared Frailty Approaches to
Reliability and Dependence Modeling. In: Proc. 29th IEEE–AIAA AeroSpace Conference,
21 p. (2008)
29. Ma, Z.S.: Towards an Extended Evolutionary Game Theory with Survival Analysis and
Agreement Algorithms for Modeling Uncertainty, Vulnerability and Deception. In: Deng,
H., Wang, L., Wang, F.L., Lei, J. (eds.) AICI 2009. LNCS (LNAI), vol. 5855, pp. 608–618.
Springer, Heidelberg (2009)
30. Ma, Z.S.: Chaos Population Chaotic Populations in Genetic Algorithms (submitted)
31. Ma, Z.S.: Stochastic Populations, Power Law and Fitness Aggregation in Genetic Algo-
rithms (submitted)
32. Ma, Z.S.: Survival-Analysis-Based Survival Selections in Genetic Algorithms (in preparation)
33. Ma, Z.S.: Revised Taylor’s Power Law and Population Aggregation Critical Density. In:
Proceedings of Annual National Conference of the Ecological Society of China, Nanjing,
China (1988)
34. Ma, Z.S.: Further interpreted Taylor’s Power Law and Population Aggregation Critical
Density. Trans. Ecol. Soc. China 1, 284–288 (1991)
35. May, R.M.: Simple mathematical models with very complicated dynamics. Nature 261,
459–467 (1976)
36. May, R.M., McLean, A.R.: Theoretical Ecology. Oxford University Press, Oxford (2007)
37. Pastor, J.: Mathematical Ecology of Populations and Ecosystems. Wiley-Blackwell (2008)
38. Schuster, H.H.: Deterministic Chaos: an introduction, 2nd edn., 269 p. VCH Publisher (1988)
39. Vincent, T.L., Brown, J.L.: Evolutionary Game Theory, Natural Selection and Darwinian
Dynamics, 382 p. Cambridge University Press, Cambridge (2005)
40. White, T.C.R.: Opposing paradigms: regulation of limitation of populations. Oikos 93,
148–152 (2001)
Application of Improved Particle Swarm Optimization
Algorithm in UCAV Path Planning*
1 Introduction
The unmanned combat aerial vehicle is an experimental class of unmanned aerial vehi-
cle. It is likely to become the mainstay of the air combat force. In order to improve the
overall survival probability and the operational effectiveness of UCAV, an integrated
system is needed in the UCAV flight mission and pre-flight mission to make resource
coordination and determine the flight path. Flight path planning is one of the most im-
portant parts for the UCAV mission planning. That is to generate a path between an
initial location and the desired destination that has an optimal or near-optimal per-
formance under specific constraint conditions. The flight path planning in a large mis-
sion area is a typical large scale optimization problem. A series of algorithms have been
proposed to solve this complicated multi-constrained optimization problem, such as the
evolutionary computation[1], genetic algorithm[2],ant colony algorithm[3]. However,
those algorithms have their shortcomings, for instance, the results of planning fall into
local minimum easily, or they need to be further optimized to meet the needs of aerial
vehicle performance constraints. Particle swarm optimization algorithm (PSO), as a new
intelligent optimization algorithm has been tried to solve this problem due to the merits
*
This work was partially supported by Innovation Funds of Graduate Programs, Shaanxi Nor-
mal University, China. #2009CXS018.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 206–214, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Application of Improved Particle Swarm Optimization Algorithm 207
of rapid searching and easier realization[4]. But the result does not approach ideal
consequence. In this paper, we propose the method based on second-order oscillating
particle swarm optimization algorithm (SOPSO) to improve the optimization results for
UCAV path planning. The model for the path planning problem was built, then the de-
sign scheme and specific realization were given. Finally the rationality and validity of
the algorithm was analyzed based on the simulation experiments and the results.
xi (t + 1) = vi (t + 1) + xi (t ) (2)
iter and Maxiter indicate the current iteration and the max iteration respectively.
This algorithm is called particle swarm optimization with linear decrease inertia weight
(WPSO)[6].
Another classical improved PSO algorithm called particle swarm optimization with
constriction factor (CFPSO)[7] updates the evolution equation as equations (4) and (2):
vi (t + 1) = χ (vi (t ) + c1r1 ( pi − xi (t )) + c2 r2 ( pg − xi (t ))) (4)
x i (t + 1) = v i (t + 1) + x i (t ) (8)
In the second-order oscillating algorithm, the searching process is divided into two
phases, we can set ξ 1 , ξ 2 with different values to control that it will begin global search
or local search in accordance with the laws of convergence. We can
,
take ξ1 < (2 ϕ1 − 1) ϕ1 ξ 2 < (2 ϕ2 − 1) ϕ2 at prophase, for the oscillation convergence
and better global search capability; take ξ1 ≥ (2 ϕ1 − 1) ϕ1 , ξ2 ≥ (2 ϕ2 − 1) ϕ2 at anaphase
for the progressive convergence and better local search capability.
Modeling of the threat sources is the key task in UCAV optimal path planning. The
main threats in the flight environment include local fires, radar detection and so on.
Assume unmanned aerial vehicles fly in the high-altitude without altitude change. It
simplifies path planning to a two-dimensional path planning. Here, we suppose that the
threat source is radar detection, and all the radars in task region are same. The aerial
vehicles’ task region is showed in Figure l. There are some circular areas around the
threat points in target region which represent the threat scope.
The flight task for aerial vehicles is from point A to point B. View the starting point A
as the origin point, and the line between the starting point and the target point as X axes,
establish coordinate system. Divide AB into m sub-sections equally. There are m-1 ver-
tical lines between point A and point B. A path can be formed by connecting the points on
these parallel vertical lines. For example, a collection of points constitutes a route:
path = {A, L1(x1, y1), L2 (x2 , y2 ),L, Lm−1(xm−1, ym−1), B} (9)
where Li ( x i , y i ) denotes the point on the i -th vertical line.
Application of Improved Particle Swarm Optimization Algorithm 209
Since the distance between adjacent vertical lines are equal, the ordinal path points’
longitudinal coordinates just can be used to encode a path. In the SOPSO, we use a
particle to indicate a path, which is encoded like this: y1 y 2 , y 3 ,..., y m −1 . In this way, the
searching space is decreased a half by this encoding method. So it is propitious to speed
up the searching process of algorithm.
In this paper, only the horizontal path optimization is considered, so the mission can be
simplified to find a path which has a high probability of not being detected by enemy
radars and shortest flight distance from the start point to the target point. According to
the characteristic of radars, if (xt , y t ) denotes the location of a radar, P as the detec-
tive cost for the air vehicle at (x, y ) is given as follows:
⎧ RA4
⎪ 4 R ≤ RA
R + RA4
⎪⎪
P=⎨ (10)
⎪ 0 R > RA
⎪
⎪⎩
210 Q. Ma and X. Lei
Wi ,i +1 = dis(Li ,i +1 ) ∗ ∑ Pj
t
(11)
j =1
t indicates the number of radars. dis (Li ,i +1 ) means the distance between adjacent
waypoints. Because the evaluation of threat cost lies on the waypoints, the blind spots
on planning path lead to the failure of planning easily. The more waypoints, the better
for perceiving threats. However, especially in PSO, we increase in the number of
waypoints which means increasing in particle’s dimension, what would reduce the
accuracy of results and make the algorithm complication. Here, a novel method is
proposed to improve the algorithm performance. That does not add the particle’s di-
mension, but only add the points of threat perception. Select a number of perceptual
points on the line which connect the adjacent points. The threat cost expressed by
equation (11) becomes equation (12):
k indicates the perceiving point on sub-path connected by the i waypoint and the i + 1
waypoint. n is the number of perceiving points that we will calculate.
Besides threats detection, we also should consider making the flight distance as short
as possible. The flight distance is showed to be the sum of line distances between points
in the flight line. There are m sections in this path. The distance from point Li ( x i , y i )
in vertical line i to point Li +1(xi +1, yi +1 ) in vertical line i + 1 can be described as:
Where δ is coefficient in [0,1].It shows the impact both of avoiding threat and flight
distance , and the bigger δ is, the shorter the flight distance would be, but the more
dangerous the flight would be.
After the analysis above, the implementation steps of the SOPSO algorithm are in-
troduced as follows:
Application of Improved Particle Swarm Optimization Algorithm 211
STEP 1. Set the relative parameter of the algorithm, such as acceleration coeffi-
cients c1 , c 2 , the max iteration Maxiter , population scale N ,dimension of particle
d etc.
STEP 2. Initialize the velocity and position of particle: Generate an initial popula-
tion and a velocity randomly.
STEP 3. Compute the personal fitness value, update the personal best value and
global best value.
STEP 4. If current iteration iter < Maxiter / 2 , update the current locations of particles
according to the equation (7), (8), and set ξ1 < (2 ϕ1 − 1) ϕ1 , ξ 2 < (2 ϕ2 − 1) ϕ2 ; other-
wise, Update the current locations of particles according to the equation (7), (8) as well,
this time set ξ1 ≥ (2 ϕ1 − 1) ϕ1 , ξ2 ≥ (2 ϕ2 − 1) ϕ2 .Set iter=iter+1.
STEP 5. Check the ending conditions. If iter > Maxiter , exit it and the current
global best value is just the global optimal solution; otherwise, turn to step 3.
STEP 6. Stop the computation and output the correlative values of the optimal person.
Figure 2~4 show the best path found by using WPSO,CFPSO and SOPSO running
for 50 times. From the result comparison we can see that the path planed by SOPSO is
more accurate and smooth than the other two results and such a path is not only of
shorter flight distance but also can avoid the threats effectively, while the path found by
WPSO is not feasible.
The path showed in Figure 5 is planed by SOPSO in which the particle’s dimension
is the number of perceiving points in the algorithm before. The iteration number in this
algorithm is also 500. Compared the results in Figure 4 and Figure 5, the path showed in
Figure 4 is better than the other one in Figure 5 obviously.
Not only the flight distance of the former is shorter than the latter one, but also it has
a greater performance in avoiding the threats.
Table 1 gives the data comparison of path planning by different algorithms. Where,
fMin, fMax, fMean represent the min value, the max value and the mean value of the
results separately which are the min value of the object function obtained by running
programs for 50 times. The statistical success rate of UCAV avoiding the radar threats
are given through counting the times of avoiding threats successfully during the 50
times running.
Judging from the experimental results, it is obvious that the proposed algorithm
SOPSO has a higher success rate and it can find the feasible and optimal path for the
aerial vehicle, while the CFPSO cannot effectively escape the threat regions even
though the mean objective function value is smaller. From practical considerations, this
proposed method provides a new way for path planning of UCAV in exact application
in the future.
214 Q. Ma and X. Lei
5 Conclusion
In order to plan the path for UCAV, a model was built to solve this problem. SOPSO
algorithm was proposed to plan the path for UCAV in flight missions. Simulation re-
sults showed that the proposed method could find a path with shorter flight distance and
avoiding the threats effectively, which was much more suitable for practical applica-
tion. Our future work will focus on the exact application research in path planning with
complex conditions in this field.
References
1. Zheng, C.W., Li, L., Xu, F.J.: Evolutionary route planner for unmanned air vehicles. IEEE
Transactions on Robotics and Automation, 609–620 (2005)
2. Wang, Y.X., Chen, Z.J.: Genetic algorithms (GA) based flight path planning with constraints.
Journal of Beijing University of Aeronautics and Astronautics, 355–358 (1999)
3. Ye, W., Ma, D.W., Fan, H.D.: Algorithm for low altitude penetration aircraft path planning
with improved ant colony algorithm. Chinese Journal of Aeronautics, 304–309 (2005)
4. Chen, D., Zhou, D.Y., Feng, Q.: Route Planning for Unmanned Aerial Vehicles Based on
Particle Swarm Optimization. Journal of Projectiles, Rockets, Missiles and Guidance,
340–342 (2007)
5. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: Proceedings of the IEEE In-
ternational Conference on Neural Network. IEEE Service Center, pp. 1942–1948. IEEE
Press, New Jersey (1995)
6. Shi, Y., Eberhart, R.: A Modified Particle Swarm Optimizer. In: Proceedings of the IEEE
International Conference on Evolutionary Computation. IEEE Service Center, pp. 69–73.
IEEE Press, Piscataway (1998)
7. Clerc, M.: The swarm and the queen: towards a deterministic and adaptive particle swarm
optimization. In: Proceedings of the Congress on Evolutionary Computation, pp. 1951–1957.
IEEE Service Center, Piscataway (1999)
8. Hu, J.X., Zeng, J.C.: Two-order Oscillating Particle Swarm Optimization. Journal of System
Simulation 19, 997–999 (2007)
Qianzhi Ma. She was born in March 1985. Her main research fields involve intelligent
optimization and path planning. She is a graduate student in Shaanxi Normal University.
Xiujuan Lei. She was born in May 1975. She is an associate professor of Shaanxi Normal
University. She has been published more than 20 pieces of papers. Her main research
fields involve intelligent optimization, especially particle swarm optimization, etc.
Modal Analysis for Connecting Rod of Reciprocating
Mud Pump
1 Introduction
Modal analysis is an effective method to determine vibration mode shapes and weak
parts of the complex mechanical system. Connecting rod is an important component
of the reciprocating mud pump dynamic system, it is not only a transmission compo-
nent but also a moving parts, at the same time it must withstand variable load such as
tensility, compress force and bending in the working process, so its use reliability has
a great significance for the normal operation of reciprocating mud pump[1]. There-
fore, dynamic characteristics study on the connecting rod has become an important
part of design. In this paper, a modal analysis was applied to connecting rod of recip-
rocating pump by ANSYS software, the main purpose of analysis is to identify the
model parameters of connecting rod such as frequency, vibration mode shapes and
provide a basis for structural dynamics analysis and the follow-up optimal design of
connecting rod.
*
Corresponding author.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 215–220, 2009.
© Springer-Verlag Berlin Heidelberg 2009
216 Z. Tong, H. Liu, and F. Zhu
In (1), [M] is the mass matrix, [C] is the damping matrix, [K] is the stiffness matrix,
{ X } is the displacement response column vector of node, { X } = { x1 , x2 ,......, xn } ,
T
[ M ]{ X&&} + [ K ]{ X } = 0 (2)
Any free vibration can be regarded as a simple harmonic vibration, a hypothesis was
put forward as shown in the following.
Put (3) into (2), some equations were shown in the following.
([ K ] − ω [ M ]{φ}) = 0
2
(4)
[ K ]{φ} = λ [ M ]{φ}
In these equations, ω is the Circular frequency; {φ } is the Characteristic column vec-
tor; λ ( λ = ω ) is the eigenvalue.
2
In the process of free vibration, not each point amplitude is zero, so (4) can be be-
come into (5).
[ K ] − ω 2 [M ] = 0 (5)
Modal Analysis for Connecting Rod of Reciprocating Mud Pump 217
Equation (5) is the undamped vibration system characteristic equation. Modal analysis
can also be considered to solve the eigenvalues ω 2 in (4). In the mechanical engi-
neering, mass matrix and stiffness matrix can be seen as a symmetric positive definite
matrix, and the eigenvalue number obtained through equation computing are equal to
the order N of matrix, that is, there are N natural frequency ( ωi (i = 1, 2,..., n) ). For
each natural frequency, a column vector consisting of N nodes amplitude can be de-
termined by the (4), namely, the vibration mode of structure ( {φi } ).
ANSYS software provides some modal extraction methods such as subspace, slock
method, curtail method, dynamic power, non-symmetric, damping method, QR
damped method and so on. In this paper, the subspace iteration method was chosen
which had the advantages of high accuracy and arithmetic stability compared with
other methods [5].
For the complex mechanical systems, CAD software is commonly used to build
model in order to enhance the speed of modeling, and then the model is imported into
CAE software to analyze through the appropriate interface file. The CAD model may
contain a number of design details which often are not based on the strength of the
structure, it would cause model complication if the model retain these details in the
process of modeling, at the same time, it is bound to increase the number of units,
even cover up the principal contradiction of problem and cause negative impact for
the results of the analysis [6]. Therefore, the local structure can be simplified that has
less impact on the results.
Reciprocating mud pump commonly uses the crankshaft to facilitate assembly and
adjusts the clearance of bushing, connecting rod must be produced into the separate
structure. In order to simplify the analysis, the connecting rod and rod cap can be seen
as a whole when modeling with Pro/E. A three-dimensional solid model of the recipro-
cating pump connecting rod was built by Pro / E software platform, then it was saved
with IGES format and was imported into finite element analysis software-ANSYS.
The materials of connecting rod are ZG35CrMo, its material properties are shown in
table 1. The geometric model was divided into grid with 8-node Solid45 cells, those
dividing precision is three levels and it was divided by the ANSYS software auto-
matically, the finite element model of connecting rod is shown in Fig. 1.
(a) (b)
(c) (d)
(e) (f)
Fig. 2. Six vibration mode shapes of connecting rod: (a) first vibration mode shape with fre-
quency 136.6 Hz; (b) second vibration mode shape with frequency 186.8 Hz; (c) third vibration
mode shape with frequency 230.5 Hz; (d) fourth vibration mode shape with frequency
283.6Hz;(e) fifth vibration mode shape with frequency 435.0Hz; (f) sixth vibration mode shape
with frequency 637.2 Hz.
220 Z. Tong, H. Liu, and F. Zhu
in the process of vibration, the sixth mode shape of vibration is not obvious, it is
mainly bending along with the X and Y axis. Bending vibration of the connecting rod
will make piston relation to sleeve of cylinder and the neck part of crank relation to
bearings deflect, which can bring about additional stress and cause cracks and damage,
so this situation should be focused on considering in the design to avoid damage [9].
5 Conclusion
Dynamic characteristic analysis of connecting rod is supplement and development for
static design, it is an important means to do rational design and increase using reliability
for connecting rod structure. The inherent vibration characteristics of reciprocating
pump connected rod were analyzed by ANSYS software, which included the natural
frequencies and mode shapes. Finally, more accurate and intuitive results were obtained.
These results showed that the stress of connecting rod should be commonly concentrate
at connect point of large and small parts, but stress concentration at the central of con-
necting rod is also more obvious, therefore, the traditional design concepts should be
changed and taken fully into account the central of connecting rod design.
References
1. Yang, B., He, Z., Peng, X., Xing, Z.: FEM Analysis on the Valve of Linear Compressor.
Fluid Machinery 33, 24–27 (2005)
2. Yu, B., Huang, Z.: Finite Element Modal Analysis of Rotor in Centrifugal Pump. Mechani-
cal Engineers, 108–109 (June 2005)
3. Han, S., Hao, Z.: The Finite Element Modal Analysis of 4108Q Diesel Engine Connecting
Rod. Vehicles and Power Technology, 37–40 (April 2002)
4. Dai, W., Fan, W., Cheng, Z.: Modal Analysis of Diesel Engine Connecting Rod Based on
ANSYS. Mechanical Engineering and Automation, 39–41 (April 2007)
5. Zhang, G.X., Zhong, Y., Hong, X., Leng, C.L.: Microelectromechanical systems motion
measurement and modal analysis based on Doppler interferometry. J. Vac. Sci. Technol.,
1251–1255 (May 2009)
6. Chen, W.: Dynamic modeling of multilink flexible rod botic manipulators. Computers and
Structures, 183–195 (February 2001)
7. Tlusty, I.F.: Dynamic structural identification task methods. In: Annals of CIRP, February
1980, pp. 260–262 (1980)
8. Kim, D.: Parametric studies on static and dynamic performance of air foil bearings with dif-
ferent top foil geometries and bump stiffness distributions. Tribology, 354–364 (February
2007)
9. Segalman Daniel, J., Roy Anthony, M., Starr Michael, J.: Modal analysis to accommodate
slap in linear structures. Vibration and Acoustics, 303–317 (March 2006)
A Time-History Analysis Algorithm of a Non-viscously
Damped System Using Gauss Precise Integration
1 Introduction
The damping characteristics and dynamic responses of a vibrating system is a key
topic in the structural dynamic analysis. The viscous damping model, first advanced
by Lord Rayleigh in 1877 [1], is usually used to simulate energy dissipation in a
structural dynamic response. It is however, well known that the classical viscous
damping model is a mathematical idealization and the true damping model is likely to
be different. Moreover, increasing use of modern composite materials and intelligent
control mechanisms in aerospace and automotive industries demands sophisticated
treatment of dissipative forces for proper analysis and design. As a result, there has
been an increase in interest in the recent years on the investigation for other damping
models which can better reflect the energy dissipation of damping and many re-
searchers also began to study the dynamic analysis method involved [2-7].
At the same time, the time-history analysis method is more and more widely used
due to the rapid construction of high rise buildings and the fast development of the
structural dynamic analysis theory and computational technology. At present, the
widely used methods include linear accelerating method, Wilson-θ method [8], New-
mark-β method [9], etc. Most recently, the precise time-integration method (PIM)
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 221–230, 2009.
© Springer-Verlag Berlin Heidelberg 2009
222 H. Shen and Z. Duan
proposed by Zhong et al. attracted lots of attention [10-12]. It can obtain a nearly
perfectly accurate solution as a numerical method, and therefore gives a very useful
idea in time history analysis. However, an inversion of matrix is needed in solving the
inhomogeneous equations during the structural dynamic analysis by PIM [13-14]. It is
known that both the accuracy and stability get worse in an inverse calculation, so the
application of this method is limited.
The time-history analysis method of a convolution integral non-viscously damped
system is studied in this paper. This model is first advanced by Biot about viscous
materials [15]. It is assumed that the non-viscous damping forces depend on the past
history of velocities via convolution integrals over exponentially decaying kernel
functions. The advantage of this model is its generality compared with traditional
viscous models. It can describe various damping mechanism by selecting different
kernel function types [16] and is more accurate in expressing the time-delay effect.
The dynamic analysis of non-viscous damped system of this paper is based on the
condition that the kernel function is exponential function [17].
Obviously, the conventional numerical integration methods can not be directly
used in this non-viscous damped system due to the change of its mathematic model.
Adhikari and Wagner gave a state-space form of this non-viscous damped system
[16], and also proposed a mode superposition method of the state-space (MSM) and a
direct time-domain integration method (DTIM) to study the system dynamic re-
sponses [18]. According to some studied cases, the mode superposition method has a
good accuracy, but it needs large computation. The condition of the direct time-
domain integration is just inversed. Accordingly, it is significant to propose a method
which has both high accuracy and efficiency of computation. In this paper, the dy-
namic response state equation of the non-viscous damped system in [16] was first
transformed. Then a modified precise integration without the inversion of matrix is set
up with basic principles of the precise integration and the Gauss-Legendre quadrature.
Under the acting of discrete load (such as earthquake wave), the cubic spline interpo-
lation is used to calculate the function values of the Gauss integral point in recursion
expressions. Finally, a numerical simulation is given to demonstrate the efficiency of
the method.
The non-viscous damping model in this paper considers the damping force relates
with the velocity time history by a convolution integral between the velocity and a
decaying kernel function in its mathematical formulation. The damping force can be
expressed as
t
Fd (t ) = ∫ G (t − τ )x& (τ )dτ (1)
0
n n
G (t ) = ∑ Ck g k ( t ) = ∑ C k µ k e − µk t , t≥0 (2)
k =1 k =1
Obviously, Eq. (3) describes a non-viscous damped dynamic system. The state-space
method has been used extensively in the modal analysis of the non-proportional vis-
cous damped. We expect to extend the state-space approach to this non-viscous
damped system and simplify the expressions of damping and equation, in order to
find a step-by-step integral algorithm to analyse the system dynamic responses. Re-
cently Wagner and Adhikari have proposed a state-space method for this damped
system [16]. Here we briefly review their main results.
y k ( t ) = ∫ µ k e − µk (t −τ ) x& (τ )dτ ∈ R N ,
t
∀k = 1,K , n (5)
0
Using additional state-variables v ( t ) = x& ( t ) , Eq. (3) can be represented in the first-
order form as
Bz& ( t ) = Az ( t ) + r ( t ) (7)
224 H. Shen and Z. Duan
where
⎡ ∑ n Ck M −C1 µ1 L −Cn µn ⎤
⎢ k =1 ⎥
⎢ M O O O O ⎥
B = ⎢ −C µ O C1 µ12 O O ⎥∈R
⎥ m× m
(8)
⎢ 1 1
⎢ M O O O O ⎥
⎢ ⎥
⎣⎢ −Cn µ n O O O Cn µn2 ⎦⎥
⎡ −K O O O O ⎤
⎢O M O O O ⎥
⎢ ⎥
A = ⎢ O O −C1 µ1 O O ⎥ ∈ R m× m (9)
⎢ ⎥
⎢O O O O O ⎥
⎢⎣ O O O O −Cn µ n ⎥⎦
⎧f ( t ) ⎫ ⎧ x (t ) ⎫
⎪ ⎪ ⎪ ⎪
⎪⎪ 0 ⎪⎪ ⎪ v (t ) ⎪
⎪ ⎪
r (t ) = ⎨ 0 ⎬ ∈ Rm , z ( t ) = ⎨ y1 ( t ) ⎬ ∈ R m (10)
⎪ M ⎪ ⎪ M ⎪
⎪ ⎪ ⎪ ⎪
⎪⎩ 0 ⎪⎭ ⎩⎪ y n ( t ) ⎪⎭
where A and B are the system matrices in the extended state-space. They are sym-
metric matrices, and the order of the system is m = 2 N + nN ; z ( t ) is the extended
state-vector, and r ( t ) is the force vector in the extended state-space, O is a n-order
null matrix, rank ( Ck ) = N , ∀k = 1,K , n .
It can be seen that, when µ → ∞ , Eq. (5) change into y ( t ) = x& ( t ) , Eq. (3) will
change into a equation of viscous damped system.
R Tk Ck R k = d k ∈ R rk × rk (11)
y k ( t ) = R k y% k ( t ) (12)
where
⎡ ∑ n Ck M −C1R1 µ1 L −C n R n µ n ⎤
⎢ k =1
⎥
⎢ M O N ,N O N , r1 L O N , rn ⎥
⎢ ⎥
B% = ⎢ −R1T C1 µ1 O TN , r1 R1T C1 R1 µ12 L O r1 , rn ⎥∈R
m% × m%
(14)
⎢ ⎥
⎢ M M M O M ⎥
⎢ − R nT Cn µn O TN , rn O Tr1 , rn L R n Cn R n µn ⎥⎦
T 2
⎣
⎡ −K ON ,N O N , r1 L O N , rn ⎤
⎢ ⎥
⎢O N , N M O N , r1 L O N , rn ⎥
% = ⎢ OT
A OTN , r1 −R1T C1R1 µ1 L O r1 , rn ⎥ ∈ R m×m
(15)
⎢ N , r1 ⎥
⎢ M M M O M ⎥
⎢ T ⎥
⎢⎣O N , rn O TN , rn O Tr1 , rn L − R Tn Cn R n µ n ⎥⎦
⎧f ( t ) ⎫ ⎧ x (t ) ⎫
⎪ ⎪ ⎪ ⎪
⎪ 0N ⎪ ⎪ v (t ) ⎪
⎪ ⎪ ⎪ ⎪
r% ( t ) = ⎨ 0r1 ⎬ ∈ R m% , z% ( t ) = ⎨ y% 1 ( t ) ⎬ ∈ R m% (16)
⎪ M ⎪ ⎪ M ⎪
⎪ ⎪ ⎪ ⎪
⎪⎩ 0rn ⎪⎭ ⎪⎩ y% n ( t ) ⎪⎭
n
It can be seen that the order of the system is m% = 2 N + ∑ rk , and each symbolic sig-
k =1
nificance is same as case A. when All Ck Matrices are of Full Rank, each R k matrix
can be chosen as the identity matrix and (13) is same to (7), so we can consider full
rank as a special case.
After obtaining the system matrices of state-space, the eigenvalue Λ and eigenvec-
tor Ψ of system matrices A and B (or A % and B% ) are solved through the solution
⎧Ψ ⎫
method of two order eigenvalue problem and satisfied equation ( AΛ + B ) ⎨ ⎬=0
⎩ ΨΛ ⎭
where Λ is a diagonal matrix; then the common mode superposition method is ap-
plied to solve the system dynamic time-history analysis considering the initial condi-
tion. The time-history analysis result calculated by this method under the ideal state of
226 H. Shen and Z. Duan
complete model is the theoretical accurate solution, but the application of the method
is limited by inefficiency calculation and demanding computing time.
Adhikari presented the direct time-domain integration method in order to solve the
shortcoming of wasting large computing time of the mode superposition method [18].
Through direct integration of (7) or (13) at every time step, the time-domain recursion
expression of the state vector z ( t ) or z% ( t ) is obtained and the explicit recursive
equations of the displacement and velocity are derived. The method can significantly
increased the efficiency of the system time history analysis due to avoiding the large
scale calculation of entire system matrices and state vector, but the direct integration
approximate treatment of state equation induces decreasing calculation accuracy
which will be discussed in the following numerical example.
Considering computational efficiency and calculation accuracy, the precise time-
integration method presented by Mr. Zhong [10] is applied to treat and solve the system
state space equation. The basic idea of the precise time-integration method is to change
the conventional two order system equation into one order expression, and the general
solution of the system equation is subsequently transformed into the exponential matrix
and then the 2N algorithm is adopted to calculate the exponential matrix to complete the
system time history analysis by applying the additional theorem of exponential function.
The errors of the precise time-integration method are only originated from the exponen-
tial matrix expansion besides the arithmetic errors of the matrix multiplication. The
more accuracy analysis indicates that the exponential matrix expansion is very small for
a damped vibration system [11], so the numerical results of the precise time-integration
method can meet the exact solution; moreover, the main body of this method is the add
operation applied for solving the exponential matrix and the method can embody the
technical characteristic of the computes and increase the computational efficiency of the
algorithm. However, there is an inverse matrix in the precise time-integration equation
for the nonhomogeneous dynamic equation of the structure under external force
[13][14] and there are problems such as loss calculation accuracy, stabilization and non-
existence of inverse matrix in the process of matrix inversion. The problems always
exist in the multi-DOF system and are particularly serious for matrices with high dimen-
sions. In this paper, a modified precise time-integration method without the inversion of
matrix is proposed combining with the basic principles of the precise time-integration
and Gauss-Legendre integration.
Considering the general case when the Ck matrices are of rank deficient, rearrange
the state-space Eq. (13) as
& = HV + F
V (17)
where V = z% (τ ) , H = B% -1 A
% , F = B% -1r% (τ ) . In order to describe the algorithm, the
exponential matrix is defined as
e Hτ ≈ I + Hτ + ( Hτ ) 2 + ( Hτ ) 6 + ... + ( Hτ ) n ! + ...
2 3 n
(18)
A Time-History Analysis Algorithm of a Non-viscously Damped System 227
Multiplying both sides of (17) by e −Hτ and integrating it from t to tk after organiza-
tion yields
e − Hτ ( V
& - HV ) = e − Hτ F , ∫ d (e t
tk
− Hτ
V ) = ∫ e − Hτ F (τ ) dτ
t
tk
(19)
then
V (t ) = e Vk + e Ht ∫ e − Hτ F (τ ) dτ
H ( t − tk ) t
tk
(20)
2 j =0 ⎝ 2 ⎠
τ
H (1− t j )
where Tτ = e Hτ , Tj = e 2
. Eq. (22) is the step-by-step integral recursion expres-
sion of solution vector Vt from solution vector Vk of tk . It suggests that, there is not
any inverse matrix in the expression. Under the acting of discrete load (such as earth-
quake wave), the cubic spline interpolation is used to calculate the F ( t j ) of the
Gauss integral point.
Now, let’s discuss the precise computation of exponential matrix Tτ , the additional
theorem of the exponential function is given by the identity
Tτ = e Hτ = ( e Hτ )
M M
(23)
= ,
where M = 2 N , generally N 20 Because τ should be a small time interval,
∆t = τ M is an extremely small time interval. The exponential matrix Tτ departs
from the unit matrix I into a very small extent. Hence it should be distinguished as
Tτ = [ I + Ta ] = [ I + Ta ]
2 N −1
× [ I + Ta ]
2N 2 N −1
(25)
Such factorization should be iterated N times, and the high-precision value of Tτ will
be obtained, so will the Tj . By substituting them into (22), we can realize an efficient
high-precision time-history analysis of this non-viscously damping system.
4 Numerical Example
Here a two-dimension and three-story frame is used. It has 12 DOF, each node has a
rotational DOF and each story has a lateral DOF. This example only is applied to test
the performance of the proposed method, and the damping model will not be dis-
cussed in detail. Therefore, the non-viscously damping model of system can be built
by one exponential function, and the motion equation of system is described in (3)
(the value of n takes 1) and mechanical model is shown in Fig 1. Otherwise, to test the
calculated accuracy and the method of dealing with the discrete load of the proposed
method, it is necessary to consider the displacement of the second floor under the
acting of sinusoidal wave sin(π t ) and El-Centro wave respectively.
For the numerical calculation we have assume
E = 3 × 107 N / m 2 , mcolumn = 300kg , mbeam = 100kg , mst or y = 2000kg ,
Acolumn = 0.6 × 0.6m 2 , Abeam = 0.2 × 0.4m 2 µ1 = 1 γ 1Tmin , γ 1 = 0.05 , Tmin = 2π ωmax ,
ξ1ω2 − ξ 2ω1 ξ ω −ξ ω
C1 = α M + β K , α = 2ω1ω2 , β = 2 2 22 12 1 , ξ1 = 0.02 , ξ 2 = 0.05 .
ω2 − ω1
2 2
ω2 − ω1
20
MSM
GPIM
15
10
U2(mm)
-5
-10
-15
0 5 10 15 20 25 30
Time(s)
Fig. 1. Twelve-DOF steel frame with non- Fig. 2. Horizontal displacement of U 2 under
viscous damping El-Centro wave
Firstly, three types of methods are used to calculate the displacement of structure
under the acting of sinusoidal wave sin(π t ) , and the time steps are taken 0.02s and
0.5s, respectively. The number of Gauss integral points is taken 2, that is, m=2. Here
the results obtained by the modal superposition method (MSM) [16] are deemed as
A Time-History Analysis Algorithm of a Non-viscously Damped System 229
the exact solution. If the step takes small value, the results obtained by all the three
methods are similar, as shown in Fig. 3a. If the big step is taken, the results obtained
by the direct time-domain integration method (DTIM) and precise time-integration
method (PIM) generate big errors, while the proposed method (GPIM) can still obtain
the exact results, as shown in Fig. 3b. It is obvious that the order of accuracy of GPIM
depends on the number of Gauss integral point and the size of step τ , it is selectable.
However, no matter whether the function fitting or the numerical calculation,
bothDTIM and PIM methods may generate big integral error when these two methods
are applied to analyse the structural response under the acting of discrete load, since it
is inevitable to calculate the integration and difference of loading function in these
two methods. The proposed method can overcome the disadvantages of the above two
methods because there are no integral and differential items in recursion expression
and the cubic spline interpolation is used to calculate the Gauss integral point of load.
The displacement of the second floor under the acting of EI-Centro wave is calculated
by the proposed method, and the results are shown in Fig. 2.
0.4 0.4
MSM DTIM PIM GPIM MSM DTIM PIM GPIM
0.3 0.3
0.2 0.2
0.1 0.1
U2(mm)
U2(mm)
0 0
-0.1 -0.1
-0.2 -0.2
-0.3 -0.3
-0.4 -0.4
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Time(s) Time(s)
It is obvious that the calculation accuracy and amount of the proposed method de-
pend on the number of Gauss integral point and the size of step τ . Comparing with
conventional precise time-integration methods, the proposed method aggravates the
=
calculating burden of exponential matrix, while the calculating accuracy is improved
obviously when only m 2 , so the calculation amount is not significantly increased.
On the other hand, the relative big integral step can be implemented, which can
lighten the calculating burden evidently. As a result, the proposed method could im-
prove both the calculating accuracy and the efficiency.
5 Conclusions
The time-history analysis method of a convolution integral non-viscously damped system
is studied in this paper, based on the state-space form of this non-viscous damped system
given by Adhikari and Wagner. By combining basic principles of the precise integration
230 H. Shen and Z. Duan
and the Gauss-Legendre quadrature, a new algorithm to analyze the dynamic response of
this damping system in time domain is proposed. Because there are no inverse matrix
items in the proposed method, the advantage is more significant when the method is
applied to large structures with multi-DOFs in which nonsingular phenomenon of system
matrix occurs easily. A numerical example shows that the proposed method can both
ensure the high calculating accuracy and lighten the calculating burden by increasing the
integral step. For the discrete load, it is more effective and reliable to operate.
References
1. Rayleigh, L.: Theory of Sound, 2nd edn., vol. 2. Dover Publications, New York (1877);
(1945 re-issue)
2. Dong, J., Deng, H., Wang, Z.: Studies on the damping models for structural dynamic time
history analysis. World Information on Earthquake Engineering 16(4), 63–69 (2000)
(in Chinese)
3. Liang, C., Ou, J.: Relationship between Structural Damping and Material Damping. Earth-
quake Engineering and Engineering Vibration 26(1), 49–55 (2006) (in Chinese)
4. Woodhouse, J.: Linear damping models for structural vibration. Journal of Sound and
Vibration 215(3), 547–569 (1998)
5. Maia, N.M.M., Silva, J.M.M., Ribeiro, A.M.R.: On a general model for damping. Journal
of Sound and Vibration 218(5), 749–767 (1998)
6. Li, Q.S., Liu, D.K., Fang, J.Q., Jeary, A.P., Wong, C.K.: Damping in buildings: its neural
network model and AR model. Engineering Structures 22, 1216–1223 (2000)
7. Adhikari, S.: Damping modelling using generalized proportional damping. Journal of
Sound and Vibration 293, 156–170 (2006)
8. Wilson, E.L.: Nonlinear dynamic analysis of complex structures. Earthquake Engineering
and Structural Dynamics 1(3), 241–252 (1973)
9. Newmark, N.M.: A method of computation for structural dynamics. Journal of Engineer-
ing Mechanics 85(3), 249–260 (1959)
10. Zhong, W.: On precise time-integration method for structural dynamics. Journal of Dalian
University of Technology 34(2), 131–136 (1994) (in Chinese)
11. Zhong, W.: On precise integration method. Journal of Computational and Applied Mathe-
matics 163, 59–78 (2004)
12. Zhong, W., Zhu, J., Zhong, X.: A precise time integration algorithm for nonlinear systems.
In: Proc. of WCCM-3, vol. 1, pp. 12–17 (1994)
13. Qiu, C., Lu, H., Cai, Z.: Solving the problems of nonlinear dynamics based on Hamilto-
nian system. Chinese Journal of Computational Mechanics 17(2), 127–132 (2000)
(in Chinese)
14. Lu, H., Yu, H., Qiu, C.: An integral equation of non-linear dynamics and its solution
method. Acta Mechanica Solida Sinica 22(3), 303–308 (2001) (in Chinese)
15. Biot, M.A.: Linear thermodynamics and the mechanics of solids. In: Proc. of the third US
national congress on applied mechanics, pp. 1–18. ASME Press, New York (1958)
16. Wagner, N., Adhikari, S.: Symmetric state-space formulation for a class of non-viscously
damped systems. AIAA J. 41(5), 951–956 (2003)
17. Adhikari, S., Woodhouse, J.: Identification of damping: Part 1, viscous damping. Journal
of Sound and Vibration 243(1), 43–61 (2001)
18. Adhikari, S., Wagner, N.: Direct time-domain integration method for exponentially
damped linear systems. Computers and Structures 82, 2453–2461 (2004)
Application of LSSVM-PSO to Load Identification in
Frequency Domain
MOE key lab for strength and vibration, Xi’an Jiaotong University,
Xi’an, China
{hdkjll,maowt.mail,zjw76888,yanguir}@gmail.com
1 Introduction
There are some loads such as vibration, noise etc. acting on aircraft during the trans-
portation, launch, and flight phase. It is commonly recognized that the acquisition of
these loads facilitate not only the designing of strength and rigidity of aircraft and
reliability of internal instruments, but also ground environmental experiments. In
practice, it is difficult to obtain accurate loads directly by experiments on the ground
or theoretical analysis [1] because of the complexity of aircraft structure and bound-
ary conditions of the structure and the corresponding loading conditions. However,
the responses of aircraft structure such as acceleration, displacement, strain, etc. can
be easily measured by the sensors in applications. Usually, it is using the structural
responses that loads are identified. This process is frequently referred to as “load
identification”.
Because many problems are treated in frequency domain in aerospace industry, the
load identification problem in frequency domain is more concerned. The basic idea of
the traditional frequency approach of load identification is that [2] the transfer func-
tion model of system is established and the loads are identified through the inverse of
the transfer function and the responses. However the transfer function matrix is often
ill-posed in resonance region, as a result, the precision and stability of load identifica-
tion are not as good as expected.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 231–240, 2009.
© Springer-Verlag Berlin Heidelberg 2009
232 D. Hu et al.
It is well considered that the relationship between loads and responses only de-
pends on the structure itself and boundary conditions [1], and is inherent once the
structure and the boundary conditions are determined. Thus the load identification
process can be understood as seeking the relationship between loads and responses
when the structure and the boundary conditions are determined.
The support vector machine (SVM) is a new tool to find the complex relationship
between variables in recent years. It can get the best compromise between the com-
plexities of the model and the learning ability according to the limited samples. The
inner product operation in high-dimensional space is transformed into the kernel func-
tion operation in low-dimensional space by the kernel function method, which effec-
tively solves the "dimension disaster" problem. The Least Squares Support Vector
Machine (LSSVM) is a new SVM model proposed by Suykens[3]. The LSSVM uses
least squares linear constraints as a loss function, which simplifies the convex optimi-
zation problem in the traditional SVM to a set of linear equations in LSSVM. It has
many advantages [3]: operation simple, fast convergence, high precision, etc. and is
widely used in estimating nonlinear function, etc.
Taking advantages of the SVM and LSSVM described above, this paper firstly use
LSSVM to search for the relationship between loads and responses of aircraft in fre-
quency domain, and establish a model of load identification. Secondly, because the
performance of LSSVM is determined by the hyper-parameters (the regularization
parameter and kernel parameters are called hyper-parameters) of LSSVM, the particle
swarm optimization (PSO) [5] is used to find optimal parameters based on the best
generalization ability of the load identification model. Finally, the identification of
load of the simulation model and random vibration experiments are conducted with
LSSVM-PSO.
For linear deterministic (or stochastic) system, the relationship between inputs
F (ω ) (or S FF (ω ) ) and outputs x(ω ) (or S xx (ω ) ) can be described as the following
function:
x(ω ) = H (ω ) F (ω )
(1)
S xx (ω ) = H (ω ) S FF (ω ) H H (ω )
H (ω ) = ( K − ω 2 M + iω C ) −1 (2)
where M , K and C are mass matrix, stiffness matrix and damping matrix.
When the responses x(ω ) (or S xx (ω ) ) are known, according to Eq.(1), the loads can
be obtained as the following form:
Application of LSSVM-PSO to Load Identification in Frequency Domain 233
F (ω ) = H (ω )−1 x(ω )
(3)
S FF (ω ) = [ H H (ω ) H (ω )]−1 H H (ω ) S xx (ω ) H (ω )[ H H (ω ) H (ω )]−1
According to Eq.(3), the identification of loads need the inverse of transfer function,
but if the condition number of H (ω ) is large, it would lead to numerical instability of
load identification. Because there are noise in the measurement and some nonlinear
factors in the structure, the relationship between loads and responses is often nonlin-
ear, written as the following function:
F (ω ) = f (ω , x(ω ))
(4)
S FF (ω ) = f (ω , S xx (ω ))
The SVM can transform the nonlinear problem in low-dimensional space into a linear
problem in high-dimensional space by a nonlinear mapping function. According to
the theory of SVM [6], if the x(ω ) (or S xx (ω ) ) are the inputs of SVM, F (ω ) (or S FF (ω ) )
are the outputs of SVM, and then the nonlinear function f (⋅) can be viewed as a
regression model of SVM. Application SVM to load identification is seeking the
relationship between loads x(ω ) (or S xx (ω ) ) and responses F (ω ) (or S FF (ω ) ), and estab-
lishing the regression model of f (⋅) .
LSSVM is a new SVM model, whose advantage has been described above. The re-
gression of LSSVM can be formulated as following: Given a training set { xi , yi }li=1 ,
where xk ∈ R n is the kth input pattern and yk ∈ R is the kth output pattern, the follow-
ing optimization problem can be written as follows:
1 C l
min J ( w , ξ ) = || w 2 || + ∑ (ξi ) 2
2 2 i =1 (5)
s.t : yi = w • φ ( xi ) + b + ξi , i = 1, 2,..., l
T
where φ (•) is a nonlinear function which maps the input space into a higher dimen-
sional space. C is the regularization parameter and ξi is the error function.
In order to solve the optimization problem above, constructing Lagrange can be
written as follows:
l
L( w , b, ξ , α ) = J ( w , ξ ) − ∑ α i {w T φ ( x i ) + b + ξi − yi } (6)
i =1
where I = (1,1,...,1)T , y = ( y1 , y2 ,..., yl )T , α = (α1 , α 2 ,..α l )T K = {Kij = K ( xi , x j )}li , j =1 is the kernel ma-
trix. where K ij ( x, y ) is kernel function, described as the following function: .
K ij ( x, y ) = φ ( xi )T • φ ( yi ) (8)
α and b can be obtained by solving the set of linear Eq.(7). Then the decision func-
tion of LSSVM can be obtained as following:
l
y ( x) = f (⋅) = ∑ α i K ( x, xi ) + b (9)
i =1
According to Eq.(8) and Eq.(9), the final result is not related to the specific mapping
function, but related to the inner product of kernel function. Generally, the kernel
function is chosen firstly, then α and b can be calculated, and the regression predic-
tion can be achieved according to Eq.(9).
Commonly practicable kernel functions including:
2
Gauss kernel: K ( x , x' ) = exp(- x - x' /σ)
{ )}
l
PRESS (θ ) = ∑ ri(
2
−i
(10)
i =1
^ (− i ) αi ⎡ K + C −1I 1⎤
where ri( −i ) = yi − y i = ,Q=⎢ ⎥.
Qii−1 ⎣⎢1
T
0 ⎦⎥
The aim of model selection is to choose the best hyper-parameters by optimizing
the generalization ability of LSSVM. In practice, users usually adopt grid search
method and gradient-based methods. The former requires re-training the model many
times, and the latter commonly falls into the local minimum. As a kind of popular
method, swarm intelligence algorithm has been proved to be an efficient and robust
method for many non-linear, non-differentiable problems. In this paper, particle
swarm optimization (PSO) [5] is applied to select the optimal hyper-parameters based
on the best generalization ability of the load identification model. The procedure of
model selection using PSO is presented as follows:
Application of LSSVM-PSO to Load Identification in Frequency Domain 235
x(ω ) System F (ω )
under test
data LS-SVM
preprocessing model F% (ω )
selection
based on
data PSO
preprocessing
calibration process, because all the characteristics of the system can be excited by white
noise. Secondly, training the LSSVM based on PSO. With the responses as inputs of
LSSVM, and loads as outputs of LSSVM, the PSO is used to find optimal parameters of
LSSVM based on the best generalization ability of the relationship between loads and
responses. Then the model of load identification is established. Thirdly, measuring the
responses in the operation process of aircraft. Finally, calculating the loads by putting
the responses obtained in the third phase into the model established in the second phase.
above, and the force can be identified, i.e. the amplitude spectrum of load is calcu-
lated according to Eq.(10) with the parameter obtained from the training process. The
amplitude spectrum of trapezoidal spectrum force is used to verify the identification
results. The load identification results are shown in Figure 4(The blue solid line is
actual measured values and the red dot Line is identified values).
Fig. 2. The finite element model of cylindrical Fig. 3. The identification results
structure
1 n
∑ ⎡ Preal (ω j ) − Ppred (ω j )⎤⎦
2
The mean square error RMSE = and average relative error
n j =1 ⎣
1 n
Er = ∑ Pi , real − Pi , pred / Pi ,real
n i =1
are chosen to judge the identification error. The identifica-
tion error of trapezoidal spectrum force are: RMSE = 0.0027 , Er = 0.0082% .
As shown in Figure 3, the blue solid line almost coincides with the red dot Line,
i.e. the amplitude spectrum of identified load corresponds with the measured load
spectrum. Because there is no noise in simulation data and the only nonlinear factor of
numerical error is very poor, the mean square error and average relative error are both
very small in the identification error. The simulation tests demonstrate that the pro-
posed approach based on LSSVM-PSO is an effective way to identify load in fre-
quency domain, and the identification accuracy is high.
(a) (b)
Fig. 5. The system of random vibration experiments: (a) The four force sensors; (b) Scheme of
the whole system
The driver spectrum shown in Figure 6 is added to drive the vibration shaker. 29
groups of data are obtained by changing the driver spectrum parameter values of
a,b,c,d,e,f,g, where a is fixed on 5Hz and g is fixed on 2000Hz. Each group’s data
contains the power spectral densities (PSD) of four acceleration responses whose
positions are located on the cylindrical structure, four forces and drive current.
It brings nonlinear effects during the vibration process because of the connection of
bolts. Moreover, the signals are influenced by the noise of the blower of the vibration
shaker, while these also bring nonlinear effects. The relationship of PSD between
drive current and one of the four acceleration responses in a specific frequency
(100Hz) among 29 groups’ data is shown in Figure 7. As shown in Figure 7, the rela-
tionship between response and drive current is nonlinear, which denotes the relation-
ship between responses and forces is nonlinear. The methods based on linear system
used to identify loads will lead to large identification error in this case.
Fig. 6. The driver spectrum Fig. 7. The relationship between current and
response
Application of LSSVM-PSO to Load Identification in Frequency Domain 239
The gauss kernel is also chosen to be the kernel function. The four PSDs of four
acceleration responses are used as inputs of LSSVM-PSO and the PSD of each force
is used as output of LSSVM-PSO. Four identification models of load identification
are established, where each model is used to identify one force.
Another drive spectrum whose parameters are different from the 29 drive spectrum
described above is used to drive the vibration shaker. The PSDs of four acceleration
responses at the same points and four forces are obtained. The acceleration responses
are inputted into each model of load identification established above to identify loads
and the forces are the actual loads of cylindrical structure to compare with the identi-
fied forces to verify the validity of the proposed approach and identification accuracy.
The identification process is the same as the process described at section 3. The iden-
tification results are shown in Figure 8. The mean square error and average relative error
as described above and energy relative error Egrms = grmsreal − grms pred / grmsreal are chosen
to judge identification error. The identification errors are shown in Table 1.
Fig. 8. The identification results of four forces: (a):Force1; (b):Force2; (c):Force3; (d):Force4
As shown in Figure 8, the blue solid line almost coincides with the red dot line, the
location and size of peak is also consistent and there are no false peaks in the identifi-
cation curves, i.e. the PSD of identified load corresponds with the measured PSD of
load. These demonstrate that the proposed approach based on LSSVM-PSO can iden-
tify the four forces with effective results. The four forces are not exactly same that the
individual peaks don’t appear in some forces, because the cylindrical structure is
deformed during the vibration process. As can be seen from the identification error in
Table 1, because the system is nonlinear seriously and the signals are influenced by
the noise greatly, the mean square error and average relative error are both larger than
the results of the simulation data, but the experimental identification errors are still
small in the bound of engineering permissible error .The identified results are almost
according to the measured values in terms of energy, which are import to experiment.
The experiments demonstrate that the proposed approach based on LSSVM-PSO is an
effective way to identify multisource loads in frequency domain, and the identifica-
tion accuracy is high.
240 D. Hu et al.
References
1. Cao, X., Sugiyama, Y., Mitsui, Y.: Application of artificial neural networks to load identifi-
cation. Computers and Structures 69, 63–78 (1998)
2. Xu, Z.-y., Liao, X.-h.: Dynamic State Loading Identification and Its Development. Journal
of Changzhou Institute of Technology 19, 13–18 (2006)
3. Suykens, J.A.K., Vandewalle, J., Moor, D.B.: Optimal Control by Least Square Support
Vector Machines. Neural Networks 14, 23–25 (2001)
4. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of the IEEE Inter-
national Conference on Neural Networks, Perth, Australia, pp. 1942–1984 (1995)
5. Vapnik, V.: The Nature of Statiscal Learning Theory. Springer, New York (1999)
6. Cawley, G.C.: Leave-one-out cross-validation based model selection criteria for weighted
LS-SVMs. In: Proceedings of the International Joint Conference on Neural Networks,
Vancouver, BC, Canada, pp. 2970–2977 (2006)
Local Model Networks for the
Optimization of a Tablet Production Process
1 Introduction
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 241–250, 2009.
c Springer-Verlag Berlin Heidelberg 2009
242 B. Hartmann et al.
Due to the high drug content and chosen manufacturing procedure (direct com-
pression) changes in raw material characteristics can result in variable crushing
strength of tablets, large tablet mass variation and most problematic, intense
Local Model Networks for the Optimization of a Tablet Production Process 243
capping problems. If problems in direct tabletting are too severe, dry granula-
tion can be used as an alternative to improve the particle compression character-
istics. Dry granulation means aggregation of smaller particles using force. The
compacts are milled and sieved afterwards. Next step is tabletting. Three types
of mixtures for tabletting were prepared:
A mixture for direct tabletting (Type: “Direkt” - 1 sample),
B granulate prepared by slugging (precompression on a rotary-tablet press),
using different parameters of tabletting speed and compression force (Type:
“Briket” - 4 samples),
C granulate prepared by precompressing on a roller compactor using different
parameters of compacting speed and force (Type: “Kompakt” - 4 samples).
The composition of all powder mixtures was the same, milling and sieving of
compacts was performed on the same equipment. After dry granulation, the par-
ticle size distribution is normally shifted toward bigger particles, also capping
occurence is often decreased due to a change in particle characteristics (flowa-
bilty and compression characteristics). Nine samples were tableted on a rotary-
tablet press using different combinations of the following parameters: compres-
son speed v, precompression force pp, main compression force P. Each of these
parameters were set at three levels. Tablets were evaluated according to capping
occurence, crushing strength and tablet mass variability [8].
3 Modeling
To model the system characteristics a local model network approach [9], [10],
[11] was used with two to four inputs, depending on the output. For each output
(CC, standard deviation of crushing strength, standard deviation of mass) a
separate neuro-fuzzy model was constructed.
The output ŷ of a local model network with p inputs u = [u1 u2 · · · up ]T can
be calculated as the interpolation of M local model outputs ŷi , i = 1, . . . , M [12],
M
ŷ = ŷi (u)Φi (u) (1)
i=1
where the Φi (·) are called interpolation or validity or weighting functions. These
validity functions describe the regions where the local models are valid; they
describe the contribution of each local model to the output. From the fuzzy logic
point of view (1) realizes a set of M fuzzy rules where the Φi (·) represent the
rule premises and the ŷi are the associated rule consequents. Because a smooth
transition (no switching) between the local models is desired here, the validity
functions are smooth functions between 0 and 1. For a reasonable interpretation
of local model networks it is furthermore necessary that the validity functions
form a partition of unity:
M
Φi (u) = 1 . (2)
i=1
Thus, everywhere in the input space the contributions of all local models sum up
to 100%. In principle, the local models can be chosen of arbitrary type. If their pa-
rameters shall be estimated from data, however, it is extremely beneficial to choose
a linearly parameterized model class. The most common choice are polynomials.
Polynomials of degree 0 (constants) yield a neuro-fuzzy system with singletons
or a normalized radial basis function network. Polynomials of degree 1 (linear)
yield local linear model structures, which is by far the most popular choice. As
the degree of the polynomials increases, the number of local models required for
a certain accuracy decreases. Thus, by increasing the local models’ complexity,
Local Model Networks for the Optimization of a Tablet Production Process 245
at some point a polynomial of high degree with just one local model (M = 1) is
obtained, which is in fact equivalent with a global polynomial model (Φ1 (·) = 1).
Besides the possibilities of transferring parts of mature linear theory to the
nonlinear world, local linear models seem to represent a good trade-off between
the required number of local models and the complexity of the local models
themselves. This paper deals only with local models of linear type:
ŷi (u) = wi,0 + wi,1 u1 + wi,2 u2 + . . . + wi,p up . (3)
However, an extension to higher degree polynomials or other linearly parame-
terized model classes is straightforward.
Fig. 1. Correlation between standard deviation σ of the particle size dirstribution and
the mean values µ
246 B. Hartmann et al.
4 Results
First, the local model networks were designed and validated, next, the models
were used to find the optimal process parameters. Furthermore we investigated
the robustness of the optimization in terms of varying training data.
4.1 Modeling
The training of the local model networks was performed with the LOLIMOT al-
gorithm proposed in [9]. One advantage of this training algorithm among others
is that the training has only to be carried out once to get reliable and repro-
ducible results. For final training the whole data set was used, since, considering
the rather small dimensionality of the problem, the nonlinear input-output rela-
tionship could be visually monitored in order to check for potential overfitting.
The identified relations can be seen in Figs. 4 to 7. For modeling the standard
deviation of the mass σm 3 local models (LMs) were used. The standard devi-
ation of the crushing strength σcs was modeled with 4 LMs and the capping
coefficient CC with 3 LMs.
4.2 Optimization
Once the models for the tabletting process parameters have been built, the ques-
tion arises: How they should be utilized for optimization? Figure 3 illustrates the
used optimization scheme. Basically, the manipulated variables that are model
inputs are varied by an optimization technique until their optimal values are
Local Model Networks for the Optimization of a Tablet Production Process 247
This method is called the weighted-sum method, where the coefficients wi of the
linear combination are called weights [13], [14]. All relevant model outputs are
directly incorporated and the loss function is chosen as a weighted sum of the
tablet process parameters. Due to expert knowledge the parameters were set to
w1 = 0.25, w2 = 0.25 and w3 = 0.5. For optimization the BFGS quasi-Newton
algorithm [15] was used. The results are summarized in Table 1. Figure 6 shows
the shapes of the loss functions for the three tablet types “Direkt”, “Kompakt”
and “Briket”. Figure 8 illustrates the variation of the optimal process param-
eters, if single data points are omitted from the training dataset. Because of
the sparse dataset, omitting critical data points can lead to different shapes of
the loss function. Therefore outliers can be observed in the boxplots. However,
the comparison of the median values with the results generated with the whole
dataset (Table 1) is satisfactory.
248 B. Hartmann et al.
Fig. 4. Model plot that shows the relations between the inputs mean value of the
tablet particle sizes µ, main compression force P, pre-compression force pp, the tablet-
ting speed v and the output standard deviation of tablet mass σm . The red dots are
measured data points.
Fig. 5. Model plot that shows the relations between the inputs mean value of the tablet
particle sizes µ, main compression force P , pre-compression force pp and the output
standard deviation of the crushing strength σcs
Fig. 6. Loss functions with respect to the raw material parameter µ = 0, 0.86, 1, main
compression force P , pre-compression force pp and tabletting speed v = 1
Local Model Networks for the Optimization of a Tablet Production Process 249
Fig. 7. Model plot that shows the relations between the inputs mean value of the tablet
particle sizes µ, main compression force P and the output capping coefficient CC
Fig. 8. Boxplots that illustrate the variation of the optimal process parameter values
in case of omitting single data points
µ P pp
0 0.21 0.69
0.5 0.04 0.73
0.86 0.01 0.70
1 0.03 0.74
5 Conclusion
This paper presents a local model network approach for the model-based opti-
mization of the parameters of a tablet press machine. Several experiments on
the tablet press machine with different tablet and process parameters made it
possible to create a well generalizing model. Due to the high flexibility and ro-
bustness with respect to the sparse and noisy data, local model networks are well
suited for this kind of modeling problem. Model plots show the applicability of
250 B. Hartmann et al.
References
1. Frake, P., Greenhalgh, D., Gierson, S., Hempenstall, J., Rudd, D.: Process control
and end-point determination of a fluid bed granulation by application of near infra-
red spectroscopy. International Journal of Pharmaceutics 1(151), 75–80 (1997)
2. Informa: PAT - Quality by design and process improvement, Amsterdam (2007)
3. Juran, J.: Juran on Quality by Design. The Free Press, New York (1992)
4. Sebhatu, T., Ahlneck, C., Alderborn, G.: The effect of moisture content on the
compression and bond-formation properties of amorphous lactose particles. Inter-
national Journal of Pharmaceutics 1(146), 101–114 (1997)
5. Rios, M.: Developments in powder flow testing. Pharmaceutical Technology 2(30),
38–49 (2006)
6. Sorensen, A., Sonnergaard, J., Hocgaard, L.: Bulk characterization of pharmaceu-
tical powders by low-pressure compression ii: Effect of method settings and particle
size. Pharmaceutical Development and Technology 2(11), 235–241 (2006)
7. Zhang, Y., Law, Y., Chakrabarti, S.: Physical properties and compact analysis of
commonly used direct compression binders. AAPS Pharm. Sci. Tech. 4(4) (2003)
8. Belič, A., Zupančič, D., Škrjanc, I., Vrečer, F., Karba, R.: Artificial neural networks
for optimisation of tablet production
9. Nelles, O.: Axes-oblique partitioning strategies for local model networks. In: In-
ternational Symposium on Intelligent Control (ISIC), Munich, Germany (October
2006)
10. Hartmann, B., Nelles, O.: On the smoothness in local model networks. In: American
Control Conference (ACC), St. Louis, USA (June 2009)
11. Hartmann, B., Nelles, O.: Automatic adjustment of the transition between local
models in a hierarchical structure identification algorithm. In: IFAC Symposium
on System Identification (SYSID), Budapest, Hungary (August 2009)
12. Nelles, O.: Nonlinear System Identification. Springer, Berlin (2001)
13. Chong, E., Zak, S.: An Introduction to Optimization. John Wiley & Sons, New
Jersey (2008)
14. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press,
Cambridge (2004)
15. Scales, L.: Introduction to Non-Linear Optimization. Computer and Science Series.
Macmillan, London (1985)
Retracted: Implementation of On/Off Controller for
Automation of Greenhouse Using LabVIEW
Abstract. The present study is concerned with the control and monitoring of
greenhouse air temperature, humidity, Light intensity, CO2 concentration and
ed
irrigation. A computer-based control and monitoring system was designed and
tested and to get this target is used Supervisory Control & Data Acquisition
(SCADA) system. The end product is expected to give the farmer or end user a
kiosk type approach. Entire greenhouse operation is governed and monitored
through this kiosk. This approach is fairly novel considering the unified system
ct
design and the SCADA platform, NI LabView 7.1.
1 Introduction
et
Environmental conditions have a significant effect on plant growth. All plants require
certain conditions for their proper growth. Therefore, it is necessary to bring the envi-
ronmental conditions under control in order to have those conditions as close to the
idea as possible. To create an optimal environment, the main climatic and environ-
R
mental parameters need to be controlled. These parameters are nonlinear and extremely
interrelated, rendering the problem of management of a greenhouse rather intractable
to analyze end control through the classical control methods. An automated manage-
ment of a greenhouse brings about the precise control needed to provide the most
proper conditions of plant growth. Greenhouse Automation has considerably evolved
in the past decade with the onus shifting to factors like flexibility, time-to-market and
upgradeability. SCADA systems, hence, form an interesting platform for developing
systems for greenhouse automation. It allows a ‘supervisory level’ of control over the
greenhouse, facilitates data acquisition and creates a developer-friendly environment
for implementing algorithms.
The five most important parameters to consider when creating an ideal greenhouse
are temperature, relative humidity, soil moisture, light intensity and Co2 concentration.
In order to design a successful control system, it is very important to realize that the five
parameters mentioned above are nonlinear and extremely interdependent [2-1]. The
computer control system for the greenhouse includes the following components [5]:
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 251–259, 2009.
© Springer-Verlag Berlin Heidelberg 2009
252 R. Alimardani et al.
This paper describes a solution to the second part of the system. The information is
obtained from multi-sensor stations and is transmitted through serial port to the
computer. It will then be processed and the orders will be sent to the actuation
network.
The original system includes four main subsystems, namely, the Temperature and
Humidity, Irrigation, light intensity and co2 concentration. The system program con-
tains the running algorithm, which governs the whole subsystems. The real time Data
ed
transmit from microcontroller to Lab View through serial port of the computer and the
system to the microcontroller. The designed system has follows.
Real time soil moisture
Real time Co2
ct
Real time lighting
Real time temperature
Real time humidity
ra
The seven actuators are as follows.
Water valve
Ventilation Fans
et
Cooler
Circulate Fans
Sprayer
Heater
R
The system can be set up for any type of plant and vegetation by simply setting cer-
tain parameters on the front panel of Lab View (fig. 1.). The entire system is simu-
lated in Lab View by National Instruments. The main block-diagram of an automated
greenhouse is shown in fig. 2.
ed
Fig. 1. Front panel of the main system
ct
ra
et
R
dry matter distribution [6]. Generally, the protection given by the greenhouse is suffi-
cient to allow the development of crops during winter without the use of heating sys-
tems. However, a greenhouse with automated heating facilities presents advantages
like increased production speed, possibility of producing products out of season and
better control of producing products out of season and better control of diseases[5].
Uniform crop growth is very important for most production systems and the heating
and ventilation systems have a major impact on producing uniform crops [7].
ed
effect [5].
Using the supplemental lighting system is a common way for greenhouse lighting.
However it can be done either with photoperiod lighting system or through walkway
and security lighting. Greenhouse lighting systems allow us to extend the growing
season by providing plants with an indoor equivalent to sunlight [3]. Front panel and
block diagram of the light system is shown in figure 5.
ed
ct
ra
et
R
The fifth major factor, namely the Co2 concentration, plays a very important role in the
photosynthesis process. The average Co2 concentration in the atmosphere is approxi-
mately 313 ppm, which is enough for effective photosynthesis. A problem arises when a
greenhouse is kept closed in autumn or/and winter in order to retain the heat, when not
enough air is circulated to have the appropriate Co2 concentration [1, 4]. In order to
improve the growing of herbs inside the greenhouse, it is necessary to increase Co2
concentration in company with favorable conditions of temperature and light.
256 R. Alimardani et al.
ed
ct
Fig. 5. Front panel and block diagram of the Light system
ra
Front panel and block diagram of the Co2 system is shown in figure 6.
et
R
The irrigation system consists of a soil moisture control module. For the desired plant,
the optimum value of soil moisture is set through the front panel. The values of soil
moisture are measured through sensors in real time which is then displayed in the
front panel. Depending on this condition, the flow control valve is handled. The RT
values show the Real-Time values as obtained from the sensors.
ed
i.e. only data from the sensors were recorded and monitored on the screen with no
operation of fans, sprayer or other installed environmental systems in the greenhouse.
For the second day of the experiment, inside air temperature was controlled by the
system.
ct
Fig. 7 shows the results for the uncontrolled experiment. The data were recorded
with no operation of fans and sprayer. These show comparison between the inside and
the outside condition. The results indicate that the outside temperature is always,
about 15oC, less than the temperature inside the greenhouse, because the solar radia-
ra
tion entered the greenhouse through transparent poly carbonate. This is a further
confirmation of greenhouse effect. Also, it was found that the rate of change of tem-
perature in the upper part, i.e. near the cover is higher than that of the height of plants.
This rise of vertical temperature is due to receiving solar incident radiation. During a
et
R
temperature decrease, it was found that the heat exchange near the cover is occurring
more rapidly. The fluctuations of temperature and to a lesser extend of RH in the
model greenhouse during observation time were affected by natural conditions such
as surface evaporation within the greenhouse, solar radiation and ambient tempera-
ture. For the controlled mode of operation, two separate experiments are carried out:
(i) temperature control only (Fig. 8a), and (ii) both temperature and humidity control
(Fig. 8b). When the controller was put into operation, it was found that
Tout<Tmid<Tup.
This could be partly due to the use of re-circulating fans.
ed
ct
ra
et
R
Fig. 8. Measured temperatures and relative humidity under the action of controller: (a) Tem-
perature control (up), and (b) Temperature and RH control (down)
4 Conclusions
This paper described the design and simulation and Implementation of a fully automated
greenhouse using LabVIEW. For better simulation results, the entire five major interre-
lated environment variables in a greenhouse Temperature-Humidity, soil moisture, light
Implementation of On/Off Controller for Automation of Greenhouse Using LabVIEW 259
intensity and Co2 concentration were considered altogether. The resulting system was
designed in such a way that the system could be easily controlled in such a way that the
system could be easily controlled and monitored by an amateur user who might have no
or little technical background on the subject, the main advantages of this simulation are
more facilities in the entire system. By means of this simulation, the optimal level of
environment and growth factors inside the greenhouse can be achieved. Once the pro-
posed system is designed, standardized, and implemented, it provides and convenient
control over the greenhouse management in order to increase efficiency.
References
1. Putter, E., Gouws, J.: An Automatic Controller for a Greenhouse Using a Supervisory Ex-
pert System. In: Electro technical Conference, Melecon 1996, 8th Mediterranean, pp. 1160–
1163 (1996)
2. Fourati, F., Chtourou, M.: A Grennhousr Control with Feed-Forward and Recurrent Neural
ed
Networks. Simulation Modeling Practice and Theory 15, 1016–1028 (2007)
3. Bowman, G.E., Weaving, G.S.: A Light Modulated Greenhouse Control System. Journal of
Agriculture Engineering Research 15(3), 255–258 (1970)
4. Klaring, H.P., Hauschild, C., Heibner, A., Bar-Yosef, B.: Model-Based Control of Co2
Concentration in Greenhouse at Ambient Levels Increases Cucumber Yield. Agriculture for
Meteorol. 143, 208–216 (2007)
ct
5. Metrolh, J.C., Serodio, C.M., Couto, C.A.: CAN Based Actuation System for Greenhouse
Control. Indusrial Elect., 945–950 (1999)
6. Korner, O., Challa, H.: Design for an Improved Temperature Integration Concept in Green-
ra
house Cultivation. Computer and Electronic in Agriculture 39, 39–59 (2003)
7. Roberts, W.J.: Creating a Master Plan for Greenhouse Operation, Rutgers, The State Uni-
versity of New Jersey (2005)
et
R
Structure Design of the 3-D Braided Composite Based on
a Hybrid Optimization Algorithm
Ke Zhang
1 Introduction
At present, textile composites are being widely used in advanced structures in avia-
tion, aerospace, automobile and marine industries [1]. Textile composite technology
by preforming is an application of textile processes to produce structured fabrics,
known as performs. The preform is then impregnated with a selected matrix material
and consolidated into the permanent shape. Three-dimensional (3-D) braiding method
which was invented in 1980s offers a new opportunity in the development of ad-
vanced composite technology. The integrated fibre network provides stiffness and
strength in the thickness direction, thus reducing the potential of interlaminated fail-
ure, which often occurs in conventional laminated composites. Other distinct benefits
of 3-D textile composites include the potential of automated processing from preform
fabrication to matrix infiltration and their near-net-shape forming capability, resulting
in reduced machining, fastening, and scrap rate [2]. The direct formation of the struc-
tural shapes eliminates the need for cutting fibres to form joints, splices, or overlaps
with the associated local strength loss, and simplifies the laborious hand lay-up com-
posite manufacturing process.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 260–269, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Structure Design of the 3-D Braided Composite 261
The 3-D braided fibre construction is produced by a braiding technique which inter-
laces and orients the yarns by an orthogonal shedding motion, followed by a compact-
ing motion in the braided direction. The basic fibre structure of 4-step 3-D braided
composite is four-direction texture. An idealized unit cell structure is constructed
based upon the fibre bundles oriented in four body diagonal directions in a rectangular
parallelepiped which is shown schematically in Fig.1 [8][9]. The yarn orientation
angles α and β are so-called braided angle. From the geometry of a unit cell associ-
ated with particular fibre architecture, different systems of yarn can identified whose
fibre orientations are defined by their respective interior angle α and cross sectional
angle β , as previously show in Fig.1. According to requirement of braid technology,
the range of α and β are commonly between 20°and 60°. In Fig.1, geometric parame-
ters of cross section of composite are b1, b2, h1, and h2 respectively.
262 K. Zhang
A model for grain system interruption has been chosen wherein the stiffness for each
system of yarns are superimposed proportionately according to contributing volume
of four yarn system to determine the stiffness of the composite. Let volume percent of
fibre of the composite be V f , volume fraction of each system of braiding yarn is equal
to V f / 4 [10]. Assuming each system of yarn can be represented by a comparable
unidirectional lamina with an elastic matrix defined as C and a coordinate transforma-
tion matrix as T, the elastic stiffness matrix Q of this yarn system in the longitudinal
direction of the composite can be expressed as Q = TCT T .
Here, we ignore approximately the effect of stress components σ y , σ z and τ yz , and
only consider the effect of σ x , τ zx and τ xy .
The constitutive relations of braiding yarn for 3-D braided composite are gotten as
follows [11]
[ N ]i = [ K ]i [ε ]i (i = 1,2,3,4) . (1)
⎡ K11 0 K14 ⎤
0
⎢ ⎥
0 K 22 K 23
0 ⎥
where λ i = V f i / V f , [K ]i =⎢ , K11 , K 22 , K 33 and K 44 are
⎢ 0 K 32 0 ⎥
K 33
⎢ ⎥
⎣ K 41 0 K 44 ⎦
0
tensile, flexural, torsional and shear stiffness coefficient respectively, the others are
the coupling stiffness coefficient..
Ψ = ∆U/U m , (3)
Considering the influence of transverse shear deformation, reference [13] has modi-
fied equation (6). In equation (7), they added to two terms dissipation energy resulted
by transverse shearing strain, namely
∆U = ∆U x + ∆U y + ∆U xy + ∆U yz + ∆U xz . (5)
As before, 4-step 3-D braided composite can be regard as superimpose of four unidi-
rectional fibre system, and each yarn system is not in same plane. Thus, unit energy
dissipation of each system should break down into six components,
∆U = ∆U x ′ + ∆U y ′ + ∆U z ′ + ∆U x ′y ′ + ∆U y ′z ′ + ∆U x ′z ′ , (6)
Ant colony optimization is a kind of discrete algorithm. There for the optimization
problem must be converted in to a discrete one before can be solved by an ant colony
algorithm.
First discretize each variable in its feasible region, i. e., divide the variables to be
optimized in its value range. The number of the division relates with the range of the
variable and the complexity of the problem, etc. Suppose the variable to be optimized
is divided into q i nodes, denote the node value as x ij where i is variable number
( i = 1,2, K n ) and j is node number of x i ( j = 1,2, K q ij ).
After the division, a network which composed of variables and their division is
constructed as depicted in Fig. 2, where “●”is the node of each variable division. The
number of the division of each variable can be unequal and the division can be linear
or nonlinear.
Structure Design of the 3-D Braided Composite 265
Suppose using m ants to optimize n variables, for ant k at time t = 0 , set the ant at
origin O. Then the ant will start searching according to the maximum transition prob-
ability. The searching starts from x 1 and goes on in sequence. Each ant can only se-
lect one path in its division to do transition. The ant k transit from xi to xi+1are called a
step and is denoted as t . The ant k starts searching from the first variable x 1 to the last
variable xn is called an iteration and is denoted as NC. An iteration includes n steps.
x
Fig. 2. Variable division and ant search path
phlk (t ) =
[τ hl (t )]α [η hl ]β ,
q i +1
∑ [τ ] [η hj ]
α β (11)
hj (t )
j =1
m
∆τ ij = ∑ ∆τ
k =1
k
ij , (14)
where ∆τ ijk (t ) is the pheromone of node xij left by ant k in this iteration. It can be
calculated according to equation (15)
,
⎧Q f k if kth ant pass through this node in this iteration
∆τ ijk = ⎨
⎩ ,
0 else
, (15)
where f k is the value of objective function of the ant k in this iteration. It is calcu-
lated with equation (10). Q is a positive constant.
Being a global search method, the ACO algorithm is expected to give its best results
if it is augmented with a local search method that is responsible for fine search. The
ACO algorithm is hybridized with a gradient based search. The ‘‘candidate’’ solution
found by the ACO algorithm at each iteration step is used as an initial solution to
commence a gradient-based search using ‘‘fmincon’’ function in MATLAB [14],
which based on the interior-reflective Newton method. The ‘‘fmincon’’ function can
solve the constraint problem here.
Structure Design of the 3-D Braided Composite 267
5 Numerical Examples
In order to test the validity of the proposed optimization design procedure, numerical
examples have been preformed. Suppose braiding yarn system (four-direction) of 3-D
braided composite be homogeneous carbon fiber, where performance of component
:
material are Ef1=258.2GPa, Ef2=18.2 GPa, Gf21=36.7 GPa, µf21=0.25, µm=0.35,
ΨL=0.0045, ΨT=0.0422, ΨLT=0.0705, ΨTT =0.0421, Em=3.4GPa, Gm=1.26GPa [11].
The performance of unidirectional composite for each system can be calculated by
268 K. Zhang
β K 22
α h2 / b2 b1 / b 2 h1 / h2 Ψ
(GPa·m2)
30° 30° 1.0 0.5 0.5 0.0027 2.51×10-5
Preliminary
30° 30° 1.0 0.5 0.5 0.0027 2.51×10-5
design
30° 30° 1.0 0.5 0.5 0.0027 2.51×10-5
36° 52° 1.04 0.35 0.34 0.0106 1.59×10-5
Optimal
23° 28° 1.12 0.29 0.28 0.0022 1.87×10-4
design
27° 45° 1.29 0.22 0.22 0.0079 3.92×10-5
6 Conclusions
According to the stiffness and damping properties of hollow-rectangular-section 3-D
braided composite, the mathematical model of structure design of the 3-D braided
composite was proposed. The results of numeral examples show that the better damp-
ing and stiffness characteristic could be obtained by using optimal design, contenting
Structure Design of the 3-D Braided Composite 269
the determinate restriction. The method proposed here is useful for the design and
engineering application of the kind of member.
References
1. Huang, G.: Modern Textile Composites. Chinese Textile Press, Beijing (2000)
2. Zheng, X.T., Ye, T.Q.: Microstructure Analysis of 4-Step Three-Dimensional Braided
Composite. Chinese Journal of Aeronautics 16(3), 142–149 (2003)
3. Wang, Y.Q., Wang, A.S.D.: On the Topological Yarn Structure of 3-D Rectangular and
Tubular Braided Preforma. Composites Science and Technology 51, 575–583 (1994)
4. Colorni, A., Dorigo, M., Maniezzo, V.: Distributed Optimization by Ant Colonies. In: The
First European Conference on Artificial Life, pp. 134–142 (1991)
5. Gambardella, L.M., Taillard, E.D., Dorigo, M.: Ant Colonies for the Quadratic Assign-
ment Problem. J. Oper. Res. Soc. 50, 167–176 (1999)
6. Gamardella, L.M., Bianchi, L., Dorigo, M.: An Ant Colony Optimization Approach to the
Probabilistic Traveling Salesman Problem. In: Guervós, J.J.M., Adamidis, P.A., Beyer,
H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439,
p. 883. Springer, Heidelberg (2002)
7. Dorigo, M., Stuzle, T.: Ant Colony Optimization. MIT Press, Cambridge (2004)
8. Yang, J.M.: Fiber Inclination Model of Three-dimensional Textile Structural Composites.
Journal of Composite Materials 20, 472–476 (1986)
9. Li, W., Hammad, M., EI-Shiekh, A.: Structural Analysis of 3-D Braided Preforms for
Composites, Part 1:Two-Step Preforms. J. Text. Inst. 81(4), 515–537 (1990)
10. Cai, G.W., Liao, D.X.: The Stiffness Coefficients of Rectangular Cross Section Beams
with 2-step Three-dimensional Braided Composite Beam. Journal of Huazhong University
of Science and Technology 24(12), 26–28 (1996)
11. Cai, G.W., Zhou, X.H., Liao, D.X.: Analysis of the Stiffness and Damping of the 4-step
Three-dimensional Braided Composite Links. Journal of Mechanical Strength 21(1),
18–23 (1999)
12. Adams, R.D., Bacon, D.G.C.: Effect of Orientation and Laminated Geometry on the
Dynamic Properties of CFRP. J. Composite Materials 7(10), 402–406 (1973)
13. Lin, D.X., Ni, R.G., Adams, R.D.: Prediction and Measurement of the Vibrational Damp-
ing Parameters of Carbon and Glass Fibre-Reinforced Plastics Plates. Journal of Compos-
ite Material 18(3), 132–151 (1984)
14. Grace, A.: Optimization Toolbox for Use with MATLAB, User’s Guide. Math Works Inc.
(1994)
Robot Virtual Assembly Based on Collision Detection in
Java3D
Peihua Chen1,∗, Qixin Cao2, Charles Lo1, Zhen Zhang1, and Yang Yang1
1
Research Institute of Robotics, Shanghai Jiao Tong University
cph@sjtu.edu.cn, charleslo77@gmail.com, zzh2000@sjtu.edu.cn,
iyangyang186@yahoo.com.cn
2
The State key Laboratory of Mechanical System and Vibration,
Shanghai Jiao Tong University
qxcao@sjtu.edu.cn
1 Introduction
Virtual Assembly (VA) is a key component of virtual manufacturing [1]. Presently,
virtual assembly serves mainly as a visualization tool to examine the geometrical
representation of the assembly design and provide a 3D view of the assembly process
in the field of virtual manufacturing [2-5]. However, in service robots’ simulation, we
also need to assemble many types of robots in the virtual environment (VE). There are
mainly two methods of virtual assembly in these systems, e.g., assembling every
single part together via hard-coding using link parameters between each part, and
another method is assembling a virtual robot in an XML(eXtensible Modeling Lan-
guage), both of which are not visual and complicated.
This paper focuses on the visual implementation of making virtual assembly of service
robots in Java3D. We choose Java JDK 1.6.0 and Java 3D 1.5.2 as the platform. First we
discuss the system architecture and methodology. This is followed by its implementation.
After that, an application of this system will be presented and conclusions obtained.
2 System Architecture
2.1 System Architecture
The architecture of this system is presented in Fig.1. It includes three parts: Model
Transformation, Visual Virtual Assembly (VVA), and XML Output. Firstly, after we
∗
Corresponding author.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 270–277, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Robot Virtual Assembly Based on Collision Detection in Java3D 271
create the CAD models of the robot, we export these models as IGES [7] format to be
imported into 3DS Max. In 3DS Max, we transform the Global Coordinate System
(GCS) of the IGES model into the coordinate system according to the right-hand rule,
and then export the models in VRML format. Then, we load the VRML files into Java
3D. After which, we set the scale factor of the model to 1:1. Then, we set the target
model as static, and move the object model towards the target model until the colli-
sion detection between them is activated. After which, we attach the object model to
the target model, which means appending the object node to the target node. If the
virtual assembly is not completed, then import another VRML file into the VE for
assembling; if it is finished, then output the assembly result into an XML file.
The use of scene graph architecture in Java 3D can help us organize the graphical ob-
jects easily in the virtual 3D world. Scene graphs offer better options for flexible and
efficient rendering [8]. Fig.2 illustrates a scene graph architecture of Java 3D for loading
the VRML models into the VE for scaling, translation and rotation transformations. In
Fig.2, BG stands for BranchGroup Node, and TG, Transform -Group Node.
As shown in Fig.2, the Simple Universe object provides a foundation for the entire
scene graph. A BranchGroup that is contained within another sub-graph may be re-
parented or detached during run-time if the appropriate capabilities are set. The Trans-
formGroup node specifies a single spatial transformation, via a Transform3D object,
that can position, orient, and scale all of its children [9], e.g., moveTG, rotTG,
scaleTG, etc., as shown in Fig.2. The moveTG handles the translations of the VRML
model, rotTG is for rotations, and scaleTG is for scaling.
272 P. Chen et al.
3 Implementation
Java 3D provide support for runtime loaders. In this paper, we use the VrmlLoader
class in package j3d-vrml97 to load the VRML files. In our project, we rewrite a
vrmlload class with the construction method of vrmlload (String filename). After
selecting a VRML file, the model will be seen in the 3D Viewer of our project, as
shown in the left picture of Fig.3. The coordinate system in 3D Viewer is right-hand,
with the orientation semantics such that +Y is horizontal to the right, +X is directly
towards the user, and +Z is the local gravitational direction upwards.
Fig.3 also shows how a model’s position, orientation, and size can be adjusted.
Here it shows a robot body that has been moved and rotated. In the control panel of
the model, there are several operations. They are: translations of x, y, and z axes;
rotations of the model around x, y, and z axes; scaling the model; displaying “Pos(x,
y, z)” and “Rot(x, y, z)” at runtime, as shown in Fig.3. In addition, when another
model is loaded, a corresponding control panel will be created at the same time.
In the VVA system, collision detection plays an important role. Every node in the
scene graph contains a Bounding field that stores the geometric extent of the node
[10]. Java3D offers two classes "WakeupOn -CollisionEntry" and “WakeupOnColli-
sion -Exit" to carry out collision detection for 3D models. When any objects in Java
3D scene collide with other bounds, "Wakeup -OnCollisionEntry" is triggered and
when collision is released, "WakeupOnCollision -Exit" is triggered. So we can use
them as triggering conditions and we are able to obtain the control behavior after
collision occurs.
This paper proposes a novel and effective algorithm to detect collisions. VRML
file format uses IndexedFaceSet node to describe shapes of faces and triangle patches
to reveal all types of shapes in 3D models. IndexedFaceSet includes a coord field that
contains a series of spatial points, which can be used in the coordIndex field to build
the surfaces of 3D models. We conserve the points' position, and establish space for
collision detection using the following method.
BoundingPolytope is chosen to use more than three half-space [11] to define a
convex and closed polygonal space of a VRML model. We make use it to build the
collision detection space. The function that defines every half-space α is: Ax + By +
Cz + D ≤ 0, where A, B, C, and D are the parameters that specify the plane. The pa-
rameters are passed into the x, y, z, and w fields, respectively, of a Vector4d object in
the constructor of BoundingPolytope (Vector4d[] planes). Thus we must make out the
parameters A, B, C, and D of every triangle plane. The following method is to com-
pute the value of A, B, C, and D. We suppose that one triangle is made up of three
uuuuv uuuuv
points: L, M, and N, shown as in Fig.4. And we obtain vector LM and MN based on
a vertex sequence.
uuuuv
LM = ( X M − X L )i + (YM − YL ) j + ( Z M − Z L ) k . (1)
uuuuv
MN = ( X N − X M )i + (YN − YM ) j + ( Z N − Z M )k
. (2)
uuuuv uuuuv
The normal vector of plane α is the vector product of LM and MN . That is
⎡ i j k ⎤
uuv uuuuv uuuuv .
n f = LM × MN = ⎢⎢ ( X M − X L ) (YM − YL ) ( Z M − Z L ) ⎥⎥ (3)
⎣⎢( X N − X M ) (YN − YM ) ( Z N − Z M ) ⎦⎥
uuv
Once we obtain normal vector n f , we get values of A, B, C. Then we can obtain the
value of constant D if we substitute any point’s coordinate into the equation. After
transmitting all plane parameters to one group of Vector4d objects, we can use
BoundingPolytope to build a geometrical detection space. Then, we invoke setColli-
sionBounds(Bounds bounds) to set the collision bounds. After setting a collision
bound, Java3D can carry out collision detection in response to the setting bounds.
While assembling the object model to the target model, we have to rotate the object
model around the z-axis, y-axis, and x-axis in this order to correspond with the target
Local Coordinate System (LCS). Then we need to move along the x, y, and z axes in
both the positive and negative directions. In Fig.5, we present how we assemble the
model J4 to model J3. Fig.5 (a) shows the initial positions of the two VRML models.
Fig.5 (b) presents the state after the second model (J4) is rotated about the x-axis by -
90°. We can also see from this picture that we don’t need to make any translations
J3
J4
along y-axis anymore, because the second model’s y-axis is already corresponding to
the first model. From Fig.5(c) and Fig.5 (d), we can obtain the offset values along +x
direction and –x direction through the collision detection between the two models. As
we have set the threshold value of collision detection at 0.5mm, thus when the dis-
tance between the two models is smaller than 0.5, the collision detection will be acti-
vated and the actual offset value will be obtained. Here, the offset value in the +x
direction is 140.5-0.5=140(mm), and the offset value in the –x direction is -
80.5+0.5=-80(mm), as shown in Fig.5 (d). So, in order to make the x-axis of the sec-
ond model correspond to the first one, the total offset value along the x-axis is: (140 +
(- 80)) / 2 = 30(mm), and the result can be seen in Fig.5 (e). Then move along the z-
axis, and the operations are similar with the previous method. The final result of this
virtual assembly is presented in Fig.5 (f).
After we have moved the second model to the correct position, we attach the sec-
ond model’s node to the first model’s node.
After we have realized the virtual assembly of the robot, we save the virtual assembly
result as XML files, as shown in Fig.1. In this paper, we use JAXB (Java Architecture
for XML Binding) [12] to access and process XML data. JAXB makes XML easy to
use by compiling an XML schema into one or more Java technology classes. The
structure of the XML schema of our project is presented in Fig.6, and the output data
could be referred to, in this structure.
4 Application
In this paper, we create the CAD models of the real robot as shown in Fig.7, and then
translate them to VRML files according to the right-hand rule.
276 P. Chen et al.
Then, we start the visual virtual assembly in Java 3D, as shown in Fig.8. After the
virtual assembly is completed, we save the virtual assembly result to an XML file.
Then we run the program again, and open the XML file just created, as shown in
Fig.9 (a). At the same time, we can control the motion of every joint through the cor-
responding control panel, as shown in Fig.9 (b).
Fig. 9. Open the XML file stored the VA data and Control the motions of the robot
Robot Virtual Assembly Based on Collision Detection in Java3D 277
5 Conclusions
From the application in section 5, Virtual Assembly of robots is carried out based on
Collision Detection, and it proves to work well. Virtual Assembly of robots based on
Collision Detection in Java3D allows for visual feedback, and makes the process
much easier for the end-users. Through the virtual assembly, users can obtain the
assembly result file in XML format that complies with a special XML schema. Then
the users can easily load the new assembled robot via the XML file, and operate every
joint of the robot, e.g., developing a program for workspace and kinematic analysis.
Acknowledgments. This work was supported in part by the National High Technol-
ogy Research and Development Program of China under grant 2007AA041703-1,
2006AA040203, 2008AA042602 and 2007AA041602; the authors gratefully ac-
knowledge the support from YASKAWA Electric Cooperation for supporting the
collaborative research funds and address our thanks to Mr. Ikuo agamatsu and Mr.
Masaru Adachi at YASKAWA for their cooperation.
References
1. Jayaram, S., Connacher, H., Lyons, K.: Virtual Assembly Using Virtual Reality Tech-
niques. CAD 29(8), 575–584 (1997)
2. Choi, A.C.K., Chan, D.S.K., Yuen, A.M.F.: Application of Virtual Assembly Tools for
Improving Product Design. Int. J. Adv. Manuf. Technol. 19, 377–383 (2002)
3. Jayaram, S., et al.: VADE: a Virtual Assembly Design Environment. IEEE Computer
Graphics and Applications (1999)
4. Jiang-sheng, L., Ying-xue, Y., Pahlovy, S.A., Jian-guang, L.: A novel data decomposition
and information translation method from CAD system to virtual assembly application. Int.
J. Adv. Manuf. Technol. 28, 395–402 (2006)
5. Choi, A.C.K., Chan, D.S.K., Yuen, A.M.F.: Application of Virtual Assembly Tools for
Improving Product Design. Int. J. Adv. Manuf. Technol. 19, 377–383 (2002)
6. Ko, C.C., Cheng, C.D.: Interactive Web-Based Virtual Reality with Java 3D, ch. 1. IGI
Global (July 9, 2008)
7. http://ts.nist.gov/standards/iges//
8. Sowizral, H., Rushforth, K., Deering, M.: The Java 3D API specification, 2nd edn. Addi-
son-Wesley, Reading (2001)
9. Sun Microsystems. The Java 3DTM API Specification, Version 1.2 (March 2000)
10. Selman, D.: Java 3D Programming. Manning Publications (2002)
11. http://en.wikipedia.org/wiki/Half-space
12. Ort, E., Mehta, B.: Java Architecture for XML Binding (JAXB). Sun Microsystems (2003)
Robust Mobile Robot Localization by Tracking Natural
Landmarks
1 Introduction
Localization is the key for autonomous mobile robot navigation systems. It is well
known that using solely the data from odometry is not sufficient since the odometry
provides unbounded position error [1]. The problem gives rise to the solution that
robot carries exteroceptive sensors (sonar, infrared, laser, vision, etc.) to get land-
marks and uses this operation to update its pose estimation.
Instead of working directly with raw scan points, feature based localization first
transforms the raw scans into geometric features. Feature based method increase the
efficiency and robustness of the navigation and makes the data more compact. Hence,
it is necessary to automatically, and robustly, detect landmarks for data association,
and ultimately localization and mapping purposes for robots. This approach has been
studied and employed intensively in recent research on Robotics [2]. One of the key
problems of feature-based navigation is the reliable acquisition of information from
the environment. Due to many advantages of laser rangefinder, such as good reliabil-
ity, high sampling rate and precision, it is wildly used in mobile robot navigation.
UKF[3] is a nonlinear optimal estimator based on given sampling strategy without
the linearization steps required by the Extended Kalman Filter(EKF). And it is not
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 278–287, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Robust Mobile Robot Localization by Tracking Natural Landmarks 279
restricted to Gaussian distributions. It has been improved that with appropriate sigma
point selection schemes, the UKF performs better estimation by exploiting more in-
formation such as the first three moments of an arbitrary distribution or the first four
non-zero moments of a Gaussian distribution. In proposed system, UKF fuses the
information from odometry with the information from range images to get robot’s
position estimation.
In this paper, we describe a feature based localization system, which is composed
of two main parts: feature extraction and probabilistic localization. For feature extrac-
tion, we describe a geometrical feature detection method for use with conventional 2D
laser rangefinder. The contribution of this paper is that our method employs the vari-
ance function of local tangent associated to laser range scan readings to track multiple
types of natural landmarks, such as line, curve and corner, instead of calculating cur-
vature function. The extracted features and their error models are used, together with
a priori map, by an UKF so as to get an optimal estimate of robot’s current pose vec-
tor and the associated uncertainty.
Clustering is a key procedure for landmark extraction process. It can roughly segment
the range image to some consecutive point clusters, which will also enhance the effi-
ciency of the extraction procedure. Range readings belong to the same segment while
the distance between them is less than a given threshold [4].
For range data acquired from clustering process, the relationship of adjacent points
is, however, totally dependent on the shape of the scanned environment. The differ-
ence between neighboring points can be depicted by a function of geometry pa-
rameters, such as the direction of the local tangent or curvature of objects in the
environment, etc.
The tangent of the points belong to the same features is supposed to be same or
change regularly, see Fig.1(a). In this work, we employ the variance function of local
tangent associated to laser range scan images to break the segments into several
unique features. The tangent of each scan point can be calculated by line-regression of
k-neighboring points. In our experiment, k is 7.
280 X. Feng et al.
(a) (b)
Fig. 1. a) MTL of range points; b) Features extracted associated to (a): -line, - corner and
○ -curve
The variance between two consecutive tangents which belong to point p (i ) and
p (i + 1) is
Where k (i ) and k (i + 1) are the slop of two tangents for p (i ) and p (i + 1) . Fig.2 is
the tangent variance function with respect to Fig.1(a).
Fig.1(b) shows the detected line segments, corners and curve segments associated to
Fig.1(a).
3 Landmark Extraction
The line parameters are (θ , d ) from the line equation x cosθ1 + y sinθ1 = d1 . Given n
points p(ri , φi ) , the regression parameters are given by following expressions[5]:
1 −2∑ i ( y − yi )( x − xi ) 1 N ,
θ = arctan( ) = arctan
2 ∑ i ⎡⎣( y − yi )2 − ( x − xi )2 ⎤⎦ 2 D (3)
d = x cos θ + y sin θ .
Where x = ∑ r cos φ
i i n and y = ∑ ri sin φi n .
The parameters θ and d are a function of all the measured points pi , i = 1 n
belonging to equation (3). Assuming that the individual measurements are independ-
ent, the covariance matrix of parameters (θ , d ) can be calculated as
Cθ ,d = ∑ i J i Cxyi J iT .
n
(4)
and r are the center and the radius of the circle, respectively. For a circle fitting prob-
lem, the data set ( x, y ) is known and the circle parameters need to be estimated.
Assuming that we have obtained M measurements p( xi , yi ) , our objective is to
find p( x0 , y0 , r ) subjecting to fi ( x0 , y0 , r) = ( xi − x0 )2 + ( yi − y0 )2 − r 2 = 0
where i = 1 M .
282 X. Feng et al.
3.3 Corners
Then, real and virtual corners can be obtained from the intersection of the previously
detected line segments. Once a corner is detected, its position ( xc , yc ) is estimated
as the intersection of the two lines which generate it. Given two lines
x cos θ1 + y sin θ1 = d1 and x cos θ 2 + y sin θ 2 = d 2 , the equations to calculate
corner ( xc , yc ) is:
C xc yc = JCd1θ1d2θ2 J T . (8)
Where J is the Jacobian matrix of equation (7) to (d1 ,θ1 , d 2 , θ 2 ) . And Cd1θ1d 2θ2 is
the covariance matrix of line parameters ( d1 , θ1 , d 2 , θ 2 ) . For convenience, the line
parameters ( d1 , θ1 ) and ( d 2 , θ 2 ) for two different tangent lines are supposed to be
independent. Then Cd1θ1d 2θ 2 is simplified to be diag (Cd1θ1 , Cd2θ2 ) .
In this application the initial location of the robot is known, and the robot has a
priori map of the locations of geometric landmark pi = [ p x , p y ]i . Each landmark is
assumed to be known. At each time step, observations z j (k ) of these Landmarks are
taken. UKF combines observations with a model of the system dynamics which are
the kinematics relationships that express the change in position of the robot as a func-
tion of the displacement of two wheels to produce an optimal estimate of the robot's
position and orientation.
The system model describes how the vehicle’s position x(k ) changes with time in
response to a control input u ( k ) = [ sl , sr ]k and a noise disturbance v( k ) . It has
the form:
x(k + 1) = f ( x(k ), u (k )) + v(k ) , v(k ) ∼ N (0, Q(k )) . (9)
Where f ( x(k ), u (k )) is nonlinear state transition function. And noise v(k ) is as-
sumed to be zero mean Gaussian with variance Q ( k ) . According to the kinematics of
a two wheel differential drive mobile robot using in our experiment, the translation
function then has the form:
⎡ x(k ) ⎤ ⎡ ∆s ⋅ cos(θ + ∆θ 2) ⎤
f (⋅, ⋅) = ⎢⎢ y (k ) ⎥⎥ + ⎢⎢ ∆s ⋅ sin(θ + ∆θ 2) ⎥⎥ .
(10)
⎢⎣θ (k ) ⎥⎦ ⎢⎣ ∆θ ⎥⎦
The observation model describes how the measurements z j ( k ) are related to the ve-
hicle’s position and has the form:
are relative to the robot itself, therefore the observation equation is the function of the
robot state and the landmark:
⎡ ( p − x (k )) 2 + ( p − y (k )) 2 ⎤
⎢ x y
⎥
h(⋅, ⋅) = ⎢ p y − y (k ) ⎥ . (12)
⎢ arctan( ) − θ (k ) ⎥
⎣ p x − x(k ) ⎦
First, using the system model and control input u (k ) , we predict the robot’s new
location at time step k+1:
x (k + 1) = ∑ i = 0 Wi ⋅ χ ik +1 .
2n
(13)
z j (k + 1) = ∑ i =0 Wiζ i k, j+1 .
2n
(15)
Where ζ i , j = h( χik +1 , p j ) .
k +1
2n
S j (k + 1) = ∑ Wi (ζ i k, j+1 − z ( k + 1))(ζ i k, j+1 − z ( k + 1))T + R j (k + 1) . (16)
i =0
v j (k + 1) S −j 1 (k + 1)vTj (k + 1) ≤ χ 2 . (17)
Where
2n .
PZ k +1 = ∑Wi (ζ i k, j+1 − z j ( k + 1))(ζ i k, j+1 − z j (k + 1))T (19)
j
i =0
2n . (20)
PX k +1 = ∑Wi ( χik +1 − x (k + 1))(ζ ik, j+1 − z j (k + 1))T
k +1Z j
i =0
5 Experimental Results
In this section, we use the mobile robot RoboSHU which is equipped with a laser
rangefinder LMS200 as exteroceptive sensor to test the localization capability of our
method. The rangefinder is configured to have a view of 180 degrees with a resolution
of nearly 0.5°. The robot is running under a RTOS in a 1.6 GHz PC and can be con-
trolled by a remote control module via wireless network. All Algorithms are pro-
grammed in C++. The average processing times per scan is 0.3s, being suitable for
real time applications. The localization experiment is carried out in a laboratory envi-
ronment (see Fig.4).
For simplicity, we assumed time-invariant, empirically estimated covariance matri-
ces Q(k ) = Q and R j ( k ) = R , j = 1, , M , given by R = diag (50,50, 0.01) ,
Q(1,1) = Q(2, 2) = 10 , and Q(1, 2) = Q(2,1) = 0 . The initial pose ( x0 , y0 , θ 0 ) is
assumed Gaussian with initial covariance matrices P0 = diag (100,100, 0.025) .
286 X. Feng et al.
Fig. 3. Pose errors computed with the ground truth and the 1σ confidence bounds computed
using the estimate covariance of system in x, y and θ are shown
Fig.3 shows the pose estimate errors of our UKF localization method, respectively
in coordinates ( x, y ) and orientation θ . The 1σ confidence bounds for the estimates
are also superimposed. It can be seen that the errors are not divergent.
Fig.4 represents the trajectory estimation obtained using our localization method.
The trajectory predicted by the odometry deviates from the true robot’s path as time
increases. The UKF is capable of tracking the robot’s position fairly accurate, even
though the trajectory includes abrupt turns. We can see that the robot can localize
itself by fusing the sensor information so as to navigate successfully. This experiment
proves the validity of the localization algorithm based on UKF.
6 Conclusions
This paper describes the implementation and obtained results of an UKF based Local-
ization framework using a 2D laser rangefinder. A segmentation based on the
variance of the local tangent value is introduced. This algorithm can provide line
segments, corners and curve segments for mobile robot localization. Landmarks ex-
tracted from these segments are not only characterized by their parameters but also
uncertainties. The system is based on an UKF that utilizes matches between observed
geometric Landmarks and a priori map of Landmarks locations to provide optimal
pose estimation for mobile robot. The accuracy and robustness of the proposed
method was demonstrated in an indoor environment experiment.
Robust Mobile Robot Localization by Tracking Natural Landmarks 287
Acknowledgement
We are grateful for the financial support from the State High-Tech Development Plan
(863 program) of China under contract No. 2007AA041604.
References
1. Iyengar, S., Elfes, A.: Autonomous Mobile Robots, vol. 1, 2. IEEE Computer Society Press,
Los Alamitos (1991)
2. Jensfelt, P., Christensen, H.: Laser Based Position Acquisition and Tracking in an Indoor
Environment. In: IEEE Int. Proc. on Robotics and Automation, vol. 1 (1998)
3. Julier, S.J., Uhlmann, J.K., Durrant-Whyte, H.F.: A new approach for filtering nonlinear
systems. In: Proc. Am. Contr. Conf., Seattle, WA, pp. 1628–1632 (1995)
4. Arras, K.O., Tomatis, N., Jensen, B.T., Siegwart, R.: Multisensor on-the-fly localization:
Precisionand reliability for applications. Robotics and Autonomous Systems 34, 131–143
(2001)
5. Arras, K.O., Siegwart, R.: Feature extraction and scene interpretation for map based naviga-
tion and map building. In: Proc. of SPIE, Mobile Robotics XII, vol. 3210 (1997)
6. Nash, J.C.: Compact numerical methods for computers: linear algebra and function minimi-
zation. Adam Hilger Ltd. (1979)
7. Nunez, P., Vazquez-Martin, R., del Toro, J.C., Bandera, A., Sandoval, F.: A Curvature
based Method to Extract Natural Landmarks for Mobile Robot Navigation. In: IEEE Int.
Symposium on Intelligent Signal Processing 2007, October 2007, pp. 1–6 (2007)
Multi-Robot Dynamic Task Allocation Using Modified
Ant Colony System
Abstract. This paper presents a dynamic task allocation algorithm for multiple
robots to visit multiple targets. This algorithm is specifically designed for the
environment where robots have dissimilar starting and ending locations. And the
constraint of balancing the number of targets visited by each robot is considered.
More importantly, this paper takes into account the dynamicity of multi-robot
system and the obstacles in the environment. This problem is modeled as a con-
strained MTSP which can not be transformed to TSP and can not be solved by
classical Ant Colony System (ACS). The Modified Ant Colony System (MACS)
is presented to solve this problem and the unvisited targets are allocated to ap-
propriate robots dynamically. The simulation results show that the output of the
proposed algorithm can satisfy the constraints and dynamicity for the problem of
multi-robot task allocation.
1 Introduction
The multiple robots system has been widely used in the planetary exploration, seafloor
survey, mine countermeasures, mapping, rescue and so on. A typical mission is visiting
the targets locating on different positions in a certain area. Multi-robot task allocation
(MRTA) is a critical problem in multi-robot system. This research introduces a dy-
namic task allocation algorithm for multiple robots to visit multiple targets.
The existing research on multi-robot task allocation for targets visiting is limited and
immature. The waypoint reacquisition algorithm provides a computationally efficient
task allocation method [1]. This approach is accomplished by employing a cluster-first,
route-second heuristic technique with no feedback or iterations between the clustering
and route building steps. However, it will generally not find the high-quality solutions.
It also may leave some waypoints (i.e. targets) unvisited. The planning of underwater
robot group is modeled as a MTSP and implemented by genetic algorithm [2]. The
author of this paper has studied the multi-objective task allocation problem for
multi-robot system in the literature [3]. However, all the proposed algorithms do not
take into account the dynamicity of multi-robot system and the obstacles in the envi-
ronment. Due to the dynamicity of multi-robot system, when some robot is damaged by
a destructive target, the system must reallocate the targets unvisited to the remaining
robots on-line. Meanwhile, each robot will change its path during the procedure of
avoiding obstacles, thus the initial schedule of targets visiting may not be optimal and
rescheduling should be considered.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 288–297, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Multi-Robot Dynamic Task Allocation Using Modified Ant Colony System 289
In this paper, the task allocation problem for multiple robots is modeled as a con-
strained MTSP. The MTSP is an NP-hard problem in combinatorial optimization. It is a
generalization of the well-known TSP where more than one salesman is allowed to be
used in the solution. Although there is a wide body of the literature for the TSP, the
MTSP has not received the same amount of attention. To find a guaranteed optimal
solution to the MTSP using exhaustive searches is only feasible for very small number
of cities. Many of the proficient MTSP solution techniques are heuristic such as evo-
lutionary algorithm [4], simulated annealing [5], tabu search [6], genetic algorithms [7],
neural networks [8] and ant systems [9].
Due to the requirement of targets visiting for multiple robots, robots typically have
dissimilar starting and ending locations and the workload of each robot should be
balanced. This problem belongs to the fixed destination multi-depot MTSP [10] and the
constraints of balancing the targets number visited by each vehicle should also be taken
into account. However, the heuristic methods mentioned above can not deal with this
constrained MTSP directly. The classical Ant Colony System (ACS) algorithm is used
to solve MTSP which can be transformed to TSP widely. In ACS, each ant constructs a
solution from a random starting point and visits all targets one by one. However, it can
not solve the fixed destination multi-depot MTSP with the constraints mentioned
above. In this paper, a new Modified Ant Colony System (MACS) algorithm is pre-
sented to solve this constrained MTSP and applied to the multi-robot dynamic task
allocation problem.
The reminder of this paper is organized as follows. First, the multi-robot task allo-
cation problem is formulated by integer linear programming formulation. In Section 3,
we give a detail description of the Modified Ant Colony System (MACS). Multi-robot
dynamic task allocation using MACS is introduced in section 4. Finally, simulation
results show that the proposed algorithm can find optimized solutions which can satisfy
the constraints and dynamicity for the problem of multi-robot dynamic task allocation.
The multi-robot task allocation problem can be stated as follows: Given n targets with
random locations in a certain area, let there be m robots which typically have dis-
similar starting and ending locations outside the area. The m robots must visit the n
targets, and each target must be visited exactly once by only one robot. In order to
saving energy and time, the total distance of visiting all targets must be minimized. In
addition, the targets number visited by each robot should be average due to the re-
quirement of workload balancing.
We propose the following integer linear programming formulation for the con-
strained MTSP defined above. The distance objective function f (x ) can be described
as follows:
m ni −1
f ( x) = ∑ ( ∑ d (Ti k , Ti k +1 ) + d ( S i , Ti1 ) + d (Ti ni , Ei )) (1)
i =1 k =1
290 Z. Xu, F. Xia, and X. Zhang
The constrained task allocation problem for multiple robots is formulated as follows:
min f ( x) (3)
subject to g ( x ) (4)
The multi-robot system discussed in this paper is constructed by the hierarchical colony
architecture [11]. Fig. 1 shows the logical hierarchical model of multi-robot system.
3.1 Initialization
Due to the characteristic of fixed destination multi-depot MTSP, the MACS algorithm
is different from the classical ACS in initialization. The dissimilar starting depots and
ending depots of robots should be considered. Instead of putting all ants on the targets
randomly in ACS, the MACS put all ants on the starting depots or ending depots of
robots randomly. That is, we put all ants on the S i (i = 1,2, L , m) or Ei (i = 1,2,L, m)
randomly. All ants will start from one depot and search at the same space.
Similarly, initialization of the pheromone matrixes and cost matrixes of ant colony
are modified. The pheromone and the cost from one depot to all targets should be
calculated and stored.
As far as the multiple depots and workload balancing are concerned, the method of
solution construction needs to be modified. According to the theory of classical ACS,
each ant runs the steps of solution construction and pheromone updating until all targets
have been visited. In MACS, there are three modifications during the procedure of
solution construction.
(1) The task number allocation phase is employed to realize the task number con-
straint. Here, the task number is equal to the number of targets assigned to each robot to
visit. This phase focuses on assigning the number of task while the task allocation and
optimization will be performed at the following phase. If the number of targets n can
be divided exactly by the number of robots m , the task number ni for robot
i (i = 1,2,L m) should be n / m . Otherwise, let s be the quotient and r be the re-
mainder, then ni can be defined mathematically as:
where u is a random uniform variable [0,1] and the value u0 is a parameter to influ-
ence the target number of different robots. v denotes the total number of robots whose
target number is s + 1 and w denotes the total number of robots whose target number
is s . After this phase, the task number for each robot to visit is clear and we can
m
see ∑n
i =1
i = n.
292 Z. Xu, F. Xia, and X. Zhang
(2) Each ant starts from the initial position, and then selects unvisited targets to
construct a route. In MACS, the procedure of solution construction is a little complex
compared with the ACS. Suppose that an ant starts from S p , p ∈ {1,2,L, m} , if the
number of targets which the ant has visited is equal to n p , then the ant returns to E p .
After that, the ant chooses an unvisited S i (i = 1,2, L p − 1, p + 1,L, m) , departs from
that starting depot and selected unvisited targets iteratively. The targets which have
been visited by an ant are recorded in the ant’s tabu table. A valid solute is constructed
by an ant until all the targets, starting depots and ending depots are visited. We define
the ant number is a , thus there are a solutions constructed by a ants.
Every ant selects the next city independently. The rules of moving from target i to
target j for ant k can be formulated as follows:
⎪
j = ⎨ l∈Nik
{
⎧ arg max τ il [η il ]β , } if q ≤ q0
(6)
⎪⎩ J , otherwise
If q is larger than q0 , the probability of moving from target i to target j for ant k
can be formulated as follows:
pijk =
[τ ij ]α [ηij ]β , if j ∈ N ik
∑ [τ il ]α [ηil ]β (7)
l∈N ik
where τ il is equal to the amount of pheromone on the path between the current target i
and possible target l . η il is the heuristic information which is defined as the inverse of
the cost (e.g. distance) between the two targets. q is a random uniform variable [0,1]
and the value q0 is a parameter to determine the relative influence between exploita-
tion and exploration. α and β are the parameters whose value determine the relation
between pheromone and heuristic information. N ik denotes the targets which ant k has
not visited.
(3) The objective function values of the routes constructed by all ants is computed
according to Eq.1 and sorted in increasing order. In order to improve future solutions,
the pheromone on the best solution found so far must be updated after all ants have
constructed their valid routes. The global pheromone updating is done using the fol-
lowing equation:
τ ij = (1 − ρ )τ ij + ρ∆τ ijbest , ∀(i, j ) ∈ T best (8)
τ 0 = 1 / nC nn (10)
where ξ (0 < ξ < 1) represents the speed of local pheromone evaporation. τ 0 is the
initial value of pheromone. C nn is the objective value computed with the Nearest
Neighbor Algorithm. The global updating belongs to a positive feedback mechanism,
while the local updating can increase the opportunity to choose the targets which have
not been explored and avoid falling into the state of stagnation.
After the MACS have met the end condition (e.g. the pre-specified number of it-
erations, denoted as N c ), the optimal solution is obtained. The time complexity of
MACS is O ( N c ⋅ a ⋅ n 2 ) which is equal to the time complexity of the classical ACS.
In the first phase, the MACS algorithm is used to solve constrained MTSP and all
targets are allocated to appropriate robots initially. The MACS is carried out with the
program language of c and embedded in the control system software of each robot.
Initial task allocation should be run by a leader robot with centralized mode. The result
of allocation will be sent to each robot in the multi-robot system by communication.
In the second phase, the robots set off to visit individual targets and run local path
planning algorithm to avoid obstacles.
Task reallocation with centralized mode. The leader robot is responsible for colony
supervision online to conduct failures. All the follower robots will send state messages
to leader robot whenever a timer signal arrived during the procedure of executing tasks.
If one robot has been found losing touch with the leader robot. Then it is considered
encountering damage and lost. In order to avoid leaving some targets unvisited, the
leader robot will reallocate the tasks unvisited to other remaining robots with central-
ized mode. The MACS algorithm only runs in the control system of the leader robot at
this situation. The inputs of algorithm are the current positions of remaining robots and
the positions of targets unvisited. The output of task reallocation will be sent to the
294 Z. Xu, F. Xia, and X. Zhang
remaining robots. Each robot which receives this message will update its task list and
visit the new target.
Task rescheduling with distributed mode. In order to adapt the unpredictable envi-
ronment, each robot will reschedule its targets list during the procedure of avoiding
obstacles with distributed mode. The MACS algorithm runs in the control system of
each robot at this situation. The inputs of algorithm are the current position of the robot
avoiding obstacles and the positions of targets unvisited by this robot. The result of task
rescheduling is the targets list with new order which is probably different with the
former order of visiting.
5 Simulation Experiments
In this section, we discuss the parameter settings for the proposed task allocation al-
gorithm and present simulation results. According to the MACS algorithm, the pa-
rameters are set as follows: a = n , α = 1 , β = 2 , q0 = 0.9 , u0 = 0.5 , ρ = 0.5 ,
ξ = 0.1 , N c = 200 . The algorithm was tested in the scenario which contains 20 targets
( n = 20 ) and 3 robots ( m = 3 ). The targets positions and the obstacles are generated
randomly. Assumed that the respective starting and ending depots coordinates of the
three robots are S1 = ( 20,0) , S 2 = (50,0) , S 3 = (80,0) , E1 = (40,100) , E2 = (50,100) ,
E3 = (60,100) . Two typical experiments were carried out to verify the validity and
rationality of the proposed task allocation algorithm.
Fig. 2 shows the results of experiment A which focuses on verifying task reallocation
with centralized mode. The targets layout and the result of initial task allocation are
shown in Fig. 2 a). As we have seen, each robot obtains a valid route, and the task
number constraint is satisfied. The leader robot is R1 and it supervises the whole
multi-robot system.
In Fig. 2 b), the robot R2 is damaged and out of work, then the leader robot R1 detects
this failure. The MACS is applied to task reallocation by robot R1 with the input of
coordinates of targets unvisited and current positions of remaining robot (R1 and R3).
The result of online task reallocation is illustrated in Fig. 2 c). The unvisited targets are
allocated to R1 and R3 autonomously.
The coordinates of targets in this experiment are shown in Table 1. However, the
process of targets visiting with multi-robot system is unpredictable and the layout of
targets is randomly. There are many other simulation experiments with various layouts
of targets, and the similar valid solutions are obtained. This experiment results show
that the leader robot can perform online task reallocation with centralized mode using
MACS algorithm.
Experiment B focuses on the procedure of task rescheduling with distributed mode.
Assumed that there are two obstacles with the figure of square in the area, the layouts of
targets and obstacles can be seen in fig. 3 a). The two obstacles have the size of
15m × 10m and 20m × 5m respectively. The coordinates of targets in experiment B are
shown in Table 1. According to the initial task allocation, the paths of robot R2 and R3
are across the obstacles.
Multi-Robot Dynamic Task Allocation Using Modified Ant Colony System 295
100 100
90 90
80 80
70 70
60 60
R3
Y(m)
Y(m)
50 50 R2
R1
40 40
30 30
20 20
10 10
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
X(m) X(m)
a) b)
100
90
80
70
60
task reallocation
Y(m)
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
X(m)
c)
R2 or R3 will run local path planning and avoid the obstacles as soon as it detects the
obstacle during the procedure of target visiting. Meanwhile, each robot independently
runs MACS online to optimize the schedule of targets visiting with the input of its
current position and the coordinates of targets unvisited in its task list. The results of
task rescheduling of R2 and R3 are shown in Fig. 3 b). The travel distance of robot
according to the new visiting order of targets is reduced obviously.
Experiment Coordinates
{52,82}, {18,40}, {47,90}, {37,32}, {37,86}, {79,69}, {57,37}, {69,69},
A {58,10}, {82,70}, {13,50}, {56,43}, {96,41}, {87,86}, {97,52}, {94,89},
{71,27}, {29,56}, {92,14}, {33,71}
{28,22}, {59,74}, {46,75}, {47,70}, {77,55}, {42,44}, {67,38}, {66,69},
B {35,60}, {21,60}, {20,51}, {48,46}, {60,66}, {15,19}, {37,26}, {63,63},
{22,30}, {50,69}, {17,43}, {61,88}
296 Z. Xu, F. Xia, and X. Zhang
The task rescheduling method allows the bulk of the computation to be distributed
among the robots on their respectively targets list and further reduces the computational
burden.
The computing time of MACS algorithm under the condition mentioned above is
less than 1s and can satisfy the requirement of real-time for robot control system. The
simulation results show that the output of the proposed algorithm can solve the con-
straints and dynamicity for the problem of multi-robot dynamic task allocation.
100 100
90 90
80 80
70 70
60 60
task
Y(m)
Y(m)
50 50 rescheduling
40 40
30 30
20 20
10 10
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
X(m) X(m)
a) b)
6 Conclusion
In this paper, we considered the situation which is more appropriate in the real
multi-robot task allocation problem for task visiting. Besides the objective of mini-
mizing the total distance of multiple robots, we also took into account the constraints of
multiple depots and workload balancing.
The task allocation problem for multiple robots was modeled as a constrained
MTSP. This is also a fixed destination multi-depot MTSP. As many proficient heuristic
methods to solve MTSP can not deal with such problem directly, this research proposed
Modified Ant Colony System (MACS) method to solve this constrained MTSP.
The proposed MACS algorithm was used to solve the dynamic task allocation for
multiple robots. Combined with the hierarchical colony architecture and the dynamicity
of multi-robot system, the MACS algorithm was not only applied to reallocate targets
unvisited with centralized mode when some robot encountered damage but also carried
out to reschedule individual targets list with distributed mode when some robot avoided
obstacles. This research made the first attempt to solve the dynamic task allocation
problem.
The simulation results show that the output of the proposed MACS algorithm can
not only solve the constraints of multi-robot task allocation problem, but also satisfy the
requirement of dynamicity for the problem of online task reallocation.
Multi-Robot Dynamic Task Allocation Using Modified Ant Colony System 297
References
1. Stack, J.R., Smith, C.M., Hyland, J.C.: Efficient reacquisition path planning for multiple
autonomous underwater robots. In: Ocean 2004 - MTS/IEEE Techno-Ocean 2004 Bridges
across the Oceans, pp. 1564–1569 (2004)
2. Zhong, Y., Gu, G.C., Zhang, R.B.: New way of path planning for underwater robot group.
Journal of Harbin Engineering University 24(2), 166–169 (2003)
3. Xu, Z.Z., Li, Y.P., Feng, X.S.: Constrained Multi-objective Task Assignment for UUVs
using Multiple Ant Colonies System. In: The 2008 ISECS International Colloquium on
Computing, Communication, Control, and Management, pp. 462–466 (2008)
4. Fogel, D.B.: A parallel processing approach to a multiple traveling salesman problem using
evolutionary programming. In: Proceedings of the fourth annual symposium on parallel
processing, pp. 318–326 (1990)
5. Song, C., Lee, K., Lee, W.D.: Extended simulated annealing for augmented TSP and
multi-salesmen TSP. In: Proceedings of the international joint conference on neural net-
works, vol. 3, pp. 2340–2343 (2003)
6. Ryan, J.L., Bailey, T.G., Moore, J.T., Carlton, W.B.: Reactive Tabu search in unmanned
aerial reconnaissance simulations. In: Proceedings of the 1998 winter simulation confer-
ence, vol. 1, pp. 873–879 (1998)
7. Tang, L., Liu, J., Rong, A., Yang, Z.: A multiple traveling salesman problem model for hot
rolling scheduling in Shangai Baoshan Iron & Steel Complex. European Journal of Opera-
tional Research 124, 267–282 (2000)
8. Modares, A., Somhom, S., Enkawa, T.: A self-organizing neural network approach for
multiple traveling salesman and robot routing problems. International Transactions in
Operational Research 6, 591–606 (1999)
9. Pan, J.J., Wang, D.W.: An ant colony optimization algorithm for multiple traveling sales-
man problem. In: Proceedings of the first international conference on innovative computing,
information and control (2006)
10. Kara, I., Bektas, T.: Integer linear programming formulations of multiple salesman prob-
lems and its variations. European Journal of Operational Research, 1449–1458 (2006)
11. Xu, Z.Z., Li, Y.P., Feng, X.S.: A Hierarchical control system for heterogeneous multiple
uuv cooperation task. Robot 30(2), 155–159 (2008)
12. Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the
traveling salesman problem. IEEE Transactions on Evolutionary Computation 1(1), 53–66
(1997)
A Novel Character Recognition Algorithm Based
on Hidden Markov Models
1 Introduction
The optical character recognition (OCR) has become an important research orienta-
tion in modern computer application domain. The character has many applications,
such as the vehicle license plate character recognition, information retrieval and un-
derstanding for video images, the character recognition for bills and documents and
etc. In the past years, many researches have been devoted to character recognition.
Independent component analysis character recognition in [1] recognizes characters
using a target function to reconstruct the character being recognized and analyzing the
errors; character recognition based on neural network in [2,3] recognize characters
with strong anti interference performance and brief program; character recognition
based on nonlinear active shape models in [4] has good performance when the charac-
ter is nonrigid and with normal variation rules. Actually, there are lots of different
uncertain information in character recognition such as the variable character shapes,
the complex character pattern, so the algorithms in [1-4] have respective limitations in
their applications.
It is a good method to adopt the statistical model building in character recognition
because of the uncertain information. The Hidden Markov Models (HMMs) has supe-
riority performance to process the serialization dynamic non-stationary signal. It has
been widely used in speech recognition [5]. The HMMs has great potential in the
character recognition because it has better flexibility in variable pattern processing.
One-dimension (1D) HMMs is used to recognize character in [6]. A two-dimension
(2D) HMMs is used in [7]. Because a 2DHMMs has a heavier computation than the
1DHMMs, a pseudo 2DHMMs with less computation is applied to recognize charac-
ters in [8].
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 298–305, 2009.
© Springer-Verlag Berlin Heidelberg 2009
A Novel Character Recognition Algorithm Based on Hidden Markov Models 299
The algorithm proposed in this paper is based on the Freeman chain code. It ex-
tracts multiple character features from each character. A 1D-multiple-HMMs is built
according these features to recognize characters. A large number of vehicle license
plate characters are used to test the performance of the algorithm. The experimental
results indicate this algorithm has a good recognition performance aiming at different
kinds of character.
where π is the initial state distribution, A is the set of state transition probabilities
and B is the probability distribution in each of the states.
Character recognition involves two basic problems of interest in HMM: the learn-
ing problem (construct models) and the evaluation problem (recognize character).
The learning problem is how to adjust the HMM parameters, so that the given set
of observations (called the training set) is represented by the model in the best way for
the intended application. Thus it would be clear that the ``quantity'' we wish to opti-
mize during the learning process can be different from application to application. In
other words there may be several optimization criteria for learning, out of which a
suitable one is selected depending on the application. This problem can be solved by
Baum-Welch algorithm in [5].
The evaluation problem is how to recognize characters by HMM. Give the HMM
model λ and the observation sequence X = ( X 1 , X 2 ,..., X T ) to calculate the probabil-
ity of P = ( X | λ ) . This problem can be solved by forward-backward algorithm in [5].
3 Model Building
This proposed algorithm is based on the Freeman chain code. Several different fea-
tures of every character are extracted. Slope analysis method is used to extract chain
code of each feature than has been projected. These extracted chain code are used as
observation sequence to establish model. Freeman 8-direction chain code is shown in
figure 1.
300 Y. Wang et al.
45$
Each state in the model has a self-circulation process to absorb redundancy strokes.
Each state can transfer to the first and the second state behind it to describe the want-
ing strokes of characters. The initial state distribution π and the set of state transition
probabilities A can be defined by this model.
We suppose the probability of different symbols in observation sequence is the
same, so observation probability in each of the states can be initialized as 1/ M .
(a) (b)
(a) (b)
from the left of the contour. These chain codes can be used qua the horizontal and
vertical features observation sequence. A typical 16 × 16 character ``A`` and its hori-
zontal projection contour image are shown in figure 4. Its chain code is (122…63). In
the similar way, vertical projection contour chain codes can be got.
∫∫ x f ( x , y )dx dy
' ' ' ' '
a= (2)
∫∫ f ( x , y )dx dy
' ' ' '
∫∫ y f ( x , y )dx dy
' ' ' ' '
b= (3)
∫∫ f ( x , y )dx dy
' ' ' '
An image can be got from the image f ( x, y ) after the affine variations
302 Y. Wang et al.
⎡x ⎤ ⎡ x⎤ ⎡d ⎤
'
(4)
⎢ y ⎥ = A ⎢⎣ y ⎥⎦ + ⎢⎣ d ⎥⎦
a 1
⎣ ⎦
'
b 2
⎡x −d ⎤ ⎡x⎤
'
⎢y ⎥ = A ⎢⎣ y ⎥⎦ (5)
a 1
⎣ −d ⎦
'
b 2
∫∫ xf ( x , y )dxdy
' '
=
' a b
a (6)
∫∫ f ( x , y )dxdy
' '
a b
∫∫ yf ( x , y )dxdy
' '
b
'
=
a b
(7)
∫∫ f ( x , y )dxdy
' '
a b
⎡x ⎤ ⎡ x −d ⎤
'' '
⎢y ⎥ = ⎢y ⎥ , so
a a 1
Let
⎣ ⎦ ⎣ −d ⎦
'' '
b b 2
⎡x⎤ = ⎡ x ⎤ ⎡k ⎤ ⎡x ⎤
'' ''
−1
k
⎢ y ⎥ = ⎢⎣ k
11 12
(8)
⎥⎦ ⎢⎣ y ⎥⎦
a a
⎢⎣ y ⎥⎦ A
⎣ ⎦
''
k
''
b 21 22 b
Hence
∫∫ xf ( x , y )dxdy
'' ''
a =
' a b
∫∫ f ( x , y )dxdy
'' ''
a b
∫∫ (k x + k y ) f ( x , y )dx dy
'' '' '' '' '' ''
=
11 a 12 b a b a b
∫∫ f ( x , y )dx dy
'' '' '' ''
a b a b
(9)
∫∫ x f ( x , y )dx dy
'' '' '' '' ''
=k
a a b a b
∫∫ f ( x , y )dx dy
11 '' '' '' ''
a b a b
∫∫ y f ( x , y )dx dy
'' '' '' '' ''
b a b a b
+k
∫∫ f ( x , y )dx dy
12 '' '' '' ''
a b a b
=0
A Novel Character Recognition Algorithm Based on Hidden Markov Models 303
2 2
5 2 2
(a) (b)
In a similar way b = 0 .
'
From the expressions above, we can get that the centroid position of image after af-
fine variations is not changed. So the character target pixels’ centroid distance ex-
presses the features of character. Figure 5 shows the window 5 × 5 with centroid as
the origin and the target pixels’ centroid distance image.
4 Character Recognition
The character optimal model λ = {λ1 , λ2 , λ3 } can recognize characters after parameters
estimation. Three observation sequences can be got for each character being recognized.
In character recognition, forward-backward algorithm is used to calculate the probabil-
ity of each sub-model according to the corresponding observation sequence. Three val-
ues of probability P1 , P2 and P3 where P1 = P ( X 1 | λ1 ) , P2 = P ( X 2 | λ2 ) , P1 = P ( X 1 | λ1 ) . So
the probability P of each character being recognized for each character model in data-
base using the weighted method.
P = P ( X | λ ) = w1 P1 + w2 P2 + w3 P3 (10)
experimental character images have been preprocessed: noise removal, image binari-
zation and character normalization. Some vehicle license plate used for experiment
and the characters recognition result are shown in figure 6. Results of recognition for
different kinds of characters are shown in table 1.
Recognition
Characters Recognized
Rate
Chinese
197 180 91.3%
character
Alphabet 366 335 91.5%
Numeral 816 770 94.3%
Total 1379 1285 93.1%
The results in table 1 show that the proposed algorithm has a high recognition rate
although there are three different kinds of characters and the results in figure 6 show
that characters can be well recognized no matter the characters are inclined or fouled.
6 Conclusion
The character recognition algorithm based on the HMM is a worthy studying subject.
In this paper, a recognition algorithm based on the Freeman chain code and HMM is
proposed. A 1D-multiple-HMMs is built according multiple features which are ex-
tracted from the character to recognize characters. A large number of vehicle license
plate characters are used to test the performance of the algorithm. The experimental
results show that this proposed algorithm has a well performance in recognition for
different kinds of characters.
A Novel Character Recognition Algorithm Based on Hidden Markov Models 305
References
[1] Min, L., Wei, W., Xiaomin, Y., et al.: Independent component analysis based on licence
plate recognition. Journal of Sichuan University 43(6), 1259–1264 (2006)
[2] Qingxiong, Y.: Realization of character recognition based on neural network. Information
Technology (4), 92–95 (2005)
[3] Tindall, T.W.: Application of neural network technique to automatic license plate recog-
nition. In: Proceedings of European Covention on Security and Detection, Brighton,
Enaland, pp. 81–85 (1995)
[4] Shi Da-ming Gunn, S.R., Damper, R.I.: Handwritten Chinese radical recognition using
nonlinear active shape models. IEEE Trans. on Pattern Analysis and Machine
Intelligence 25(2), 277–280 (2003)
[5] Rabiner, L.R.: A tutorial on hidden Markov models and select applications in speech rec-
ognition. Proc. of IEEE 77(2), 257–286 (1989)
[6] Awaidah, S.M., Mahmoud, S.A.: A multiple feature/resolution scheme to Arabic (Indian)
numerals recognition using hidden Markov models. Signal Processing 89(6), 1176–1184
(2009)
[7] Yin, Z., Yunhe, P.: 2D-HMM based character recognition of the engineering drawing.
Journal of Computer Aided Design and Computer Graphics 11(5), 403–406 (1999)
[8] Li, J., Wang, J., Zhao, Y., Yang, Z.: A new approach for off-line handwritten Chinese
character recognition using self-adaptive HMM. In: The Fifth World Congress on Intelli-
gent Control and Automation, 2004 (WCICA 2004), vol. 5, pp. 4165–4168 (2004)
[9] Gang, L., Honggang, Z., Jun, G.: Application of HMM in Handwritten Digit OCR.
Journal of Computer Research and Development 40(8), 1252–1256 (2003)
[10] Keou, S., Fenggang, H., Xiaoting, L.: A fast algorithm for searching and tracking object
centroids in binary images. Pattern Recognition and Artificial Intelligence 11(2), 161–168
(1988)
New Algorithms for Complex Fiber Image Recognition
Shanghai Normal University, Dept. of Computer Science, No. 100 GuiLin Road,
200234 Shanghai, China
{ma-yan,lsb}@shnu.edu.cn
1 Introduction
At present, many approaches have been presented for the fiber image recognition
using digital image processing technology. The processes of the fiber image analysis
and recognition are: image preprocessing, image analysis and image recognition. It
was presented to process data by the principal factor analysis and create experts li-
brary[1]. In [2], the method was designed to extract the image fiber edge, and use the
curve fitting to make a quantitative analysis of the fiber. The gray value morphologi-
cal method was given to process the fiber image[3].
The most above methods are designed for simple fiber image, that is, there is only
1-2 fibers and less cross-points in the image. In this paper, the fiber image being rec-
ognized is complex, that is, they are different in size, cross-cutting and adhesive in
different fibers. This paper presents the binarization method based on fiber boundaries
continuity and variance, thinning and pruning for binary fiber image, the corner detec-
tion algorithm based on chain codes, the recognition method based on the curvature
similarity in the same fiber. The experimental results show that the better results has
been obtained by using the proposed automatic recognition methods.
2 Image Preprocessing
2.1 Image Binarization
The input gray image is taken by melectron microscope (As shown in Fig. 1). The
fibers are mutual adhesive with different intensity from Fig.1. If we use the traditional
differential operators, such as Sobel operator, Canny operator for edge detection, the
results are sensitive to the threshold. When selecting a big threshold, high-contrast
edge can be detected, and low-contrast edge may be neglected. When selecting a small
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 306–312, 2009.
© Springer-Verlag Berlin Heidelberg 2009
New Algorithms for Complex Fiber Image Recognition 307
threshold, the small stripes may be detected and it is be not beneficial for the following
recognition.
The fiber image has the characteristics that the boundary variance is big and non-
boundary variance is small. We present the edge detection method based on the vari-
ance and reduce the threshold based on the continuity of the fiber boundary until all
edge has been detected. Fig.2 is the variance distribution along horizontal direction in
the period of Fig.1, x represents the pixel location, y respresents the variance. If we
encounter the valley point B while detecting, the variance of B is recorded as VarB ,
and the peak point belong to the neighborhood of B is further searched. If we encoun-
ter the peak point A in Fig.2, the variance of A is recorded as VarA . The initial
threshold is defined as Th0 . If VarA − VarB > Th0 , the point A is considered as
boundary point and recorded as 1, otherwise recorded as 0. While taking a big Th0 ,
other peak points with small variance, such as point C, D and E can not be detected.
It is found through out experiments that the distribution of variance of the same fi-
ber boundary has greater difference in the influence of outside light and the distance
between the fiber and electron microscopy. But the variance for the local adjacent
boundary is related and the image binarization method based on variance and bound-
ary continuity is presented:
A
Variance
C D
B
Step 1: Define the initial threshold Th0 ; While the difference of variance diff var
between the peak point and the valley point is greater than Th0 , the peak point is
recorded as 1, otherwise recorded as 0, and diff var is recorded;
Step 2: Define step as step. Define Th0 = Th0 − step . The peak point, satisfied
that it is recorded as 0, diff var < Th0 , and at least one point in its 8-point neighhood
has been recorded as 1, is recorded as 1;
Step 3: Return to Step 2, until Th0 < Thmin , Thmin is the predefined minimum
value.
Through our experiments, Th0 , step and Thmin can be set to 20, 1 and 5 respec-
tively. The experimental results show that we can receive the better results by using
the proposed image binarization method. Fig.3 is the binarizaiton result for the image
shown in Fig.1.In some cases, it is the Contact Volume Editor that checks all the pdfs.
In such cases, the authors are not involved in the checking phase.
In this paper, the binary image is thinned using the look-up table thinning algorithm
proposed in the literature [8]. The 8-neighbor look-up table, in which the value of 8-
neighbor is corresponding with their value, is predefined. The binary image is
scanned from up to bottom, left to right, according to the value in the look-up table to
determine whether to delete the point.
The fiber image after thinning will have burr and the wrong recognized fiber may
occur due to the burr. The deburr method is to start from the endpoint, walk along the
connected pixels and calculate the walk length. The algorithm will continue until
encountering the cross point or another endpoint. When the walk length is lower than
the threshold (usually set to 10), it can be considered as burr and the values of the
walking through pixels are set to 0.
After binarization, thinning and deburr in the image process, it will occur that one
continuous fiber is segmented into two parts wrongly. So the filling algorithm is re-
quired to reduce measure error. In this paper, the angle between adjacent fiber end-
points is calculated and the filling algorithm proposed in the literature [9] is fulfilled.
New Algorithms for Complex Fiber Image Recognition 309
The thinned, pruned fiber image may be crossed and adhesive in different fibers, and
that will occur the wrong connecting point, corner, which should be deleted in order
to ensure the accuracy of recognition.
In this paper, a fiber from the starting point to terminal point is assigned 8 neighbor
chain codes. The chain code values are 0-7 shown in Fig.4.
2
3 1
4 0
5 7
6
Fig. 4. The value of chain codes
Define a chain code as the array c = {c1 , c2 , L , cn } (in this code only consider the
chain code values, regardless of the starting point coordinates). Define the difference
between the adjacent chain code values in the array as d i (that is, the relative chain
code), d i is calculated using the following equation:
⎧ ci +1 − ci ci +1 −c i ≤ 2
⎪
d i = ⎨ c i +1 − c i − 8 ci +1 − ci ≥ 6 (1)
⎪c −c +8 ci +1 − ci ≤ −6
⎩ i +1 i
Since that the largest difference between the directions represented by the adjacent
chain codes is ± 90o , the possible values for d i is 0, ±1, ±2. If the value is zero, the
direction of fiber is unchanging. If positive, it turns right. If negative, it turns left. The
point with d i of 2 may be considered as corner in the theory. But it may not be the
real corner due to the noise in the fiber image. To address the problem, the corner
detection algorithm based on chain code is given as follows:
(1) Calculate the difference d i between adjacent chain code values according to
formula (2), the difference between equation (2) and equation (1) is that the difference
d i between the values of the adjacent chain codes separated by length1 ( length1 can
be set to about 20 based on our experiments), which can avoid the impact of local
noise in the fiber image.
⎧ ci + length1 − ci ci + length1 −c i ≤ 2
⎪
d i = ⎨ci + length1 −c i −8 ci + length1 − ci ≥ 6 (2)
⎪c −c +8 ci + length1 − ci ≤ −6
⎩ i + length1 i
310 Y. Ma and S.-b. Li
that is, n1 is the number of d i of 2 among the following n points after i . Assume s
n1
be the scale value. While > s , the point corresponding to d i is considered as
n
corner. The features of the proposed corner detection algorithm is, while d i is 2, the
point is not seen as corner immediately, the point that satisfies that, the number of d i
of 2 in its following points exceeds the predefined value s ( s can be set to 0.4), is seen
as corner instead. That can avoid the impact of the local noise.
3 Fiber Recognition
The fiber boundaries are represented by chain codes. The chain codes must be
matched each other for the purpose of recognition. Since there exists many different
lengths of chain codes in an image, and two boundaries belonging to the same fiber
have the similarity in curvature, the fast recognition algorithm is given:
(1) Select a chain code ci = {ci1 , ci 2 , L , cin } , assume the cordination of the initial
point p0 is xi , yi , calculate the slope k of the point p0 . While k ≥ 1 , the direction
of point p0 is up and down, the search zone is [xi − h, xi + h] , While k < 1 , the
direction of point p0 is left and right, the search zone is [ yi − h, yi + h] , h can be
defined as the width of the thickness fiber and generally about 50.
(2) Assume a chain code c j = {c j1 , c j 2 , L , c jm } be found in the search zone, due
to unequal length for each chain codes, that is, n ≠ m . Suppose n ≥ m , select m
chain code values from the start of ci to compare with c j one by one. Once the
values of chain code are equal, the match value match increases 1. While one round
comparison stops, the chain codes of ci are shifted to right by one position and the
algorithm continues to compare it with c j until arriving the position of n − m + 1 .
The n − m + 1 number of match are obtained and we represent the largest match as
matchmax . Normalize matchmax , let matchmax = matchmax / m . Define the largest
{
match value of each chain codes as M = matchmax 1 , matchmax 2 , L , matchmax k . }
Find the maximum in M , once the maximum is greater than the predefined value, ci
is considered to be matched with c j , and they are recognized as the same fiber.
3
1
4
2
The most fibers are recognized correctly using the proposed algorithm as Fig.5. For
example, No.1, No.2 and No.3 are cross-cutting in Fig.5, while No.1 is recognized as
one fiber, No.2 and No.3 are recognized as the same fiber and the results are
consistent with the judge of human eyes. No.1 and No.4 are considered as the same
fiber in the position of No.5, since the bifurcation of the two fibers is not obvious in
the position of No.5.
5 Summary
This paper presents binarization, prunning, corner detection and recognition methods
for the recognition of cross-cutting and winding complex fibers. The experimental
results show that the most fibers can be recognized by using the proposed methods.
The diameter and length of fibers can be further measured with the help of the
recognized fibers.
Acknowledgments. Supported by Leading Academic Discipline Project of Shanghai
(
Normal University Project Number DZL805, DCL200802).
References
1. Jinhu, S., XiaoHong, W.: The Application of Image Processing in Specialty Animal Hair
Fibers Recognition. Journal of Textile Research 25(4), 26–27 (2004)
2. ZaiHua, Y., Yuhe, L.: Animal Fiber Length Measurement Based on Image Processing.
Computer Engineering and Applications 41(8), 180–181 (2005)
3. Jian, M., WenHua, Z., WeiGuo, C.: The Application of Gray Value Morphology in Animal
Fiber Image Processing. Computer Engineering and Applications 25(5), 42–44 (2004)
4. ShuaiJie, R., WenSheng, Z.: Measuring Diameter and Curvature of Fibers Based on Image
Analysis. Journal of Image and Graphics 13(6), 1153–1158 (2008)
5. SuPing, Y., PeiFeng, Z., JianPing, C.: Segmentation on Cotton Fiber Cross Secctions Based
on Mask. Computer Engineering 33(13), 188–191 (2007)
6. Bribiesca, E.: A Geometric Structure for Two-Dimensional Shapes and Three-Dimensional
Surfaces. Pattern Recognition 25(5), 483–496 (1992)
7. Canny, J.A.: Computational Approach to Edge Detection. IEEE Trans. on PAMI 8(6),
679–698 (1986)
312 Y. Ma and S.-b. Li
8. Wei, Y., Ke, G., YiKun, W.: An Efficient Index Thinning Algorithm of Fingerprint Image
Based on Eight Neighbourhood Points. Journal of SiChuan University of Science & Engi-
neering 21(2), 61–63 (2008)
9. XiaoFeng, H., WenYao, L., ShouDong, X.: The Computer Measurement for Cotton Fiber
Length. Cotton Science 15(6), 339–343 (2003)
Laplacian Discriminant Projection Based on Affinity
Propagation
Abstract. The paper proposes a new algorithm for supervised dimensionality re-
duction, called Laplacian Discriminant Projection based on Affinity Propagation
(APLDP). APLDP defines three scatter matrices using similarities based on rep-
resentative exemplars which are found by Affinity Propagation Clustering. After
linear transformation, the considered pairwise samples within the same exemplar
subset and the same class are as close as possible, while those exemplars between
classes are as far as possible. The experiments on several data sets demonstrate
the competence of APLDP.
1 Introduction
Linear dimensionality reduction is widely spread for its simplicity and effectiveness
in dimensionality reduction. Principal component analysis(PCA), as a classic linear
method for unsupervised dimensionality reduction, aims at learning a kind of subspaces
where the maximum covariance of all training samples are preserved [2]. Locality Pre-
serving Projections(LPP), as another typical approach for unsupervised dimensionality
reduction, seeks projections to preserve the local structure of the sample space [3]. How-
ever, unsupervised learning algorithms can not properly model the underlying structures
and characteristics of different classes [5]. Discriminant features are often obtained by
supervised dimensionality reduction. Linear discriminant analysis(LDA) is one of the
most popular supervised techniques for classification [6,7]. LDA aims at learning dis-
criminant subspace where the within-class scatter is minimized and the between-class
scatter of samples is maximized at the same time. Many improved LDAs up to date
have demonstrated competitive performance in object classification [10,11,12,14,16].
The similarity measure of the scatter matrices of traditional LDA is based on the
distances between sample vectors and the corresponding center vectors. Frey proposed
a new clustering method, called affinity propagation clustering(APC), to identify a sub-
set of representative examples which is important for detecting patterns and processing
sensory signals in data[21]. Compared with center vector, the exemplars are much more
representative because the similarities between the samples and exemplars within the
same subsection are more closely than those of between the samples and the center
vectors. Motivated by APC, traditional LDA, Laplacian Eigenmaps(LE) and the near-
est neighborhood selection strategy [4,8,9], we propose a new dimensionality reduction
algorithm, Laplacian discriminant projection based on affinity propagation (APLDP),
Corresponding author.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 313–321, 2009.
c Springer-Verlag Berlin Heidelberg 2009
314 X. Chang and Z. Zheng
for discriminant feature extraction. In our algorithm, we play much emphasis on exem-
plar based scatter similarity which can be viewed as extensions of within-class scatter
and the between-class scatter similarity. We formulate the exemplar based scatter by
means of similarity criterions which were commonly used in LE and LPP. The ex-
tended exemplar scatter are governed by different Laplacian matrices. Generally, LDA
can be regarded as a special case of APLDP. Therefore, APLDP not only conquers the
non-Euclidean space problem[5], but also provides an alternative way to find potential
better discriminant subspaces.
2 Related Work
2.1 Linear Discriminant Analysis
Let X = [x1 , x2 , . . . , xn ] ∈ RD×n denote a data set matrix which consists of n samples
{xi }ni=1 ∈ RD . Linear dimensionality reduction algorithms focus on constructing a small
number, d, of features by applying a linear transformation W ∈ RD×d that maps each
sample data {xi } of X to the corresponding vector {yi } ∈ Rd in d−dimensional space as
follows:
W : xi ∈ RD → yi = W T xi ∈ Rd . (1)
Assume that the matrix X contains c classes, and is ordered such that samples appear
by class
X = [X1 , X2 , . . . , Xc ]
(2)
= [x11 , . . . , x1c1 , . . . , xc1 , . . . , xccc ].
In traditional LDA, two scatter matrices, i.e., within-class matrix and between-class
matrix are defined as follows [6]:
1
c
Sw = (x − m(i) )(x − m(i) )T (3)
n i=1 x∈X
i
1
c
Sb = ni (m(i) − m)(m(i) − m)T , (4)
n i=1
where ni is the number of samples in the i − th class Xi , m(i) is the mean vector of the
i − th class, and m is the mean vector of total samples. It follows from the definition that
trace(S w ) measures the within-class compactness, and trace(S b ) measures the between-
class separation.
The optimal transformation matrix W obtained by traditional LDA is computed as
follows [6]:
tr(W T S b W)
Wopt = arg max . (5)
w tr(W T S w W)
To solve the above optimization problem, the traditional LDA computes the following
generalized eigenvalue equation
S b wi = λS w wi , (6)
and takes the d eigenvectors that are associated with the d largest eigenvalues λi , i =
1, . . . , d.
Laplacian Discriminant Projection Based on Affinity Propagation 315
Messages are updated on the above simple formula that search for minima of an ap-
propriately chosen energy function. The message-passing procedure may be terminated
under a fixed number of iterations, or after the local decisions keep constant for some
number of iterations.
xis − κis 2
αis = exp(− ). (14)
t
316 X. Chang and Z. Zheng
To obtain the compact expression of Eq(13), let Ais = diag(α1i , ..., αni s ) be a diagonal
matrix and Yis = [y1i , ..., yni s ]. In addition, let ens denote the all one column vector of
length n s . Then γis = n1s Yis ens . Eq(13) can be reformulated as:
ais = αis tr{(yis − γis )(yis − γis )T }
= tr{ αis yis (yis )T } − 2tr{ αis yis (γis )T } + tr{ αis γis (γis )T }
2
= tr{Yis Ais (Yis )T } − tr{Yis Ais ens (ens )T (Yis )T } (15)
ns
(ens ) Ais ens
T
+ tr{Yis ens (ens )T (Yis )T }
n2s
= tr{Yis Lis (Yis )T }
where
2 s (ens )T Ais ens
Lis = Ais −
Ai ens (ens )T + ens (ens )T (16)
ns n2s
The within-exemplar scatter of class i:
ks
ks
ai = ais = tr{Yis Lis (Yis )T } (17)
s=1 s=1
There also exists a 0-1 indicator matrix Pis satisfying Yis = Yi Pis . Each column of Pis
records the exemplar information which is derived from the clustering process.
ks
ai = tr{Yi Pis Lis (Pis )T (Yis )T }
s=1 (18)
= tr{Yi Li YiT }
ks
where Li = s=1 Pis Lis (Pis )T . The total within-exemplar scatter of all classes:
c
c
A = ai = tr{Yi Li (Yi )T } (19)
i=1 i=1
There exists a 0-1 indicator matrix Qi satisfying Yi = YQi . Each column of Qi records
the class information which is known for supervised learning. Then Eq(19) can be re-
formulated as
c
A = tr{YQi Li (Qi )T (Yi )T }
i=1 (20)
= tr{YLExem Y T },
where LExem = ci=1 Qi Li (Qi )T can be called the within-exemplar Laplacian matrix.
Plugging the expression of Y = W T X into Eq(20), we obtain the final form of the
total within-exemplar scatter:
A = tr(W T DExem W) (21)
where DExem = XLExem X is the within-exemplar scatter matrix.
T
Laplacian Discriminant Projection Based on Affinity Propagation 317
xij − x2
αij = exp(− ). (27)
t
Let Y = [yk11 , . . . , yk11 , . . . , ykcc , . . . , ykcc ] consists of exemplar center vectors of all classes.
By the similar deduction, B can be formulated as follows
T
B = tr{Y Lb Y }, (28)
where
2 eTexem Ab eexem
Lb = Ab − Ab eexem eTexem + eexem eTexem (29)
nexem n2exem
is the exemplar based between-class Laplacian matrix.
Taking Y = W T X into account, we re-write Eq.(28) as follows
B = tr(W T DB W). (30)
T
where DB = XLb X is the total exemplar based between-class scatter matrix.
318 X. Chang and Z. Zheng
4 Experiments
In this section, we investigate the use of APLDP on several data sets including UCI1
and PIE-CMU face data set [13]. The data sets used in the paper belong to different
fields in order to test the performance of APLDP algorithm. We compare our proposed
algorithm with PCA[2], LDA[7], LPP[3] and Marginal Fisher Analysis (MFA)[1].
3 3 3 3 3 3
2 2 2 2 2 2
1 1 1 1 1 1
0 0 0 0 0 0
−1 −1 −1 −1 −1 −1
−2 −2 −2 −2 −2 −2
−3 −3 −3 −3 −3 −3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
(a) PCA (b) LDA (c) LPP (d) MFA (e) APLDP1 (f) APLDP2
Fig. 1. Embedding results in 2 − D space of PCA, LDA, LPP, MFA and APLDP1,2(with and
without AP)
The PIE-CMU face data set consists of 68 subjects with 41, 368 face images [13]. In
this experiment, we select 40 subjects with 120 face images for each from CMU data
set, 60 images for training, and the other 60 images for testing. Before the experiment,
faces in the images are detected by the face detection system described in [17][18]. The
detected faces are converted to gray scale images and resized to 32 × 32. Some samples
are shown in Fig.2. Totally, there are 2, 400 images in the training set and the testing
set, respectively.
It should be mentioned that we take PCA as preprocessing step for LDP. The number
of principal components is a free parameter to choose. As pointed out in [19,20], the
dimension of principal subspaces significantly affects the performance of recognition
tasks. Besides, they confirmed that the optimal number lies in the interval [50, 200].
Fig. 3. From the top row to the bottom row, the face-like images are Eigenfaces, Fisherfaces,
Laplacianfaces, MFAfaces and APLDPfaces, respectively
Based on their work, we find the best dimension of PCA is 182. Therefore, we take 182
as the number of principal components in the following experiments.
For the sake of visualization, we illustrate algorithmic-faces derived from different
algorithms, such as Eigenfaces from PCA, Fisherfaces from LDA and Laplacianfaces
from LPP, in Fig.3. The special face-like images derived from MFA and APLDP can be
called MFAfaces and ELDPfaces, respectively.
The average results with corresponding reduced dimensions are obtained over 50
random splits. The classification is also based on k-nearest neighbor classifier. The ex-
perimental results are shown in Table2. In the experiment, the parameters of APLDP
algorithm are set as k = 3, and the time variable t = 10.
5 Conclusions
In this paper, based on affinity propagation and LDA algorithms, we propose a new
method, exemplar based Laplacian Discriminant Projection(APLDP) for supervised di-
mensionality reduction. Using similarity weighted discriminant criterions, we define the
exemplar based within-class Laplacian matrix and between-class Laplacian matrix. In
comparison with the traditional LDA, APLDP focuses more on the enhancement of the
discriminability while keeping local structures. Therefore, APLDP has the flexibility of
finding optimal discriminant subspaces.
Acknowledgement
The authors confirm that the research was supported by National Science Foundation
of China (No.60805001).
References
1. Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., Lin, S.: Graph Embedding and Extension: A
General Framework for Dimensionality Reduction. IEEE Transactions on Pattern Analysis
and Machine Intelligence 29(1), 40–51 (2007)
2. Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1),
71–86 (1991)
3. He, X., Yan, S., Hu, Y.X., Niyogi, P., Zhang, H.: Face recognition using Laplacianfaces.
IEEE Transactions on Pattern Analysis and Machine Intelligence 27(3), 328–340 (2005)
4. Belkin, M., Niyogi, P.: Laplacian Eigenmaps for dimensionality reduction and data represen-
tation. Neural Computation 15, 1373–1396 (2003)
5. Zhao, D., Lin, Z., Xiao, R., Tang, X.: Linear Laplacian Discrimination for Feature Extraction.
In: CVPR (2007)
6. Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press,
Boston (1990)
7. Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. fisherfaces: Recognition
using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine
Intelligence 19(7), 711–720 (1997)
8. Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neigh-
bor classification. In: NIPS, pp. 1475–1482 (2006)
Laplacian Discriminant Projection Based on Affinity Propagation 321
9. Nie, F., Xiang, S., Zhang, C.: Neighborhood MinMax Projections. In: IJCAI, pp. 993–998
(2007)
10. Howland, P., Park, H.: Generalizing discriminant analysis using the generalized singular
value decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(8),
995–1006 (2004)
11. Liu, C.: Capitalize on dimensionality increasing techniques for improving face recognition
grand challenge performance. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence 28(5), 725–737 (2007)
12. Martinez, A., Zhu, M.: Where are linear feature extraction methods applicable. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 27(12), 1934–1944 (2006)
13. Sim, T., Baker, S., Bsat, M.: The CMU Pose, illumination, and expression (PIE) database.
In: IEEE International Conference of Automatic Face and Gesture Recognition (2002)
14. Wang, X., Tang, X.: Dual-space linear discriminant analysis for face recognition. In: CVPR,
pp. 564–569 (2004)
15. Yan, S., Xu, D., Zhang, B., Zhang, H.: Graph embedding: A general framework for dimen-
sionality reduction. In: CVPR (2005)
16. Yang, J., Frangi, A., Yang, J., Zhang, D., Jin, Z.: KPCA plus LDA: a complete kernel Fisher
discriminant framework for feature extraction and recognition. IEEE Transactions on Pattern
Analysis and Machine Intelligence 27(2), 230–244 (2005)
17. Zheng, Z.L., Yang, J., Zhu, Y.: Face detection and recognition using colour sequential im-
ages. Journal of Research and Practice in Information Technology 38(2), 135–149 (2006)
18. Zheng, Z.L., Yang, J.: Supervised Locality Pursuit Embedding for Pattern Classification.
Image and Vision Computing 24, 819–826 (2006)
19. Wang, X., Tang, X.: A unified framework for subspace face recognition. IEEE Transactions
on Pattern Analysis and Machine Intelligence 26(9), 1222–1228 (2004)
20. Wang, X., Tang, X.: Random sampling for subspace face recognition. International Journal
of Computer Vision 70(1), 91–104 (2006)
21. Frey, B.J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science 315,
972–994 (2007)
An Improved Fast ICA Algorithm for IR Objects
Recognition
Abstract. In this paper, an improved algorithm based on fast ICA and optimum
selection for IR objects recognition is proposed. Directed against the problem
that the Newton iteration is rather sensitive to the selection of initial value, this
paper presents a one dimension search to improve its optimum learning algo-
rithm in order to make the convergence of the results independent of the selec-
tion of the initial value. Meanwhile, we design a novel rule for the distance
function to retain the features of the independent component having major
contribution to object recognition. It overcomes the problem of declining of
recognition rate and robustness associated with the increasing of training image
samples. Compared with traditional methods the proposed algorithm can reach
a higher recognition rate with fewer IR objects features and is more robust in
different kinds of classes.
1 Introduction
Objects recognition is one of the key techniques in the field of pattern recognition.
The key of objects recognition is feature extraction and dimension reduction. Directed
against the lower recognition rate of IR image due to the characteristics of IR image
itself, it is necessary to look for a new method for image information processing and
feature extraction which can remove redundant information in the object image data
and presents object image using feature vector with invariant features. Several charac-
teristic subspace methods have been developed in the field of objects recognition,
such as Principal Component Analysis (PCA) characteristic subspace algorithm[1][2],
Independent Component Analysis (ICA) characteristic subspace algorithm[3][4] and
so on. ICA is different from the PCA in that it does not focus on signals’ second-order
statistical correlation. Instead it is based on the higher-order statistics of signals, i.e., it
studies the independent relationship between signals. Therefore ICA method can
reveal the essential structure between image data. Hyvarinen A. [5] [6] etc proposed a
fast independent component analysis (Fast ICA) algorithm which is a fast iterative
optimization algorithm with fast convergence rate and exempts the need to determine
learning step. In recent years, fast ICA has obtained universal attention and has been
widely used in feature extraction [7], blind source separation [8], speech signal proc-
essing [9], target detection [10] and face recognition [11] [12], etc.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 322–329, 2009.
© Springer-Verlag Berlin Heidelberg 2009
An Improved Fast ICA Algorithm for IR Objects Recognition 323
Directed against the problem that the Newton iteration in the fast ICA algorithm is
rather sensitive to the selection of initial value and the problem of declining in recogni-
tion rate and robustness with the increasing of training image samples, this paper pre-
sents an improved fast ICA and features optimization algorithm which can be applied
to IR multi-object recognition. The paper is organized as follows. Section 2 introduces
ICA theory and fast fixed-point algorithm for ICA. Section 3 IR describes multi-object
classification recognition based on improved fast ICA and a feature optimization algo-
rithm and gives the rule for feature optimization. Section 4 gives experimental results
produced by using improved fast ICA algorithm and conducts performance evaluation.
Section 5 is conclusion.
2 Related Works
where i = 1, 2, L , n. j = 1, 2, L , m .
The linear model of ICA can be expressed as
m
X = AS = ∑ a j s j (2)
j =1
J ( x) = H ( xgauss ) − H ( x) (4)
where xgauss is a random variable having Gaussian distribution and the same variance
with x.
From (4), we can see that the value of negative entropy is always non-negative, and
only when the random variable x is of Gaussian distribution, the negative entropy is
zero. The stronger the non-Gaussian random of variable x, the bigger J ( x ) is. The
more effective approximate negative entropy formula is
where G (•) are some non-quadratic functions, and v is a random variable with stan-
dard normal distribution.
By finding the projection direction wi which makes J ( wi ) maximum, an inde-
pendent component can be extracted. If J ( wi ) takes the maximum value,
E{G ( wiT X )} will also have the maximum value, and the extreme point wi of
E{G ( wiT X )} is the solution of equation
E{ Xg ( wiT X )} = 0 (6)
E{ Xg ( wiT X )} E{ Xg ( wiT X )}
wi+ = wi − ≈ wi − (7)
E{ X T Xg '( wiT X )} E{g '( wiT X )}
where E{ X T Xg '( wiT X )} ≈ E{ X T X }E{g '( wiT X )} = E{g '( wiT X )} . Multiply the two sides
of (7) with − E { g '( w iT X ) } , and let w i∗ = − E { g '( w iT X ) } w i+ , we
have
T
By conducting iteration according to (8), the wi obtained through convergence corre-
sponds to the row vectors of the separation matrix W, and then an independent com-
ponent si can be extracted.
An Improved Fast ICA Algorithm for IR Objects Recognition 325
Fast ICA algorithm utilized the principle of Newton iteration. However, since the
direction of Newton iteration is not necessarily in the descent direction, when the
initial value is far from the minimum point, a situation may arise that the object func-
tion obtained by iteration cannot reach the best point or even not convergent. Targeted
at this problem, this paper proposes a strategy that adds a one-dimensional search
along the direction of Newton iteration, and designs the new iteration formula
⎧ x ( k +1) = x ( k ) + λk d ( k )
⎪⎪ ( k ) ( k ) −1
⎨d = −∇ f ( x ) ∇f ( x )
2 (k )
(9)
⎪ f ( x ( k ) + λ d ( k ) ) = min f ( x ( k ) + λ d ( k ) )
⎪⎩ k
λ
where λ is the damping factor, ∇ is the first order partial derivative of the function,
∇ 2 is the second order partial derivative of the function. The optimum solution of the
function can be obtained by iteration.
In the case of wi = 1 , it is transformed into an unconstrained extreme value
problem, and the new cost function is
E{ Xg ( wiT X )} − cwi
wi+ = wi − λi (11)
E{g '( wiT X )} − cI
where λi is the one-dimensional search factor, which makes the cost function enter
the convergence region of Newton iteration beginning with a certain wi in a certain
range, and ensures the convergence of the algorithm. When λi = 1 , it becomes the
original fast ICA algorithm.
326 J. Liu and H.B. Ji
The optimal selection of features is a key issue that affects the correctness of object
classification and recognition. In feature subspace, the concentrated the feature vectors
of the same class and the more far apart the feature vectors of different classes, the
easier the classification and recognition. Therefore, when extracting features from IR
object images, it is required to extract features that are far different between different
classes of objects and are almost the same for one class of objects. With the increasing
of the number of observation images, the number of independent components and the
corresponding features are also increasing. Among these added features, some will
have less contribution to classification recognition due to similarity to existing fea-
tures; others may even reduce recognition rate due to feature distortion. In order to
ensure correctness and efficiency of classification and recognition, we select a mini-
mum number of features from the original features that are most effective in classifica-
tion. For this purpose, we propose a new feature selection criteria based on intra-class
and inter-class distance function.
Suppose observation image matrix F is composed of M classes and there
are N i (i = 1, 2,L , M ) observations for each class. Column vector A j of mixed
matrix A corresponds to independent component s j , and aij is the jth feature of the
ith observation. U j is defined as the mean value of intra-class distance of the jth
feature in M classes. This can be expressed as
⎧ 1 M Ni Ni
⎪U j = ∑∑∑ aq + s , j − aqc + p, j
MN ( N − 1) c =1 s =1 p =1 c
⎪
⎨ p≠s (13)
⎪ c −1
⎪ qc = ∑ N i , where N 0 = 0; i = 1, 2, L, M
⎩ i =0
Similarly, V j is called the mean value of inter-class distance of the jth feature in M
classes, and can be expressed as
⎧ 1 M M
1 Nl 1 Nk
⎪ j
V = ∑∑ ∑ q + s, j N ∑a
M ( M − 1) l =1 k =1 Nl s =1 l
− aqk + p , j
⎪ k p =1
⎨ k ≠l (14)
⎪ l −1 k −1
Uj
βj = (15)
Vj
An Improved Fast ICA Algorithm for IR Objects Recognition 327
which reflects the difficulty degree of classification of M classes by the jth feature.
The smaller β j is, the more sensitive the jth feature is, and the more easy and effec-
tive the classification of the M classes.
We also have the relationship in practical situations
0 < U j < Vj (16)
In accordance with the definition of effective classification factor, we can further
obtain
0 < β j <1 (17)
The experiment verifies the feasibility and effectiveness of the algorithm through
IR database built by ourselves. We use PCA feature extraction algorithm (Traditional
PCA) in [2], Fast ICA feature extraction algorithm (Traditional FastICA) in [6] and
our proposed object recognition algorithm based on improved fast ICA and feature
328 J. Liu and H.B. Ji
(a) Classification rate in different dimension (b) Classification rate in different classes
5 Conclusions
In this paper, we develop an IR object recognition technique employing an improved
fast ICA and feature optimization algorithm. Targeted at the problem that the Newton
An Improved Fast ICA Algorithm for IR Objects Recognition 329
iteration in the fast ICA algorithm is sensitive to the selection of initial value, one
dimension search is imposed on the direction of Newton iterative in order to ensure
the convergence of the results and the robustness to initialization. Meanwhile, a novel
rule based on distance function is designed to select the optimal features favorable for
recognition according to the characteristics of infrared image. It overcomes the prob-
lem of declining in recognition rate and robustness along with the incensement of the
number of training image samples. Experimental comparison on IR object database
under similar conditions shows that our proposed method can obtain highest recogni-
tion rate. Theoretical derivation and experimental results show that the improved fast
ICA features optimization algorithm can improve recognition accuracy and effi-
ciency. In the future, we aim to further improve the recognition rate in combination
with other feature information.
References
1. Oja, E.: Neural Networks, Principle Components and Subspaces. International Journal of
Neural Systems 1(1), 61–68 (1989)
2. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. American Statistical Association,
American (2003)
3. Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York
(2001)
4. Karhunen, J., Hyvarinen, A., Vigario, R., et al.: Applications of Neural Blind Separation to
Signal and Image Processing. In: IEEE International Conference on ICASSP, vol. 1(1), pp.
131–134 (1997)
5. Hyvarinen, A.: Blind Source Separation by Nonstationarity of Variance: a Cumulant-based
Approach. IEEE Transactions on Neural Networks 12(6), 1471–1474 (2001)
6. Hyvarinen, A.: Fast and Robust Fixed-point Algorithms for Independent Component
Analysis. IEEE Transactions on Neural Networks 10(3), 626–634 (1999)
7. Jouan, A.: FastICA (MNF) for Feature Generation in Hyperspectral Imagery. In: 10th In-
ternational Conference on Information Fusion, pp. 1–8 (2007)
8. Shyu, K.-K., Lee, M.-H., Wu, Y.-T., et al.: Implementation of Pipelined FastICA on FPGA
for Real-Time Blind Source Separation. IEEE Transactions on Neural Networks 19(6),
958–970 (2008)
9. Shen, H., Huper, K.: Generalised Fastica for Independent Subspace Analysis. In: IEEE In-
ternational Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 1409–1412
(2007)
10. Ming-xiang, W., Yu-long, M.: A New Method for The Detection of Moving Target Based
on Fast Independent Component Analysis. Computer Engineering 30(3), 58–60 (2004)
11. Mu-chun, Z.: Face Recognition Based on FastICA and RBF Neural Networks. In: 2008 In-
ternational Symposium on Information Science and Engineering, vol. 1, pp. 588–592
(2008)
12. Li, M., Wu, F., Liu, X.: Face Recognition Based on WT, FastICA and RBF Neural Net-
work. In: Third International Conference on Natural Computation, vol. 2, pp. 3–7 (2007)
Facial Feature Extraction Based on Wavelet Transform
1 Introduction
In recent years, computer vision has rain-storm development. Computer vision con-
cerns with developing systems that can interpret the content of natural scenes [6, 7].
Like the aims of all applications, to make computers more user-friendly and to in-
crease computer ability to identify learn and gain knowledge, computer vision sys-
tems begin with the simple process of detecting objects and locating objects, continue
with the higher process of detecting and locating features to gain more information,
and then proceed to the most advanced process of extracting meaningful information,
to apply to intelligent systems.
Face recognition and face expression are one part of computer vision. To recognize
one person, we must have three main processes. One of the main processes is facial
feature extraction. In a computer vision society, a feature is defined as a function of
one or more measurements, the values of some quantifiable property of an object,
computed so that it quantifies some significant characteristics of the object [6,7]. One
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 330–339, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Facial Feature Extraction Based on Wavelet Transform 331
of the biggest advantages of feature extraction lies in that it significantly reduces the
information (compared to the original image) to represent an image for understanding
the content of the image [12]. Basically, the extraction of facial feature points (eyes,
mouth, nose, chin, inner boundary) plays an important role in various applications
such as face detection, face recognition, model based image coding, and expression
recognition, and has become a popular area of research due to emerging applications
in human-computer interface, surveillance systems, secure access control, video con-
ferencing, financial transaction, forensic applications, pedestrian detection, driver
alertness monitoring systems, image database management system and so on.
In space problem of facial feature extraction, we have some obstacles given as fol-
low: input images are so blur and have a lot of noise, face sizes and orientations.
Sometimes facial features may be covered by other things, such as a hat, a pair of
glasses, a hand, etc.
Many approaches to facial feature extraction have been reported in literature over
the past few decades, ranging from the geometrical description of salient facial fea-
tures to the expansion of digitized images of the face on appropriate basis of images
[8]. Different techniques have been introduced recently, for example, principal com-
ponent analysis [1], geometric modeling [2], auto-correlation [13], deformable tem-
plate [14], neural networks [15], elastic bunch graph matching [16], color analysis
[17] and so on. Lam and Yan [3] have used snake model for detecting face boundary.
Although the snake provides good results in boundary detection, the main problem is
to find the initial position [4].
We can divide these approaches to four main parts: geometry-based, template-
based, color segmentation, appearance-based approach. Generally geometry-based
approaches extract features using geometric information such as relative positions
and sizes of the face components. Template-based approaches match facial compo-
nents to previously designed templates using appropriate energy functional. The
best match of a template in the facial image will yield the minimum energy. Color
segmentation techniques make use of skin color to isolate the face. Any non-skin
color region within the face is viewed as a candidate for eyes and mouth. The per-
formance of such techniques on facial image databases is rather limited, due to the
diversity of ethnical backgrounds [18]. In appearance-based approaches, the con-
cept of “feature” differs from simple facial features such as eyes and mouth. Any
extracted characteristic from the image is referred to as a feature [5]. Methods such
as principal component analysis (PCA), independent component analysis, and Ga-
bor wavelets [21] are used to extract the feature vector.
This paper explores facial feature extraction such as eyes, mouth, chin, and inner
boundary. Facial feature extraction is established by two approaches: (i) edge detec-
tion based on wavelet transform and (ii) geometry property. Based on wavelet prop-
erty, we can make the edge of an image bolder and reduce the noise from that image.
Appling geometry property of human face, eyes, and mouth, we give some thresholds.
Experimental results indicate that the algorithm can be applied to image having
noises, human having beard and human expression.
332 N.V. Hung
2.1 Wavelets
In Wavelets, we have
f ( t ) = ∑∑ aj ,kψ j ,k ( t ) . (6)
k l
where
( )
j
ψ j ,k (t ) = 2 2
ψ 2 j t − k , k, j ∈ Z. (7)
where
c ( k ) = c0 ( k ) = g ( t ) ,ϕk ( t ) = ∫ g ( t )ϕk ( t ) dt . (9)
d ( j , k) = d j ( k) = g( t ) ,ψ j ,k ( t ) = ∫ g( t )ψ j ,k ( t ) dt. (10)
( ) ( )
j j
ψ j ,k ( t ) = 22ψ 2 j t − k , ϕ j ,k ( t ) = 22 ϕ 2 j t − k . (11)
The edge of an image is a boundary which the brightness of image changes abruptly.
In image processing, an edge is often interpreted as a class of singularities. In a func-
tion, singularities can be presented easily as discontinuities where the gradient ap-
proaches infinity. However, image data is discrete, so edges in an image are often
defined as the local maxima of the gradient [10].
Edge detection is a technique related to gradient operators. Because an edge is
characterized by having a gradient of large magnitude, edge detectors are approxima-
tions of gradient operators. Noise influences the accuracy of the computation of gra-
dients. Usually an edge detector is a combination of a smooth filter and a gradient
operator. First, an image is smoothed by the smooth filter and then its gradient is
computed by the gradient operator.
334 N.V. Hung
∇f = f 2x + f 2y . (12)
2
In oder to simplify computation in image processing, we often use the 1-norm instead.
∇f 2
= fx + fy . (13)
( f (i − 1, j ) − f (i + 1, j ) ) + ( f (i , j − 1) − f (i , j + 1) )
2 2
∇f = . (14)
2
and
∇f 1
=
1
2
(
f (i − 1, j ) − f (i + 1, j ) + f (i , j − 1) − f (i , j + 1) . ) (15)
max ∇f ( x, y) = M. (16)
( x, y)∈Ω
D = RIGHT_EYES.X – LEFT_EYE.X
Build matrix arWeight[] whose height is InputMatrix.width
Assign arWeight[i] = sum{InputMatrix(j,i), j = Left_eyes.y .. Left_eyes.y + D}.
Determine Average = sum(arWeigh[i],where i < Left_eyes.X or i >
Right_eye.x) / (intW-D)
Do normalized noise to matrix arWeight.
Find the first left point Find the first right point
cLeft is abciss of the first left point cRight is abciss of the first right point
Assign cLeft = i, the first i satisfied arWeight(i) > Assign cRight = i, the first i satisfied arWeight(i) >
Average where i = Left_eye.x –D/4 downto 0 Average where i = Right_eye.x + D/4 to intW.
d = D/(intCountDetect-1)
Buid two matrixs arLeft and arRight to save left inner poins and right inner points
Assign start = Left_eye.y + d*i;end = left_eye.y + d*(i+1)
for i = 0 to intCountDetect – 1
Apply SelectBorderSegment (matrix arW, start to end, cLeft, cRight)
This method will return the next coordinate left and right of inner boundary.
Do normalized data arLeft[] and arRight[] again.
4 Experimental Results
The facial feature extraction method was implemented in C#, Visual Studio 2007,
Windows XP and Dot.net framework 2.0.
To evaluate the accuracy in finding these facial features, we compare facial fea-
tures obtained by hand and facial feature gained by these algorithms. The mean of
1 i =n
error of point to point is calculate by this formula me = ∑ di
ns i =1
where di is an error of point to point, s is the distance between two eyes and n is the
number of features.
The accurate scale of eyes is 93.54% with me2 = 0.0534 in 124 Yale images (front
human face) and 91.17% with me2 = 0.043 in 34 student’s images.
The accurate scale of mouth is 91.93% with me1 = 0.068 in 124 Yale images and
88.23% with me1 = 0.054 in 34 student’s images.
The accurate scale of chin is 91.93% with me1 = 0. 0017 in 124 Yale images and
88.23% with me1 = 0.0034 in 34 student’s images.
The rate between the inner boundary we obtain by these algorithms and by hand is
90.32 in 124 Yale images and 85.29% in 34 student’s images.
338 N.V. Hung
References
1. Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Cognitive Neurosci-
ence 3(1), 71–86 (1991)
2. Ho, Y.Y., Ling, H.: Facial Modeling from an Uncalibrated Face Image Using a Coarse-to-
Fine Genetic Algorithm. Pattern Recognition 34(8), 1015–1031 (2001)
Facial Feature Extraction Based on Wavelet Transform 339
3. Lam, M.K., Hong, Y.: Locating and Extracting the Eye in Human Face Images. Pattern
Recognition 29(5), 771–779 (1996)
4. Bhuiyan, A., Ampornaramveth, V., Muto, S., Ueno, H.: Face Detection and Facial Feature
Localization for Human-Machine Interface. NII Journal 5 (2003)
5. Bagherian, E., Rahmat, R., Udzir, N.: Extract of Facial Feature Point. JCSNS 9(1) (Janu-
ary 2009)
6. Castleman, R.: Digital Image Processing. Prentice-Hall, Englewood Cliffs (1996)
7. Van Vliet, J.: Grey-Scale Measurements in Multi-Dimensional Digitized Images. Doctor
Thesis. Delft University Press (1993)
8. Hjelmas, E., Low, K.B.: Face detection: A Survey. Computer Vision and Image Under-
standing 83(3), 236–274 (2001)
9. Tang, Y., Yang, L., Liu, J., Ma, H.: Wavelet Theory and Its Application to Pattern Recog-
nition. World Science Publishing Co. (2000) ISBN 981-02-3819-3
10. Li, J.: A Wavelet Approach to Edge Detection. Master Thesis of Science in the Subject of
Mathematics Sam Houston State University, Huntsville, Texas (August 2003)
11. Nixon, S., Aguado, S.: Feature Extraction and Image Processing (2002) ISBN 0750650788
12. Lei, B., Hendriks, A., Reinders, M.: On Feature Extraction from Images. Technical Report
on Inventory Properties for MCCWS
13. Goudail, F., Lange, E., Iwamoto, T., Kyuma, K., Otsu, N.: Face Recognition System Using
Local Autocorrelations and Multiscale Integration. IEEE Trans. Pattern Anal. Machine In-
tell. 18(10), 1024–1028 (1996)
14. Yuille, A., Cohen, D., Hallinan, P.: Feature Extraction from Faces Using Deformable
Templates. In: Proc. IEEE Computer Soc. Conf. on computer Vision and Pattern Recogni-
tion, pp. 104–109 (1989)
15. Rowley, H., Beluga, S., Kanade, T.: Neural Network-Based Face Detection. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 20(1), 23–37 (1998)
16. Wiskott, L., Fellous, J.-M., Kruger, N., Von der Malsburg, C.: Face Recognition by Elastic
Bunch Graph Matching. IEEE Trans. Pattern Anal. & Machine Intelligence 19(7),
775–779 (1997)
17. Bhuiyan, M., Ampornaramveth, V., Muto, S., Ueno, H.: Face Detection and Facial Feature
Extraction. In: Int. Conf. on Computer and Information Technology, pp. 270–274 (2001)
18. Chang, T., Huang, T., Novak, C.: Facial Feature Extraction from Colour Images. In: Pro-
ceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2,
pp. 39–43 (1994)
19. Mallat, M., Zhong, S.: Characterization of Signals from Multiscale Edges. IEEE Trans.
Pattern Anal. Machine Intell. 14(7), 710–732 (1992)
20. Mallat, S., Hwang, W.L.: Singularity Detection and Processing with Wavelets. IEEE
Trans. Inform. Theory 38(2), 617–643 (1992)
21. Tian, Y., Kanade, T., Cohn, J.: Evaluation of Gabor Wavelet-Based Facial Action Unit
Recognition in Image Sequences of Increasing Complexity. In: Proceedings of the Fifth
IEEE International Conference on Automatic Face and Gesture Recognition, pp. 218–223
(2002)
22. Meyer, Y.: Ondelettes, fonctions splines et analyses graduées. Rapport Ceremade 8703
(1987)
23. Meyer, Y.: Ondelettes et Opérateurs. Hermann, Paris (1987)
24. The Yale database, http://cvc.yale.edu/
Fast Iris Segmentation by Rotation Average Analysis
of Intensity-Inversed Image
1 Introduction
Iris recognition is known as one of the most reliable and accurate biometric tech-
niques in recent years [1-5]. Iris, suspending between the cornea and lens and perfo-
rated by the pupil, is a round contractile membrane of the eye (A typical iris image
and related medical terminology are shown in Figure 1a). As a valuable candidate
which could be used in highly reliable identification systems, the iris owns many
advantages [4]: The patterns of iris are rather stable and not easy to change through-
out one’s whole life; Iris patterns are quite complex and can contain many distinctive
features; An image of iris is much easier to be captured, it can be collected within a
distance as far as one meter, and the process would be much less invasive. With all
these advantages, Iris recognition is more and more widely used in modern personnel
identification system.
The process of iris recognition mainly consists of six steps: image acquisition,
segmentation, normalization, encoding, matching, and evaluation.
∗
Correponding Author.
∗∗
Both authors equally contributed to this work.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 340–349, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Fast Iris Segmentation by Rotation Average Analysis 341
Iris segmentation is the first substantial step of image processing to subtract the ef-
fective texture region of iris from the acquired images. The latter steps, such as iris
feature encoding and evaluation, are directly based on the result of iris segmentation.
The main task of iris segmentation is to locate the inner and outer boundaries of
iris, which is so called iris location. Dozens of papers (e.g. [6-11]) can be found in
this active research field. Most of the cases, eyelids/eyelashes may strongly affect the
final accuracy of iris recognition. For a complete segmentation, eyelids and eyelashes
also need to be detected and removed. Besides the problem of iris location, eyelids
and eyelashes detection are also major challenges for effective iris segmentation.
Eyelid detection is a challenging task due to the irregular shape of eyelids and eye-
lashes. Normally, much computational cost is needed and higher chances to be failed
in eyelid detection. The corresponding research was addressed in many works (e.g.
[12-15]), however it is less discussed and resolved compared to the research of iris
location.
In the remainder of this paper, we will introduce the new method to do iris segmen-
tation by rotation average analysis of intensity-inversed image in a detail. For iris
location, in fact, circular edge detection, we first locate the inner boundary (as a cir-
cle) and then fitting the outer boundary (as a circle) by least-square non-linear circular
regression. A simplified mathematical model of arc for detecting the eyelids is created
to speed up the conventional process of eyelid detection. Other conventional methods
are compared and discussed.
2 Segmentation Methods
There are two steps to complete the iris segmentation. First, locating the circular
boundaries of the iris, which includes fitting the inner boundary of the iris (the pupil-
lary boundary) and fitting the outer boundary of the iris (the limbus boundary). The
fitted two circles are not necessarily to be concentric. According to the fact that the
distance between the centers of them is rather small, it will be much easier to locate
one after the other.
Normally the iris images in use are near infrared (NIR) images which contain
abundant texture features in dark and brown color iris. In the NIR iris images, the
sclera is not white, high contrast level can be found at the boundary between pupil
and iris (inner boundary), while on the other hand the contrast at the boundary be-
tween iris and sclera (limbus, outer boundary) is much less (figure 1a). So, we started
the whole process from locating the pupil.
Fig. 1. (a) A NIR Iris image and related medical terminology. (b) Locate the pupil in the inten-
sity-inversed iris image. Calculate the center coordinates by averaging Hx and Vx, as well as
Hy and Vy. Scan the image vertically to get the most consecutive high intensity pixels on thick
red line (Vx). Do the same horizontally showing as the thick blue line (Hy).
defined threshold, which is normally close to the global maximum of intensity, e.g.
85% of the maximum.
First, scan the image vertically, for each fixed x, scan along y axis. The x-
coordinate - for which the most consecutive high intensity pixels in the y-direction are
found - is stored (thick red line, Vx). This step is repeated for fixed y, and scan along
x axis, and a y-coordinate is stored (thick blue line, Hy).
It is not necessarily true that (Vx,Hy) is the correct center. Therefore the second
step is to calculate the middle point of the thick red and blue line. For example, if the
first high-intensity pixel on thick red line is found at (Vx,100) and the last high-
intensity pixel at (Vx,180), then the middle point (Vx, Vy) is (Vx, 140).
In a final step, Vx is averaged with Hx (thin blue line), and Hy is averaged with Vy
(thin red line),which result in the mediated center coordinate. We call it pseudo cen-
ter, because it may be affected by reflections and noise or irregular edge of the pupil.
It is necessary to finely locate the center in the following process. The average length
of the consecutive high intensity pixels is the diameter of the pupil.
1 m, n
M= ∑ I (i, j ) ∗ Χ
N 1, 2
(1)
Where, M is the coordinate of centroid, N is the number of total pixels, I is the inten-
sity of each pixel, Χ is the coordinate of each pixel. The area of summation can be a
limited region near the estimation.
Normally the pseudo center of the pupil slightly runs off the real center, and this
refinement may finely calibrate it in several pixels. Figure 2a shows the inversed
image, centered with the detected coordinate of the center of the pupil.
Fast Iris Segmentation by Rotation Average Analysis 343
Fig. 2. (a) The intensity-inversed iris image is centered in a square panel. (b) The rotation aver-
age image of (a).
The pupil is thus quickly located. The time-consuming procedure of ‘integro- dif-
ferential’ of intensities [2] and Canny edge detection [16] are not needed at all.
After an accurate center of the pupil and its radius are determined, we can next
center the iris image in a larger-size square panel so that a rotation average image and
curve can be calculated. Figure 2b shows the rotation average image according to the
refined center, and figure 3a shows the rotation average curve.
Different averaging method can be used to calculate the rotation average curve:
A) Mean average
It is the most common way to calculate the mean average of all the pixels in a cir-
cular shell. It runs rather quickly, and most of the time, the result is stable.
B) Median average
Median is the number in the middle, when all numbers are listed in order. The sta-
tistic calculation of median average makes it more stable and less sensitive to singular
points. Disturbance from the reflection spots and the spike noise can be reduced.
Mean and median are both types of averages, although mean is the most common
type of average and usually refers to the arithmetic mean. Here, we choose the median
average in calculating the rotation average curve.
From the rotation average curve (figure 3a), we clearly see an oscillation section
near the middle-peak, which corresponds to the iris region containing abundant tex-
ture information. The curve is smooth and continuous, so that it is easy to find the
middle-valley and middle-peak on the curve. The mean-value of the middle-peak and
maximum intensity on the curve just corresponds with the papillary boundary, while
the mean-value of the middle-peak and minimum intensity can be used to roughly
segment the limbus boundary.
This limbus boundary calculated by the rotation average analysis is a rough value,
assuming the pupil and limbus share the same center. Since the limbus boundary and
pupillary boundary are not necessarily concentric, the pupillary boundary center we
have found can not be used as the center of limbus boundary directly (though it’s very
close to the truth). A refinement step of fitting the outer boundary is necessary.
344 W. Li and L.-H. Jiang
Fig. 3. (a) Rotation average curve and analysis of the intensity-inversed image. (b) Intermediate
result of iris segmentation.
( x − xc ) 2 + ( y − yc ) 2 = r 2 (2)
The threshold used for roughly segmenting the limbus boundary can be utilized to
binarize the iris image and generate a simple edge map for this circle fitting (figure 4a).
This process is much faster than classical Canny edge detection.
Direct mathematical calculation of circular regression requires a good initial guess
and less massive noise (e.g. eyelashes). Since we already had a very close estimation
of the outer boundary, it will be much easier to accurately locate the limbus boundary
by limiting the circle fitting in a certain range of estimation (for instance, +-10%).
The circle fitting is based on orthogonal distance regression. Most of the known fit-
ting routines rely on the Levenberg-Marquardt optimization routine. The Levenberg-
Marquardt algorithm requires an initial guess as well as the first derivatives of the
distance function. In practice it converges quickly and accurately even with a wide
range of initial guesses. For two-dimensional circle fitting, we give the distance, ob-
jective and derivatives functions:
Distance equation:
(3)
d ( x, y ) = ( x − xc ) 2 + ( y − yc ) 2 − r
Objective function:
(4)
E ( xc , yc , r ) = ∑ ( ( x − xc ) 2 + ( y − yc ) 2 − r )
2
Fast Iris Segmentation by Rotation Average Analysis 345
Derivatives:
∂d ∂d ∂d
= −( x − xc ) /(d + r ) ; = −( y − yc ) /(d + r ) ; = −1 (5)
∂xc ∂yc ∂r
Good circular regression algorithms can readily be found to minimize the objective
function E. Usually such an algorithm requires an initial guess, along with partial
derivatives, either of E itself or of the distance function d. Of course, one can imple-
ment a least-squares algorithm which uses a different optimization algorithm.
We recommend and use Taubin’s method [17] for circle fitting (see figure 4 for the
result). It is a robust and accurate circle fitting method, which works well even if data
points are observed only within a small arc. It is more stable than the other simple
circle fitting methods, e.g. by Kasa [18].
Fig. 4. (a)Simple binarized edge map for circle fitting (b) Circular regression and result of
fitting the outer boundary
To be compared with, the known Hough transform algorithm [19] needs to calcu-
late a Canny edge map first, and then use edge points to vote for particular model.
The idea of both the Hough transform and non-linear circular regression is to find the
contours which can fit in a circle with its center as (xc, yc) and the radius as r. Hough
transform uses “voting” mechanism, while circular regression uses directly mathe-
matical calculation.
Another method is to use integro-differential method to search for the circle of
outer boundary, due to the limited searching range, this method is also fast. Here we
do not intend to go into the details.
The integro-differential approach can be adapted to detect the eyelid, when the contour
model of circle is replaced by an arc model. This arc edge detector can be described as:
I ( x, y )
max Gσ (r ) * ∫ arc ( d , a ,θ ) (6)
d , a ,θ L(arc)
346 W. Li and L.-H. Jiang
G is a Gaussian low-pass filter, I is the intensity of pixels on arc, L is the length of arc. d,
a, and θ are the three parameters of our new arc model, which will be introduced latter.
The contours of upper/lower eyelid can be regarded as a piece of parabolic arc ap-
proximately. Both Hough transformation and integro-differential method could be
applied in this task. The methods used in finding the upper and lower eyelid arcs are
in fact pattern fitting algorithms. In our implementation, we chose integro-differential
method to avoid the calculation of edge maps. A standard mathematical model of arc
(equation 7 & figure 5a) has four parameters for the j’th image: the apex coordinate
(hj, kj), the curvature aj, and the rotation angle θj . To fit the data with the mathemati-
cal model of arc is a time-consuming searching problem in a 4D space.
(−( x − h j ) sinθ j + ( y − k j ) cosθ j )2 = a j (( x − h j ) cosθ j + ( y − k j ) sin θ j ) (7)
Fig. 5. (a) Conventional mathematical model of parabolic arc with 4 parameters. (b) Novel
model of arc with 3 parameters and central axis passing through the center of pupil.
Fast Iris Segmentation by Rotation Average Analysis 347
Fig. 6. Examples of iris segmentation results of the proposed method. (a)-(c) Iris segmentation
of images from CASIA iris image database. (d) Segmentation of self-obtained iris image. A
dummy arc of lower eyelid is drawn on the edge, if the eyelid and eyelash do not cover the iris.
Table 1. Typical time cost in each steps of segmentation for the new method presented
Table 2. Comparison of performance of iris boundaries location (not include eyelid detection)
refining the pupillary center with the centroid (the center of mass) of the pupil. If not
inverse the intensity, the calculated center of mass will not be stable, since the sur-
rounding area has higher intensity than the center area, the center of mass will easily
move away from the real center.
4 Conclusions
In this paper, we present a novel iris segmentation method. The effective iris is sub-
tracted in two steps: Iris boundaries location and eyelid detection. Intensity analysis of
rotation average of inversed image is applied to quickly locate the pupil and give a
close initial guess in circular regression of the outer boundary. The outer boundary is
refined by least-square non-linear circular regression. A new simplified mathematical
model of arc is created and implemented, which also works fast and efficiently com-
pared to the conventional methods. Our method provides a new idea to do iris seg-
mentation based on statistic analysis of rotation average.
In the experiment based on both self-acquired images and the public database
(CASIA iris image database), the new method used in the iris segmentation shows
fast and robust performance. Prospectively, there are still spaces to be improved in
our method: automatic eyelashes detection and removal.
Acknowledgments. This work was mainly done in the Catholic University of Leuven
(K.U.Leuven), Belgium. Thanks very much for the instruction of Prof. Dr. Dirk Van-
dermeulen in the Center for Processing Speech and Images (PSI) at the Department of
Electrical Engineering (ESAT). Thanks for Ying Zhang in the Terry Fox Laboratory,
University of British Columbia, Canada for acquisition of iris images.Parts of the
experimental data were supplied by the Institute of Automation, Chinese Academy of
Sciences (CASIA) [20].
References
1. Daugman, J.: New Methods in Iris Recognition. IEEE Trans. System, Man, and Cybernet-
ics–Part B: Cybernetics 37(5), 1167–1175 (2007)
2. Daugman, J.: How Iris Recognition Works. IEEE Trans. Circuits and Systems for Video
Technology 14(1), 21–30 (2004)
3. Daugman, J.: The Importance of being Random: Statistical Principles of Iris Recognition.
Pattern Recognition 36(2), 279–291 (2003)
4. Daugman, J.: Statistical Richness of Visual Phase Information: Update on Recognizing
Persons by Iris Patterns. Int. J. Computer Vision 45(1), 25–38 (2001)
5. Wildes, R.: Iris Recognition: An Emerging Biometric Technology. Proc. IEEE 85(9),
1348–1365 (1997)
6. Trucco, E., Razeto, M.: Robust iris location in close-up images of the eye. Patter Anal.
Applic. 8, 247–255 (2005)
7. Tang, R., Han, J., Zhang, X.: An Effective Iris Location Method with High Robustness.
Optica Applicata. 37(3), 295–303 (2007)
8. He, Z., Tan, T., Sun, Z.: Iris Localization via Pulling and Pushing. In: ICPR, vol. 4,
pp. 366–369 (2006)
Fast Iris Segmentation by Rotation Average Analysis 349
9. Yuan, W., Xu, L., Lin, Z.: An Accurate and Fast Iris Location Method Based on the Fea-
tures of Human Eyes. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3614,
pp. 306–315. Springer, Heidelberg (2005)
10. Sun, C., Zhou, C., Liang, Y., Liu, X.: Study and Improvement of Iris Location Algorithm.
In: Zhang, D., Jain, A.K. (eds.) ICB 2005. LNCS, vol. 3832, pp. 436–442. Springer,
Heidelberg (2005)
11. Lee, J.C., Huang, P.S., Chang, C.P., Tu, T.M.: Novel and Fast Approach for Iris Location.
IIHMSP 1, 139–142 (2007)
12. He, X.F., Shi, P.F.: An efficient iris segmentation method for recognition. In: Singh, S.,
Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3687, pp. 120–126.
Springer, Heidelberg (2005)
13. Arvacheh, E.M., Tizhoosh, H.R.: Iris Segmentation: Detecting Pupil, Limbus and Eyelids.
In: ICIP, pp. 2453–2456 (2006)
14. He, Z., Tan, T., Sun, Z., Qiu, X.: Towards Accurate and Fast Iris Segmentation for Iris
Biometrics. IEEE Trans. Pattern Analysis and Machine Intelligence 99(2008),
doi:10.1109/TPAMI.2008.183
15. Jang, Y.K., Kang, B.J., Park, K.R.: Study on eyelid localization considering image focus
for iris recognition. Pattern Recognition Letters 29(11), 1698–1704 (2008)
16. Canny, J.: A Computational Approach to Edge Detection. IEEE Trans. Pattern Analysis
and Machine Intelligence 8, 679–714 (1986)
17. Taubin, G.: Estimation Of Planar Curves, Surfaces And Nonplanar Space Curves Defined
By Implicit Equations, With Applications To Edge And Range Image Segmentation. IEEE
Trans. PAMI 13, 1115–1138 (1991)
18. Kasa, I.: A curve fitting procedure and its error analysis. IEEE Trans. Inst. Meas. 25, 8–14
(1976)
19. Duda, R.O., Hart, P.E.: Use of the Hough Transformation to Detect Lines and Curves in
Pictures. Comm. ACM 15, 11–15 (1972)
20. CASIA iris image database, http://www.sinobiometrics.com
A New Criterion for Global Asymptotic
Stability of Multi-delayed Neural Networks∗
Abstract. This Letter presents some new sufficient conditions for the unique-
ness and global asymptotic stability (GAS) of the equilibrium point for a class
of neural networks with multiple constant time delays . It is shown that the use
of a more general type of Lyapunov-Krasovskii functional enables us to estab-
lish global asymptotic stability of a class of delayed neural networks than those
considered in some previous papers. Our results generalize or improve the pre-
vious results given in the literature.
1 Introduction
In the last few years, stability of different classes of neural networks with time delays,
such as Hopfield neural networks, cellular neural networks, bidirectional associative
neural networks, has been extensively studied[1-7,10-12], particularly regarding their
stability analysis. Recently, LMI-based techniques have been successfully used to
tackle various stability problems for neural networks,mainly with a single delay(see,
for example[8,9,15]) and a few with multi-delays[13,14].In the present paper, moti-
vated by [14],by employing a more general Lyapunov functional and the LMI
approach , we establish some sufficient conditions for the uniqueness and global
asymptotic stability for neural networks with multiple delays . The conditions given in
the literature are generalized or improved .
Consider the following neural networks with multiple delays
m
u& (t ) = − Au (t ) + Wg (u (t )) + ∑ W j g (u (t − τ j )) + I , (1)
j =1
where u (t ) = [u1 (t ),L , u n (t )]T is the neuron state vector, A = diag (a1 ,L , a n ) is
a positive diagonal matrix, W , W j = (ω ik( j ) ) n×n , j = 1,2,L, m are the interconnection
∗
Project supported Partially by NNSF of China (No:10571032).
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 350–360, 2009.
© Springer-Verlag Berlin Heidelberg 2009
A New Criterion for Global Asymptotic Stability 351
g j (ξ1 ) − g j (ξ 2 )
(H ) 0≤ ≤σ j, j = 1,2,L , n ,
ξ1 − ξ 2
for each ξ1 , ξ 2 ∈ R, ξ1 ≠ ξ 2 , where σ j are positive constants. In the following, we
will shift the equilibrium point u ∗ = [u1∗ ,L , u n∗ ]T of system (2) to the origin. The
transformation x(⋅) = u (⋅) − u ∗ puts system (3) into the following form
m
x& (t ) = − Ax (t ) + Wf ( x (t )) + ∑ W j f ( x (t − τ j )), (2)
j =1
f j (ξ1 ) − f j (ξ 2 )
0≤ ≤σ j, j = 1,2,L , n (3)
ξ1 − ξ 2
for each ξ1 , ξ 2 ∈ R, ξ1 ≠ ξ 2 , where σ j are positive constants.
⎛S S12 ⎞
S = ⎜⎜ 11 ⎟>0
⎝ S 21 S 22 ⎟⎠
Theorem 1. The The origin of (2) is the unique equilibrium point and it is globally
asymptotically stable if there exists a positive diagonal matrix
P = diag ( pi > 0), and positive definite matrix Q j ( j = 1,2,L , m) such that
⎡ Ω
L − Wm ⎤ − W1
⎢− W T
L 0 ⎥⎥Q1
⎢ 1 > 0, (4)
⎢ M
L M ⎥ M
⎢ ⎥
⎣ − Wm
T
L Qm ⎦ 0
where Ω = 2 A ∑ P − WP − PW − P ∑ j =1 Q j P ( ∑ = diag (σ i > 0) ).
−1 T m
Proof. We will first prove the uniqueness of the equilibrium point. To this end, let us
consider the equilibrium equation of (2) as follows
m
Ax ∗ − Wf ( x ∗ ) − ∑ W j f ( x ∗ ) = 0, (5)
j =1
Where x ∗ is the equilibrium point. We have to prove x ∗ = 0 , for this purpose, mul-
T ∗ −1
tiplying both sides of (5) by 2 f ( x ) P , we obtain
m
2 f T ( x ∗ ) P −1 Ax ∗ − 2 f T ( x ∗ ) P −1Wf ( x ∗ ) − 2∑ f T ( x ∗ ) P −1W j f ( x ∗ ) = 0, (6)
j =1
Using (3) in (6) results in
f T ( x ∗ )(2 P −1 A ∑ −1 − P −1W − W T P −1 ) f ( x ∗ )
m
− 2∑ f T ( x ∗ ) P −1W j f ( x ∗ ) ≤ 0.
j =1
m
Adding and subtracting the term f T ( x ∗ )∑ Q j f ( x ∗ ) in the left side of the above
j =1
inequality yields
UBU T ≤ 0, (7)
where U = [ f T ( x ∗ ) P −1 , f T ( x ∗ ),L , f T ( x ∗ )] ,
⎡ Ω − W1 L − Wm ⎤
⎢− W T Q1 L 0 ⎥⎥
B=⎢ 1 . (8)
⎢ M M L M ⎥
⎢ ⎥
⎣ − Wm
T
0 L Qm ⎦
A New Criterion for Global Asymptotic Stability 353
()
Condition 4 in Theorem 1 is B > 0 . Thus it follows from (7) that f (x∗ ) = 0 ,
∗ ∗
substitutes this into (5), we get Ax = 0 which implies x = 0 . This has shown that
the origin is the unique equilibrium point for every I .
We will now prove the GAS of the origin of (2). To this end, let us consider the
following Lyapunov-Kraasovskii functional
n
V ( x(t )) = x T (t )Cx(t ) + 2∑ p i−1 ∫
xi ( t )
f i ( s )ds
0
i =1
m
+ ∑∫
t
f T ( x( s ))Q j f ( x( s ))ds ,
t −τ j
j =1
m
+ 2∑ f T ( x(t )) P −1W j f ( x(t − τ j ))
j =1
m
− ∑ f T ( x(t − τ j ))Q j f ( x(t − τ j )) ,
j =1
354 K. Liu and H. Zhang
⎡ AC −1 + C −1 A − WP − W1 L − Wm ⎤
⎢ ⎥
⎢ − PW Ω − W1 L − Wm ⎥
T
S=⎢ − W1 T
− W1T Q1 L 0 ⎥.
⎢ ⎥
⎢ M M M M ⎥
⎢ − Wm T
− WmT L Qm ⎥⎦
⎣ 0
In the following , we will choose a positive-definite matrix C such that S > 0 . For
this purpose, let
⎛ AC −1 + C −1 A D ⎞
S = ⎜⎜ ⎟, (10)
⎝ DT B ⎟⎠
whereD = [−W1 ,L ,Wm ] and B is given in (8). In view of (4) and (8), we have
B > 0 . Now, choose a positive definite matrix G such that
G > DB −1 D T . (11)
Hence, V& ( x(t )) is negative definite and therefore, the origin of (2) or equivalently
∗
the equilibrium x of (1) is globally asymptotically stable. This completes the proofs.
In view of Lemma 1, the following Theorem 1' and Theorem 1'' are both equiva-
lent descriptions of Theorem 1.
Theorem 1'. The origin of (2) is the unique equilibrium point and it is globally as-
ymptotically stable if there exists a positive diagonal matrix P and positive definite
matrices Q j ( j = 1,2,L , m) such that
m m
2 A ∑ −1 P − WP − PW T − P ∑ Q j P − ∑ WQ −j 1W jT > 0 .
j =1 j =1
Theorem 1''. The origin of (2) is the unique equilibrium point which is globally as-
ymptotically stable if there exists a positive diagonal matrix P and positive definite
matrices Q j ( j = 1,2,L , m) such that
m
B1 = 2 A ∑ −1 P − WP − PW T − P ∑ Q j P > 0
j =1
and
⎡Q1 − W1T B1−1W1 L − W1T B1T Wm ⎤
⎢ ⎥
⎢ M M M ⎥ > 0.
⎢ − Wm B1 W1
T T
L Qm − Wm B1 Wm ⎥⎦
T T
⎣
In the following, we will give further result for GAS of the origin of (2)(or equiva-
lently the equilibrium point of (1)). We need the following
and
m
S −1 > ∑ S −j 1 . (15)
j =1
356 K. Liu and H. Zhang
Proof. Let A denote the matrix given in the left side of (14), i.e.,
⎛ Q1 ⎞ ⎛W1 SW1 L W1 SWm ⎞
T T
⎜ ⎟ ⎜ ⎟
A=⎜ O ⎟ − ⎜ M L M ⎟,
⎜
⎝ Qm ⎟⎠ ⎝Wm SW1 L Wm SWm ⎟⎠
⎜ T T
where x = ( x , x ,L , x ) ∈ R
T
1
T
2
T
m
nm
with x j ∈ R n , j = 1,2,L , m .Adding and
m m m
subtracting the term ∑ x Tj W jT S jW j x j + ∑∑ x Tj W jT SWi xi , we have
j =1 j =1 i =1
m
f ( x) = x T Ax = ∑ x Tj (Q j − W jT S jW j ) x j
j =1
m m m m
+ ∑ [ x Tj W jT S jW j x j - 2 x Tj W jT S ∑ Wi xi ] + ∑ x Tj W jT S ∑ Wi xi . (16)
j =1 i =1 j =1 i =1
Note that
m
x Tj W jT S jW j x j − 2 x Tj W jT S ∑ Wi xi
i =1
m m
= [ S jW j x j − ∑ SWi xi ]T ⋅ S −j 1 ⋅ [ S jW j x j − ∑ SWi xi ]
i =1 i =1
m
− ∑ xiT WiT SS −j 1 SWi xi
i =1
for i = 1,2,L , m . This implies that
m m
∑[x W
j =1
T
j
T
j S jW j x j - 2 x Tj W jT S ∑ Wi xi ]
i =1
m m
≥ −∑ xiT Wi T S ∑ S −j 1 SWi xi .
i =1 j =1
Substituting this into (16) result in
m
f ( x) = x T Ax ≥ ∑ x Tj (Q j − W jT S jW j ) x j
j =1
m m m
+ ∑ [ xiT WiT [ S (−∑ S −j 1 + S −1 ) S ]∑ Wi xi ,
i =1 i =1 i =1
A New Criterion for Global Asymptotic Stability 357
Which, by using the condition (14) and (15), implies that f ( x) = x T Ax > 0 for all
x ≠ 0 . Therefore we have A > 0 and the proofs is completed.
Theorem 2. The origin of (2) is the unique equilibrium point which is globally as-
ymptotically stable if there exists a positive diagonal matrix P and positive definite
matrices S j ( j = 1,2,L , m) such that
m m
2 A ∑ −1 P − WP − PW T − P ∑ W jT S jW j P − ∑ S −j 1 > 0 . (17)
j =1 j =1
Proof. In view of (17), there exists a sufficiently small positive constant δ >0 such
that
m m
2 A ∑ −1 P − WP − PW T − P ∑ (W jT S jW j + δE ) P − ∑ S −j 1 > 0 , (18)
j =1 j =1
Q j = W jT S jW j + δE , j = 1,2,L , m (19)
and
m
B2 = 2 A ∑ −1 P − WP − PW T − P ∑ Q j P . (20)
j =1
Theorem 3. The origin of (2) is the unique equilibrium and it is globally asymptoti-
cally stable if there exists a positive constant α > 0 such that the following condi-
tions hold
(a) − (2 A ∑ −1 +W + W T + αE ) is positive definite;
m
α
(b) ∑W
j =1
j 2 ≤
2
.
Proof. In view of condition (a), there must be a positive constant δ > 0 such that
− (2 A ∑ +W + W + αE ) > 2δE .
−1 T
(23)
Take P = λE and S j = λ j E ( j = 1,2,L , m) ,where λ and λ j are positive constant
and observe that
m m
2 A ∑ −1 P − WP − PW T − ∑ S −j 1 − P ∑ (W jT S jW j + δ 2 E ) P
j =1 j =1
m m
= 2λA ∑ −1 −λW − λW T − ∑ λ−j1 E − λ2 ∑ (λ jW jT W j + δ 2 E )
j =1 j =1
m
≥ 2λA ∑ −1 −λW − λW T − λ2 ∑ (λ j W j
2
+ δ 2 + λ− 2 λ−j1 )E
2
j =1
= −λ (−2 A ∑ +W + W )−1 T
m
− λ2 ∑ [λ j ( W j + δ − λ−1λ−j1 ) 2 + 2λ ( W j + δ )]E .
2 2
j =1
m m
2 A ∑ −1 P − WP − PW T − ∑ S −j 1 − P ∑ (W jT S jW j + δ 2 E ) P
j =1 j =1
m
≥ −λ (−2 A ∑ −1 +W + W T ) − 2λ ∑ ( W j + δ )E
2
j =1
α m
= −λ[(−2 A ∑ −1 +W + W T + αE ) − 2δE ] + 2λ ( − ∑ W j )E
2 j =1
2
Moreover we have
m m
2 A ∑ −1 P − WP − PW T − ∑ S −j 1 − P ∑ W jT S jW j P > 0 .
j =1 j =1
Theorem 4. The origin of (24) is the unique equilibrium point which is globally and
asymptotically stable if there exists a real constant β > −2 such that following
conditions hold
(I) − (W + W T + βE ) is positive definite;
m
β
(II) ∑W
j =1
j 2 ≤ 1+
2
.
3 Conclusions
The equilibrium and global asymptotic stability properties of neural networks with
multiple constant delays have been studied. Some new stability criteria have been
derived by employing a more general type of Lyapunov-Krasovskii functional. The
conditions are expressed in terms of the linear matrix inequality, which can be veri-
fied efficiently with the Matlab LMI Control Toolbox.
References
1. Hopfield, J.: Neurons with Graded Response Have Collective Computational Properties
Like Those of Two-State Neurons. Proc. Natl. Acad. Sci. USA 81, 3088–3092 (1984)
2. Marcus, C.M., Westervelt, R.M.: Stability of Analog Neural Network with Delay. Phys.
Rev. A 39, 347–359 (1989)
3. Belair, J.: Stability in A Model of A Delayed Neural Network. J. Dynam. Differential
Equations 5, 607–623 (1993)
4. Gopalsamy, K., Leung, I.: Delay Iinduced Periodicity in A Neural Network of Excitation
and Inhibition. Physica D 89, 395–426 (1996)
5. Ye, H., Michel, N., Wang, A.: Global Stability of Local Stability of Hopfield Neural
Networks with Delays. Phys. Rev. E 50, 4206–4213 (1994)
6. Liao, X.F., Yu, J., Chen, G.: Novel Stability Criteria for Bi-directional Associative Mem-
ory Neural Networks with Time Delays. Int. J. Circuit Theory Appl. 30, 519–546 (2002)
7. Gopalsamy, K., He, X.Z.: Stability in A Asymmetric Hopfield Networks with Transmis-
sion Delays. Physica D 76, 344–358 (1994)
360 K. Liu and H. Zhang
8. Liao, X.F., Chen, G., Sanchez, E.N.: LMI-Based Approach for Asymptotically Stability
Analysis of Delayed Neural Networks. IEEE Trans. Circuits Syst. 49(7), 1033–1039
(2002)
9. Liao, X., Chen, G., Sanchez, N.: Delayed-Dependent Exponential Stability Analysis of De-
layed Neural Networks: an LMI Approach. Neural Networks 15, 855–866 (2002)
10. Liao, X.F., Wu, Z., Yu, J.: Stability Analysis of Cellular Neural Networks with Continuous
Time Dlays. J. Comput. Appl. Math. 143, 29–47 (2002)
11. Arik, S.: Stability Analysis of Delayed Neural Networks. IEEE Trans. Circuits Syst. I. 47,
1089–1092 (2000)
12. Liao, X.X., Wang, J.: Algebraic Criteria for Global Exponential Stability of Cellular Neu-
ral Networks with Multiple Time Delays. IEEE Trans. Circuits Syst. 50, 268–275 (2003)
13. Wu, W., Cui, B.T.: Global Robust Exponential Stability of Delayed Neural Networks.
Chaos, Solitons and Fractals 35, 747–754 (2008)
14. Liao, X.F., Li, C.: An LMI Approach to Asymptotical Stability of Multi-Delayed Neural
Networks. Physica D 200, 139–155 (2005)
15. Arik, S.: Global Asymptotic Stability of A Larger Class of Neural Networks with Constant
Time Delay. Phys. Lett. A 311, 504–511 (2003)
An Improved Approach Combining Random PSO
with BP for Feedforward Neural Networks
1 Introduction
Feedforward neural networks(FNNs) have been widely used to approximate arbitrary
continuous functions [1,2], since a neural network with a single nonliner hidden layer
is capable of forming an arbitrarily close approximation of any continous nonliner
mapping [3]. There have been many algorithms used to train the FNN, such as back-
propagation algorithm (BP), particle swarm optimization algorithm (PSO) [4], simulat-
ing annealing algorithm (SSA) [5], gentic algorithm(GA) [6,7] and so on. Regarding
the FNN training algorithms, the BP algorithm is easily trapped into a local minima
and converge slowly [8]. To solve this problem, many improved BP algorithm has
been proposed [9-11]. However these algorithms have not removed the disadvantages
of the BP algorithm in essence.
In 1995, inspired from complex social behavior shown by the natural species like
flock of birds, PSO was proposed by James Kennedy and Russell Eberhart. Different
∗
Corresponding author.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 361–368, 2009.
© Springer-Verlag Berlin Heidelberg 2009
362 Y. Cui et al.
from the BP algorithm, the PSO algorithm has good ability of global search. Although
the PSO algorithm has shown a very good performance in solving many problems, it
suffers from the problem of premature convergence like most of the stochastic search
techniques.
In order to make use of the advantages of PSO and BP, many researchers have
concentrated on hybrid algorithm such as PSO-BP, CPSO-BP and so on [12-14]. In
these algorithms, the PSO is used to do global search first and then switch to gradient
descending searching to do local search around global optimum. It is proved to be
better in convergence rate and convergence accuracy. However, premature conver-
gence still exists in the PSO-BP and CPSO-BP. During the global search, every parti-
cle has to follow both it's best historic position (pbest) and the best position of all the
swarm (gbest). Therefore, pbest will get close to gbest, and then the current particle
swarm may lose its diversity.
In this paper an improved PSO-BP approach named RPSO-BP is proposed to solve
the problems mentioned above. In this algorithm, there are three methods to update
each particle and each particle selects one randomly. In another word, each particle
does not have only one update method that following its best historic position and the
best position of the swarm, any more. In this way the gbest can avoid getting closed to
pbest and the likelihood of premature convergence can be reduced. Moreover, the
experiment results show that the proposed algorithm has better generalization per-
formance and convergence performance than other traditional algorithms.
X i
( t + 1) = X i
( t ) + V i ( t + 1) (2)
where Vi is the velocity of the ith particle; Xi is the position of the ith particle; Pi is the
best position achieved by the particle so far ; Pg is the best position among all parti-
cles in the population; r1 and r2 are two independently and uniformly distributed
random variables with the range of [0,1]; c1 and c2 are positive constant parameters
called accelerated coefficients, which control the maximum step size.
The adaptive particle swarm optimization (APSO) [15,16] algorithm is proposed
by Shi & Eberhart in 1998. This algorithm can stated as follows:
V I
(t + 1) = w * V I (t ) + c1 * r1 * ( P I (t ) − X I
(t )) + c2 * r 2 * ( P g (t ) − X I
(t )) (3)
X i
( t + 1) = X i
( t ) + V i ( t + 1) (4)
where w is called the inertia weight that controls the impact of the previous velocity
of the particle on its current. Several selection strategies of inertial weight w have
An Improved Approach Combining Random PSO with BP 363
been given. Generally, in the beginning stages of algorithm, the inertial weight w
should be reduced rapidly, and the inertial weight w should be reduced slowly when
around global optimum.
Another important variant of standard PSO is the CPSO, which was proposed by
Clerc and Kennedy [17]. The CPSO ensures the convergence of the search producers
and can generate higher-quality solutions than standard PSO with inertia weight on
some studied problems.
(3) If f2 is less than f1 and the pbest of the particle h is nearest to the gbest, the par-
ticle will be updated according to Eqs.(6) and (4).
V I
(t + 1) = w * V I (t ) + c1 * r1 * ( P h (t ) − X I
(t )) + c 2 * r 2 * ( P g (t ) − X I
(t )) (6)
where Pj and Ph are the personal (local) best position achieved by the particle j and
particle h so far respectively.
Similar to the PSO, the parameter w in the above RPSO-BP algorithm reduces
gradually as the iterative generation increases as follows[16]:
(W max − W min ) (7)
W (iter ) = W max −
iterma ∗ iter
where Wmax is the initial inertial weight, Wmin is the inertial weight of liner section
ending, iterma is the used generations that inertial weight, is reduced linearly, iter is a
variable whose range is [1,iterma].
4 Experimental Results
In this paper, some experiments are conducted to verify the effectiveness of the pro-
posed learning approach. In the following experiments, the performances of the BP,
PSO, CPSO, PSO-BP and CPSO-BP are compared with that of the RPSO-BP. All the
experiments are carried out in MATLAB 6.5 environment running in a Pentium 4,
2.40 GHZ CPU.
364 Y. Cui et al.
Table 1. The approximation accuracies and iteration number with six learning algorithms for
approximating the function f(x)
Fig. 1. The approximating result of the RPSO-BP algorithm for approximating the function f(x)
Second, we set the maximal iteration number as 30000 for the BP algorithm, 20000
for the PSO and CPSO algorithms. In the later three algorithms, the maximal iteration
number as 300 for PSO, and the one as 15000 for BP algorithm. The corresponding
results are summarized in Table 1 and Figs. 1-2.
It can be found from Table 1 and Figs.1-2 that the values of MSE of the new algo-
rithm are always less than those of the other five algorithms for approximating the func-
tion. The results also support the conclusion that the generalization performance of the
An Improved Approach Combining Random PSO with BP 365
BP
2.5 PSO
CPSO
2 PSO-BP
CPSO-BP
RPSO-BP
1.5
t h e t e s t i n g e r ro r
1
0.5
-0.5
-1
0 0.5 1 1.5 2 2.5 3 3.5
the function range
Fig. 2. The testing error curves of six learning algorithms for approximating the function f(x)
Fig. 3. The relationship between the number of hidden neurons and testing error for approxi-
mating the function f(x) with three algorithms
Fig. 4. The relationship between the number of PSO and testing error for approximating the
function f(x) with three learning algorithms
366 Y. Cui et al.
proposed algorithm is better than the other algorithms. Moreover, the hybrid algorithms
such as PSO-BP, CPSO-BP and RPSO-BP converge faster than other algorithms.
Fig.3. shows the relationship between the number of hidden neurons and testing er-
ror for approximating the function f(x) with PSO-BP, CPSO-BP and RPSO-BP. It can
be seen that the testing error has a downward trend as the number of hidden neurons
increases. Fig. 4 shows the relationship between the number of PSO and testing error
for approximating the function f(x) with PSO-BP, CPSO-BP and RPSO-BP. The most
suitable number of PSO can be selected as 60 for RPSO-BP.
In petrochemical industry, the true boiling point curve of crude oil reflects the composi-
tion of the distilled crude oil. To build a model that takes the mass percentage of distilled
component as the independent variable and the distilled temperature as the dependent
variable is an important problem in petrochemical industry. In this subsection, we use
FNN to build nonparametric models to resolve this problem. In this subsection, the num-
ber of hidden neurons and the size of PSO we chose are 10 and 50 respectively.
Regarding the three hybrid algorithms in this example, the maximal iteration num-
ber of the corresponding PSO is set as 500, and the one for BP is set as 5000. For the
BP, the maximal iteration is assumed as 10000. For the PSO and CPSO, the maximal
iteration number is assumed as 10000. The simulation results are summarized in the
table 2 and the comparison result of the three hybrid algorithms are showed in Fig. 5.
0.4
CPSO-BP
0.35 PSO-BP
RPSO-BP
0.3
0.25
the testing error
0.2
0.15
0.1
0.05
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
the normalized data
Fig. 5. The testing error curves of six learning algorithms for modeling the true boiling point
curve of crude oil
Table 2. The approximation accuracies and iteration number with six learning algorithms for
modeling the true boiling point curve of crude oil with six algorithms
From the Table 2 and Fig.5 the same conclusions as the experiment of the function
approximation problem can be drawn.
5 Conclusions
In this paper, an improved PSO-BP approach which combines RPSO with BP is pro-
posed to train FNN. PSO has good global search ability, but the swarm loses its diver-
sity easily. However, the gradient descent algorithm has good local search ability. In
this paper, global search algorithm—RPSO is combined with BP reasonably. More-
over, in order to improve the diversity of the swarm in the PSO, RPSO are proposed
in the paper. The RPSO-BP could not only improve the diversity of the swarm but
also reduce the likelihood of the particles being trapped into local minima on the error
surface. Finally, the experimental results were given to verify that the proposed algo-
rithm has better generalization performance and faster convergence rate than other
traditional algorithms. In future research works, we shall focus on how to apply this
improved algorithm to solve more practical problems.
References
1. Homik, K.: Mulitilayer feedforward networks are universal approximators. Neural
Networks 2, 359–366 (1989)
2. Chen, D.S., Jain, R.C.: A robust Back-propagation Algorithm for Function Approximation.
IEEE Trans. on Neural Network 5, 467–479 (1994)
3. Meng, J.L., Sun, Z.Y.: Application of combined neural networks in nonlinear function
approximation. In: Proceedings of the Third World Congress on Intelligent Control and
Automation, Hefei, pp. 839–841 (2000)
4. Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: IEEE International Conference
on Neural Networks, Perth, IEEE Service Cente, Piscataway, NJ, pp. 1942–1948 (1995)
5. Li, X.P., Zhang, H.Y.: Improvement of Simulated Annealing Algorithm. Software
Guide 7(4), 47–48 (2000)
6. Angeline, P.J., Sauders, G.M., Pollack, J.B.: An evolutionary algorithm that constructs re-
current neural networks. IEEE Trans. Neural Networks 5(1), 54–65 (1994)
7. Yao, X.: A review of evolutionary artifical neural networks. Int. J. Intell. Syst. 8(4),
539–567 (1993)
8. Han, F., Ling, Q.H., Huang, D.S.: Modified Constrained Learning Algorithms Incorporat-
ing Additional Functional Constraints Into Neural Networks. Information Sciences 178(3),
907–919 (2008)
9. Wang, X.G., Tang, Z., Tamura, H., Ishii, M.: A modified error function for the backpropa-
gation algorithm. Neurocomputing 57, 477–484 (2004)
10. Liu, Y.H., Chen, R., Peng, W., Zhou, L.: Optimal Design for Learning Rate of BP Neutral
Network. Journal of Hubei University of Technology 22(2), 1–3 (2007)
368 Y. Cui et al.
11. Ma, Y.Q., Huo, Z.Y., Yang, Z.: The Implementation of the Improved BP Algorithm by
Adding the Item of the Momentum. Sci-Tech Information Development & Econ-
omy 16(8), 157–158 (2006)
12. Zhang, J.-R., Zhang, J., Lok, T.-M., Lyu, M.R.: A hybrid particle swarm optimization-
back-propagation algorithm for feedforward neural network training. Applied Mathematics
and Computation 185(2), 1026–1037 (2007)
13. Guo, W., Qiao, Y.Z., Hou, H.Y.: BP neural network optimized with PSO algorithm and its
application in forecasting. In: Proc. 2006 IEEE International Conference on Information
Acquisition, Weihai, Shandong,China, August 2006, pp. 617–621 (2006)
14. Han, F., Ling, Q.H.: A New Approach for Function Approximation Incorporating Adap-
tive Particle Swarm Optimization And A Priori Information. Applied Mathematics and
Computation 205(2), 792–798 (2008)
15. Shi, Y.H., Eberhat, R.C.: Parameter selection in particle swarm optimization. In: Proc. of
1998 Annual conference on Evolutionary Programming, San Diego, pp. 591–600 (1998)
16. Shi, Y.H., Eberhat, R.C.: A modified particle swarm optimizer. In: Proc.of IEEE World
Conf. on Computation Intelligence, pp. 69–73 (1998)
17. Clerc, M., Kennedy, J.: The particle swarm:explosion,stability,and convergence in a multi-
dimensional complex space. IEEE Trans. Evolut. Comput. 6(1), 58–73 (2002)
Fuzzy Multiresolution Neural Networks
1 Introduction
Artificial neural networks (ANN) and wavelet theory become popular tools in
various applications such as engineering problems, pattern recognition and non-
linear system control. Incorporating ANN with wavelet theory, wavelet neural
networks (WNN) was first proposed by Zhang and Benveniste [1] to approximate
nonlinear functions. WNNs are feedforward neural networks with one hidden
layer, where wavelets were introduced as activation functions of the hidden neu-
rons instead of the usual sigmoid functions. As a result of the excellent properties
of wavelet theory and the adaptive learning ability of ANN, WNN can make an
remarkable improvement for some complex nonlinear system control and identi-
fication. Consequently, WNN was received considerable attention [2,3,4,5,6,7].
Especially, inspired by the fuzzy model , Daniel [8] put forward a fuzzy wavelet
model. The FWN consists of a set of fuzzy rules. Each rule corresponding to a
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 369–378, 2009.
c Springer-Verlag Berlin Heidelberg 2009
370 L. Ying, S. Qigang, and L. Na
sub-wavelet neural network consists of single fixed scaling wavelets, where the
translation parameters are variable. Such sub-WNNs at different resolution level
can capture the different behaviors of the approximated function. The role of the
fuzzy set is to determine the contribution of different resolution of function to
the whole approximation. The fuzzy model can improve function approximation
accuracy given the number of wavelet bases. But such FWNN require a complex
initialization and training algorithm. During the initialization step, the dilation
values of such FWNN is fixed beforehand through experience and a wavelet
candidate library should be provided by analysis the training samples.
As we know, the scaling function corresponds to the global behavior (low
frequency information) of the original function, while the wavelets corresponds
to the local behavior (high frequency information) of the original function. The
FWNN in [8] loses sight of the low frequency, though which may be approximated
by wavelet functions. But due to the theory of MRA, it would need much more
wavelet nodes.
Incorporating the idea of FWNN with pre-wavelet, we propose a fuzzy pre-
wavelet neural network. The hidden of layer of FMRANN consist of not only the
scaling function nodes, but also wavelet function nodes. The structure of FM-
RANN includes two sub-neural networks: sub-scaling function neural networks
and sub wavelet neural networks. The contribution of such sub-neural networks
is determined fuzzy rules.
In the learning of artificial neural network (ANN), the back propagation algo-
rithm by Rumelhart, Hinton, and Williams has long been viewed as a landmark
event. With the increasing of complexity of the practical problems, the standard
back propagation algorithm based the single gradient has ability not equal to
its ambition, which motivated many researchers to develop enhanced training
procedures with exhibit superior capabilities in terms of training speed, map-
ping accuracy, generalization, and overall performance than the standard back
propagation algorithm. The training method based upon second-order derivative
information exhibit better efficient and promising. Primary second-order meth-
ods are the back propagation algorithm based on quasi-Newton, Levenburg-
Marquardt, and conjugate gradient techniques. Although these methods have
shown promise, they often lead to poor local optima partially attributed to the
lack of a stochastic component in training procedure. Particle swarm optimiza-
tion (PSO) in [9,13] is a stochastic population-based optimization technique.
The simplicity of implementation and weak dependence on the optimized model
of PSO make it a popular tool for a wide range of optimization problems.
A fuzzy multi-resolution neural network (FMRANN) based on modified par-
ticle swarm algorithm is proposed to approximate arbitrary nonlinear function
in this paper. The basic concepts of Multi-resolution are introduced in Section 1.
In section 2, the two different structure of fuzzy multi-resolution neural network
(FMRANN) is presented and Aiming at the features of FMRANN, a modified
particle swarm optimization is introduced to update the parameters. Two simu-
lation examples are utilized to illustrate the good performance of the FMRANN
compared with other methods in Section 4 .
Fuzzy Multiresolution Neural Networks 371
where
µi (x)
µ̂i (x) =
c ,
µi (x)
i=0
0
M
ŷ 0 (x) = wk0 Φk (x),
k=1
1
M
ŷ 1 (x) = wk0 Ψk (x).
k=1
From figure 1 to figure 2, we respectively show the graphs of scaling functions
and wavelet functions with different support length and vanishing moment which
are used for our FMRANN.
1.4 2
1.2
1.5
1
1
0.8
0.6 0.5
0.4 0
0.2
−0.5
0
−1
−0.2
−0.4 −1.5
0 2 4 6 0 2 4 6
Fig. 1. The graph of scaling function φ and wavelet function ψ according to Daubecies
wavelet with support width 5 and vanishing moment 3
1.2 1.5
1
0.8
0.6
0.5
0.4
0
0.2
0
−0.5
−0.2
−0.4 −1
0 2 4 6 8 0 2 4 6 8
Fig. 2. The graph of scaling function φ and wavelet function ψ according to Daubecies
wavelet with support width 7 and vanishing moment 4
1995. Particle swarm optimization method mimics the development of this tech-
nique was inspired by the animal social behaviors such as school of fish, flock
of birds etc and mimics the way they find food sources. PSO has gained much
attention and wide applications in different fields. Compared with genetic algo-
rithm (GA), PSO employs a cooperative strategy while GA utilizes a competitive
strategy. In nature, cooperation is sometimes more benefit for survival than com-
petition. For example,the PSO has shown its superior ability than CA in some
problems such as optimizing the parameters of artificial neural network (ANN)
[14]. Gradient decent is another popular algorithm for the learning of ANN,
which is a good choice when the cost function is unimodal. But the majority of
the cost functions are multimodal, gradient decent might lead to convergence to
the nearest local minimum. But PSO has the ability to escape from local minima
traps due to its stochastic nature.
In PSO, each particle represents an alternative solution in the multi-dimensional
search space. PSO can find the global best solution by simply adjusting the trajec-
tory of each particle towards its own best location and towards the best location
of the swarm at each time step (generation).
Fuzzy Multiresolution Neural Networks 375
The implement of PSO can be categorized into the following main steps:
(1). Define the Solution Space and a Fitness Function:
(2). Initialize Random Swarm Position and Velocities:
(3). Updating particle Velocities and Position
v(k + 1) = w ∗ v(k) + c1 ∗ rand()(pbest − x)
+c2 ∗ rand() ∗ (gbest − x);
x(k + 1) = x(k) + v(k + 1)
where wis inertial weight, c1 and c2 are constriction factors. Use of inertia weight
and constriction factors has made the original implementation of the technique
very efficient [9,13].
The inertial weight factor provides the necessary diversity to the swarm by
changing the momentum of particles and hence avoids the stagnation of particles
at local optima. The empirical investigations shows the importance of decreasing
the value of from a higher valve (usually 0.9 to 0.4) during the search.
The majority problem of PSO is premature convergence of particle swarm.
How to efficiently avoid such case is considerably important. Various mechanisms
have been designed to increase the diversity among the particles of a swarm.
Here we propose an mechanism called forgetfulness to improve the the diver-
sity of particle the premature convergence. The basic ideal is that we impose each
particle and the swarm slightly amnesia when the particle swarm trend to pre-
mature convergence. This mechanism give the particles restarting chance based
on the existing better condition. Firstly, define a merit to reflect the convergent
status of the particle swarm as follows:
σf2
τ=
max1≤j≤N {(fi − f¯)2 }
Where
N N
1 1
f¯ = fi , σf2 = (fi − f¯)2
N i=1
N i=1
Where fi is the fitness of the i-th particle, N is the size of the particle swarm, f¯ is
the average fitness of all particles, and σf2 is the covariance of fitness. For a given
small threshold, if τ is less than this threshold and the expected solution haven’t
reached, then we think that this particle swarm tends to premature convergence.
Under this condition, we impose each particle and the swarm slightly amnesia.
Each particle forget its historical best position and consider the current position
as its best position. Similarly, the swarm don’t remember its historical global best
position and choose the best position from the current positions of all particles.
4 Simulation
In the following, the concrete numerical examples are given to demonstrate the
validity of the presented FMRANN. In this section, for evaluating the perfor-
mance of the network, we take the following merits defined in [1] as a criterion
to compare the performance of various methods.
376 L. Ying, S. Qigang, and L. Na
10 1
original function error data
8 estimated result from FMRANN 0.8 zero line
6 0.6
4 0.4
2 0.2
0 0
−2 −0.2
−4 −0.4
−6 −0.6
−8 −0.8
−10 −1
−10 −8 −6 −4 −2 0 2 4 6 8 10 −10 −8 −6 −4 −2 0 2 4 6 8 10
Fig. 3. The left figure is the comparison between original function and estimated re-
sult from FMRANN with performance index 0.019912. The right figure is the error
between original function and estimated result from FMRANN with performance in-
dex 0.019912.
10 1
original function error data
8 estimated result from FMRANN 0.8 zero line
6 0.6
4 0.4
2 0.2
0 0
−2 −0.2
−4 −0.4
−6 −0.6
−8 −0.8
−10 −1
−10 −8 −6 −4 −2 0 2 4 6 8 10 −10 −8 −6 −4 −2 0 2 4 6 8 10
Fig. 4. The left figure is the comparison between original function and estimated re-
sult from FMRANN with performance index 0.015591. The right figure is the error
between original function and estimated result from FMRANN with performance in-
dex 0.015591.
N N
1
N
J = ( yl − ŷl )/( yl − ȳ), ȳ = yl
N
l=1 l=1 l=1
Where yl is the desired output and ŷl is the estimated output from the con-
structed neural networks.
We respectively selected spline wavelets with order 3 and Daubechies wavelet
with support width 2N-1 and vanishing moment N. We compare our FMRANN
with other works in [1], [7] and [8], which take the Mexican Hat function as
wavelet function.
Example I - Approximation of univariate piecewise function:
⎧
⎨ −2.186x − 12.864, −10 ≤ x < −2
f (x) = 4.246x, −2 ≤ x < 0 . (4)
⎩
10e−0.05x−0.5sin[(0.03x + 0.7)x], 0 ≤ x ≤ 10
Fuzzy Multiresolution Neural Networks 377
References
1. Zhang, Q.H., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Netw. 3(6),
889–898 (1992)
2. Pati, Y.C., Krishnaprasad, P.S.: Analysis and synthesis of feedforward neural net-
works using discrete affine wavelet transformation. IEEE Trans. Neural Networks 4,
73–85 (1993)
378 L. Ying, S. Qigang, and L. Na
3. Zhang, J., Walter, G.G., Lee, W.N.W.: Wavelet neural networks for function learn-
ing. IEEE Trans. Signal Processing 43, 1485–1497 (1995)
4. Zhang, Q.: Using wavelet networks in nonparametric estimation. IEEE Trans. Neu-
ral Networks 8, 227–236 (1997)
5. Alonge, F., Dippolito, F., Mantione, S., Raimondi, F.M.: A new method for optimal
synthesis of wavelet-based neural networks suitable for identification purposes. In:
Proc. 14th IFAC, Beijing, P.R. China, June 1999, pp. 445–450 (1999)
6. Li, X., Wang, Z., Xu, L., Liu, J.: Combined construction of wavelet neural networks
for nonlinear system modeling. In: Proc. 14th IFAC, Beijing, P.R. China, June 1999,
pp. 451–456 (1999)
7. Chen, J., Bruns, D.D.: WaveARX neural network development for system identi-
fication using a systematic design synthesis. Ind. Eng. Chem. Res. 34, 4420–4435
(1995)
8. Daniel, J., Ho, W.C., et al.: Fuzzy wavelet networks for function learning. IEEE
Trans. on Fuzzy Systems 9(1), 200–211 (2001)
9. Eberhart, R.C., Kennedy, J.: A new optimizer using particle swarm theory.
In: Proc. 6th Int. Symp. Micro Machine and Human Science, Nagoya, Japan,
pp. 39–43 (1995)
10. Daubechies, I.: Ten lectures on wavelets CBMS. SIAM 61, 194–202 (1994)
11. Strang, G., Nguyen, T.: Wavelets and Filter Banks. Wellesley-Cambridge Press
(1996)
12. Chui, C., Wang, J.: A general framework for compactly supported splines and
wavelets. Proceedings of Americal Mathematical Soceity 113, 785–793
13. Eberhart, R.C., Simpson, P., Dobbins, R.: Computational Intelligence PC Tools:
Academic, ch. 6, pp. 212–226 (1996)
14. Eberhart, R.C., Shi, Y.: Comparison between genetic algorithms and particle
swarm optimization. In: Porto, V.W., Saravanan, N., Waagen, D., Eiben, A.E.
(eds.) EP 1998. LNCS, vol. 1447, pp. 611–618. Springer, Heidelberg (1998)
Research on Nonlinear Time Series Forecasting of
Time-Delay NN Embedded with Bayesian Regularization
1 Introduction
Basically, the traditional time serial forecasting model was built on the foundation of
statistic technology, they are mainly subjected to linear model, which has advantage
of simple nature and strong explanation ability, but as for the forecasting of tremen-
dous system’s evolution serial, it fails, especially when the change of real economic
system is strong, high level nonlinear characteristic and classical chaos appears in the
system, the common forecasting method can hardly handle it. Chaos theory is an
important discovery in nonlinear dynamic, which come into being from 1960’s, until
1980’s, it have developed to be a new study with special concept system and method
frame. Because of this, people have some new acknowledgement on the complexity
of time serial. According to utilize the phase space reconstruction technology of cha-
otic theory, economic time serial is embedded in the reconstructed phase space, and
with the help of fractal theory and symbol dynamics, the complex moving characteris-
tic of economic dynamic system can be described, furthermore, the inner rule can be
found and the forecasting report is produced. Therefore, it has a realistic meaning and
scientific value[1-6] to find the evolution rule of economic time serial on the stand of
nonlinear dynamics and study the chaotic characteristic and nonlinear forecasting of
complex economic dynamic system.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 379–388, 2009.
© Springer-Verlag Berlin Heidelberg 2009
380 W. Jiang et al.
besides, it has faster response capability and smaller number of iterate than other simi-
lar methods.
As for the nonlinear characteristic of trade serial, this paper put the established
TDBP model into some certain occupation’s trade forecasting, according to adopt the
actual amount of imp&exp of 1/1989-5/2003 as the ample of training network, a new
model was founded[9,10], which successfully forecast and combine this occupation’s
international trade affairs[8]. Meanwhile, on the foundation of analyzing forecasting
precision, this paper gave a capability evaluation for the nonlinear forecasting model
according the maximum Lyapunov index and the relation dimension between com-
bined serial and original data.
⎡X ( j +τ ) ⎤ ⎛ ⎡ X ( j) ⎤⎞
⎢ X ( j) ⎥ ⎜⎢ ⎥⎟
⎢ ⎜
⎥=F ⎢ X ( j − τ ) ⎥⎟
(2)
⎢M ⎥ ⎜ ⎢M ⎥⎟
⎢ ⎥ ⎜ ⎟
⎜ ⎢ X ( j − (m − 1)τ ⎥ ⎟
⎣ X ( j − (m − 2)τ ⎦ ⎝ ⎣ ⎦ ⎠
Then forecasting reflection F : Rm→Rm can be denoted as:
X(j+ τ )= F (X(j),X(j- τ )…,X(j-(m-1) τ )) (3)
And sse denotes the error’s square sum, ssw denotes the square sum of all power
coefficient’s valve value in the network, α , β are the coefficient of objective func-
tion. Suppose that network’s power value is random variable, after the output data is
figured out, we can use Bayesian formula to revamp the ratio density function of
power value:
P ( D | w, β , M ) P ( D | w | α , M )
P ( w | D, α , β , M ) = (5)
P( D | α , β , M )
And w denotes network power value’s vector, D denotes data set, M denotes network
model. P ( D | α , β , M ) is full probability, P ( w | α , M ) denotes the first check
density function of power vector, P ( D | w, β , M ) is similar function of timing
output, commonly supposed that noise and power vector exist in data follow the
Gauss distribution, it means:
P ( D | w, β , M ) = e − βsse / Z D ( β ) (6)
P( w | α , M ) = e −αssw / Z w (α ) (7)
e − (αssw+ βsse) /[ Z w (α ) Z D ( β )]
P ( w | D, α , β , M ) = (8)
P( D | α , β , M )
= e − F ( w) / Z F (α , β )
From formula (8), we can see that the optimal power value posses the biggest last
check probability, and maximum last validation probability equals to the minimum
regularization objective function F( w ) = βsse + αssw . If α << β , network train-
ing can reduce error to minimum; and if α >> β , network training will automati-
cally reduce the efficient network parameters, which is to make up to big network
error. On the condition that the error of network training is as small as possible, it is
evident that new capability objective function is efficient, which can minimize the
efficient parameter and lessen the scale of network.
Here, we adopt Bayesian method to certain the optimal regularization parame-
ter α , β , suppose α , β ‘s first validation function is P (α , β , M ) , then the last is:
P(D | α, β, M)P(α, β, M)
P(α, β | D, M) = (9)
P(D | M)
Research on Nonlinear Time Series Forecasting of Time-Delay NN 383
Formula (9) illustrate that the last validation of maximum regularization parame-
ter α , β is equal to maximum similar function P ( D | α , β , M ) . Combining (5) and
(8), formula (10) can be concluded:
Z F (α , β )
P( D | α , β , M ) = (10)
Z w (α ) Z D ( β )
()
In order to figure Z F (α , β ) out, unfolding F w on the minimum point w*, because
gradient is 0, then:
F(w)=F(w*)+(w-w*)TH(w*)(w-w*)/2 (11)
(2π ) N / 2 − F ( w*)
= e [det( H ( w*)−1 )]1/ 2
Z F (α , β )
Then:
Putting formula (13) into formula(10), obtaining logarithm from both sides, besides,
utilizing the one-power condition of optimal value to obtain the optimal regularization
parameter:
γ n−γ
α* = , β* = (14)
2 E w ( w*) 2 E D ( w*)
reflect the actual scale of network. Paper [8] pointed out that: as for Hison matrix H,
when using quick algorithm Levenberg-Marquardt to train the network, it was easy to
approach by Gauss-Network method. In the process of training, sse and ssw can be
the factor to figure out the number of nerve cell of hidden level (marked as Nhl) by
following the efficient parameter γ , after certain times of iterate, when these three
parameters is in a forever status or smaller, it gives a sign that this network is in con-
vergence status, and it is the time to stop training; selecting a smaller Nhl, continu-
ously increasing the number of hidden nerve cell until current Nhl makes network
convergence, then the state of sse and ssw will be unchangeable, as a result, this Nhl
can be regarded as the spot number of hidden-level.
This paper adopts single-output three-level BP neural network, separately, the
stimulating function of these three levels is logsig, logsig, pureline. The nerve cell
384 W. Jiang et al.
number of input level equals to the embedded dimension number of nonlinear time
serial, besides, many study has proved that the number of maturity embedded dimen-
sion of time serial can be the nerve cell number of network input level. Here, we ob-
tained the maturity embedded dimension number by utilizing G-P algorithm. we use
MatLab language to build the concrete model and on the aid of tool box function
trainbr() Bayesian Regularization algorithm is realized. Fig.1 gives out the basic
frame of the model.
Nhl=Nhl
Figure +1
the Find
Num- N
The
ber Build Result
Of Y num- Ana-
Giv- ber. net-
Input
Level
spot
Study
Sam-
ple
ing
train-
ing
,
doesγ
sseey
Of
Hid-
work
Fore-
cast
Emu-
lator
And
lyze
And
Fore-
And den
Origi- com- ssw is level Mode Fore- cast
nal Tutor bine-d steady l cast evalua-
Data signal Nl/L- ? tion
Selec- M
tion And
BR
Algo-
Data rithm
uni-
tary
Application
Data dealing in Founding the forecasting /evaluation
advance model
comes to 16, relation dimension is still mature. The time-delay τ is 1. When training
the network, we adopted unitary data of imp&exp trade’s month data from 1/1989 to
5/2003 to do the phase space reconstruction, and then respectively obtained 13-
dimension matrix or 16-dimension matrix which is be the study sample; the tutor
signal of network model is the unitary data of imp&exp trade’s month data from
2/1990 to 6/2003 and from 5/1990 to 6/2003 separately. As for different Nhl, utilizing
these study samples and tutor signals to do network training until the network is in the
status of convergence, the obtained parameter is illustrated by table.1 and table 2.
The network training parameter in table.1 demonstrates that: when network model
of imp&exp trade is under the condition that N hl ≥ 6 or N hl ≥ 8 , γ , sse and ssw
is steady, as a results, the hidden level nerve cell number of nonlinear forecasting
model of imp&exp trade are 6 and 8. On the foundation of these parameters, TDBP
network model of imp&exp trade can be established, which can give nonlinear
forecast for future development.
4 Example Verification
This paper used multi-steps forecasting method, which feedback the forecasting value
to the input end, reconstruct input matrix and do next step forecasting. Here the 12-
step of 7/2003-6/2004 is given out: as for the data of 6/2003-12/2003, we can com-
pare it directly from the original data, from which we can judge the generalization
capability of neural network and the forecasting precision of non-linear model; the
forecasting results of 1/2004-6/2004 shows the develop trend of our country’s
386 W. Jiang et al.
Table 3. The actual month data of imp&exp trade and forecasting results from 7/2003 to
12/2003 (10 thousand dollar)
Table 4. Imp&exp trade forecasting results from 1/2004 to 6/2004 (10 thousand dollar)
Table 5. Nonlinear characteristic calculation result of original month serial and forecast serial
import export
characteristic sample
Original serial Forecast serial Original serial Forecast serial
Relation dimension 1.4576 1.4206 1.6819 1.9697
Maximum Lyapunov index 0.0061 0.0060 0.0115 0.0167
imp&exp trade. The actual month data of imp&exp trade and forecasting results is
generalized in table.3 and table.4.
The results of table.3 shows that forecasting value is very close to the actual value
except the data of 1/2003’s export trade, and the relative error is under 10 percent,
single step forecasting relative error is below 1 percent. Meanwhile combining the
figure and forecasting data, apparently, the seasonal change trend of imp&exp trade
can be predicted, and the forecasting point it out that the amount of imp&exp trade of
the second half year of 2003 is larger than the first half one, when it come to Decem-
ber, the volume steps to the maximum number, then imp&exp volume of January and
February suffer a slow down, but January is little bigger than February, all the change
is in accordance with the actual development. Usually, the corporation like to finish the
good exchange at the end of one year, and the custom’s statistic is balanced before the
new year, because China’s spring festival is always in January or February, as a results,
the amount of December is usually the largest one, January and February the least,
owing to the lag factor of last year, amount of January will bigger than February. The
Research on Nonlinear Time Series Forecasting of Time-Delay NN 387
forecasting demonstrated that: there will be a trade deficit partly because of model
factor; actually, it is also the results of environment change. On the stand of interna-
tional, the value of dollar may rise, which can hoist the value of RMB, unfortunately,
this can sharpen down the capability of China’s export trade and prompts import deals.
After the Cancun meeting, EU and USA will give more attention on business coordina-
tion of bilateral and distribute economic, which can prompt the coordination of distrib-
ute economic, strengthen the international trade protectionism and descent a blow for
our country’s strategy of broadening export volume and absorbing foreign investment.
As for the domestic aspect, the drop of custom will promote China’s import volume;
every blade has two curves, proper unfavorable balance of trade will buffer the pres-
sure of RMB appreciation. Base on these factors, the forecasting that the amount of
export trade in 2004 will change more, and the import trade may increase faster than
export, In a word, emulation and forecasting basically combined the training data and
develop trend.
Table.5 shows that the nonlinear characteristic value of the two import serials al-
most identical; as for export trade, the forecasting serial and original serial have tiny
difference, but it is not a big deal, besides, when the embedded dimension is 16, these
two serial’s relation dimension reach the point of maturity. Actually, many scientific
studies had proved that there are more factors are influencing the export trade than
import, which produces more noise in the export trade serial and fades the nonlinear
characteristic of actual system. Fortunately, the model of this paper is insensitive of
noise, and it has a good capability of generalization, which can enable the model
obtain a direct nonlinear expression form from the dynamic system, produce the com-
bination serial and efficiently ‘catch’ the dynamics characteristic of nonlinear system
that produced the original serial.
5 Conclusions
As a kind of nonlinear dynamics method, neural network has a strong capability to
close nonlinear reflection, and that is why it is so suitable in modeling and forecasting
of nonlinear economic system. Up to this day, the scholar has reaped enormous actual
fruit from applying neural network in the problems of economic management. Based
on nonlinear forecasting theory of phase reconstruction, this paper has built a time-
delay BP neural network model. Considering the weak generalization capability of
common BP neural network, this paper combined the Bayesian Regularization to
improve the objective function and the generalization capability of neural network. In
order to explain more vividly, this paper then adopted some certain occupation’s ac-
tual monthly imp&exp data of 1/1989-5/2003 as training sample, finally, it built a
multi-steps nonlinear forecasting model. The results of imp&exp trade’s multi-step
forecast showed that the model could not only reasonably forecast the develop trend
of actual serial, but also combined the actual data, besides, the relative error of single
forecast is in the scope of 1 percent. In addition, nowadays, the forecast evaluation
only focus on the improvement of the combination of comparing data point and the
forecasting precision, which lead to the overlook of the difference between combined
serial and original serial. When the system is nonlinear, it got a problem, because in
nonlinear system, even two identical serial may come from two different sub-systems.
388 W. Jiang et al.
For this problem, this paper made an improvement: utilizing the relation dimension of
combined serial and original data and the maximum Lyapunov index, comparing the
memory structure of combined serial and original serial, and then finding that the
model can efficiently catch up the inner structure and rule of the nonlinear system that
produce the original serial. The model of this paper has something really special from
others, its free parameter is fewer, it has a strong capability of robustness and gener-
alization, which make it very suitable for the forecasting of nonlinear chaos time
serial.
Acknowledgement
This paper is supported by the Society Science Foundation of Hunan Province of
China No. 07YBB239.
References
1. Ott, E., Grebogi, C., Yorke, J.A.: Controlling chaos. Physical Review Letters 64(11),
1196–1199 (1990)
2. Islam, M.N., Sivakumar, B.: Characterization and prediction of runoff dynamics: a nonlin-
ear dynamical view. Advances in Water Resources (25), 179–190 (2002)
3. Ramirez, T.A., Puebla, H.: Control of a nonlinear system with time-delayed dynamics.
Phys. Lett. A 262, 166–173 (1999)
4. Dongmei, L., Zhengou, W.: Small parameter perturbations control algorithm of chaotic
systems based on predictive control. Information and Control 32(5), 426–430 (2003)
5. Weijin, J.: Non-Linear chaos time series prediction & control algorithm based on NN.
Journal of Nanjing University (NaturalSciences) 38(Computer issue), 33–36 (2002)
6. Weijin, J.: Research of ware house management chaotic control based on alterable. Micro
Electronics & Computer 19(11), 55–57 (2002)
7. Sivakumar, B., Jayawardena, A.W., Fernando, T.M.K.G.: River flow forecasting: use of
phase-space reconstruction and artificial neural networks approaches. Journal of Hydrol-
ogy (265), 225–245 (2002)
8. Islam, M.N., Sivakumar, B.: Characterization and prediction of runoff dynamics: a nonlin-
ear dynamical view. Advances in Water Resources (25), 179–190 (2002)
9. Jayawardena, A.W., Li, W.K., Xu, P.: Neighborhood selection for local modeling and pre-
diction of hydrological time series. Journal of Hydrology (258), 40–57 (2002)
10. Kim, H.S., Eykholt, R., Salas, J.D.: Nonlinear dynamics, delay times, and embedding win-
dows. Phy. D (127), 48–60 (1999)
An Adaptive Learning Algorithm for Supervised Neural
Network with Contour Preserving Classification
1 Introduction
It is known that repetitive feeding of training samples is required for allowing a super-
vised learning algorithm to converge. If the training samples effectively represent the
population of the targeted data, the classifier can be approximated as being generalized.
However, there are many times when it is impractical to obtain such a truly representa-
tive training set. Many classifying applications are acceptable with convergence to a
local optimum. For example, a voice recognition system may be customized for effec-
tively recognizing voices of a group of limited number of users, so the system may not
be practical for recognizing any speaker. As a consequence, this kind of application
needs occasional retraining when there is sliding of actual context locality.
Our focus is when only part of the context is changed; thereby establishing some
new cases, while inhibiting some old cases, assuming a constant system complexity.
The classifier will be required to effectively handle some old cases as well as new
cases. Assuming that this kind of situation will occur occasionally, it is expected that
the old cases will age out, the medium-old cases are accurately handled to a certain
degree, and new cases are most accurately handled. Since the existing knowledge is
lost while retraining new samples, an approach to maintain old knowledge is required.
While the typical solution uses both prior samples and new samples on retraining, the
major drawback of this approach is that all the prior samples for training must be
maintained.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 389–398, 2009.
© Springer-Verlag Berlin Heidelberg 2009
390 P. Fuangkhon and T. Tanprasert
Research works related to the proposed algorithm are in the field of adaptive
learning [1], [2], incremental learning [16], and contour preserving classification [3].
The prior one can be categorized into three strategies [3]. The first strategy (increas-
ing neurons) [4], [5], [6], [7] will increase number of hidden nodes when the error is
excessively high. These algorithms will adapt weights of neuron that is closest with
input sample and neighbors only. However, the increasing size of neural networks is a
cause of accuracy tradeoff. The second strategy (rule extraction) [8], [9], [10], [11]
will interchange between rules and weight of neuron. The accuracy of networks de-
pends on discovered rules. The translation of weight vector into rules also partly sup-
presses certain inherited statistical information. The last strategy (aggregation) [12],
[13], [14], [15] will allow existing weights to change in bounded range and will al-
ways add new neurons for learning samples of new context. This method requires two
contexts be similar and network’s size is incrementally larger.
In this paper, an alternative algorithm is proposed for solving adaptive learning
problem for supervised neural network. The algorithm improved from [17], [18] is
able to learn new knowledge while maintaining an old knowledge through the decay
rate while allowing the adjustment of the number of new samples. In addition, the
improvement of the classification and the noise tolerance is archived by utilizing the
contour preserving classification algorithm which helps expanding the territory of
both classes while maintaining the shape of both classes.
Following this section, section 2 summarizes the outpost vector model. Section 3
describes the methodology. Section 4 demonstrates the experimental results of a 2-
dimension partition problem. Section 5 discusses the conclusion of the paper.
Outpost Vector of
Ak
taking B*(Ak) as
Ak
Ak’s
territory
B*(Ak)
Optional outpost
vector B*(Ak)’s
territory
A
Outpost Vector of B*(Ak)
taking Aj as the nearest
3 Methodology
The algorithm [17], [18] utilizes the concept of adaptive learning and outpost vectors
in modeling the new training samples. The adaptive learning algorithm will maintain
limited number of training samples from the previous training session (decayed prior
samples) to be used in the next training session while the outpost vectors will help
expanding the territory of both classes and maintaining the shape of the boundary
between both classes.
There are three parameters in the algorithm: new sample rate, outpost vector rate,
and decay rate. Firstly, the new sample rate is the ratio of the number of selected new
samples over the number of new samples. It determines the number of selected new
samples to be included in the final training set. The larger new sample rate will cause
the network to learn new knowledge more accurately. The number of selected new
samples is calculated by formula:
nss = nw × ns (1)
nov = ov × ns (2)
392 P. Fuangkhon and T. Tanprasert
ndc = dc × ps (3)
be moved across their boundary. However, the new location must also be outside the
territory of the other class.
4 Experiment
The experiment was conducted on a machine using Intel® Pentium D™ 820 2.4 GHz
with 2.0 GB main memory running Microsoft® Windows™ XP SP3. Feed-forward
backpropagation neural network running under MATLab 2007a is used as the classifier.
The proposed algorithm was tested on the 2-dimension partition problem. The dis-
tribution of samples was created in limited location of 2-dimension donut ring as
shown in Fig. 2. This partition had three parameters: Inner Radius (R1), Middle Ra-
dius (R2) and Outer Radius (R3). The class of samples depended on geometric posi-
tion. There were two classes which were designed as one and zero.
The context of the problem was assumed to shift from an angular location to another
while maintaining some overlapped area between consecutive contexts as shown in
Fig. 3. The set numbers shown in Fig. 3 identify the sequence of training and testing
sessions.
In the experiment, each training and testing set consisted of eight sub sets of sam-
ples generated from eight problem contexts. Each sub set of samples consisted of 400
new samples (200 samples from class 0 and 200 samples from class 1). The samples
from class 0 were placed in the outer radius and inner radius. The samples from class
1 were placed in the middle radius.
In the sample generation process, the radius of the donut ring was set to 100.
Placement of the samples from both classes was restricted by the gap or empty space
introduced between both classes to test the noise tolerance of the network.
There were two training sets having the gap of size 5 (Set A) and 10 (Set B) as
shown in Fig. 4. The gap of size 0 was not used to generate the training set because
there was no available space at the boundary between both classes to generate outpost
vectors.
0 1 0
R1 R2 R3
Θ = 0°
100 100
Class 0 Class 1 Class 0 Class 0 Class 1 Class 0
90 90
80 80
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0 20 40 60 80 100 0 20 40 60 80 100
There were six main testing sets having the gap and noise radius 5:0 (Set M), 5:5
(Set N), 5:10 (Set O), 10:0 (Set P), 10:10 (Set Q), and 10:20 (Set R) as shown in
Fig. 5. To test the noise tolerance of the training sets, four testing sets having the gap
and noise radius 5:5 (Set S), 5:10 (Set T), 10:10 (Set U), and 10:20 (Set V) were gen-
erated by adding noise radius to all samples in the two training sets. The noise sam-
ples introduced into the testing set (Set N, O, Q, R, S, T, U, V) were intended to test
the noise tolerance of the network when outpost vector was applied.
80 80 80
70 70 70
60 60 60
50 50 50
40 40 40
30 30 30
20 20 20
10 10 10
0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
(a) Set M, Gap 5, Noise 0 (b) Set N, Gap 5, Noise 5 (c) Set O, Gap 5, Noise 10
100 100 100
Class 0 Class 1 Class 0 Class 0 Class 1 Class 0 Class 0 Class 1 Class 0
90 90 90
80 80 80
70 70 70
60 60 60
50 50 50
40 40 40
30 30 30
20 20 20
10 10 10
0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
(d) Set P, Gap 10, Noise 0 (e) Set Q, Gap 10, Noise 10 (f) Set R, Gap 10, Noise 0
Fig. 5. Sample Sub Testing Sets in S8
An Adaptive Learning Algorithm for Supervised Neural Network 395
In the training process, the sub training (training on sub set of training set) was
conducted eight times with eight sub training sets to cover eight problem contexts in a
training set. Each final sub training set was composed of three components:
1. Selected new samples taken from sub training set
2. Outpost vectors generated from sub training set
3. Decay prior samples randomly selected from final sub training set from the
previous training session
The numbers of vectors in each part of the final sub training set were determined by
the new sample rate, the outpost vector rate, and the decay rate. Table 1 shows the
number of vectors in a sample final sub training set when new sample set consisted of
200 samples, new sample rate was equal to 1.0, outpost vector rate was equal to 0.5
and decay rate was equal to 1.0. Fig. 6 shows samples of final training set for the last
training session when decay rate is equal to 1.0 and 2.0 respectively.
100 100
Class0 Class1 Class0 Class1
80 80
60 60
40 40
20 20
0 0
-100 -50 0 50 100 -100 -50 0 50 100
-20 -20
-40 -40
-60 -60
-80 -80
-100 -100
(a) 1200 Samples, Gap 5, Decay 1 (b) 1600 Samples, Gap 5, Decay 2
Fig. 6. Sample Last Sub Training Set
Because the prior sample set was constructed at the end of the algorithm, there
was no prior sample set to make decayed prior sample set for the first sub training
session. An additional step for solving this problem is to also use the new sample set
as the prior sample set.
The procedure for the experiment started from the feed forward back-propagation
neural network being trained with the following parameters:
1. network size = [10 1]
2. transfer function for hidden layer = “logsig”
3. transfer function for output layer = “logsig”
4. max epochs = 500
5. goal = 0.001
After the first training session (S1), seven sub training sessions followed. At the end
of the eighth sub training session (S8), the supervised neural network was tested with
the sub testing samples from every context to evaluate the performance of testing data
from each context.
396 P. Fuangkhon and T. Tanprasert
The testing results are shown in Table 2, 3, 4, 5, 6, 7, and 8. For the testing sets
without noise samples (Set M, P), applying outpost vector can lower the mean square
error (MSE) effectively. For the testing sets with noise samples and small gap (Set N,
O), medium outpost vector rate (OV 0.5) give better results. For the testing sets with
noise samples and large gap (Set Q, R), large outpost vector rate (OV 1.0) give better
results. For the testing sets with noise samples generated from the training sets (Set S,
T, U, V), large outpost vector rate (OV 1.0) generally give better results.
The testing results show that the proposed algorithm can classify samples in the
newer contexts (S1, S7, S8) accurately while the accuracy of classifying samples from
the older contexts (S2, S3, S4, S5, S6) is lower because its old knowledge is decaying.
The proposed algorithm presents some level of noise tolerance because the difference
between the mean square errors (MSEs) of the classification of testing sets with and
without noise samples is insignificant.
Table 2. MSEs from Training Set A with Testing Set M, N, O and Decay Rate 1
SET OV NS S1 S2 S3 S4 S5 S6 S7 S8
M 0.0 00 0.13 0.21 0.17 0.16 0.25 0.14 0.01 0.03
N 0.0 05 0.14 0.19 0.12 0.17 0.26 0.14 0.01 0.04
O 0.0 10 0.14 0.21 0.16 0.19 0.25 0.14 0.03 0.04
M 0.5 00 0.02 0.14 0.15 0.02 0.00 0.00 0.00 0.00
N 0.5 05 0.02 0.14 0.15 0.03 0.00 0.00 0.00 0.00
O 0.5 10 0.02 0.15 0.14 0.03 0.01 0.00 0.00 0.00
M 1.0 00 0.20 0.38 0.27 0.14 0.06 0.02 0.01 0.01
N 1.0 05 0.21 0.38 0.27 0.13 0.06 0.03 0.01 0.01
O 1.0 10 0.21 0.37 0.27 0.14 0.07 0.03 0.02 0.01
Table 3. MSEs from Training Set A with Testing Set M, N, O and Decay Rate 2
SET OV NS S1 S2 S3 S4 S5 S6 S7 S8
M 0.0 00 0.27 0.30 0.28 0.28 0.28 0.23 0.21 0.21
N 0.0 05 0.28 0.30 0.28 0.27 0.28 0.25 0.20 0.22
O 0.0 10 0.27 0.30 0.27 0.28 0.27 0.23 0.21 0.21
M 0.5 00 0.12 0.23 0.13 0.02 0.01 0.00 0.00 0.00
N 0.5 05 0.12 0.23 0.13 0.03 0.01 0.00 0.00 0.00
O 0.5 10 0.12 0.24 0.13 0.03 0.02 0.01 0.01 0.01
M 1.0 00 0.18 0.32 0.31 0.20 0.11 0.08 0.07 0.05
N 1.0 05 0.18 0.32 0.31 0.20 0.10 0.09 0.07 0.06
O 1.0 10 0.18 0.32 0.31 0.20 0.11 0.09 0.08 0.06
An Adaptive Learning Algorithm for Supervised Neural Network 397
Table 4. MSEs from Training Set B with Testing Set P, Q, R and Decay Rate 1
SET OV NS S1 S2 S3 S4 S5 S6 S7 S8
P 0.0 00 0.17 0.30 0.26 0.19 0.21 0.16 0.08 0.06
Q 0.0 10 0.18 0.31 0.27 0.20 0.21 0.16 0.08 0.07
R 0.0 20 0.17 0.30 0.27 0.20 0.21 0.17 0.10 0.09
P 0.5 00 0.27 0.45 0.38 0.39 0.35 0.19 0.08 0.04
Q 0.5 10 0.27 0.45 0.38 0.39 0.34 0.20 0.08 0.04
R 0.5 20 0.27 0.45 0.37 0.38 0.33 0.21 0.09 0.07
P 1.0 00 0.22 0.31 0.22 0.26 0.25 0.23 0.12 0.01
Q 1.0 10 0.25 0.34 0.22 0.27 0.25 0.23 0.13 0.01
R 1.0 20 0.24 0.34 0.25 0.29 0.24 0.25 0.14 0.05
Table 5. MSEs from Training Set B with Testing Set P, Q, R and Decay Rate 2
SET OV NS S1 S2 S3 S4 S5 S6 S7 S8
P 0.0 00 0.22 0.32 0.35 0.33 0.28 0.25 0.20 0.17
Q 0.0 10 0.24 0.32 0.35 0.33 0.29 0.25 0.21 0.18
R 0.0 20 0.25 0.32 0.35 0.34 0.29 0.25 0.21 0.20
P 0.5 00 0.16 0.31 0.34 0.29 0.19 0.08 0.03 0.03
Q 0.5 10 0.16 0.31 0.34 0.29 0.19 0.08 0.03 0.04
R 0.5 20 0.16 0.31 0.33 0.29 0.20 0.10 0.05 0.06
P 1.0 00 0.17 0.28 0.15 0.03 0.09 0.10 0.01 0.00
Q 1.0 10 0.17 0.30 0.16 0.04 0.09 0.09 0.01 0.01
R 1.0 20 0.18 0.30 0.17 0.08 0.11 0.11 0.02 0.02
Table 6. MSEs from Training Set A with Testing Set S, T and Decay Rate 1
SET OV NS S1 S2 S3 S4 S5 S6 S7 S8
S 0.0 05 0.13 0.19 0.14 0.15 0.24 0.14 0.01 0.02
T 0.0 10 0.13 0.25 0.18 0.14 0.24 0.14 0.08 0.02
S 0.5 05 0.01 0.14 0.15 0.02 0.00 0.00 0.00 0.00
T 0.5 10 0.02 0.15 0.14 0.02 0.00 0.00 0.02 0.00
S 1.0 05 0.19 0.37 0.27 0.14 0.06 0.03 0.01 0.01
T 1.0 10 0.18 0.39 0.26 0.14 0.07 0.02 0.02 0.01
Table 7. MSEs from Training Set B with Testing Set U, V and Decay Rate 1
SET OV NS S1 S2 S3 S4 S5 S6 S7 S8
U 0.0 10 0.18 0.30 0.27 0.20 0.21 0.16 0.07 0.08
V 0.0 20 0.20 0.30 0.26 0.21 0.21 0.16 0.09 0.07
U 0.5 10 0.29 0.43 0.41 0.39 0.35 0.17 0.07 0.05
V 0.5 20 0.31 0.44 0.37 0.42 0.35 0.17 0.08 0.06
U 1.0 10 0.24 0.32 0.22 0.27 0.24 0.21 0.11 0.01
V 1.0 20 0.28 0.29 0.23 0.27 0.25 0.21 0.14 0.03
Table 8. MSEs from Training Set B with Testing Set U, V and Decay Rate 2
SET OV NS S1 S2 S3 S4 S5 S6 S7 S8
U 0.0 10 0.23 0.37 0.30 0.33 0.29 0.24 0.20 0.19
V 0.0 20 0.28 0.26 0.37 0.29 0.28 0.24 0.21 0.18
U 0.5 10 0.18 0.33 0.34 0.29 0.19 0.07 0.03 0.04
V 0.5 20 0.20 0.30 0.33 0.29 0.19 0.07 0.03 0.05
U 1.0 10 0.18 0.31 0.17 0.03 0.08 0.10 0.01 0.01
V 1.0 20 0.21 0.26 0.15 0.08 0.09 0.10 0.02 0.01
398 P. Fuangkhon and T. Tanprasert
5 Conclusion
A study of noise tolerance characteristics of an adaptive learning algorithm for super-
vised neural network is presented. The noise samples are used to test the noise
tolerance of the algorithm. Overall result shows that combining adaptive learning
algorithm with contour preserving classification yields effective noise tolerance, bet-
ter learning capability, and higher accuracy.
References
[1] Tanprasert, T., Kripruksawan, T.: An approach to control aging rate of neural networks
under adaptation to gradually changing context. In: ICONIP 2002 (2002)
[2] Tanprasert, T., Kaitikunkajorn, S.: Improving synthesis process of decayed prior sam-
pling technique. In: Tech 2005 (2005)
[3] Tanprasert, T., Tanprasert, C., Lursinsap, C.: Contour preserving classification for maxi-
mal reliability. In: IJCNN 1998 (1998)
[4] Burzevski, V., Mohan, C.K.: Hierarchical growing cell structures. In: ICNN 1996 (1996)
[5] Fritzke, B.: Vector quantization with a growing and splitting elastic net. In: ICANN 1993
(1993)
[6] Fritzke, B.: Incremental learning of local linear mappings. In: ICANN 1995 (1995)
[7] Martinez, T.M., Berkovich, S.G., Schulten, K.J.: Neural-gas network for vector quantiza-
tion and it application to time-series prediction. IEEE Transactions on Neural Networks
(1993)
[8] Chalup, S., Hayward, R., Joachi, D.: Rule extraction from artificial neural networks
trained on elementary number classification tasks. In: Proceedings of the 9th Australian
Conference on Neural Networks (1998)
[9] Craven, M.W., Shavlik, J.W.: Using sampling and queries to extract rules from trained
neural networks. In: ICML 1994 (1994)
[10] Setiono, R.: Extracting rules from neural networks by pruning and hidden-unit splitting.
Neural Computation (1997)
[11] Sun, R.: Beyond simple rule extraction: Acquiring planning knowledge from neural net-
works. In: ICONIP 2001 (2001)
[12] Thrun, S., Mitchell, T.M.: Integrating inductive neural network learning and explanation
based learning. In: IJCAI 1993 (1993)
[13] Towell, G.G., Shavlik, J.W.: Knowledge based artificial neural networks. Artificial Intel-
ligence (1994)
[14] Mitchell, T., Thrun, S.B.: Learning analytically and inductively. Mind Matters: A Tribute
to Allen Newell (1996)
[15] Fasconi, P., Gori, M., Maggini, M., Soda, G.: Unified integration of explicit knowledge
and learning by example in recurrent networks. IEEE Transactions on Knowledge and
Data Engineering (1995)
[16] Polikar, R., Udpa, L., Udpa, S.S., Honavar, V.: Learn++: An incremental learning algo-
rithm for supervised neural networks. IEEE Transactions on Systems, Man, and
Cybernetics (2001)
[17] Tanprasert, T., Fuangkhon, P., Tanprasert, C.: An Improved Technique for Retraining
Neural Networks In Adaptive Environment. In: INTECH 2008 (2008)
[18] Fuangkhon, P., Tanprasert, T.: An Incremental Learning Algorithm for Supervised Neu-
ral Network with Contour Preserving Classification. In: ECTI-CON 2009 (2009)
Application Study of Hidden Markov Model and
Maximum Entropy in Text Information Extraction
1 Introduction
The universal application of WWW causes on-line text quantity to increase exponen-
tially, therefore how to deal with these huge amounts of on-line text information
becomes the present important research subject. The automatic text information ex-
traction is an important link of text information processing [1].Text information ex-
traction refers to extract automatically the related or specific type information from
text. At present, the text information extraction model mainly has three kinds: the
dictionary-based extraction model[2], the rule-based extraction model[3] and the
extraction model based on Hidden Markov Model (HMM)[4-8].
The text information extraction using HMM is one kind of information extraction
method based on statistics machine learning. HMM is easy to establish, does not need
∗
This work is supported by the Natural Science Foundation of China (Grant #60775041).
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 399–407, 2009.
© Springer-Verlag Berlin Heidelberg 2009
400 R. Li et al.
the large-scale dictionary collection and rule set, its compatibility is good and the
extraction precision is high, thus HMM obtains researchers’ attention. For example,
In References [4], Kristie Seymore et al. used HMM to extract forehead information
of computer science research paper such as title, author and abstract. In References
[5], Dayne Frietag and Andrew McCallum adopted one kind of “shrinkage” technol-
ogy to improve the probability estimate of the HMM information extraction model. In
References[6], Freitag D and McCallum A used the stochastic optimization technique
is to select automatically the most suitable HMM model structure for information
extraction. Souyma Ray and Mark Craven[7] applied phrase structure analysis tech-
nology in the natural language processing to the HMM text information extraction. T
Scheffer et al.[8] used the active learning technology to reduce the flag data which is
needed when training HMM information extraction model. The HMM method doesn’t
consider the characteristic information of text context and the characteristic informa-
tion contained in text word itself, but these information are very useful to realize the
correct text information extraction. Freitag D et al.[9]proposed a Maximum Entropy
Markov Model(MEMM) for the segmentation of questions and answers in the
FAQs(Frequently Asked Questions) text. MEMM is one kind of exponential model. It
mainly takes the text abstract characteristic as the input and selects the next state in
the base of the Markov state transition, and on this point, it is near to finite state
automaton. Since MEMM incorporates the text context characteristic information and
the characteristic information contained in text word itself into Markov model, it can
improve the performance of information extraction. But it hasn’t concrete text word
count and only considers the abstract characteristic, which causes its performance to
be inferior to that of HMM in some certain circumstances.
In the paper, we combine the advantage of maximum entropy model, which can in-
tegrate and process rules and knowledge efficiently, with that of hidden Markov
model, which has powerful technique foundations to solve sequence representation
and statistical problem, and present a Maximal Entropy-based Hidden Markov
Model(ME-HMM) for text information extraction. The algorithm uses the sum of all
features with weights to adjust the transition parameters in hidden Markov model.
Experimental results show that compared with the simple hidden Markov model, the
new algorithm improves the performance in precision and recall.
An HMM includes two layers: one observation layer and one hidden layer. The ob-
servation layer is the observation sequence for recognition and the hidden layer is a
Markov process, (i.e. a limited state machine), in which each state transition all has
transition probability.
An HMM is specified by a five-tuple(S,V,A,B,∏):
S={S1,S2,…,SN}
V={V1,V2,…VM}
A={aij=p(qt+1=Sj|qt=Si),1≤i,j≤N}
Application Study of Hidden Markov Model and Maximum Entropy 401
Hidden Markov model is mainly used to solve the following three fundamental prob-
lem: evaluation problem, study problem and decoding problem. For the commonly
used algorithm, see also Reference [10-12]. Text information extraction needs to
solve the study problem and the decoding problem of HMM. The purpose of text
information extraction is to extract those specific and compelling information from
large quantity of information, namely to extract some information such as author,
publishing press, publication time, affiliation and so on from the “word string ”array
composed of different text information, which is similar to part-of-speech tagging in
the field of Chinese information processing. When extracting text information using
HMM, Maximum Likelihood(ML) algorithm for marked training sample set or
Baum-Welch algorithm for unmarked training sample set is adopted generally to
obtain HMM parameters. Then Viterbi algorithm is used to find out the state label
sequence of the maximum probability from input text information and the state label
is the content label to be extracted that is we define beforehand. In the paper, the
manually marked training sample is adopted to train model parameters in the text
information extraction model based on HMM. Information extraction is a two-stage
process:
1) Obtain HMM parameters by using the statistical method from the training sam-
ple. ML algorithm is adopted to construct HMM model and obtain model parameters
aij ,bj(Vk),πi by the methods of statistic. ML algorithm demands to have sufficient
marked training array, and each word in each array uses corresponding class label.
Therefore, the transition number of times from state Si to Sj may be calculated and
noted as Cij. The number of times of outputting word Vk can be calculated in the spe-
cific state, noted as Ej(K).Similarly, the number of times of which a sequence starts
from a certain specific state may be calculated, noted as Init(i).Therefore, the prob-
ability can be described as follows:
Init (i)
πi = N
,1 ≤ i ≤ N (1)
∑
j =1
Init ( j )
cij
aij = N ,1 ≤ i, j ≤ N (2)
∑ cik
k =1
E j (k )
bj (k ) = M
,1 ≤ i ≤ N , 1 ≤ j ≤ M (3)
∑
i =1
E j (i )
402 R. Li et al.
)
2 Apply the established HMM for text information extraction.The text informa-
tion extraction process based on HMM is that state sequence Q* generating the
maximum probability symbol sequence should be sought if given HMM and a symbol
sequence and then the observation text for marked target state label is the content of
information extraction. Viterbi algorithm is the classical approach to solve the HMM
decoding problem. To avoid data underflow problem, an improved Viterbi algorithm
is brought forwarded in the paper. The place of improvement is that all the probabili-
ties in Viterbi formulas are multiplied by proportion factor 102, then the logarithm is
taken to both sides of formulas so as to obtain improved Viterbi formulas.
where C is the model set which satisfies the restraint. The remaining problem is to
seek p* in C. p* can be described as the following form:
1
p* ( y | x ) = exp[ ∑ λi f i ( x, y )] (6)
Z( X ) i
Where λi is the model parameter and also can be seen as the characteristic weight.
In 1972, Darroch and Ratcliff proposed a called GIS algorithm(generalized iterative
scaling algorithm), the algorithm is a general iterative algorithm [16]. An advantage of
the ME method is that the experimenter only needs to concentrate on what characteris-
tic should be selected, but does not need to consider how to use these characteristics.
Application Study of Hidden Markov Model and Maximum Entropy 403
In the maximum entropy method, we call a rule as a characteristic. The idea of the
maximum entropy method is to find a characteristic set, and determine the important
degree of each characteristic. The maximum entropy model can integrate each kind of
characteristic and rule into a unified frame, and hidden Markov model has the very
strong superiority in sequence representation problem and the statistical learning
aspect. So, if the maximum entropy method is incorporated into the hidden Markov
model for text information extraction, both can we solve the knowledge expression
problem, and may momentarily add the newly gained language knowledge to the
model, and the combination of the two methods is that of the regular method and the
statistical method union. In accordance with the idea, the paper adopts hidden Markov
model based on maximum entropy principle for text information extraction. For ease
of comparison, three models including HMM, MEMM and ME-HMM proposed in
the paper are shown in Figure 1, respectively.
state
observation characteristic
(a) state
state
observation
characteristic
(c)
(b)
In the above formula, NF is the chosen characteristic number, NS is the state number of
the model. In the training stage, each observation is performed for feature extraction.
And each observation in the training data set implicitly corresponds to a flag state,
therefore we count the relation between the state and the characteristic. By using the
GIS algorithm [16], the probability matrix of the characteristic–state transition can be
obtained. The GIS algorithm is as follows:
404 R. Li et al.
2) Take as the 0th iteration of the GIS algorithm by the beginning of the random
parameter and make M i,( 0j ) = 1 .
3) In the nth iteration, use the current M i,( nj ) value and calculate the the expected
value of each characteristic-state:
1 m
Ei,( nj ) = ∑ ∑ P ( s | ot )f i, j ( ot ,s )
s (n)
(11)
ms k =1 k∈S s' k k
1
Ps'( n )( s | ot ) = exp[ ∑ M i,( nj ) f i, j ( ot ,s )] (12)
k
Z( o,s') i k
Z(o,s’) is the normalization constant which can guarantee that the value is one by
method of summation for all next state s of all s’ state.
4) In the situation of satisfying the constraints limit, combine the expected value
with the average value of the training data to decide how to adjust the parame-
ter. Designate a constant C, the adjustment formula is as follows.
1 Fi, j
M i(,nj+1) = M i(,nj) + log[ (n) ] (13)
C Ei, j
5) If parameter values converge, then terminate the algorithm, else go to Step (3).
In order to reduce the iterative number of times, we use the statistical method to as-
sign the initial value for M i,( 0j ) . When utilizing the Viterbi algorithm, the state at time t
is determined together by the probability of the state at time t -1 and the observation
characteristic at time t.
1
p(st = s j | st −1, ot ) = (λ, αt −1, j + (1 − λ) * ∑ (M i, j * f i, j (ot , st ))) (14)
γ i
Where α t −1, j is the transition probability from the known state at time t-1 to state j.
γ is a normalized parameter.
here λ is the weight of adjusting the relatively important degree of the characteristic-
state transition probability and state transition probability.
Application Study of Hidden Markov Model and Maximum Entropy 405
4 Experiments
For ease of comparison, we conduct the experiment by using the standard data set
which is provided by Carnegie Mellon University for computer scientific research
paper forehead information extraction. In order to reduce the time complexity of the
characteristic selection process in the maximum entropy method, we artificially select
some useful characteristics. The concrete state can be divided into the positive charac-
teristic and negative characteristic according to the contribution of the characteristic
attribute relative to the state.
The positive characteristic of the state indicates that when observation presents this
characteristic, the characteristic tends to shift to this state. The negative characteristic
of the state indicates that when observation presents this characteristic, the probability
of shifting to the state can decrease. In the analysis whether to contain the personal
name characteristic, we use the American personal name dictionary downloaded from
on-line to match.
When extracting information from the computer scientific research paper forehead,
owing to considering the useful information such as the typeset format, newline char-
acter, separating character, the text preprocessing first needs to be conducted by
adopting the HMM text information method based on text blocks. In the training
stage, the initial probability and the transition probability take the block as the Fun-
damental unit, and formula (1) and (2) are used. The output probability takes the word
as the Fundamental unit, and formula (3) is used. In the extraction stage, after the text
to be extracted is partitioned into small blocks, we calculate the output probability of
each block. The text sequence to be extracted is converted to the block sequence and
the block emission probability is the sum of the emission probability of each word in
the block. Suppose the block observation sequence is O = O1 O2…OT, if the length of
tth block is K, namely including K words, noted as Ot1 Ot2…OtK, the probability that
state j outputting the tth block is:
b j (ot ) = ∑ kK=1 b j (otk ) (16)
We only use a few characteristics such as personal name, all digits, month or its abbre-
viation, email symbol “@”(the block without containing the symbol can’t be email.).
In the experiment, by changing the value of parameter λ , the relatively important
degree of the characteristic-state transition probability and state transition probability.
When λ =0.6 and train-set=400, experimental results are shown in Table 1.
When using MEMM for text information extraction without considering the statis-
tical probability of words, the performance in the experimental data set in the paper is
poor and experimental results are not convenient for combination, therefore we don’t
list the related experimental results. As shown in Table 1, the text information extrac-
tion algorithm based on ME-HMM improves in some degree in the precision and
recall rate. When using the person name dictionary to match, Table 1 shows that the
recall rate of state “author” increases approximately 8%, which illustrates that the
joined related knowledge plays a large role.
406 R. Li et al.
Table 1. Precision and recall comparison of information extraction concrete domaion using
HMM and ME-HMM
HMM ME-HMM
domain
Precision(%) Recall(%) Precision(%) Recall(%)
title 0.820347 0.830480 0.838308 0.864868
author 0.823954 0.916533 0.898551 0.955185
affiliation 0.872078 0.914193 0.887361 0.916536
address 0.902686 0.840000 0.925010 0.853007
email 0.888176 1.000000 0.988436 1.000000
note 0.880784 0.715228 0.921323 0.692785
web 1.000000 0.546508 1.000000 0.546508
phone 0.986667 0.888878 0.986301 0.912470
date 0.660505 0.990808 0.697607 0.990808
abstract 0.953434 1.000000 0.910071 1.000000
Intro 0.872307 1.000000 0.872307 1.000000
keyword 0.824168 0.932762 0.824168 0.931662
degree 0.512413 0.838224 0.616536 0.812032
pubnum 0.884603 0.642828 0.954045 0.703434
page 1.000000 1.000000 1.000000 1.000000
References
1. Lawrence, S., Giles, L., Bollacker, K.: Digital libraries andautonomous citation indexing.
Computer 32(6), 67–71 (1999)
2. Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level boot-
strapping. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence,
pp. 811–816. AAAI Press, Orlando (1999)
3. Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelli-
gence 118(12), 15–68 (2000)
Application Study of Hidden Markov Model and Maximum Entropy 407
4. Seymore, K., McCallum, A., Rosenfel, R.: Learning hidden Markov model structure for in-
formation extraction. In: Proceedings of the AAAI 1999 Workshop on Machine Learning
for Information Extraction, pp. 37–42. AAAI Press, Orlando (1999)
5. Frietag, D., McCallum, A.: Information extraction with HMMs and shrinkage. In: Proceed-
ings of the AAAI 1999 Workshop on Machine Learning for Information Extraction,
pp. 31–36. AAAI Press, Orlando (1999)
6. Freitag, D., McCallum, A.: Information extraction with HMM structures learned by sto-
chastic optimization. In: Proceedings of the Eighteenth Conference on Artificial Intelli-
gence, pp. 584–589. AAAI Press, Edmonton (2002)
7. Ray, S., Craven, M.: Representing sentence structure in hidden Markov models for infor-
mation extraction. In: Proceedings of the Eighteenth Conference on Artificial Intelligence,
pp. 584–589. AAAI Press, Edmonton (2002)
8. Scheffer, T., Decomain, C., Wrobel, S.: Active Hidden Markov Models for Information
Extraction. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA
2001. LNCS, vol. 2189, p. 309. Springer, Heidelberg (2001)
9. Freitag, D., McCallum, A., Pereira, F.: Maximum entropy Markov models for information
extraction and segmentation. In: Proceedings of The Seventeenth International Conference
on Machine Learning, pp. 591–598. Morgan Kaufmann, San Francisco (2000)
10. Rabiner, L.E.: A tutorial on hidden Markov models and selected application in speech rec-
ognition. Proceedings of the IEEE 77(2), 257–286 (1989)
11. Kwong, S., Chan, C.W., Man, K.F.: Optimization of HMM topology and its model
parameters by genetic algorithms. Pattern Recognition, 509–522 (2001)
12. Hong, Q.Y., Kwong, S.: A Training Method for Hidden Markov Model with Maximum
Model Distance and Genetic Algorithm. In: IEEE International Conference on Neural
Network & Signal Processing (ICNNSP 2003), Nanjing,P.R.China, pp. 465–468 (2003)
13. Berger, A.L., Stephen, A., Della Pietra, V.J.: A maximum entropy approach to natural lan-
guage processing. Computational Linguistics 22(1), 39–71 (1996)
14. Chieu, H.L., Ng, W.T.: Named entity recognition with a maximum entropy approach. In:
Daelemans, W., Osborne, M. (eds.) Proc. of the CoNLL 2003, pp. 160–163. ACL, Edmon-
ton (2003)
15. Xiao, J.-y., Zhu, D.-h., Zou, L.-m.: Web information extraction based on hybrid condi-
tional model. Journal of Zhenzhou University 40(3), 52–55 (2008)
16. Darroch, J., Ratcli, D.: Generalized iterative scaling for log-linear models. Annals of
Mathematical Statistics 43(5), 1470–1480 (1972)
Automatic Expansion of Chinese Abbreviations
by Web Mining
1 Introduction
Abbreviations take an important part in modern Chinese language. It is so com-
mon that native Chinese speakers will not even notice them in a newspaper or
a book. But for natural language processing, the wide spread of abbreviations
is not ideal. For computers, abbreviations are unrecognizable strings until they
are further processed. Even worse, the formation of Chinese abbreviations is not
quite like English ones[1]. Therefore, we cannot simply port an English analyzer
into Chinese.
A lot of works[2][3][4] treat abbreviations like out-of-vocabulary words, which
means they put an emphasis on how to tag a string as an abbreviation, just like
the recognition of a location name, a person name, etc. But for practical applica-
tions like information retrieval and machine translation[5], we should learn what
an abbreviation stands for: what is their original word form (called full name).
The relationship is crucial for better relevance evaluation of two documents.
This paper is supported in part by Chinese 863 project No. 2009AA01Z334 and
the Shanghai Municipal Education Commission Foundation for Excellent Young
University Teachers.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 408–416, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Automatic Expansion of Chinese Abbreviations by Web Mining 409
1. Truncation;
2. Condensation;
3. Summarization.
The crux of the expansion problem is to find possible candidates of full names. In
scientific writing, a full name, together with the proposed abbreviation, must be
presented before the use of the abbreviation. In this case, if having caught the re-
lated context of both full name and abbreviation, we can decipher the full name.
Unfortunately, it is not the case in daily usage like newspaper articles, since some
abbreviations are far more common than full names. For example, we will hardly
find ”dddddd”(The United States of America) in news except for govern-
ment documents, but the use of ”dd”(America) is very extensive. Full names
and abbreviations co-occur in some contexts after all, though the co-occurrence
is hard to find in articles or static corpus, for it is scarce. Web, on the contrary,
contains thousands of terabytes of information, chances are that we can find the
co-occurrence in some web documents. Moreover, with the wider and wider usage
of the Web, new abbreviations will definitely be recorded in the Web.
Our approach then consists of two parts naturally (Fig. 1). First, we extract
the possible contexts from the Web. Second, we recognize the full name from the
context. Since we have only a limited access to the Web, that is to say, through
a search engine like Google or Baidu1 , which will give us less than 1000 results
per query, in most times even less, we cannot happily expect a full name to
emerge in a single query like ”dd”(America). In contrast, we have to tweak
the queries to have the results containing possible full names. After the snippets
are extracted, we move on to an extraction module in which we use linguistic
heuristics to generate a list of possible candidates. The list is further pruned
using co-occurrence data (local Web data) in the snippets for efficiency’s sake.
Finally, we rank the candidates in the pruned list to find the optimal result by
features acquired from the Web (global Web data) using the k nearest neighbor
method[9].
1
http://www.google.cn, http://www.baidu.com
Automatic Expansion of Chinese Abbreviations by Web Mining 411
Note that if w occurs in some query’s results for over γ times, we add h(w) for
1, rather than count the occurrences of w directly from all returned snippets,
412 H. Liu, Y. Chen, and L. Liu
because for a specific abbreviation there may be some special word that will
appear again and again in the results. For generality’s sake, we will consider the
count of related queries only.
With all the help words H, the queries are constructed as
Putting these queries through a search engine, we will have a list of returned
snippets that may contain full names. We will then extract all the possible can-
didates of full names. The strategy is simple: according to linguistic observations
in section 2.1, we assume that a full name will always include all characters in
the abbreviations.2 Therefore, if locating all the characters from the abbrevia-
tions in the snippets, we can find the first candidate by simply extracting the
shortest string containing them. For example, for ”dd” we will first find ”d
ddddd”. For ”dd” we will first extract ”dddd”. Starting from the
first candidate, we scan in both directions to construct more candidates until we
meet a ”boundary word”. A boundary word is either a word in a given stop word
list, or a word having appeared next to the abbreviation in question. Namely,
the set of candidate C is a set of word sequences defined as following:
After the candidates are extracted, we have to decide which one is the full name.
Before selection, we will first prune the list to exclude less likely candidates, since
selection approach, which is based on Web features, is quite time intensive.
In a straight-forward manner we can just count the occurrences of the candi-
date strings in the snippets and delete ones with the lowest count. This approach
is somewhat clumsy in the sense that c1 which is a substring of c2 will always
has higher counts than c2 , because each occurrence of c2 contains a c1 .
Though the occurrence count is not applicable directly as the sole evidence,
we can use it as a basis for further judgment. We can model the scenario as a
comparison problem. If two strings do not have a subsumption relation between
them, the above count is the measurement. Otherwise, we will subtract the co-
occurrence count of c2 , the super-string from that of the substring, c1 .
For any s1 and s2 we can draw the relation between them according to the
following:
2
The assumption is not always true as we show in section 4.
Automatic Expansion of Chinese Abbreviations by Web Mining 413
⎧
⎨ O(s1 ) > 2O(s2 ) s1 ⊆ s2
Proposition 1. ∀s1 , s2 ∈ S, s1 s2 iff O(s1 ) > O(s 2)
s2 ⊆ s1
⎩ 2
O(s1 ) > O(s2 ) Otherwise.
With defined on any string pairs of S, we know that is a total order.
Therefore, we can have some top candidates for further selection.
The selection of an optimal full name from the pruned list is seen as a ranking
problem. The ranking values are obtained using kNN. The training examples
are transformed into a real value vector − →
x of features. The ranking value of a
candidate c for an abbreviation a is the average Euclid distance from k nearest
neighbors in the examples. Candidates with the lowest value (the nearest) is
taken as the final result.
Two kinds of real value features are used here, the structural features and
the Web-based features. The structural features represent how the abbreviation
is constructed, i.e. the distribution of abbreviation characters (characters in the
abbreviation word) in full names. The Web-based features show the occurrences
of the candidates in Web context. Table 1 shows the features and some formulae
for computing them. In the table Len is a function for a string’s length, Ch is
the set of abbreviation characters, I(t) is a function that maps an abbreviation
character t to its index in the full name, and E is the set of characters that are
expanded into 2-character words in the full name.
from the PFR People’s Daily Corpus3 , which contains articles from People’s
Daily in 1998. We take 2-character words tagged as ”j” for abbreviations. We
do not include single character words because they are often covered in an ab-
breviation dictionary. Another half of the data (Data Set 1) are hand selected
from the Web by an editor who is specialized in neither computer science nor
linguistics. The part of information represents the usage of abbreviations in our
daily life.
The training examples contain 50 abbreviations from the PFR corpus, to-
gether with their full names. In each query throughout the expansion we retrieve
100 snippets from Baidu. Since Web-based expansion is time intensive, we only
use 5 help words from learning. We evaluate the accuracy of expansion of Data
set 1, Data set 2 and the two sets in general. The overall results are shown in
Table 2. Apart from our final system, we present several other baseline systems,
B DICT, B NOKNN and B NOHELPWORD. B DICT relies on an abbreviation
dictionary for expanding. The dictionary is generated from the Modern Chi-
nese Standardized Dictionary[10]. The other baseline systems B NOKNN and
B NOHELPWORD only utilize parts of the final system. B NOKNN uses only
local Web count (occurrence counts from snippets of candidate generation) for
selecting the optimal result without the kNN ranker, while B NOHELPWORD
do not use help words but only the abbreviation itself as a query to generate
snippets.
Our approach does not perform so well on data set 2 as on data set 1. It
is partly due to the fact that the PRF corpus contains texts ten years ago,
so the abbreviations are somewhat too ”old-fashioned” to appear on the Web.
Moreover, some abbreviations tagged in the corpus are used as common words
now. From the Table 2 we can also see that B DICT has the lowest performance,
since the dictionary does not cover many abbreviations in use. Both B NOKNN
and B NOHELPWORD have similar performance, about 0.3 lower than our final
system. This inferiority shows the indispensability of the two main components:
help words and the kNN based selector.
We also look into the contribution of each feature used in the kNN-based
selector. We evaluate the contribution in a ”negative” way, i.e. excluding one
feature each time to see the impact on the evaluation result. The results are
shown in Table 3. We can see that all the features contribute to the final result,
3
Available at http://icl.pku.edu.cn/icl groups/corpus/dwldform1.asp
Automatic Expansion of Chinese Abbreviations by Web Mining 415
with the contribution ranging from 8% to more than 30%. For Web generated
features, one interesting thing is that features about the title of snippets (title-
count and titledensity) have less contribution than others. This may be due to
the fact that for some long full names, people tend to use their abbreviations
in titles. Another interesting finding is that all the features except for one con-
tribute more to data set 2 than data set 1. This is because it is more difficult to
generate full names for abbreviation from Data set 2 from the Web than those
in data set 1. Therefore, the candidate list of full names is more error prone,
which leads to the larger contribution of the kNN-based selector together with
involving features.
One surprise for us is that our approach can handle abbreviations for very
complex and long slogans, for example, ”dddd”’s full name is ”ddddd
dddddddddddddddddddddddddddddddddd
ddddd”. This shows the merit of help words, for if we search ”dddd”
only, we will not find the complete slogan in the snippet.
The errors of expansion can be sorted in several categories. First, some ab-
breviations coincide with some common expressions. For example, ”dd” is
the abbreviation of ”ddddd” (Shanghai Library), while it also means ”the
above graph”. Therefore, we are not able to get the desired full name through
Web search due to the overflow of the more common usage. Second, for some
long full names the later parts of them do not appear in the abbreviations, thus
often neglected in both snippets and our processing. For example, ”dd” is ex-
panded into ”dddddd” instead of ”dddddddd” (Chinese Football
League A). Third, some abbreviations, especially those are not named entities,
are always used as a single word. Consequently, it is hard to find its original
form, even for a man. For example, ”ddd” which is labeled as an abbrevi-
ation in PRF corpus, is the abbreviation of ”ddddd” (the young and the
middle aged). However, ”ddd” is so widely used that we can hardly find the
expression of ”ddddd”.
416 H. Liu, Y. Chen, and L. Liu
5 Conclusion
In this paper we suggest a novel method to expand Chinese abbreviations into
their full names. Unlike previous works, we use World Wide Web as the main
source of information. The approach consists of two parts: first, we conjure
queries using learned help words to extract related Web contents containing both
the abbreviation and full name through a search engine. Second, we extract can-
didates of full names from the contents and select the best result. The extraction
is based on linguistic heuristics. The final selection consists of two parts: a pruner
based on occurrence information within obtained snippets and a kNN-based se-
lector utilizing a set of Web-based and structural features. Experiment shows
that our approach enjoys satisfactory results. We also show the indispensability
of help words and kNN-based ranker through experimental results. Moreover,
the contributions of different features are discussed and compared.
For further work, we are to compare the role of search engines in our task.
We are curious that whether different search engine will have different impact
on our task, or other Web mining tasks. Another interesting topic is to apply
content information for refined expansion. In this paper we have not considered
polysemy, which also exists in abbreviations. Disambiguation is a tough topic to
deal with in the future.
References
1. Chang, J., Teng, W.: Mining atomic Chinese abbreviation pairs: A probabilis-
tic model for single character word recovery. Language Resources and Evalua-
tion 40(3/4), 367–374 (2007)
2. Chen, K., Bai, M.: Unknown word detection for Chinese by a corpus-based learning
method. Computational Linguistics 3(1), 27–44 (1998)
3. Sun, J., Gao, J., Zhang, L., Zhou, M., Huang, C.: Chinese named entity identifi-
cation using class-based language model. In: COLING 2002, pp. 24–25 (2002)
4. Sun, X., Wang, H.: Chinese abbreviation identification using abbreviation-template
features and context information. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F.,
Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 245–255. Springer,
Heidelberg (2006)
5. Li, Z., Yarowsky, D.: Unsupervised Translation Induction for Chinese Abbrevia-
tions using Monolingual Corpora. In: Proceedings of ACL, pp. 425–433 (2008)
6. Chang, J., Lai, Y.: A preliminary study on probabilistic models for Chinese abbre-
viations. In: Proceedings of the Third SIGHAN Workshop on Chinese Language
Learning, pp. 9–16 (2004)
7. Fu, G., Luke, K., Zhang, M., Zhou, G.: A hybrid approach to Chinese abbrevia-
tion expansion. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.)
ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 277–287. Springer, Heidelberg (2006)
8. Huang, L.: More on the construction of modern Chinese abbreviations. Journal of
Suihua University (004) (2008)
9. Mitchel, T.: Machine Learning 48(1) (1997)
10. Li, X.: Modern Chinese Standardized Dictionary. Foreign Language Teaching and
Researching Press, Language and Literature Press, Beijing (2004)
Adaptive Maximum Marginal Relevance Based
Multi-email Summarization
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001,
China
{bxwang,liubq,cjsun,wangxl,lib}@insun.hit.edu.cn
1 Introduction
With the popularization of email service, multi-email summarization is highly de-
manded. As an information fusion technique based on email contents, multi-email
summarization can supply users with direct and compact summaries. Since users are
used to talking about a specific subject via emails [1], a set of emails of the same
subject can be clustered together in the email client, and users from client side usu-
ally would like to obtain a panoramic view of these emails for further analysis and
decision.
Multi-email summarization is a kind of operation on email contents, which have
some characteristics that web page contents do not possess. We take the following
two facts as examples: firstly, email contents are usually very short, so there is much
less information in email contents for language understanding. Secondly, when writ-
ing emails, people tend to use a casual tone, which is more closed to their oral habits,
so parsing the contents with a set of rules becomes very tough. The two aspects above
introduce the main difficulty to multi-email summarization. Besides, some other tech-
niques should be considered in order to build a multi-email summary system, such as
Base64 and Quoted Printable decoding, time and title extracting, and so on.
In this paper, we present an adaptive maximum marginal relevance (AMMR) based
approach to generate multi-email summaries, which takes the extracting summary way
and is able to automatically adjust the parameters according to the content cohesion of
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 417–424, 2009.
© Springer-Verlag Berlin Heidelberg 2009
418 B. Wang et al.
the email collections. Experimental results show that our method can improve the
average quality of multi-email summaries and meet the need of system application.
The rest of this paper is organized as follows: Section 2 discusses the related work.
Section 3 presents the proposed adaptive MMR model and its application in multi-
email summarization. We evaluate our work in section 4. Section 5 concludes this
paper and discusses the future work.
2 Related Work
The researches on multi-email summarization have just started now, so most people
take the extracting approach. Wan et al. [2] extract question-answer pairs to form
email summaries, and a similar method is also discussed in [3], in which Shrestha
et al. generate email summaries by joining extracted question-answer sentences. Since
QA pairs can reflect the semantic clue of a given email thread, such simple strategies
improve the performance of summarization systems to a certain extent, however the
disadvantage of this kind of approach is obvious, that is, email summaries based on
this method are lack of consistency. Moreover, not all the QA sentences are related to
the topic of the email thread.
Rambow et al. [4] introduce machine learning techniques to multi-email summari-
zation. They first extract features for every sentence, and then decide whether a given
sentence should be chosen as a part of the summary. In fact, this strategy has been
applied in traditional multi document summarization. Ntt [5] employs SVM to extract
summary sentences, and their basic idea is to find the hyperplane to separate summary
sentences from the sentence set. A summary achieved by classify methods has a
higher precision, however its redundancy is also higher, because the sentences ex-
tracted by classifier are similar to each other.
Comparing with multi-email summarization, summarization on multi documents is
developing better, and many approaches are used to solve the problem. On summari-
zation based on the extraction strategy, besides the method in [4], O’Leary et al. [6]
compute the probability of the sentence becoming a member of the summary by using
HMM. Kraaij et al. [7] extract summary sentences based on the term frequency. By
introducing the deep mining of document relation, people have explored information
fusion based summarization. Radev [8] presents a cross-document structure, so as to
fuse information of multi documents on every semantic level. Xu [9] proposes a
multi-document rhetorical structure (MRS), and has done many researches on sum-
mary sentence extracting, ranking, redundancy reducing, and summary generating.
Although multi document summarization is different from multi-email summarization
on the object to summarize, there is still some experience for the latter to learn.
Our approach employs the improved extracting strategy to generate multi-email
summaries. By adopting the maximum marginal relevance (MMR) model, the
sentences extracted to form a summary are more relative with the subject, and the
redundancy can be reduced in our summary system. However MMR has an inherent
disadvantage that it is not able to adjust its parameters automatically for different
email sets, which makes the average quality of email summaries decline. In this pa-
per, we analyze the inner relationship between MMR and the content cohesion of
emails of the same subject, and propose an AMMR model, which improves the aver-
age quality of email summaries.
Adaptive Maximum Marginal Relevance Based Multi-email Summarization 419
with different contributions. Assuming that there is a way to model the subject cor-
rectly and the reasonable features of sentences can be extracted to convert them to
vectors, we can regard content cohesion as the variance of the sentence vectors.
However, the content cohesion is difficult to compute since the variance is hard to
compute. Thus, we present an approximate method and the idea is as follows. The
content cohesion is closed to sentence selection during summary generation. Suppos-
ing that the content cohesion of a given email selection is large, we can draw the con-
clusion that the percentage of sentences related to their subject is large, so we only
have to select one or two most relative sentences as representative, and extract proper
sentences which are not very relative as the supplement of the summary content. Oth-
erwise we should choose more sentences that are related to the subject to generate a
summary, and reduce the number of irrelative sentences. In this paper, we introduce
content cohesion into MMR to build an adaptive model, which is defined as follows:
(2)
s i ∈R \ S ⎢⎣ s j ∈S ⎥⎦
Where N q stands for the number of sentences containing the words in the query, and
N denotes the number of all the sentences in the email collection. We estimate the
content cohesion by N q N , for the more query words a sentence containing, the
more relative it is to the subject. And the estimated content cohesion is adopted to
replace λ , in order to adjust the parameter of the model according to different email
collections automatically.
The proposed AMMR based multi-email summarization algorithm is depicted in
Figure 1.
Sentence similarity computing plays an important role in both AMMR and the evalua-
tion of summary results. VSM is widely employed to compute similarity in NLP,
however it does not perform very well in our experiments. According to our investi-
gation, this problem is caused by the individuation and informality of people while
writing emails, which directly lead to the existence of word variants. As we know, the
fundamental idea of VSM is the words’ cooccurrence, but in email contents there are
many words with the same meaning but totally different morphology.
In this paper, we adopt the method based on HowNet to compute semantic similar-
ity of sentences, which is built on the computation of word semantic similarity [11].
Our strategy is to convert the sentence similarity to the weighted mean of word simi-
larities. A greedy algorithm is employed to match the most similar words in two sen-
tences to collect word pairs, and take inverse document frequency as the weight for
computing the weighted mean of the words. Experimental results have shown that the
average quality of summary has increased by 4% to 6% after introducing this strategy,
which will be given in detail in section 4.
4 Experiments
4.1 Evaluation
In this paper we adopt direct evaluation [12] to evaluate the performance of an email
summarization system. We first choose several typical user queries as the subjects,
and then acquire the top k emails respectively from the email retrieval system to form
email collections. For every group of emails, 5 summaries are generated manually by
different people as benchmark, with which we can evaluate the machine summaries.
We compute the precision and the redundancy for each email summary and get the
quality of the summary according to them. Precision can be computed as follows:
Where SP stands for the summary generated by human, and SA stands for the sum-
mary generated automatically. Sim is a similarity computing function.
We consider redundancy as the semantic similarity of the sentences within the
summary, so it can be obtained by computing the average similarity of the sentence
pairs within the machine summary, which is defined by:
max_ sim ( si , s j ) .
1
redundancy = ∑
n i , j =1,...n ,i ≠ j
(4)
We choose 6 groups of real emails as test corpus, and each group contains 15 to 20
members. Since the emails are obtained from our email retrieval system, the subject
of every group can be expressed by the user’s query. The experimental results are
shown in figure 2, where the horizontal axis represents the value of linear interpola-
tion factor λ , and vertical axis stands for the quality of summaries defined in (5).
`
Adaptive Maximum Marginal Relevance Based Multi-email Summarization 423
From the results we can see the general trend is that the quality can reach its
maximum value while λ ranging from 0 to 1. But for different subjects, λ is not at a
fixed value or interval when the maximum value appears, which is reasonable accord-
ing to our discussion on the content cohesion of emails, so the summarization system
based on traditional MMR model is unable to achieve a high average quality.
In figure 2, the position of λ marked by the dashed line represents the value of
linear interpolation introduced by AMMR, and the corresponding value on vertical
axis is the quality AMMR can reach. We can see that our method is able to approach
the maximum value of the summary quality on all the 6 groups, especially for the last
group, the maximum quality appears when λ is equal to about 0.4, which signifi-
cantly deviates from the rest 5 groups, and the AMMR model performs well despite
of this situation.
In Figure 2, the red dotted line represents the results of the model taking HowNet
based sentence similarity computing method, and the black squared line stands for
that using VSM, we can see the quality has increased by 4% to 6%, which means that
the summary sentences selected tend to be more reasonable after introducing this
semantic based strategy.
5 Conclusion
Aiming at increasing the average quality of multi-email summaries, we propose a
novel multi-email summarization technique based on adaptive MMR model. We ana-
lyze the relationship between content cohesion and the parameters of MMR and find
out that the content cohesion is one of the main factors that lead to the decline of
summary quality. This motivates us to present an adaptive summarization model to
improve the quality of email summaries. The experimental results show that our
model can effectively improve the average quality of email summaries. Our work can
be applied to build an automatic summary module on the email client.
In the future, we will focus on the following problems: 1) to model email content
cohesion in an exact way; 2) to adopt more linguistic features to make the computa-
tion of sentence relevance reasonable; 3) to develop algorithms for improving the
readability of summary generated.
Acknowledgments. This investigation was supported by the project of the High
Technology Research and Development Program of China (grants No.
2006AA01Z197 and 2007AA01Z172), the project of the National Natural Science
Foundation of China (grant No. 60673037) and the project of the Natural Science
Foundation of Heilongjiang Province (grant No. E200635).
References
1. Fisher, D., Moody, P.: Studies of automated collection of email records. University of
Irvine ISR Technical Report UCI-ISR-02-4 (2002)
2. Wan, S., McKeown, K.: Generating overview summaries of ongoing email thread discus-
sions. In: Proceedings of COLING 2004, the 20th International Conference on Computa-
tional Linguistics, pp. 745–751 (2004)
424 B. Wang et al.
3. Shrestha, L., McKeown, K.: Detection of question-answer pairs in email conversations. In:
Proceedings of COLING 2004, pp. 889–895 (2004)
4. Rambow, O., Shrestha, L., Chen, J., et al.: Summarizing email threads. In: Proceedings of
HLT/NAACL, Boston, USA (2004)
5. Hirao, T., Sasaki, Y., Isozaki, H., et al.: NTT’s Text Summarization System for DUC
2002. In: Proceedings of the workshop on automatic summarization, Philadelphia, Penn-
sylvania, USA, pp. 104–107 (2002)
6. Zajic, D.M., O’Leary, D.P.: Sentence Trimming and Selection: Mixing and Matching. In:
Proceedings of the 2006 Document Understanding, New York (2006)
7. Kraaij, W., Spitters, M., van der Heijden, M.: Combining a Mixture Language Model and
Naive Bayes for Multi-document Summarization. In: Proceedings of the DUC 2001 work-
shop (SIGIR 2001), New Orleans (2001)
8. Radev, D.R.: A Common Theory of Information Fusion from Multiple Text Sources Step
One: Cross-Document Structure. In: Proceedings of the 1st ACL SIGDIAL Workshop on
Discourse and Dialogue, Hong Kong, pp. 74–83 (2000)
9. Xu, Y.: Research of Multi Document Automatic Summarization. Dissertation of the Doc-
toral Degree in Engineering, Harbin Institute of Technology (2007)
10. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering
documents and producing summaries. In: Proceedings of the 21st annual international
ACM SIGIR conference on research and development in information retrieval, Melbourne,
Australia, pp. 335–336 (1998)
11. Li, S.J., et al.: Semantic computation in a Chinese question-answering system. Journal of
Computer Science & Technology 17(6), 933–939 (2002)
12. Jones, K.S., et al.: Automatic Summarizing Factors and Directions. In: Advance in Auto-
matic Text Summarization. MIT Press, Cambridge (1998)
Semantic Relation Extraction by Automatically
Constructed Rules
Rongfeng Huang
1 Introduction
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 425–434, 2009.
c Springer-Verlag Berlin Heidelberg 2009
426 R. Huang
2 Identification Method
We use various types of information called features and we uniform their repre-
sentation to make them work well with each others. And then we construct rules
with a list of features.
2.1 Features
There are five types of features, including words, part of speeches, syntactic
information, positions, and extracted semantic relations. And we use a ”Feature-
Value” structure to uniform their representations.
words with part of speeches to only part of speeches. Therefore, it’s necessary to
decide which words should be kept in their forms while others are better to be
kept as part of speeches only. In this paper, all decisions about these are made
depending on the distribution of words in corpus.
Secondly, result of syntactic analysis is another kind of important useful in-
formation. But unfortunately, a reliable syntactic parser like the broad-coverage
parser used in MindNet is inaccessible at most of the time, especially when we
deal with Chinese. So, we should decide which syntactic information provided by
a not very reliable parser is useful for us. All relevant decisions about these are
also made depending on the distribution of result of syntactic analysis in corpus.
In this paper, a dependency parser is used. Based on the syntactic results, a
further analysis is applied. For example, ”Is X a subject of a sentence? (X is a
definition word)”.
Thirdly, the positions of definition words are also taken into account. This
is an assistant strategy in fact. One reason is the unreliability of the syntactic
parser. And our suspicions will be raised when the number of syntactic relation-
ships of the path linking from a definition word to a candidate word is getting
bigger. Another reason is that a certain word may play different syntactic roles
in different sentences and its roles are related to its positions more or less. For
example, a subject tends to be in a position of the first several ones of a sentence
while an object prefers to appear in the tail of a sentence.
Finally, as a special resource, we use extracted semantic relations to aid in the
identification of other semantic relations. In this paper, about half of extracted
relations are type of Hypernym and the precision of it is 95% with a margin
error of +/-5% with 99.5% confidence. So a double-step pass strategy is applied.
Hypernym relations are identified first and then they are encoded as a type of
feature to be used in a second pass. Making use of extracted semantic relations
to improve precisions of other semantic relations’ identification is also mentioned
in [9].
Uniform Representation. In this paper, any feature is attached with a value
derived from a definition to form a ”Feature-Value” structure. And then, any
candidate word can be attached with a list of such structures. In fact, if we take
”Feature-Value” structure as a type of ”Feature”, value range of which is true,
false, then regardless of the types and value ranges of features in a ”Feature-
Value” structures, we can treat them as the same. We refer ”Feature-Value”
structures as features in the rest of this paper.
Firstly, select the best feature from features of candidate words by means of
a kind of evaluation way which is described later. And then candidate words
will be split into two sets, words in one of which own this feature and words in
another don’t.
Secondly, if the precision of the former is higher than a predefined value, then
a new rule is constructed. Otherwise, select more features and split candidate
words into more sets like the first step until the precision is acceptable, or give
up if the number of selected features exceeds a predefined value.
Thirdly, after a new rule is constructed, any candidate words satisfying this
new rule will be removed. And then repeat the procedure until no more new
rules can have a precision higher than the predefined value.
Feature Evaluation. There has been a lot of effective ways to choose features.
However, different evaluation ways have different trends on precision and recall.
And we notice that very few candidate words have relevant semantic relation
to a head word. For example, a candidate word has a probability of only 1.5%
, estimated from a random small set, to hold Part Of relation with the head
word. And this fact causes severer different trends of different evaluation ways.
In experiments of this paper, Information Gain and X 2 Statistic usually give
good recalls while perform poorly in precisions. In contrast, Odds Ratio shows
an excellent precision but gives an unsatisfactory recall. Therefore, it’s necessary
to combine them to get a good precision as well as an acceptable recall. In order
to make it, we choose Information Gain (see ( 1),( 2)) as the evaluation way for
selection of the first feature of a rule.
Entropy(C) = −p+ log2 p+ − p− log2 p− (1)
Where C is a set of candidate words, p+ is the proportion of words holding target
relation with head words and p− is the proportion of the rest words.
|Cv |
Inf oGain(C, F ) = Entropy(C) − Entropy(Cv ) (2)
|C|
v∈{true,false}
Where F is a feature, Ctrue is the set of words with feature F in C and Cfalse
is the set of the rest words in C .
And we choose odds ratio (see ( 3)) to evaluate the rest features of a rule.
P (F |C+ )(1 − P (F |C− ))
OddsRatio(C, F ) = log (3)
P (F |C− )(1 − P (F |C+ ))
Where C+ is the set of words holding target relation with head words, C− is the
set of the rest words in C, P (F |C+ ) is the proportion of words holding Feature
F in C+ and P (F |C− ) is similar to P (F |C+ ).
3 Semantic Relation
We apply our approach to identify ten types of semantic relations ( see Table 1.
in Sect. 4.2). And we construct the identified relations as a net to make them
more useful.
Semantic Relation Extraction by Automatically Constructed Rules 429
Three examples, covering from abstract concept to concrete concept and from
substance to creature, are showed in Fig. 1, Fig. 2 and Fig. 3.
Head word:
¯Ô¤ká5 §é¯Ô5 G¹ÚuÐåû
(essence)
½^ y «O
Definition: , ,
( ” ” ) (The inherent nature of a thing or a class of things,
determines the character, state and development of that thing or that class of
things. (opposite to phenomena))
Head word: Ð
j§/G&|§fõãÚ§¶ õkxÚ½ãÚ
(culver)
:"
Definition:
(A bird, looking like homing pigeon, usually has taupe feathers and a neck
with white or filemot spots.)
v(titanium)
Head word:
7á§ÎÒTi"ÕÚ§M §òÐ5r§L:p§
F@¡"^5EA«Ü7g"(A kind of metal element, signed with ”Ti”,
Definition:
is silver gray, light, hard, ductile and corrosion-resistant. It can be use to make
special alloy steel.)
as a net. Every word with its part of speech has and only has one node in the
net. In other words, every word has not only semantic relations to some words
in its definition, but can have additional semantic relations to other words once
it is used to explain them. That is because when we look up a word, we may
not only consult the definition of this word, but also the definitions of any word
which mentions it [9]. Chodorow et al. [10] exploit such insight in developing a
tool for helping human users disambiguate hyper/hyponym links among pairs of
lexical items. MindNet implements it with full inversion of structures [5].
4 Evaluation
4.1 Experimental Settings
Preprocessing. Before extracting semantic relations, it’s necessary to process
raw text first, including word segment, part of speech tagging and dependency
parsing. All these are done by LTP v2.02 . Precision of dependency parsing is
78.23% without labeling type of dependency and 73.91% with labeling.
4.2 Results
After a double-step pass, 43000 relations are identified. For each type of semantic
relations, a random sample of 400 relations is hand-checked and the precisions of
all types are listed in Table 1. Using common statistical techniques, we estimate
that these rates are representatives of each type with a margin error of +/-5%
with 99.5% confidence. From this result, not only a large number of relations
have been extracted but also high precisions of each type have been achieved.
However, such a ”hand-check” evaluation way may not be objective enough as
some of the identified relations are somewhat fuzzy to discriminate. Also, the
recall is hard to estimate. Further, it’s necessary to see the relations and the
net they form are really useful. Therefore, we suggest a more objective way of
evaluation.
2
LTP is a language platform based on XML presentation, http://ir.hit.edu.cn/
demo/ltp/Sharing Plan.htm
Semantic Relation Extraction by Automatically Constructed Rules 431
A thesaurus3 is used to generate similar word pairs and dissimilar word pairs.
Similar strategy is taken by Stephen [11]. And here, we generate six similarity
levels of word pairs.
Words in the thesaurus are classified in five levels, including Big Class, Middle
Class, Tiny Class, Word Group and Atom Group. Every item in this thesaurus
consists of a class code and one or more words. For example:
The class code in the head of the first item in Fig. 4 means that these three
words are classified in A [Big Class] a [Middle Class] 01[Tiny Class] A [Word
Group] 08 [Atom Group]. And ”=” means that these three words are the same.
If they are not completely the same, then they are denoted by ”#”. If only one
word contained in an item, then a tag ”@” is used. The procedure of generating
word pairs is showed below:
Firstly, randomly select word pairs from the thesaurus.
Secondly, look up the already selected word pairs to ensure the newly selected
pair doesn’t exist in them.
Thirdly, look up the net to find out if they exist and connect to each other
within 18 arcs.
Fourthly, look up the thesaurus to see if they belong to the same atom group,
then they are labeled with ”Atom Group”, if they belong to the same word group
but not the same atom group, then they are labeled with ”Word Group”, or they
are labeled with Tiny Class, Middle Class or Big Class by analogy. And finally
if they are from different Big Classes then they are labeled with ”General”.
Fifthly, repeat the above procedure until 20000 word pairs for each level are
available. And then they are split into two sets equally. One set is used for
training and another is used for testing.
3
HIT-IR Tongyici Cilin (Extended), http://ir.hit.edu.cn/demo/ltp/Sharing Plan.htm
432 R. Huang
Stephen [11] also uses path patterns to determine similarity. However, the
method adopted here is different. In this paper, if the precision and frequency of
a pattern are both high, then it will be used to identify similar word pairs later.
5.3 Result
In experiments of this paper, there are three different definitions of similarity.
In the first experiment, only word pairs labeled with ”Atom Group” are consid-
ered as similar pairs. In the second experiment, word pairs labeled with ”Word
Group” are also considered. In the third experiment, word pairs labeled with
”Tiny Class” are also considered. And the result is showed in Table 2.
The result shows high precision and low recall. The possible reasons may be:
Firstly, the high precision is ensured by the high precision of extraction.
Secondly, the low recall of determining word similarity reflects the low recall
of the semantic relations extraction. And there is no doubt that the insufficient
semantic relations cause two similar words can’t find a stable and short path to
reach each other although they may be still connected by a long path. And the
longer the path is, the more instable and unreliable the path will be. Also it will
become sparser when the length of path grows up.
Thirdly, the way of determining similarity may still need to be improved.
Finally, the knowledge contained in the MCSD may not consist very well with
the one contained in the thesaurus.
6 Discussion
Although a large number of semantic relations have been identified, the recall
still may be low, reflected by the result of determining word similarity. And
lots of other knowledge in definitions is still unexploited. Take the definition of
culver (see Fig. 2) for example. In Fig. 6, only top level semantic relations linking
from head words to words in definitions are identified, and lower level semantic
relations are still unexploited.
7 Conclusion
Acknowledgement
References
1. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to
wordnet: An on-line lexical database. Int. J. Lexicography 3(4), 235–244 (1990)
2. Zhendong, D., Qiang, D.: Hownet, http://www.keenage.com
3. Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. J.
Mach. Learn. Res. 3, 1083–1106 (2003)
434 R. Huang
4. Aone, C., Ramos-Santacruz, M.: Rees: a large-scale relation and event extraction
system. In: Proceedings of the sixth conference on Applied natural language pro-
cessing, Morristown, NJ, USA, pp. 76–83. Association for Computational Linguis-
tics (2000)
5. Richardson, S.D., Dolan, W.B., Vanderwende, L.: Mindnet: acquiring and struc-
turing semantic information from text. In: Proceedings of the 17th international
conference on Computational linguistics, Morristown, NJ, USA, pp. 1098–1102.
Association for Computational Linguistics (1998)
6. Barriere, C.: From a children’s first dictionary to a lexical knowledge base of con-
ceptual graphs. PhD thesis, Burnaby, BC, Canada, Adviser-Popowich, Fred. (1997)
7. Markowitz, J., Ahlswede, T., Evens, M.: Semantically significant patterns in dic-
tionary definitions. In: Proceedings of the 24th annual meeting on Association for
Computational Linguistics, Morristown, NJ, USA, pp. 112–119. Association for
Computational Linguistics (1986)
8. Vanderwende, L.H.: The analysis of noun sequences using semantic information
extracted from on-line dictionaries. PhD thesis, Washington, DC, USA, Mentor-
Loritz, Donald (1996)
9. Dolan, W., Vanderwende, L., Richardson, S.D.: Automatically deriving structured
knowledge bases from on-line dictionaries. In: Proceedings of the First Conference
of the Pacific Association for Computational Linguistics, pp. 5–14 (1993)
10. Chodorow, M.S., Byrd, R.J., Heidorn, G.E.: Extracting semantic hierarchies from a
large on-line dictionary. In: Proceedings of the 23rd annual meeting on Association
for Computational Linguistics, Morristown, NJ, USA, pp. 299–304. Association for
Computational Linguistics (1985)
11. Richardson, S.D.: Determining similarity and inferring relations in a lexical knowl-
edge base. PhD thesis, New York, USA (1997)
Object Recognition Based on Efficient Sub-window
Search
1 Introduction
Object class recognition is a key issue in computer vision. In the last few years,
Object class recognition has received a lot of attentions. It has been approached in
many ways in literatures. These approaches can typically be divided into two types.
One type is training a part-based shape model, the model describes objects by simpler,
robust and stable features connected in a deformable configuration. These features are
usually generic, which can be edge points, lines or even visual words, and features
must be grouped to describe a given object. Some typical part-based shape models
include constellation model [1], implicit shape model (ISM) [2], k-fans model [3],
pair-wise relationships model [4] and star model [5,6]. When detect an object, it first
detect individual object parts, which be group together to reason about the position of
the entire objects. This is accomplished via generalized Hough transform, where the
detections of individual object parts cast probabilistic votes for possible locations of
the whole object. These votes are summed up into a Hough image, the peaks of it
being considered as detection hypotheses.
A shortcoming of part-based shape model is the need for clean training shapes.
Most of this type approaches need pre-segmentation training images or label the ob-
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 435–443, 2009.
© Springer-Verlag Berlin Heidelberg 2009
436 Q. Nie, S. Zhan, and W. Li
ject boundary box. This quickly becomes intractable when hundreds of categories are
considered. Some weakly supervised training methods exist, but with high computa-
tion cost.
Another type for object localization is commonly performed using sliding window
classifiers [7]. This approach trains a quality function (classifier) and then scans over
the image and predicts that the object is present in sub-windows with high score.
Siding window object localization has been shown to be very effective in many situa-
tions, but suffers a main disadvantage: it is computationally inefficient to scan over
the entire image and test every possible object location. To overcome this shortcom-
ing, researchers typically use heuristics to speed up the search, which introduces the
risk of missing target object. Lampert C.H. et al.[8] proposes an Efficient Sub-
window Search (ESS) to solve this problem. It relies on a branch-and-bound scheme
to find the global optimum of the quality function over all possible sub-images, thus
returning the same object locations that an exhaustive sliding window approach
would. At the same time it requires less classifier evaluations and typically runs in
linear time or faster.
Motivated by Lampert C.H.’ work, we proposed a multi-object recognition
method. This method integrates bag of features model with efficient sub-window
search technology. Bag of features model is a very powerful and effective classifica-
tion model. It allows weakly supervised training method. And ESS is applied in our
recognition frame for fast localization. In order to improve the recognition perform-
ance, we also designed a new local feature descriptor suitable for our recognition task.
The rest of the paper is organized as follows. We first introduce a modified spatial
PACT for our local feature descriptor in Section 2. Then in Section 3 we explain how
to use an efficient sub-window search method to perform localization. In Section 4 we
present our experiments results on PASCAL 2007 dataset. And come to the conclu-
sions in section 5.
tures in the image. SPACT demonstrates high performance on commonly used datasets.
Besides, it need not pre-segment image and evaluates extremely fast.
Fig. 1. Examples of CT images. (a) Original images (b) CT images(c) Modified CT images.
Our local feature descriptor is motivated by Wu J.’s work. But we have modified it
to fit our situation. And trim it to fit well in our bag of feature model. According to
Wu J.’s analysis, the powerful representation capacity of CT histogram is mostly
depend on its successful encoding shapes information. But from some examples of
CT images shown in fig.1(b), we find that Census Transform is sensitive to neighbor
pixels variation. The CT image is unnecessary noisy even in consistent region. In
order to remove these noises and keep shape information more clearly, we modified
the CT by introducing a threshold CTVALUE. Only when the difference between two
neighbor pixels great than CTVALUE, the CT value is 1, otherwise the CT value is 0.
The modified CT images are shown in fig.1(c). They are obviously clearer than stan-
dard CT images and keep the shape structures better.
In order to remove pixels correlation effects and get a more compact representa-
tion, Principal Component Analysis (PCA) operation is used to CT histograms data.
①
The key steps for PCA operation are: Random select some training images, con-
②
vert them to gray images, and calculate the modified CT values. Use regular grids
,
sample patches. The sampling interval set to 32 the size of image patches set to
③
32X32. Calculate CT histograms of these patches and normalize the CT histo-
grams. When calculate CT histogram, remove two bins with CT = 0, 255. Perform ④
PCA operation on these CT histograms. The PCA results show that the first 40 ei-
genvectors are enough to cover 92% information. So we get the first 40 eigenvectors
as main components.
To add spatial information, an image patch is split into 2x2 blocks, add the
center block, the normalized PACT histograms of these blocks are concatenated to
form a feature vector which has 40x5= 200 dimensions. Together with the gray
438 Q. Nie, S. Zhan, and W. Li
histogram of image patch, which result in a feature vector with an overall 200+256=
456 dimension.
The above contents are how to represent local image patches. In order to use bag of
feature model, the second step need create a codebook. There are various methods for
creating visual codebooks. K-means clustering, mean-shift and hierarchical k-means
clustering are the most common methods. Yet Extremely Random clustering forests
(ERC forests)[12,13] have recently attracted a lot of attention. ERC Forests are very
efficient at runtime, since matching a sample against a tree is logarithmic in the
number of leaves. And they can be trained on large, very high-dimensional datasets
without significant over fitting and within a reasonable amount of time (hours). In
practice, the average-case time complexity for building ERC forest is O( DN logk) ,
where D is the feature dimension, N is the number of patches and k is the number of
leaf nodes. In contrast, k-means has a complexity of O(DNk) . In our recognition
system, ERC forest is applied to create visual codebook.
After creating a codebook, we can represent an image as histogram of visual
words. And the training images’ histograms of visual words are fed to SVM to train a
classifier model. LIBSVM[14] is selected for training classifier model. Linear kernels
with C=2−5 is used to ensure real-time classification. The trained classifier model can
be used for later localization.
take on any of the rectangles in the set are calculated. And this highest score is the
upper bound of the rectangle set.
Before giving the algorithm of Efficient Sub-window Search, we first discuss how
to construct a bound function fˆ . Denoting rectangles by R and sets of rectangles
by ℜ , the bound function fˆ has to fulfill the following two conditions:
1) fˆ ( ℜ ) ≥ max f (R )
R ∈ℜ
where 〈 ., 〉 denotes the scalar product. hi is the histogram of the training examples i,
and α i and β are the weight vectors and bias that were learned during SVM training.
Because of the linearity of the scalar product, we can rewrite this expression as a sum
over per-point contribution with weights w j = ∑ α i h ij
i
f (I ) = β + ∑
n
w cj (2)
j =1
Here cj is the cluster index belonging to the feature point xj and n is the total number
⊂
of feature points in I. This form allows us to evaluate f over sub-images R I by sum-
ming over the feature points that lie within R. Since we are only interested in the
⊂
argmax of f over all R I, we can ignore the bias term β . Set f = f + + f − , where f +
contains only the positive summands of Equation 2 and f − only the negative ones. If
we denote by Rmax the largest rectangle and by Rmin the smallest rectangle contained
in a parameter region ℜ , then formula (3)
+ −
fˆ ( ℜ ) = f ( R max ) + f ( R min ) (3)
has the desired properties 1) and 2). Using integral images we can evaluate f + and
f − at constant time complexity, thus making the evaluation of fˆ a constant time
operation.
Suppose we use a priority queue P to hold the search states, the pseudo-code which
use ESS to locate an object class instance is like follows.
① Initialize P as empty priority queue
set [T,B,L,R] = [0, n] × [0, n] × [0,m] × [0,m] (m, n is the size of image I)
440 Q. Nie, S. Zhan, and W. Li
② repeat
split [T,B,L,R]→[T1,B1,L1,R1] ∪[T2,B2,L2,R2]
push([T1,B1,L1,R1], fˆ ([T1,B1,L1,R1] ) into P
push([T2,B2,L2,R2], fˆ ([T2,B2,L2,R2] ) into P
retrieve top state [T,B,L,R] from P
until [T,B,L,R] consists of only one rectangle
③ set (t max , bmax , lmax , rmax ) = [T,B,L,R]
This algorithm always examines the most promising(with highest upper bound) rec-
tangle set. The candidate set is split along its largest coordinate interval into halves,
thus forming two smaller disjoint candidate sets. The search is stopped if the most
promising set contains only a single rectangle with the guarantee that this is the
rectangle of globally maximal score.
4 Experiments
We experiments our methods on PASCAL VOC 2007 dataset [15]. The images in this
dataset contain objects at a variety of scales and in varying context. Multiple objects
from multiple classes may be present in the same image. The complexity of these
images is very high. The data has been split into 50% for training/validation and 50%
for testing. We choose ten object classes for our recognition task. We use approximate
1000 training images train classification models, and use approximate 2500 test
images test detection performance.
The experiment has three major steps. The first step is codebook creation. ERC
forest is applied to create visual codebook. Regular grids sampling method is used to
,
extract features from training images. The sampling interval is set to 32 the size of
image patches is set to 32. After extract local features in all training images, describe
them using modified sPCAT descriptors as discussed in section 2. These feature vec-
tors are used to train an ERC Forest. We use 4 assembling trees and they contain 4000
leaf nodes in total. The leave nodes are numbered and each leaf node is assigned a
distinct visual word index.
The second step is training a classifier. Regular grids sampling method also be se-
lected to extract local features. But in this phase, more patches need be sampled than
,
cookbook creation phase. The sampling interval is set to 8 the size of image patches
set to different scales. After getting the feature descriptors, query the corresponding
visual words. During a query, for each descriptor tested, each tree is traversed from
the root down to a leaf and the returned leaf index is the visual word index. After
getting all visual words in an image, calculate the 4000-D histogram of visual words.
These histograms are fed to SVM to train classifier model. After get SVM decision
function, calculate the weight of each word w j = ∑ α i h ij , rewrite decision function
i
f and f − . Then we can localize object instance using the algorithm discussed in
+
section 3. To find multiple object locations in an image, the best-first search can be
performed repeatedly. Whenever an object is found, the corresponding region is re-
moved from the image and the search is restarted until a threshold is not satisfied.
In our experiments, we adapt average precision (AP) scores to compare the per-
formance of detection methods. Detections are considered true or false positives
based on the area of overlap with ground truth bounding boxes. To be considered a
correct detection, the area of overlap a0 between the predicted bounding box BB p and
ground truth bounding box BB g must exceed 50% by the formula:
area ( BB ∩ BB g )
a0 =
p
area ( BB ∪ BB g )
(4)
p
Fig. 2. shows the AP scores on the test data for 10 of the categories..
For illustration, we also compare our AP scores with the maximum results and
median results in the PASCAL VOC 2007 challenge. From the table we can see that
although our results are not prior than best results, our results are better than median
results of PASCAL VOC 2007 in most of 10 categories. We believe that our method
is general for object categories with different features.
Average Precision
sofa
sheep sheep
person
aeroplane aeroplane
tvmonitor AP
train train
horse
motorbike motorbike
bicycle
car car
0 10 20 30 40 50
AP(%)
5 Conclusions
We have proposed a new method for object localization in natural images. It uses a
weakly supervised training method, yet can get state-of-the-art localization result. It
achieves this in several ways. First, we designed a modified sPACT patch descriptor
which is suitable for representing local features. Then, ERC forest is applied to
speedup both training and testing process. And we further improve the localization
performance by employing an Efficient Sub-window Search technical. In future work,
we will explore more powerful local feature descriptors, and adapt our method to
different SVM kernels.
Acknowledgments. The authors also would like to thank the researchers who kindly
published their datasets and software packages online.
References
1. Fergus, R., Perona, P., Zisserman, A.: A visual category filter for google images. In:
Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 242–256. Springer,
Heidelberg (2004)
2. Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation
with an implicit shape model. In: 8th European Conference on Computer Vision,
pp. 17–32. Springer, Heidelberg (2004)
3. Crandall, D., Felzenszwalb, P., Huttenlocher, D.: Spatial priors for part-based recognition
using statistical models. In: IEEE Computer Vision and Pattern Recognition 2005,
pp. 10–17. IEEE Press, San Diego (2005)
4. Leordeanu, M., Heber, M., Sukthankar, R.: Beyond Local Appearance: Category Recogni-
tion from Pairwise Interactions of Simple Features. In: IEEE Computer Vision and Pattern
Recognition 2007, pp. 1–8. IEEE Press, Minnesota (2007)
5. Shotton, J., Blake, A., Cipolla, R.: Contour-Based Learning for Object Detection. In: 10th
International Conference on Computer Vision, pp. 503–510. IEEE Press, Beijing (2005)
6. Opelt, A., Pinz, A., Zisserman, A.: A Boundary-Fragment-Model for Object Detection. In:
Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 575–588.
Springer, Heidelberg (2006)
7. Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: IEEE
Computer Vision and Pattern Recognition 2005, pp. 886–893. IEEE Press, San Diego
(2005)
8. Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond Sliding Windows: Object Localiza-
tion by Efficient Subwindow Search. In: IEEE Computer Vision and Pattern Recognition
2008, pp. 1–8. IEEE Press, Anchorage (2008)
9. Deng, H.L., Zhang, W., Mortensen, E.: Principal Curvature-Based Region Detector for
Object Recognition. In: IEEE Computer Vision and Pattern Recognition 2007, pp. 1–8.
IEEE Press, Minnesota (2007)
10. Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of Adjacent Contour Segments for
Object Detection. IEEE Trans. Pattern Anal. Machine Intell. 30, 36–51 (2008)
11. Wu, J., James, M.R.: Where am I: Place instance and category recognition using spatial
PACT. In: IEEE Computer Vision and Pattern Recognition 2008, pp. 1–8. IEEE Press,
Anchorage (2008)
Object Recognition Based on Efficient Sub-window Search 443
12. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning Jour-
nal 63, 3–42 (2006)
13. Moosmann, F., Triggs, B., Jurie, F.: Fast Discriminative Visual Codebooks using Random-
ized Clustering Forests. In: Advances in Neural Information Processing Systems, vol. 19,
pp. 985–992 (2006)
14. LIBSVM: a library for support vector machines,
http://www.csie.ntu.edu.tw/cjlin/libsvm
15. PASCAL 2007 VOC dataset, The PASCAL Visual Object Classes Challenge (2007),
http://www.pascal-network.org/challenges/VOC/voc2007/
A Multi-Scale Algorithm for Graffito Advertisement
Detection from Images of Real Estate
Abstract. There is a significant need to detect and extract the graffito adver-
tisement embedded in the housing images automatically. However, it is a hard
job to separate the advertisement region well since housing images generally
have complex background. In this paper, a detecting algorithm which uses
multi-scale Gabor filters to identify graffito regions is proposed. Firstly, multi-
scale Gabor filters with different directions are applied to housing images, then
the approach uses these frequency data to find likely graffito regions using the
relationship of different channels, it exploits the ability of different filters tech-
nique to solve the detection problem with low computational efforts. Lastly, the
method is tested on several real estate images which are embedded graffito ad-
vertisement to verify its robustness and efficiency. The experiments demon-
strate graffito regions can be detected quite well.
1 Introduction
With the development of network and multimedia technology, people are used to
retrieval housing images from Internet to get real estate information instead of text
description. However, a large number of housing images are embedded graffito adver-
tisement by real estate agents, for instance, an agent often write a contact telephone
number or embed a logo in the images, it is a tedious job for people to select the pol-
luted images one by one from a large scale web image database, therefore, there is
great demand for detecting and extracting the graffito advertisement from the images
automatically by computer.
Since the background of housing images are very complex, it is difficult to detect
and extract these polluted information well. Fortunately, from consistency and ro-
bustness of observations from these images, we find that images include graffito ad-
vertisement are more relevant than other images and the graffito advertisement is
composed of some connected region that we can make good use of these traits to
detect and extract the polluted information.
Multi-scale frequency filter is a popular method in image analysis. This technique was inspired
by a biological system [1] which has been studied in the past decades. The results of physiological
studies have supported that visual processing in the biological visual system at the cortical level
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 444–452, 2009.
© Springer-Verlag Berlin Heidelberg 2009
A Multi-Scale Algorithm for Graffito Advertisement Detection 445
involves a set of different scale filtering mechanisms [2][3]. Some frequency mathematic models
have been studied, such as Differences of Gaussian[4] , Gabor functions[5] and Wavelet func-
tions[6].In the frequency domain, it is generally observed that a clutter field in an image has higher
frequency content compared to other scatter regions. Therefore, filtering with frequency analysis
is an attractive approach for the task of image region analysis.
In this paper, firstly terms used in the proposed multi-scale frequency analysis are
defined and invariant operators are introduced to represent a housing image. Since
multi-scale method has been studied for its localized frequency filters alike human’s
visual cortical neurons [3] and used for representation of an image’s opponent fea-
tures [7], a multi-scale algorithm using frequency domain filters to detect polluted
regions in a housing image is then proposed. In this approach, because frequency
domain features of an image are divided into different subgroups by analyzing re-
gions’ distribution, it is more suitable and robust than other feature detection models
and it can be used for detecting words, numbers and company’s logo which embedded
by a real estate agent without any explicit knowledge. The experiments results indi-
cate that images have graffito advertisement with complex background can be
detected quite well.
The remainder of this paper is organized as follows: necessary definitions and
operators for images which have graffito advertisement are presented in section 2,
then a multi-scale algorithm is proposed and some important issues are discussed in
section 3. Section 4 provides the experimental comparison results between the pro-
posed algorithm with others algorithm and the concluding remarks are given in
Section 5.
Here I(x, y) denotes a pixel point position, L is gray level and Thr presents a specific
threshold between 0 and L.
Definition 2. The region which excludes advertisement is presented as B, so the pos-
sible advertisement region in an image is defined as
∑S
i =1.. n
i =I −B (2)
When σ F < σ o , Fm has isotropy we can get σ Fθ at the same θvalue with different
m
frequency.
A Multi-Scale Algorithm for Graffito Advertisement Detection 447
⎧ ∆
⎪1 1/ M | σ Fm|θ − σ Fm|θ |< σ m|θ
P=⎨ (7)
⎪⎩0 otherwise
∆
Where σ m|θ is an assigned threshold value.
The other property of the polluted region is that the region is continuous not only
in the image but also in its corresponding frequency domain. We can verify the region
by comparing it with the x-y axes after getting the candidate regions by multi-scale
transformation in frequency domain.
For implement, Gabor filter is chosen for transforming an image to a frequency do-
main. The transformation of a 2-D image is modulated by a 2-D Gabor function. A
Gabor filter is consisted of two functions having phase by 90 degree, conveniently
located in the real and imaginary parts in complex field. Gabor is a function modu-
θ
lated by a complex sinusoid which can be rotated in the x-y plan by , it can be sym-
bolized as x' and y' .
1 x' y'
)
G(x,y,θ ,f ) = exp(− [( )2 + ( )2 ])cos(2*pi *f *x'
2 sx ' sy '
(10)
sx and sy are variances along x and y-axes respectively, f is the frequency of the sinu-
soidal function, θg ives the orientation of Gabor filter. By changing the radial fre-
quency and the orientation, the filter makes different target features. The result of
spatial domain by Gabor functions is shown in Fig. 2.
After filtering the input image with filters under various frequency and orientation
of a channel, these paralleling channels can be analyzed. Since the polluted regions
have some common properties under different channels, we can extract these special
regions by analyzing the relationship of the channels. The proposed paradigm is
shown in Fig.3.
The analysis can be performed in each filter channel using different scale and theta,
so it is possible to mark polluted regions in an output image.
448 J. Yang and S.-j. Zhu
Filter(f1 ,T1 )
Filter(f1 ,T 2 )
Filter(f1 , T N )
Image Image
Filter(fM ,T1 ) Region
Procession Combining
Filter(f M ,T 2 )
Various Filters
Filter(f M ,T N )
⎧1 ( Pm∩θ = 1) ∧ (Λ = 1)
Region = ⎨ (12)
⎩0 otherwise
By formula (12), we can get a 0-1 array which can describe the region is polluted
region or not well.
In addition, many other features can be identified using transformed. In our imple-
mentation, we used the orientation and frequency dimensions. In order to prove the
effectiveness of the method, in the next section, we detect graffito regions from
housing images downloaded from Internet.
450 J. Yang and S.-j. Zhu
4 Experimental Results
In order to verify the correctness and effectiveness of our proposed solution, a prototype
system is developed by MATLAB. The scheme is tested on the real housing images
included advertisement from Web page. The theta equals 0o,30o,45o,90o,120o,135o respec-
tively, the frequency is selected by f = (2 w - 0.5) / W , w=[0...log(2W/8) ] , where the
test W of image is scaled the range of 0 to 255. The comparison results between the
proposed method and Summing Squared Response of filters (SSR) (where threshold is
0.5 of range of total summing squared value) are shown in Figure.4. The method of
summing squared response is to find a threshold of filters to separate the original image
[8]. From Fig.4, we can find that the different level of multi-scale has its own character-
istic. When performed by the algorithm, image (d) is obtained which contains graffito
features. From the comparison shown in Fig.5, we can find that multi-scale method can
gives more graffito regions than the summing squared responses, which can gives more
supports for the next step of classification. The result indicates that the detecting of
regions from an image having embedded advertisement based on multi-scale algorithm
is more suitable for housing images and it is robust for these types of images with more
complex background. In the point of image procession, the method also has the ability
of eliminating light shadows. Therefore, the result of filtered image cannot give more
details of buildings which often have more lines in an image. Additionally, the computa-
tion complexity of this method can performed using fast FFT technique which is a
critical factor for some real-time applications.
(a) Original image (b) Result of proposed method (c) Result of SSR
5 Conclusions
In this paper, we present a method for detecting graffito polluted regions from an
image based on multi-scale algorithm. The main contribution of this paper is that the
proposed algorithm can find the relationship of channels in different frequency, there-
fore, more effective method can be proposed based on its characteristics. On the other
hand, the method using fast FFT transform can make algorithm more efficient which
can solve problems in the real application of housing images embedded advertise-
ment. Experimental results indicate our method is superior in terms of accuracy, ro-
bustness and stability in detection of advertisement in housing images. More studies
in the aspect of basic theory and its application should be studied in the near future.
Acknowledgement
This work is supported by Shanghai Education Commission Research Project under
Grant 09YZ344, and by Shanghai Special Research Fund of Young College Teacher
under Grant sdl08026. The authors are grateful for the anonymous reviewers who
made constructive comments.
References
1. De Valois, R.J., Albracht, D.G., Thorell, L.G.: Spatial-frequency selectivity of cells in ma-
caque visual cortex. Vision Research 22, 545–559 (1982)
2. Wilson, H.R.: Psychological evidence for spatial channels. In: Braddick, O.J., Sleigh, A.C.
(eds.) Physical and Biological Processing of Images. Springer, Berlin (1983)
3. Watt, R.J., Morgan, M.J.: Spatial lters and the localisation of luminance changes in human
vision. Vision Res. 24(24), 1387–1397 (1984)
4. Petrosino, A., Ceccarelli, M.: A scale-space approach to preattentive texture discrimination.
In: International Conference on Image Analysis and Processing, 1999. Proceedings,
pp. 162–167 (1999)
5. Kamarainen, J.-K., Kyrki, V., Kalviainen, H.: Invariance properties of Gabor filter-based
features-overview and applications. IEEE Trans. on Image Processing 15(5), 1088–1099
(2006)
6. Jain, A., Healey, G.: A multiscale rep resentation including opponent color features for tex-
ture recognition. IEEE Transactions on Image Processing 7(1), 124–128 (1998)
452 J. Yang and S.-j. Zhu
7. Pollen, D.A., Ronner, S.E.: Visual cortical neurons as localized spatial frequency filters.
IEEE Transactions on System, Man and Cybernetics 13(15), 907–916 (1983)
8. Jung, K.: Neural network-based text location in color images. Pattern Recognition
Letter 22(14), 1503–1515 (2001)
An Improved Path-Based Transductive Support Vector
Machines Algorithm for Blind Steganalysis Classification
1 Introduction
Techniques for semi-supervised learning (SSL) are becoming increasingly sophisti-
cated and widespread. Semi-supervised learning is halfway between supervised and
unsupervised learning, using a relatively small labeled data set and a large unlabeled
data set to obtain the classification. With the rising popular application of support
vector machines (SVM), one of the semi-supervised method, transductive SVM
[1](TSVM), becomes more popular nowadays. TSVM, maximizes the margin using
both labeled data set and unlabeled data set, which is not used in standard SVM.
However, the drawback of TSVM is that the objective function is non-convex and
thus difficult to minimize. Joachims designs a heuristically optimization algorithm for
TSVM called SVMlight [2], which has a wide range of applications in diverse do-
mains. Chapelle and Zien in [3] represent a path-based gradient descent method on
the primal formulation of TSVM objective function, called Low Density Separation
algorithm (LDS algorithm for short). A novel large margin methodology SSVM and
SPSI is proposed by Wang and Shen [4].
Information hiding techniques have recently received a lot of attention. With digi-
tal images as carriers, the goal of blind steganalysis is to classify testing images by
any steganographic algorithm into two categories: cover images and stego images.
Nowadays, most blind steganalysis methods use supervised learning technologies,
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 453–462, 2009.
© Springer-Verlag Berlin Heidelberg 2009
454 X. Zhang and S. Zhong
which require a large amount of labeled data for training. In fact, labeled samples are
often difficult, expensive, or time consuming to obtain, while unlabeled data may be
relatively easy to collect: we can download a large amount of unlabeled images from
the internet. It is of great interest both in theory and in practice because semi-
supervised learning requires less human effort and gives higher accuracy [5]. Re-
searches for semi-supervised blind steganalysis classification are still on the initial
stage. Our research group once proposed semi-supervised blind steganalysis classifi-
cation [6] based on SVMlight, but it has high complexity. Thus we consider another
semi-supervised algorithm, path-based TSVM algorithm (LDS algorithm [3]) for
blind steganalysis. As far as we know, this algorithm is not yet applied to blind stega-
nalysis classification.
In this paper, we build up on the work on LDS algorithm for blind steganalysis
classification. But its classification accuracy is not high enough, partly because of the
special distribution of steganalysis feature data set, and the lack of utilizing labeled
data set information to build the graph. Thus, an improved LDS algorithm for blind
steganalysis classification is proposed. It uses Mahalanobis distance instead of
Euclidean distance to measure the distance between samples, and modifies the edges
between labeled samples when modeling initial weight-graph.
The rest of this paper is organized as follows. Section 2 reviews LDS algorithm
and shows its drawbacks. In Section 3, we propose an improved LDS algorithm to
overcome those drawbacks. Experimental results on blind steganalysis classification
are presented in Section 4. Final conclusions are in Section 5.
Path-Based TSVM algorithm is a way to enforce the decision boundary lying in low
density regions, and thus not crossing high density clusters, that is why it is equally
called Low Density Separation (LDS) algorithm. LDS algorithm first builds a weight-
graph derived from the data, using path-based dissimilarity measure to compose con-
nectivity kernel matrix; and then takes manifold mapping method for dimensionality
reduction; finally, minimizes TSVM objective function by gradient descent. The first
part is the most significant process, and we will show some details.
2
⎛ p −1
⎞
ρ
ρ p∈Pij
(
d ( i, j ) = 2 ln ⎜⎜1 + min ∑ exp ( ρ e ( pk , pk +1 ) ) − 1
1
) ⎟⎟ . (1)
⎝ k =1 ⎠
dij denotes the minimum of all path-specific maximum weight. If two nodes are in the
same cluster, their pairwise dissimilarity will be small; by contrast, if two nodes are in
different clusters, at least one large edge of all paths from node xi to node x j crosses
low density regions, which will cause large dissimilarity value. Then we can get con-
nectivity kernel K as [3],
⎛ d ρ ( i, j ) ⎞
K ( i, j ) = exp ⎜ − ⎟ . (2)
⎝ 2 σ 2
⎠
Chapelle and Zien [3] represent a path-based gradient descent on the primal formula-
tion of TSVM objective function, which can be rewritten as follows,
l l +u
w + C ∑ L ( yi ( w ⋅ xi + b ) ) + C * ∑ L* ( w ⋅ xi + b )
1 2
min
2 i =1 i = l +1
⎧
⎪ L(t ) = max ( 0,1 − t ) . (3)
⎪
⎪
s.t. ⎨ L* (t ) = exp ( −3t 2 )
⎪ l +u
⎪1 1 l
∑
⎪⎩ u i =l +1
w ⋅ xi + b = ∑ yi
l i =1
The last constraint is to enforce that all unlabeled data are not put in the same class.
Unlike the traditional SVM learning algorithms, Chapelle and Zien solve the problem
by gradient descent on equation (4), and this primal method has obvious lower time
complexity than SVMlight algorithm. We can see the contrast experimental results in
Section 4. The implemental details of original LDS algorithm can be found in [3].
Although LDS algorithm has the advantages of low time complexity, easy implemen-
tation, its classification accuracy is not high enough to some extent. As we mentioned
456 X. Zhang and S. Zhong
(a) (b)
Fig. 1. An example of showing the drawback of LDS algorithm: (a) a synthetic data set is con-
structed by Delta data set and its mirror points; (b) the classification result done by LDS
algorithm
in Section 2.1, the graph is built as all nodes are equal, no matter labeled or unlabeled.
Here is a synthetic data set constructed by Delta data set and its mirror points, with
four labeled samples, see fig.1(a). We can see that there is a wide gap between Delta
data set and its mirror points, thus, if we don’t use labeled sample information to build
the graph, the classification will be done like fig.1(b).
Fig.1 shows that although two labeled samples are in the same cluster, they some-
times produce a large dissimilarity calculated by Euclidean distance. Chang and Ye-
ung in [8] shows a way of robust path-based semi-supervised clustering, thus we can
take it in LDS algorithm. In the synthetic Delta data set, the same cluster samples can
be linked together by shrinking the distance of purple red dotted lines in fig.1(a), and
the different cluster samples distance (green dotted line in fig.1(a)).can be enlarged to
distinguish obviously.
Furthermore, LDS algorithm uses the most common Euclidean distance to build
original graph, but the results obtained from Euclidean distance often comes with a
deviation from the actual distance, because of the various weights of different fea-
tures and the mutual influences among the features [9]. On the contrary, Mahalano-
bis distance can overcome these shortcomings. Mahalanobis distance utilizes the
covariance matrix to eliminate the mutual influences among the features, thus can
capture the distance more correctly than Euclicean distance. We can use the Within-
and-Between-Class Distribution Graph (B-W Graph) [10] to compare the difference
between Euclidean distance and Mahalanobis distance for steganalysis features. In
B-W Graph, the horizontal axis denotes the between-class distribution B, which
provides the main information for classification; and the vertical axis denotes the
with-in distribution W. B≥0 means correct classified samples, B<0 means error
classified samples.
Fig.2(a) shows the B-W Graph of steganalysis feature set (extracted from 300
original samples and 300 stego samples with Shi’s method [11]) by Euclidean dis-
tance. In this figure, a lot of stego and original samples are set on the region of B<0,
which affects the accuracy of classifier. But if Mahalanobis distance is used instead of
Euclidean distance, the result will be better, see Fig.2(b).
An Improved Path-Based Transductive Support Vector Machines Algorithm 457
(a) (b)
Fig. 2. The steganalysis feature set B-W Graph: (a) B-W Graph is drawn using Euclidean dis-
tance; (b) B-W Graph is drawn using Mahalanobis distance
em ( i, j ) = (x − x ) S −1 ( xi − x j ) .
T
(4)
i j
Where S denotes the sample covariance matrix for all samples x ∈ X l U X u . Then
we add the information of labeled samples by shrinking the edges of two samples in
the same clusters to the minimum em , and expanding the edges of two samples in
different clusters to the maximum em ,
Then we use novel em′ to calculate the linear case connectivity kernel ( σ = ∞ ),
( ( ) )
2
1 ⎛ p −1
⎞
d m ( i, j ) = 2 ln ⎜⎜1 + min ∑ exp ρ em′ ( pk , pk +1 ) − 1 ⎟⎟ .
ρ
ρ ⎝
p∈Pi, j
k =1 ⎠
(6)
K m ( i, j ) = d m ( i, j )
ρ
458 X. Zhang and S. Zhong
The analysis of time complexity for our proposed algorithm is shown as follows: It
starts with the l × m matrix X l and u × m matrix X u , and then builds a fully con-
nected graph with standard Mahalanobis distance. The computation of calculating
( )
covariance matrix is O ( l + u )2 m . For each two samples, the Mahalanobis distance
( )
costs O m2 + ( l + u )2 m . Next, we modify the edge lengths that costs O ( l 2 ) . And
Dijkstra’s algorithm with a priority queue based on a binary heap allows equation (6)
( )
to have O ( l + u )2 ⎣⎡( l + u ) + log ( l + u ) ⎦⎤ time complexity [3]. That is, step 1 costs
(
O m2 + (l + u ) m + (l + u ) .
2 3
)
( )
The time complexity of MDS is approximately equal to O ( l + u ) in step 2. The
3
kernel matrix K m′ calculated by MDS sharply reduces the time complexity of gradient
( )
descent algorithm, which is approximately equal to O ( l + u ) . Thus, the time com-
3
( )
plexity of whole algorithm is O m 2 + ( l + u ) m + ( l + u ) . It nearly equals to the
2 3
4 Experimental Result
The original image database comes from all the 1096 sample images contained in the
CorelDraw Version 10.0 software CD#3. The stego image database comes from those
460 X. Zhang and S. Zhong
1096 images embedded by five typical data embedding methods [11]: the non-blind
spread spectrum (SS) method (α=0.1) by Cox et al., the blind SS method by Piva et
al., the 8×8 block based SS method by Huang and Shi, a generic quantization index
modulation (QIM, 0.1 bpp (bit per pixel)), and a generic LSB (0.3 bpp). The hidden
data are a random number sequence obeying Gaussian distribution with zero mean
and unit variance.
We randomly choose 300 original images as positive samples and 300 stego im-
ages as negative samples (totally 600) for testing in our experiment. They also com-
pose the unlabeled data set X u . And we randomly choose some other samples from
original image database and stego image database respectively for the labeled data set
X l . In experiments, we set the default values of parameters as recommended in [3],
while setting soft margin parameter C=50 by cross-validation.
In this experiment, we randomly choose 10 to 100 labeled samples and combine with
600 samples unlabeled data set for training. The experiment is done for 20 times and
then calculates the mean accuracy and variance. The results of semi-supervised learn-
ing algorithm (LDS algorithm and our proposed algorithm) and the result of super-
vised learning algorithm (SVM algorithm) are shown below.
(a) (b)
Fig. 5. Comparative Experiment for supervised and semi-supervised learning: (a) the accuracy
of three algorithms; (b) the variance of three algorithms
Fig.5(a) illuminates that our proposed algorithm achieves the highest accuracy,
even with 10 labeled samples; our proposed algorithm can reach almost 90% accu-
racy. Fig.5(b) shows the variance of three algorithms. Algorithm with larger variance
is usually more unstable. Variance becomes much larger when the number of labeled
samples is small, especially for LDS algorithm. This phenomenon might be caused by
the biased selection of labeled samples, but we can see that our proposed algorithm
still has the best stability.
An Improved Path-Based Transductive Support Vector Machines Algorithm 461
SVMlight algorithm has a wide range of applications despite its high time complexity,
while our proposed algorithm has lower time complexity. SVMlight software tool can
be obtained at http://www.cs.cornell.edu/People/tj/. In this experiment, we also ran-
domly choose 10 to 100 labeled samples and combine with 600 samples unlabeled
data set for training. The experiment is done for 10 times and then calculates the mean
accuracy and cost time. It is worth mentioning that the cost time of building graph in
our proposed method is included. The testing results are shown in Fig. 6 and table 1.
From the results, we can make the conclusion that our proposed algorithm has much
lower time complexity and higher accuracy than SVMlight algorithm.
Fig. 6. The accuracy of SVMlight and proposed algorithm with 10 to 100 labeled samples
No. of No. of
Proposed (s) SVMlight(s) Proposed (s) SVMlight(s)
labeled labeled
10 4.50 277 60 7.12 601
20 5.25 357 70 7.39 532
30 5.51 520 80 7.54 553
40 6.06 541 90 7.74 408
50 6.52 593 100 8.12 436
average 5.568 457.6 average 7.582 506
5 Conclusions
In this paper, an improved path-based TSVM algorithm is proposed for blind stegana-
lysis classification. It uses Mahalanobis distance instead of Euclidean distance to
measure the distance between samples, and adds the information of labeled samples
by shrinking the edges of two samples in the same clusters, and expanding the edges
462 X. Zhang and S. Zhong
of two samples in different clusters. It improves the accuracy and time complexity.
Experiments show that our proposed path-based TSVM algorithm performs well even
in a small labeled data set. But it still has some drawbacks. With the increasing size of
labeled data set, the result is not significantly increasing accordingly. It might be
caused by the gradient descent method which will converge to a local minimal value
sometimes. Another drawback is, when the size of unlabeled samples becomes very
large, the time complexity will be extremely high.
In the future, we will try to do more comparative experiment for other semi-
supervised algorithms and find a low time complexity algorithm to deal with the large
size of unlabeled samples.
Acknowledgments. This paper is supported by the National Natural Science Founda-
tion of China under Grant NO.10871221, Fujian Provincial Young Scientists Founda-
tion under Grant No.2006F3076.
References
1. Chapelle, O., Scholkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press,
Cambridge (2006)
2. Joachims, T.: Transductive inference for text classification using support vector machines.
In: Proceeding of ICML, Bled, pp. 200–209 (1999)
3. Chapelle, O., Zien, A.: Semi-Supervised Classification by Low Density Separation. In:
Proceeding of the AISTAT, Barbados, pp. 57–64 (2005)
4. Wang, J., Shen, X.: Large margin semi-supervised learning. J. Mach. Learn. Res. 8,
1867–1891 (2007)
5. Zhu, X.: Semi-Supervised Learning Literature Survey. Technical report, University of
Wisconsin-Madison,
http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
6. Zhong, S.P., Lin, J.: An Universal Steganalysis Method for GIF Image Based On TSVM.
J. Beijing Jiaotong University 33, 122–126 (2009)
7. Fischer, B., Roth, V., Buhmann, J.M.: Clustering with the connectivity kernel. In: Thrun,
S., Saul, L., Scholkopf, B. (eds.) Advances in Neural Information Processing Systems,
vol. 16. MIT Press, Cambridge (2004)
8. Chang, H., Yeung, D.Y.: Robust Path-Based Spectral Clustering. J. Pattern Recogni-
tion 41, 191–203 (2008)
9. Chen, Y.L., Wang, S.T.: Application of WMD Gaussian kernel in spectral partitioning. J.
Computer Applications 28, 1738–1741 (2008)
10. Tong, X.F., Teng, J.Z., Xuan, G.R., Cui, X.: JPEG Image Steganalysis Based on Markov
Model. J. Computer Engineering 34, 217–219 (2008)
11. Shi, Y.Q., Xuan, G.R., Zou, D., Gao, J.J.: Steganalysis Based on Moments of Characteris-
tic Functions Using Wavelet Decomposition, Prediction-Error Image, and Neural Network.
In: Proceeding of ICME 2005, Netherlands (2005)
12. Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies,
http://www.csie.ntu.edu.tw/~cjlin/
Resolution with Limited Factoring
Dafa Li
1 Introduction
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 463–468, 2009.
c Springer-Verlag Berlin Heidelberg 2009
464 D. Li
(1) (2) (m )
L1 , L1 , ..., and L1 1 have a mgu α, and C1 and C2 have no common variables.
Then the binary resolvent of the factor C1 α of clause C1 and clause C2 , where the
(1)
factored literal L1 α is the literal resolved upon, is an instance of the resolvent
obtained by linear binary resolution with C1 as a top clause and C2 as side
clauses. See Fig. 1.
Proof.
(1) (2) (m )
Let α be a mgu of L1 , L1 , ..., and L1 1 . Then,
(1) (2) (m1 )
L1 α = L1 α = ... = L1 α. (1)
(1)
literal resolved upon. Let σ be a mgu of L1 α and ¬L2 . Then
(1)
L1 ασ = ¬L2 σ. (2)
ασ = βγ. (5)
Resolution with Limited Factoring 465
Fig. 1.
We take C1 as a top clause and C2 as a side clause and let R1 be the binary
(1)
resolvent of C1 and C2 , where L1 and L2 are the literals resolved upon. Then
β ∨ C1 β ∨ C2 β.
(2) (m1 )
R1 = L1 β ∨ ... ∨ L1 (6)
Now R1 is considered as a center clause and C2 is taken a side clause again, let
us compute a binary resolvent of R1 and C2 . Notice that from Eq. (1) and Eq.
(1) (2) (m )
(4), L1 ασ = L1 ασ = ... = L1 1 ασ = ¬L2 ασ. From Eq. (5),
(2)
L1 βγ = ¬L2 βγ. (7)
By notation in [1], we write C2 as L2 [x1 , ..., xl ] ∨ C2 [x1 , ..., xl ], where x1 , ..., xl
are all the variables which occur in C2 . After renaming all the variables in C2 ,
we obtain C2∗ = L∗2 ∨ C2∗ = L2 [y1 , ..., yl ] ∨ C2 [y1 , ..., yl ], where y1 , ... and yl are
new variables which do not occur in C1 , C2 , β, γ, α or σ. Clearly, L2 βγ =
L2 [x1 , ..., xl ]βγ = L2 [x1 βγ, ..., xl βγ]. Let λ = {x1 βγ/y1 , ..., xl βγ/yl }. Then,
γλ = ηρ. (10)
Let R2 be the binary resolvent of R1 and C2∗ , where L1 β and L∗2 are the literals
(2)
(m2 )
and L2 have a mgu β, and C1 and C2 have no common variables. Then the
binary resolvent of the factor C1 α of C1 and the factor C2 β of C2 , where L1 α
and L2 β are the literals resolved upon, is an instance of the binary resolvent of
C1 and C2 , where L1 and L2 are the literals resolved upon.
Proof.
Let µ = α ∪ β. Then factor C1 α = C1 µ = L1 µ ∨ L1 µ ∨C1 µ and factor C2 β =
(1)
instance of Rb .
Example 2. Let C1 = P (x) ∨ ¬Q(x) ∨ ¬Q(y) and C2 = ¬P (a) ∨ R(u) ∨ R(v).
Then instead of computing the resolvent of factors of the two clauses, we just
compute the binary resolvent of the two clauses.
(m2 )
and L2 have a mgu α2 , and C1 and C2 have no common variables. Then the
binary resolvent of the factor C1 α1 of C1 and the factor C2 α2 of C2 , where the
(1)
literal L2 α2 and the factored literal L1 α1 are the literals resolved upon, is an
instance of the resolvent obtained by linear binary resolution with C1 as a top
clause and C2 as side clauses. See Fig. 1.
Resolution with Limited Factoring 467
Proof
Let α = α1 ∪ α2 . Then L1 α ∨C1 α is a factor of C1 and L2 α ∨ L2 α ∨ C2 α
(1) (1)
(1)
is a factor of C2 . Let σ be a mgu of L1 α and ¬L2 α, and R be the resolvent
of the factors of C1 and C2 . Then R = C1 ασ ∨ L2 ασ ∨ C2 ασ. For simplicity,
(1)
here let m1 = 2. Then, by using the method used in Lemma 1, we can obtain
a deduction of R2 by linear binary resolution with C1 as a top clause and C2
as side clauses, and we can show that for some substitution ρ, R2 ρ = C1 ασ ∨
(L2 ∨ L2 ∨ ... ∨ L2 2 ∨ C2 )ασ = C1 ασ ∨L2 ασ ∨ C2 ασ = R. It means that
(1) (2) (m ) (1)
R is an instance of R2 .
Example 3. Let C1 = ¬P (a) ∨ ¬P (u) ∨ R(v) and C2 = P (x) ∨ ¬Q(x) ∨ ¬Q(y).
Then instead of computing the resolvent of factors of the two clauses, we derive
the deduction by linear binary resolution with C1 as a top clause and C2 as side
clauses.
(m )
and L2 2 have a mgu α2 , and C1 and C2 have no common variables. Then the
(1) (1)
binary resolvent of the two factors, where the factored literals L1 α1 and L2 α2
are the literals resolved upon, is an instance of the resolvent obtained by linear
binary resolution with C1 as a top clause and the factor C2 α2 of C2 as side
clauses. See Fig. 1.
Proof
let α = α1 ∪ α2 . Then C1 α = L1 α ∨C1 α and C2 α = L2 α ∨ C2 α. Let σ be a
(1) (1)
(1) (1)
mgu of L1 α and ¬L2 α and R be the binary resolvent of the factors C1 α and
(1) (1)
C2 α, where the factored literals L1 α1 and L2 α2 are the literals resolved upon.
Then R = C1 ασ ∨ C2 ασ. Now let us consider C1 as a top clause and the factor
C2 α of C2 as side clauses. For simplicity, here let m1 = 2. By lemma 1, we can
derive R2 ρ = C1 ασ ∨ C2 ασ = R for some substitution ρ, where the deduction of
R2 is obtained by linear binary resolution with C1 as a top clause and the factor
C2 α2 of C2 as side clauses. See Fig. 1. It implies that R is an instance of R2 .
We can find proofs of the following examples by using the limited factoring.
Example 4. S = {(1). P (x) ∨ P (y), (2). ¬P (x) ∨ ¬P (y)}.
Proof
(3). ¬P (z) a factor of (2)
(4). P (y) a binary resolvent of (1) and (3)
(5). a binary resolvent of (3) and (4)
468 D. Li
Example 5. S = {(1). P (z, y)∨P (x, g(x)),(2). Q(b)∨¬P (u, v), (3). ¬P (w, g(b))∨
¬Q(w)}[7].
Proof
(4). ¬P (u, v) ∨ ¬P (b, g(b)) a binary resolvent of (2) and (3)
(5). ¬P (b, g(b)) a factor of (4)
(6). P (z, y) a binary resolvent of (1) and (5)
(7). a binary resolvent of (5) and (6)
4 Summary
Noll [5] showed that it is sufficient to factorize only one of two parent clauses of
a resolvent. He called the refinement of the resolution as the resolution rule with
half-factoring. In this paper, furthermore, we demonstrate how to eliminate the
half-factoring. We show that if one can binary resolve two clauses on literals that
might also both factor in their clauses, then also allow factoring of only one of
the clauses on those literals. Otherwise, factoring can be ignored. For example,
lemmas 2 and 3 declare that it is not necessary to factorize the literals which
are not resolved upon.
References
[1] Chang, C.C., Lee, R.C.T.: Symbolic logic and mechanical theorem proving. Aca-
demic Press, San Diego (1973)
[2] Duffy, D.: Principles of automated theorem proving. John Wiley & Sons, Chichester
(1991)
[3] Fitting, M.: First-order logic and automated theorem proving. Springer, New York
(1990)
[4] Kowalski, R.: Studies in completeness and efficiency of theorem proving by resolu-
tion. Ph.D. thesis, University of Edingburgh (1970)
[5] Noll, H.: A Note on Resolution: How to get rid of factoring without loosing com-
pleteness. In: Bibel, W. (ed.) CADE 1980. LNCS, vol. 87, pp. 250–263. Springer,
Heidelberg (1980)
[6] Robinson, J.A.: A machine-oriented logic based on the rosolution principle. J.
ACM 12(1), 23–41 (1965)
[7] Socher-Ambrosius, R., Johann, P.: Deduction systems. Springer, New York (1997)
Formal Analysis of an Airplane Accident in
N Σ-Labeled Calculus
1 Introduction
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 469–478, 2009.
c Springer-Verlag Berlin Heidelberg 2009
470 T. Mizutani et al.
2 N Σ-Labeled Calculus
A brief explanation of N Σ-labeled calculus is introduced along with [9] and [12],
whose semantics is described in [9] with its soundness. We will partially follow
[14] in which PA stands for Peano arithmetic, while an occurrence of the variable
x in a formula A will be indicated explicitly as A[x], and µxA[x] will designate
the least number satisfying A, in this paper.
Peano arithmetic (PA) is extended to PA(∞) called pseudo-arithmetic [8],
including the infinity (∞) and the minimalization (µ).
Infinite number of special constants, J, J1 , J2 , . . . and labels, , 1 , 2 , . . .
∈ Σ are added, where Σ is the set of labels. The special constants correspond to
program variables taking natural number values and possibly ∞, and the change
of their values along the natural number time is expressed by the change of local
models, or worlds, and vice versa. It must be noticed that a program variable is
guarded against quantification in the calculus, since each J is not a variable but
a constant. A label indicates a personality that is a generalization of an agent of
multi-agent systems, or an observer in physics, including notion of subjectivity.
A tense means the time relative to a reference ‘observation time’ called now,
which is taken to be 0 throughout the calculus, where ∞ indicates the tense
when false holds. The value of a special constant may change along with tense.
@-mark as a logical symbol is called the coincidental operator. A formula
of the form A@a, intuitively means that the personality designated by the
label believes at the tense designated by a the fact that “A holds now”, while
A@a and A@ mean that A holds at tense a, and that believes that A holds
now, respectively. The logical symbol “ ; ” called the futurity operator moves
the observation time toward a future time-point. a; b is the tense of b observed
at the tense designated by a. a; A means the least, i.e. the earliest, time when
A comes to hold, or rises, after or exactly at the tense a, which will be called
the ascent of A at a.
b+t b+t
Definition 4. For a term a[x], x=b a[x], that is, an approximation of x=b a[x]dx,
is defined as follows.
b+t
def
a[x] = h · a[b + yh],
x=b 0≤y<n+1
where t = nh.
It must be noted that the error of this approximation can be made arbitrary
small by choosing a sufficiently small h.
5.2 Formalization
In this paper, the decision and order of the controller at Tokyo ACC, the de-
cision of the crews of the 2 airplanes, and the order from TCASs installed in
the airplanes caused the near miss are formulated, verified and analyzed in the
calculus.
Only the changes of the vertical positions of the airplanes are dealt with in
the analysis, for the simplicity. To do it, the projection of the positions of the
airplanes to a vertical plain is considered.
def
Separation(P1 , P2 , v, h) = v[FL] < |p1v − p2v | ∨ h [nm] < |p1h − p2h |,(3)
180 A ≡ (180[s], A), (4)
def
Separation7,5 = Separation(P1 , P2 , 7, 5), (5)
def
anticipateNM(P1 , P2 ) = hold(P1 )&hold(P2 ) ⊃ 180 ¬Separation7,5 , (6)
Table 2. Facts
index condition/prefix action tense label
↑ anticipateNM(A, B)
12 S ∗, Monitor
= 15:54 15
↑ cntl(A, ⊥, 350[FL])
13 S ∗, A
= 15:54 27
↑ (cntl(A, ⊥, 350[FL]) ≡ Ecntl)
14 S ∗, A
= 15:54 27
↑ RA(A, 1500ft/min, ⊥)
15 S ∗, A
= 15:54 35
↑ RA(B, −1500ft/min, ⊥)
16 S ∗, B
= 15:54 34
17 ¬Ecntl, ¬CNF S ∗, A, B, ACC
↑ cntl(B, ⊥, 350[FL])
18 S ACC
= 15:54 27
ope(B, ⊥, 350[FL]) @D6 ,
19 180 Separation7,5 S; 15:54 27 ∗, ACC
hold(A)
20 hold(A) ACC
Formal Analysis of an Airplane Accident in N Σ-Labeled Calculus 475
• Axiom 2: The controller recognizes that anticipating the near miss and
the corresponding alert will occur simultaneously.
• Axiom 3: The controller recognizes the alert D1 seconds after it actually
occurs.
• Axiom 4: The controller operates some emergency controlling order in-
dicated by Ecntl to each airplane A or B with D2 second delay after he
notices the near miss danger.
• Axiom 5: There exist some suitable orders that the two airplanes will
not in the near miss state, namely, they will keep a safety separation for
180 seconds.
– Axiom 6: The order will be voided after 15 seconds.
– Axiom 7: TCAS of the airplane A outputs resolution advisory, and orders
to climb or descend some value X, after D3 seconds when it recognizes the
near miss danger.
– Axiom 8 is similar. It is for the airplane B.
– Axiom 9: Each airplane operates to climb or descend some value X with D5
seconds delay after the controller at the ACC sends the control order.
– Axiom 10: The airplane must follow the order from its own TCAS if there
is no emergency control from ACC.
– Axiom 11 is the differential equation of the actual movement when the air-
plane operates the value X, i.e. the pair of values, v and h, where v is the
vertical speed to climb or descend and h is vertical position to go.
– Fact 12: The monitor on the panel of the controller anticipated the danger
of near miss at 15:54 15 , where S indicates the actual starting time of the
whole system.
– Fact 13: The pilot or the crew of the airplane A recognized at 15:54 27 that
the controller ordered to descend to FL350.
– Fact 14: The crew of the airplane recognized that the order was emergent.
– Fact 15: The crew, on the other hand, recognized at 15:54 35 that TCAS
of A output resolution advisory to climb at 1500ft per minute. Of course
these two operations contradicts, and the crew followed that of ACC for the
axioms 9 and 10.
– Fact 16: At the same time, the crew of B recognized the order from its own
TCAS to descend at 1500 ft per minute.
– Fact 17 is the initial condition. At the start time of the system, there are no
emergency control and no conflict alert.
– Facts 18-20 are recognition by the controller.
• Fact 18: Though the Fact 13 (and 14), the controller believed, or mis-
recognized, that he ordered B, instead of A, to descend to FL350.
• Fact 19: The controller believed that if he has ordered as Fact 18, the
near miss accident would occur.
• Fact 20: These two airplane held their vertical and horizontal speed.
In these axioms and facts, the program labels exist implicitly and do not repre-
sent explicitly.
476 T. Mizutani et al.
6 Conclusion
This paper has demonstrated an analysis of a concrete serious near miss acci-
dent of airplanes by N Σ-labeled calculus that is a formal system for complicated
control systems involving human factor, especially misunderstanding and inap-
propriate decision.
If the order from the controller at ACC contradicts that from TCAS, the
report from the Investigating Committee for Avion and Traffic Accident [10]
recommends that the crew must follow the latter, as mentioned in the Section
5.4. And also, on Jul. 1, 2002, a similar airplane accident occurred at Überlingen
[2]. A crew of one airplane followed the descending order from the controller,
though there was a climbing order from its own TCAS. On the other hand,
the crew of another airplane descended to follow the order from its own TCAS.
Consequently, these two airplanes collided.
From the investigations of these accidents, every crew now must follow orders
from TCAS if orders from it and from controllers contradicts. Namely, axioms 9’
and 10’ in Table 3 are substituted for axioms 9 and 10. Following to these new
rules, the fact that there is no near miss accident by such a conflict is proved.
One of our future studies is to construct some safety designs to control air-
planes, trains, vehicles, etc., against misunderstanding and incorrect decision.
478 T. Mizutani et al.
References
1. Avion Safety Network (ed.): ASN Aircraft accident McDonnell Douglas DC-10-40
JA8546 off Shizuoka Prefecture (2005),
http://aviation-safety.net/database/record.php?id=20010131-2
2. Avion Safety Network (ed.): ASN Aircraft accident Tupolev 154M RA-85816
Überlingen (2007),
http://aviation-safety.net/database/record.php?id=20020701-0
3. Curzon, P., Ruksenas, R., Blandford, A.: An Approach to Formal Verification of
Human-Computer Interaction. Formal Aspect of Computing 19, 513–550 (2007)
4. Damm, W., Hungar, H., Olderog, E.-R.: Verification of Cooperating Traffic Agents.
International Journal of Control 79, 395–421 (2006)
5. Fagin, R., Halpern, J.Y., Moses, Y., Vardi, M.Y.: Reasoning About Knowledge.
The MIT Press, Cambridge (1995)
6. Gentzen, G.: Untersuchungen über das logische Schließen. Mathematische
Zeitschrift 39, 176–210, 405–431 (1935); Investigations into Logical Deduction. In:
Szabo, M.E. (ed.) The Collected Papers of Gerhard Gentzen, Series of Studies in
Logic and the Foundations of Mathematics, pp. 68–131. North-Holland Publ. Co.,
Amsterdam (1969)
7. Halpern, J.Y., Vardi, M.Y.: The Complexity of Reasoning about Knowledge and
Time. I. Lower Bounds. Journal of Computer and System Sciences 38, 195–237
(1989)
8. Igarashi, S., Mizutani, T., Ikeda, Y., Shio, M.: Tense Arithmetic II: @-Calculus as
an Adaptation for Formal Number Theory. Tensor, N. S. 64, 12–33 (2003)
9. Mizutani, T., Igarashi, S., Ikeda, Y., Shio, M.: Formal Analysis of an Airplane
Accident in N Σ-labeled Calculus. In: Deng, H., Wang, L., Wang, F.L., Lei, J. (eds.)
AICI 2009. LNCS (LNAI), vol. 5855, pp. 469–478. Springer, Heidelberg (2009)
10. Investigating Committee for Avion and Traffic Accident (ed.): Report of the Near
Miss Accident by JA8904 belonging to Japan Airlines with JA8546 belonging to
the Same Company (2002) (in Japanese)
11. McCarthy, J., Sato, M., Hayashi, T., Igarashi, S.: On the Model Theory of Knowl-
edge. Stanford University Technical Report, STN-CS-78-657 (1979)
12. Mizutani, T., Igarashi, S., Shio, M., Ikeda, Y.: Human Factors in Continuous Time-
Concerned Cooperative Systems Represented by N Σ-labeled Calculus. Frontiers
of Computer Science in China 2, 22–28 (2008)
13. Presburger, M.: Über die Vollständigkit eines gewissen Systems der Arithmetik
ganzer Zahlen in welchem die Additon als einzige Operation hervortritt. In:
Comptes-Rendus du Congres des Mathematicians des Pays Slaves, pp. 92–101
(1930)
14. Shoenfield, J.R.: Mathematical Logic. Addison-Wesley Publishing Company, Read-
ing (1967)
Using Concept Space to Verify Hyponymy in Building a
Hyponymy Lexicon
Lei Liu1, Sen Zhang1, Lu Hong Diao1, Shu Ying Yan2, and Cun Gen Cao2
1
College of Applied Sciences, Beijing University of Technology
2
Institute of Computing Technology, Chinese Academy of Sciences
{liuliu_leilei,zhansen,diaoluhong}@bjut.edu.cn,
{yanshuying,caocungen}@ict.edu.cn
1 Introduction
Automatic acquisition of semantic relations from text has received much attention in
the last ten years. Especially, hyponymy relations are important in accuracy verifica-
tion of ontologies, knowledge bases and lexicons [1][2].
Hyponymy is a semantic relation between concepts. Given two concepts X and Y,
there is the hyponymy between X and Y if the sentence “X is a (kind of) Y” is accept-
able. X is a hyponym of Y, and Y is a hypernym of X. We denote a hyponymy rela-
tion as hr(X, Y), as in the following example:
中国是一个发展中国家 ---hr(中国,发展中国家)
(China is a developing country ---hr(China, developing country) )
Human knowledge is mainly presented in the format of free text at present, so
processing free text has become a crucial yet challenging research problem. The error
hyponymy relations in the phase of acquiring hyponymy from free text will affect the
building of hyponymy lexicon.
In our research, the problem of hyponymy verification is described as follows:
Given a set of candidate hyponymy relations acquired based on pattern or statistics,
we denoted these relations as CHR= {(c1, c2), (c3, c4), (c5, c6), …}, where ci is the
concept of constituting candidate hyponymy relation. The problem of hyponymy
verification is how to identify correct hyponymy relations from CHR using some
specific verify methods.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 479–486, 2009.
© Springer-Verlag Berlin Heidelberg 2009
480 L. Liu et al.
2 Related Work
There are two main approaches for automatic/ semi-automatic hyponymy acquisition.
One is pattern-based (also called rule-based), and the other is statistics-based. The
former uses the linguistics and natural language processing techniques (such as lexical
and parsing analysis) to obtain hyponymy patterns, and then makes use of pattern
matching to acquire hyponymy, and the latter is based on corpus and statistical lan-
guage model, and uses clustering algorithm to acquire hyponymy.
At present the pattern-based approach is dominant, and its main idea is the hy-
ponymy can be extracted from text as they occur in detectable syntactic patterns. The
so-called patterns include special idiomatic expressions, lexical features, phrasing
features, and semantic features of sentences. Patterns are acquired by using the lin-
guistics and natural language processing techniques.
There have been many attempts to develop automatic methods to acquire hy-
ponymy from text corpora. One of the first studies was done by Hearst [3]. Hearst
proposed a method for retrieving concept relations from unannotated text (Grolier’s
Encyclopedia) by using predefined lexico-syntactic patterns, such as
…NP1 is a NP2… ---hr(NP1, NP2)
…NP1 such as NP2… ---hr(NP2, NP1)
…NP1 {, NP2}*{,} or other NP3 … ---hr (NP1, NP3), hr (NP2, NP3)
Other researchers also developed other ways to obtain hyponymy. Most of these tech-
niques are based on particular linguistic patterns.
Morin and Jacquemin produced partial hyponymy hierarchies guided by transitiv-
ity in the relation, but the method works on a domain-specific corpus [4].
Llorens and Astudillo presented a technique based on linguistic algorithms, to con-
struct hierarchical taxonomies from free text. These hierarchies, as well as other rela-
tionships, are extracted from free text by identifying verbal structures with semantic
meaning [5].
Sánchez presented a novel approach that adapted to the Web environment, for
composing taxonomies in an automatic and unsupervised way. It uses a combination
of different types of linguistic patterns for hyponymy extraction and carefully de-
signed statistical measures to infer information relevance [6].
Elghamry showed how a corpus-based hyponymy lexicon with partial hierarchical
structure for Arabic can be created directly from the Web with minimal human super-
vision. His method bootstraps the acquisition process by searching the Web for the
lexico-syntactic patterns [7].
Using Concept Space to Verify Hyponymy in Building a Hyponymy Lexicon 481
3 Concept Space
In Chinese, one may find several hundreds of different hyponymy relations patterns
based on different quantifiers and synonymous words, which is equivalent to the sin-
gle hyponymy pattern (i.e. (<?C1> is a <?C2>), (<?C3> such as <?C1>,<?C2>)) in
English. Fig. 1 depicts a few typical Chinese hyponymy relation patterns.
Firstly we initially acquire a set of candidate hyponymy relation from large Chi-
nese free text based on Chinese lexico-syntactic patterns. Then we build concept
space using those candidate hyponymy relations.
Definition 1: The concept space is a directed graph G = (V, E, W) where nodes in V
represent concepts of the hyponymy and edges in E represent relationships between
concepts. A directed edge (c1, c2) from c1 to c2 corresponds to a hyponymy from con-
cept c1 to concept c2. Edge weights in E are used to represent varying degrees of cer-
tainty.
For a node c in a graph, we denote by I(c) and O(c) the set of in-neighbors and out-
neighbors of v, respectively. Individual in-neighbors are denoted as Ii(c), for <=i<=
|I(c)|, and individual out-neighbors are denoted as Oi(c), for 1 <= i <= |O(c)|.
The basic control process about building concept space is shown in Algorithm 1.
------------------------------------------------------------------------------------------------
Algorithm 1. The basic process of building concept space
Input: the set of candidate hyponymy relations CHR from large Chinese free text
based on Chinese lexico-syntactic patterns;
Output: the concept space G.
Step1: Initialize G = (V, E, W), let V=∅, E=∅, W=∅;
Step2: For each (c1,c2)∈CHR, continue Step3-Step4 ;
Step3: If c1∉V,c2∉V, then V=VU{c1,c2}; E=EU{(c1,c2)}; If c1∉V, c2∈V, then V=VU{c1};
E=EU{(c1,c2)}; If c1∈V, c2∉V, then V=VU{c2}; E=EU{(c1,c2)};
Step4: CHR= CHR-{(c1,c2)};
Step5: For each r∈E, set its w(r) ∈W to be 0.
Step6: return G ;
----------------------------------------------------------------------------------------------------------
(e) (f)
c=食品, {c ,…, c }={牛肉饼, 蛋糕, 面包, 奶油}, c’=产品, {c’ ,…, c’ }={牛肉
饼, 牛肉干, 牛肉汤}, {c ,…, c }∩{c’ ,…, c’ } = {牛肉饼}
1 m 1 n
1 m 1 n
(c= foodstuff, {c1,…, cm}={hamburger, cake, bread, butter}, c’=product, {c’1,…,
c’n}={hamburger, beef jerky, brewis}, {c1,…, cm}∩{c’1, …, c’n} = {hamburger})
Structure (f): (c, c1,), (c, c2), …, (c, cm), (c, c’1), (c, c’2), …, (c, c’n) {c1, c2, …,
cm}∩{c’1, c’2, …, c’n}≠∅.
For example:
西 红柿
c= , {c1,…, cm}={ 植物 蔬菜 食品 果 实 茄子
, , , }, c’= , {c’1,…,
c’n}={ 蔬菜 食品 食材
, , 蔬菜 食品
}, {c1,…, cm}∩{c’1,…, c’n} = { , }
(c=tomato, {c1,…, cm}={plant, vegetable, foodstuff, fruit}, c’= aubergine,
{c’1,…, c’n}={vegetable, foodstuff, food for cooking}, {c1,…, cm}∩{c’1, …, c’n} =
{vegetable, foodstuff})
The structure features of hyponymy are converted into a set of production rules
used in uncertainty reasoning. We use CF (certainty factors) that is the most common
approach in rule-based expert system. The CF formula is as follows:
⎧ P ( C H R |f ) -P ( C H R )
⎪ , P ( C H R |f ) ≥ P ( C H R )
⎪ 1 − P (C R )
C F (C H R , f ) = ⎨ (1 )
⎪ P ( C H R |f ) -P ( C H R ) , P ( C H R |f ) < P ( C H R )
⎪⎩ P (C H R )
5 Evaluation
5.1 Evaluation Method
We used about 8GB of raw corpus from the Chinese Web pages. Raw corpus is
preprocessed in a few steps, including word segmentation, part of speech tagging, and
splitting sentences according to periods. Then we acquired candidate hyponymy
relations CHR from processed corpus by matching Chinese hyponymy patterns. For
analyzing the influence of threshold a, we choose several different values. We
manually evaluated 10% initial set CHR and 10% final result. The detailed result is
shown in Table 1.
As we can see from table 1, there are 62,265 hyponymy relations in concept space
initially. With the increase of threshold α, the precision is also increase. If we want to
increase the precision, we can augment γ value. For example, when α=0.8, the preci-
sion is up to 95%, and but its recall decreases to 27%. That is to say, when threshold
α is a small value, our methods can throw away many error hyponymy relations under
the condition of skipping a few correct relations. But when threshold α is a large
value, our methods can throw away many error hyponymy relations and also skip
many correct relations at same time.
6 Conclusion
References
1. Beeferman, D.: Lexical discovery with an enriched semantic network. In: Proceedings of
the Workshop on Applications of WordNet in Natural Language Processing Systems,
ACL/COLING, pp. 358–364 (1998)
2. Cao, C., Shi, Q.: Acquiring Chinese Historical Knowledge from Encyclopedic Texts. In:
Proceedings of the International Conference for Young Computer Scientists, pp. 1194–1198
(2001)
3. Hearst, M.A.: Automated Discovery of WordNet Relations. In: Fellbaum, C. (ed.) To Ap-
pear in WordNet: An Electronic Lexical Database and Some of its Applications, pp. 131–
153. MIT Press, Cambridge (1998)
4. Morin, E., Jacquemin, C.: Projecting corpus-based semantic links on a thesaurus. In: Pro-
ceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp.
389–396 (1999)
5. Lloréns, J., Astudillo, H.: Automatic generation of hierarchical taxonomies from free text
using linguistic algorithms. In: Advances in Object-Oriented Information Systems, OOIS
2002 Workshops, Montpellier, France, pp. 74–83 (2002)
6. Sánchez, D., Moreno, A.: Pattern-ed automatic taxonomy learning from the Web. AI Com-
munications 21(3), 27–48 (2008)
7. Elghamry, K.: Using the Web in Building a Corpus-Based Hypernymy-Hyponymy Lexicon
with Hierarchical Structure for Arabic. Faculty of Computers and Information, 157–165
(2008)
8. Zhang, C.-x., Hao, T.-y.: The State of the Art and Difficulties in Automatic Chinese Word
Segmentation. Journal of System simulation 17(1), 138–143 (2005)
Advanced Self-adaptation Learning and Inference
Techniques for Fuzzy Petri Net Expert System Units∗
1 Introduction
In intelligent diagnoses system, since most knowledge is fuzzy and general logic rules
cannot describe it effectively. Fuzzy Petri net (FPN), which is based on fuzzy produc-
tion rules (FPRs), can provide us a useful way to properly represent fuzzy knowledge
[1], [2]. Many results prove that FPN is suitable to represent and reason misty logic
implication relations [3],[4].
However, Knowledge in expert systems is updated or modified frequently and ex-
pert systems may be regarded as dynamic systems. The models must have the ability
to adjust themselves according to the systems’ changes. But general FPN which lacks
learning ability cannot cope with potential changes of actual systems[5]. Artifical
neural network(ANN) has strong adaptation and learning ability[6]. It is effective to
combine the superiority of ANN into FPN to make FPN has self-adaptation learning
ability to adapt dynamic expert systems.
∗
This work is supported by NNSFC Grant 50539140 & 50779020 and NSF of Hubei Grant
2008CDB395.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 487–496, 2009.
© Springer-Verlag Berlin Heidelberg 2009
488 Z. Zhang, S. Wang, and X. Yuan
In order to overcome the disadvantage of learning speed slow and local optimiza-
tion in general back-propagation learning way, adaptive learning techniques are used
to learn and train parameters of FPRs in FPN. After a training process, an excellent
input-output map of the knowledge system can be gained.
j
p1 t11 λ11
d1j p j +1
µ 1j
p 2j µ 2j
d 2j +
t 2j λ 2j d j 1
p 3j
d 3j
A FPN is a bipartite directed graph containing two types of nodes: places and tran-
sitions, where circles represent places and bars represent transitions. Every FPR may
be expressed as a transition of FPN. The proposition of production rule is expressed
as place relatively. In a FPN, the relationships from places to transitions and from
transitions to places are represented by directed arcs.
Definition 1: A FPN has n layers noses and every layer has different places and transi-
tions. The FPN may be described using a 10-tuple[10],[11].
FPN = ( P, T , D, I , O, f , M ,W , Th, β ) (1)
Advanced Self-adaptation Learning and Inference Techniques 489
Where, P = {p11,L, p1a1 , p12,L, pa22 ,L, p1n,L, pann } is a finite set of places.
T = {t11 ,L, tb11 , t12 ,L, tb22 ,L, t1n ,L, tbn
n
} is a finite set of transitions.
D = {d11,L, d a11 , d12 ,L, d a22 ,L, d1n ,L, d ann } is a finite set of propositions ,and
P IT I D = Φ , P = D .
:
I T → P ∞ is the input function, a mapping from transitions to bags of places,
which determines the input places to a transition, and P ∞ denotes bags of places.
:
O T → P ∞ is the output function, a mapping from transitions to bags of places,
which determines the output places form a transition.
:
f T → [0,1] is an association function, a mapping from transitions to real values
between 0 and 1, f (tbj ) = µbj .
M : P → [0,1] is an association function, every place pi ∈ P has a sign m( pi ) ,
which replaces the true degree.
W = {ω11 ,L, ωb11 , ω12 ,L, ωb22 ,L, ω1n ,L, ωbnn } is a set of input weights and output
weights which assign weights to all the arcs of a net.
( ){ }
Th t b = λ11 ,L, λ1b1 , λ12 ,L, λb22 ,L, λ1n ,L, λnbn is a set of threshold value from zero
j
:
to one to transition .
β P → D is an association function, a relationship mapping from places to
propositions.
Firstly, some basic definitions are given to explain the transition firing rule of FPN.
( )
Definition2: if ∀pij ∈ I (t j ) , m(pij ) > 0 i =1,2,L,a j = 1,2,L n then ∀t ∈ T , t is enabled.
Definition3: ∀t j ∈T , ∀p j ∈ I (t j ) ,if
i
∑m( pij ) ⋅ωij > Th(t j ) then transition fires, at the same
i
time, token transmission takes place.
j
Sigmoid function F(x) is used to approximates the threshed of t .
490 Z. Zhang, S. Wang, and X. Yuan
F ( x ) = 1 /(1 + e − b ( x − Th (t
j
)) (3)
)
Where, x = ∑ m( pij ) ⋅ ωij , b is instant. When b is chosen properly, if x > Th(t j ) , then
i
− b ( x −Th (t j ))
e ≈ 0 and F ( x) = 1 ,which denotes transition t is enable. In the other hand,
if x < Th(t j ) , then e − b ( x −Th ( t )) ≈ 1 and F ( x) = 0 , which denotes transition t is not
j
Definition 5: If transition fires, the token of input places does not vary and output
place produces new token.
1) If a place only has one input transition, a new token with certainty factor CF is
put into each output place, new token sign is given as:
)
m( p j +1 = f (t j ) × CF (t j )( x j ) (5)
)
Where pij ∈ I (t j , i = 1,2 L m, p j +1 ∈ O(t j )
2) If a place has more than one input transitions t z ( z = 1,2, L , c) and more than
j
one route are active at the same time, then the new certainty factor is decided by the
maximal passed sign of the fired transitions:
)
m( p j +1 = max( f (t1j ) × CF (t1j )( x1j ), f (t 2j ) × CF(t 2j )( x2j ),
(6)
L, f (tcj ) × CF(tcj )(xcj ))
Definition 6 (Source Places, Sink Places): A place p is called a source place if it has
no input transitions. It is called a sink place if it has no output transitions. A source
place corresponds to a precondition proposition in FPN, and a sink place corresponds
to a consequent.
Definition 7(input place, output place): The set of places P is divided into three
parts P = PUI U Pint U PO , where P is the set of places of FPN, PUI ={p∈P|.p = Φ} ,
p ∈ PUI is called a user input place; Pint = { p ∈ P|. p ≠ Φ and p. ≠ Φ} , p ∈ Pint is
Advanced Self-adaptation Learning and Inference Techniques 491
p32 3
ω 212 t 22 λ22 µ 2 p4
2
p11 t λ1
1 1 d 32
1
ω 222 d 43
µ 11 2
p1 p 3
d11 1
ω13
2 ω 11
p 1
t λ2 µ1
1 1 2
d13
2 2
2 d1 t λ
2 2
p23 t 3 λ3 p4
1 1 2
1µ ω 3
2
µ 3
d 21 p22 ω 122 d 3
2
d 2 p33 ω 33
2
d 33
1) If m( p11 ) > λ11 then t11 fire, or if m( p12 ) > λ12 then t12 fire and then the true degree
of p12 may be calculated using follow formula.
)
m( p12 = max( f (t11 ) × x11 × F ( x11 ),
(7)
f (t12 ) × x12 × F ( x12 ))
=
m( p23 ) x22 × F ( x22 ) (8)
m ( p4 3 )=x × F ( x )
2
4
2
4 (9)
4
3) The last layer transitions fire, the true degree of p may be calculated through
follow formula.
=
m( p 4 ) x 3 × F ( x 3 ) (10)
=
ω13 + ω23 + ω33 1 , F ( x 3 ) = µ 3 /(1 + e −b ( x
3
− λ3 )
).
ANN has strong learning and adaptive ability. When weights can not be ascertained
via expert knowledge, FPN structure may be translated further into a neural networks-
like structure. The back-propagation way with multi-layered feed-forward networks
of ANN is used to training FPN model. The trained weight threshold value, and true
degree of FPRs may be used to analyze fuzzy reasoning system.
Supposing there are n layers networks, if t j is the jth layer transition of FPN, the
weights of input arc are supposed as ω1 j , ω 2 j L , ω m j , the threshold value of t j is
λ j and the true degree is µ j .the network output of nth layer node is expressed as:
Advanced Self-adaptation Learning and Inference Techniques 493
G ( x j ) := F ( x j ) ⋅ x j (11)
m
Where, x j = W T × M = ∑ m( pi j )ωi j , j = 1,2 L , n ,
i =1
−b ( x j − λ j )
F ( x ) = µ /(1 + e
j j
) , W = [ω 1j , ω 2j , Lω mj ] ,
M = [m( p1j ), m( p 2j ), L m( p mj )] .
The last layer output may be computed via the last node of network.
O( R) = G ( x n ) = M ( p n ) (12)
E = 12 ∑ ∑ (O(R) − O )
r b
l =1 o =1
* 2
(13)
Where, r is number of samples and b is output places. The learn ways are given as
follow.
dE
ωi j (k + 1) = ωi j (k ) − η , i = 1,2Lm − 1, j = 1,2Ln (14)
dωi j
m −1
ωm j ( k + 1) = 1 − ∑ ωi j (k + 1) (15)
i =1
dE
Where, may be computed through the follow way.
dωi
j
If the FPN model has n layers, the last layer weight varying dE may be com-
dω i
n
puted as follow.
dE dE dO ( R ) dE dG ( x n )
= × = ×
dω i n dO ( R ) dω i n dG ( x n ) dω i n
r b dG ( x n ) dx n (16)
= ∑ ∑ (O ( R ) − O * ) ×
l =1 o =1 dx n d ω i n
n n
Where, dG(x ) and dx may be calculated using follow formula:
dxn dω i n
µ n x nbe −b ( x − λ )
n n
dG ( x n ) µn
= + (17)
1 + e − b ( x − λ ) (1 + e − b ( x − λ ) ) 2
n n n n
dx n
494 Z. Zhang, S. Wang, and X. Yuan
dx n
= m ( pi n ) (18)
dωi n
The error term of the other layers may be computed through the same back-
propagation method.
In training, the choice of the learning rate η has an important effect on weights
convergence. If it is set too small, too many steps are needed to reach an acceptable
solution. On the contrary, a large learning rate will possibly lead to oscillation, pre-
venting the error to fall below a certain value. To overcome the inherent disadvan-
tages of pure gradient-descent, an adaptation way of the weight-updates according to
the behavior of the error function is applied to training weights.
Here, Individual update-value ∆ i (k ) is introduced for the weight-updating. The
adaptive update-value ∆ i (k ) evolves during the learning process based on its local
dE dE
(k − 1) . ∆ωi (k ) and ωi (k + 1) are
j j
sight on the error function (k ) and
dωi dωi j
j
ωi j (k + 1) = ωi j (k ) + ∆ωi j (k ) (20)
At the beginning, initial value ∆ i (0) should be set to all update-values ∆ i j (k ) . For
j
∆ i (0) directly determines the size of the first weight-step, it is preferably chosen in a
j
reasonably proportion to the size of the initial weights. Where, ∆ i j (0) = 0.19 is very
good.
5 Training Example
In this section, the above given fuzzy expert reasoning system (shown as Fig.2) is
used as example to illustrate the learning effect of above introduced way.
In the example, assume the ideal parameters are given as:
=
λ11 = λ12 = λ12 = λ22 λ3 = 0.5 , µ11 = 0.8 , µ 12 = 0.9 , µ12 = 0.9 , µ 22 = 0.8 , µ 3 = 0.9 ,
ω 11
2
= 0.35 , ω12
2
= 0.65 , ω21
2
= 0.25 , ω22
2
= 0.75 , ω 13 = 0.15 , ω 32 = 0.55 , ω 33 = 0.3 .
If weights are unknown, neural networks technique are used to estimate the weights.
The learning part of the FPN may be formed as a standard sub single layer ANN and a
two layers sub neural network. Fig.3 and Fig.4 show the perfection of adaptation
learning techniques.
Advanced Self-adaptation Learning and Inference Techniques 495
From the simulation example, we can see that the fuzzy reasoning algorithm and
the ANN training algorithm are very effectively when weights of FPN are not known.
Fig.3 and Fig.4 show that self-adaptation learning way is very quick to find optimiza-
tion weights to make error function E to least. Self-adaptation learning way over-
comes the disadvantage of learning speed slow and local optimization in common
learning way.
ω222
ω112
ω122
ω 112 ω 32
ω 33
ω 13
6 Conclusion
In expert system, the fuzzy reasoning process based on FPRs is difficult to obtain ef-
fective knowledge. In order to produce effective reasoning and learning ability, FPN
without loop is transformed into hierarchy model in this paper. Through building con-
tinuous functions, the approximate transition firing and fuzzy reasoning may be ac-
quire effectively, which provides powerful facility for FPN to reason and learn both
forward and backward. In training, self-adaptation learning way has been used in learn-
ing and training the parameters of FPN. Simulation results show that designed self-
adaptation learning way is very quick to find optimal weights. Therefore the designed
reasoning and learning way of FPN is effective for dynamic knowledge inference and
learning in expert system.
496 Z. Zhang, S. Wang, and X. Yuan
References
[1] Chen, S.M., Ke, J.S., Chang, J.F.: Knowledge representation using fuzzy Petri nets. IEEE
Trans. Knowledge Data Engineering 2, 311–319 (1990)
[2] Cao, T., Sanderson, A.C.: Representation and Analysis of Uncertainty Using Fuzzy Petri
Nets. Fuzzy System 3, 3–19 (1995)
[3] Lee, J., Liu, K.F.R., Chiang, W.: A Fuzzy Petri Net-Based Expert System and Its Appli-
cation to Damage Assessment of Bridges. IEEE Transactions on Systems, Man, and Cy-
bernetics -Part B: Cybernetics 29(3), 350–369 (1999)
[4] Scarpelli, H., Gomide, F., Yager, R.R.: A Reasoning Algorithm for High-level Fuzzy
Petri Nets. IEEE Trans. Fuzzy System 4(3), 282–293 (1996)
[5] Li, X., Lara-Rosano, F.: Adaptive Fuzzy Petri Nets for Dynamic Knowledge Representa-
tion and Inference. Expert Systems with Applications 19, 235–241 (2000)
[6] Zengren, Y.: Artificial Neural Network Application. Qinghua University Press (1999)
[7] Bugarn, A.J., Barro, S.: Fuzzy reasoning supported by Petri nets. IEEE Trans. Fuzzy Sys-
tem 2(2), 135–150 (1994)
[8] Pedrycz, W., Gomide, F.: A Generalized Fuzzy Petri Nets Model. IEEE Trans. Fuzzy
System 2, 295–301 (1994)
[9] Li, X., Yu, W.: Object Oriented Fuzzy Petri Net for Complex Knowledge System Model-
ing. In: IEEE international conference on control applications, September 2001, pp. 476–
481 (2001)
[10] Yeung, D.S., Tsang, E.C.C.: A Multilevel Weighted Fuzzy Reasoning Algorithm for Ex-
pert Systems. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and
Humans 28(2), 149–158 (1998)
[11] Scarepelli, H., Gomide, F.: A High Level Net Approach for Discovering Potential Incon-
sistencies in Fuzzy Knowledge Bases. Fuzzy Sets System 62(2), 175–193 (1994)
MIMO Instantaneous Blind Identification Based on
Second-Order Temporal Structure and Newton’s Method
1 Introduction
Blind signal processing (BSP) is an important task for numerous applications such as
speech separation, dereverberation, communications, signal processing and control,
etc. Its research has obtained extensive interests all over the world and many exciting
results have been reported, [1] and references therein. Its task is to recover the source
signals only given the observations in the sense of some uncertain problems such as
scaling, permutation and/or delay.
Multiple-input multiple-output (MINO) instantaneous blind identification (MIBI)
is one of the attractive BSP problems, where a number of source signals are mixed by
an unknown MIMO instantaneous mixing system and only the mixed signals are
available, i.e., both the mixing system and the original source signals are unknown.
The goal of MIBI is to recover the instantaneous MIMO mixing system from the
observed mixtures of the source signals [1][2]. In this paper, we focus on developing
a new algorithm to solve the MIBI problem by using second-order statistics and New-
ton’s method.
*
Shen Xizhong, Ph.D, born in 1968, as a professor in Shanghai Institute of Technology, and
also as a visitor/post-doc in Shanghai Jiao Tong University. He is going on studying blind
signal processing, neural network, etc.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 497–506, 2009.
© Springer-Verlag Berlin Heidelberg 2009
498 X. Shen and G. Meng
x ( t ) = As ( t ) + ν ( t ) . (1)
where A = [ a1 , , am ] ∈ n×m
is an unknown mixing matrix with its
ν ( t ) = ⎡⎣ν 1 ( t ) , ,ν n ( t ) ⎤⎦
T
is the vector of noises, and
Without knowing the source signals and the mixing matrix, the MIBI problem is to
identify the mixing matrix from the observations by estimating A as  , and if we
set the following linear transformation
y (t ) = A
ˆ + x (t ) . (2)
time t , whose elements are the estimations of sources, and ˆ + is the Moore-Penrose
A
inverse of  , which is a demixing matrix named in BSP.
MIMO Instantaneous Blind Identification Based on Second-Order Temporal Structure 499
The mixing matrix is identifiable in the sense of two indeterminacies, which are
unknown permutation of indices of each column of the matrix and its unknown mag-
nitude. When signal si is multiplied by a scalar, this is equivalent to rescaling the
corresponding column of A by the scalar. Therefore, the scalar of each column
remains undetermined. The usual convention is to assume that each column satisfy
the normalization conditions, i.e., on the unit sphere,
n
∑a
i =1
ij
2
= 1, j = 1, 2, ,m . (3)
S j ( a j ) = ∑ aij 2 − 1 = 0; j = 1, 2,
n
,m . (4)
i =1
It should be noted that m sources cannot be determined in their exact order. It is also
unknown which the first column of A is and which the second is, and thus its permu-
tation of indices of each column of the matrix is indeterminate.
To solve the MIBI problem, we first define the following concepts Def 1~2 for the
derivation of the algorithm, and then make the following assumptions AS 1~4 [2].
ru ( t ,τ ) E ⎡⎣u ( t ) u ( t − τ ) ⎤⎦ , ∀t ,τ ∈ . (5)
ru ,v ( t ,τ ) E ⎡⎣u ( t ) v ( t − τ ) ⎤⎦ , ∀t ,τ ∈ . (6)
∑ ξ r ( t ,τ ) = 0 ⇒ ξ
j =1
j s , jj j = 0, ∀j = 1, 2, ,m (8)
AS 3. the noise signals have zero auto- and cross- correlation functions on the noise-
free ROS Ω :
rn , j1 j2 ( t ,τ ) = 0, ∀1 ≤ j1 , j2 ≤ m . (9)
500 X. Shen and G. Meng
AS 4. the cross-correlation functions between the source and noise signals are zero on
the noise-free ROS Ω :
Here, no assumptions are made on the mixing matrix, and even rank-deficient one can
be identified. This is a significant advantage with respect to other methods because of
the two reasons as follows: (1) most IBSS may not only require that the number of
sensors is larger than the number of sources, but also that the mixing matrix is full
rank; (2) no assumptions are made on the probability density functions of the noise
and source signals.
The procedure of our proposed algorithm includes two steps, that is, step 1 is that the
problem of MIBI is formulated as the problem of solving a system of homogeneous
polynomial equations; and step 2 is that Newton’s method is applied to solve the system
of polynomial equations. We detail these steps respectively in section III and IV.
R x ,◊ ⎡⎣rx ( t1 ,τ 1 ) rx ( t N ,τ N ) ⎤⎦ , (11)
R x,◊ = A ◊ R s, . (12)
ΦA ◊ = 0 . (13)
Here, (
Φ = ϕq ,i1i2 ) q =1, ,Q ;i1 ,i2 =1, , n ;i1 ≤i2
is a matrix with
⎛1 ⎞
⎜ n ( n + 1) − rank ⎡⎣ R x ,◊ ⎤⎦ ⎟ × ( n ) dimensions, of which its rows form a basis for
2
⎝ 2 ⎠
MIMO Instantaneous Blind Identification Based on Second-Order Temporal Structure 501
the nonzero left null space Ν ( R x ,◊ ) . Therefore, there are Q equations about each
column of A in (13).
R x ,◊ is splitted into signal and noise subspace parts as
1
M max = n ( n + 1) − ( n − 1) . (14)
2
M max is the maximum number of the sources to be identified by SOTS.
4 Newton’s Method
In this section, we summarize the main ideas behind the so-called Newton’s method
that transforms a system of nonlinear equations into a convergent fixed-point problem
in a general way.
Newton’s method for nonlinear system is able to give quadratic convergence, pro-
vided that a sufficiently accurate starting value is known and the inversion of the
Jacobian matrix of the nonlinear equations exists [7]. Certainly, we could use the
quasi-Newton’s method to avoid the calculation of inversion; however we use New-
ton’s method in our algorithm for the number of equations and variables involved are
small and its inversion of the corresponding Jacobian matrix is simple.
We expand the expression in (13) as
fq (a j ) = ∑ ϕq;i i ai j ai j = 0;
12 1 2
i1 ≤i2 ;i1i2 =1, , n . (15)
q = 1, , Q; ∀j = 1, ,m
( )
F a j ( k ) = ⎡⎣ f1 ( a j ) f Q ( a j ) ⎤⎦ . By Newton’s method, for each
T
and set
column of the mixing matrix we have
502 X. Shen and G. Meng
( ) ( )
a j ( k +1) = a j ( k ) − J −1 a j ( k ) F a j ( k ) . (16)
We employ the initial solutions as equal distributed vectors in the super sphere de-
fined on the super space of a j , for example, in our simulation of mixing matrix with
3 × 4 sizes,
⎡1 0 0 1⎤
A = ⎢⎢0 1 0 1⎥⎥ . (17)
⎣⎢0 0 1 1⎥⎦
Thus, the different solutions are easily obtained. Alternatively, to get all the solutions
of (16), we set the initial values given by 2 N +1 vectors with its entries random nor-
N +1
mally distributed values in [-1,1]. And then, after running the algorithm, we get 2
solutions and select the centers of all the solutions.
The procedure of Newton’s method for nonlinear equations can be described as
follows:
Step 1. Set initial approximation in (17) of an initial solutions
a j ( 0) , j = 1, 2, ,m;
Step 2. Set tolerance TOL and maximum number of iterations maxIteration;
Step 3. For each solution a j ( k ) , calculate a j ( k +1) by (16) until a j ( k +1) − a j ( k )
2
is less than a certain tolerance or the number of iterations reaches to the maximum
value maxIteration.
estimated and employed, i.e., the employed noise-free ROS in the domain of block-
lag pairs is given by
∑σ i
si
2
1 1
2
3
0.5
4
A1
0 Estimated A 1
z
A2
Estimated A 2
-0.5
A3
Estimated A 3
-1 A4
-1 Estimated A 4
0.5 1
1 -0.5 0
x -1
y
40 40
20 20
0 0
2 4 6 8 10 2 4 6 8 10
80 (BL) 80 (BR)
Included Angle/°
60 60
40 40
20 20
0 0
2 4 6 8 10 2 4 6 8 10
Compared numbers Compared numbers
Fig. 2. Comparisons of MIBI_NWT with MIBI_SD and MIBI_Homotopy. TL, TR, BL and
BR are respectively the estimated included angles along different running times between the
first, second, third and fourth columns and their estimates. Red dot indicates the result of
MIBI_NWT; magenta plus the results of MIBI_SD and blue circle indicates the results of
MIBI_homotopy.
6 Conclusion
In this paper, we further develop the algorithm proposed in [2] to obtain more accu-
rate solution. The SOTS is considered only on a noise-free region of support (ROS).
We project the MIBI problem in (1) onto the nonlinear system of homogeneous poly-
nomial equations in (13) of degree two. The nonlinear system is solved by Newton’s
method for the equations, which is quite different from the algorithm in [2] which
applied homotopy method. Our algorithm allows estimating the mixing matrix for
scenarios with 4 sources and 3 sensors, etc. Simulations and comparisons show its
effective with more accurate solutions than the algorithm with homotopy method.
Acknowledgments. This paper is supported by the National Science Foundation of
China with the project number 10732060, and Shanghai Leading Academic Discipline
Project directed by Hu Dachao with the project number J51501, and also Shanghai
Education with No. ZX2006-01.
References
1. Cichocki, A., Amari, S.I.: Adaptive Blind Signal and Image Processing: Learning Algo-
rithms and Applications. Wiley, New York (2002)
2. van de Laar, J., Moonen, M., Sommen, P.C.W.: MIMO Instantaneous Blind Identification
Based on Second-Order Temporal Structure. IEEE Transactions on Signal Processing 56(9),
4354–4364 (2008)
506 X. Shen and G. Meng
3. Shen, X., Shi, X.: On-line Blind Equalization Algorithm of an FIR MIMO channel system
for Non-Stationary Signals. IEE Proceedings Vision, Image & Signal Processing 152(5),
575–581 (2005)
4. Shen, X., Haixiang, X., Cong, F., et al.: Blind Equalization Algorithm of FIR MIMO Sys-
tem in Frequency Domain. IEE Proceedings Vision, Image & Signal Processing 153(5),
703–710 (2006)
5. Hua, Y., Tugnait, J.K.: Blind identifiability of FIR-MIMO systems with colored input using
second order statistics. IEEE Signal Processing Letters 7(12), 348–350 (2000)
6. Lindgren, U., van der Veen, A.-J.: Source separation based on second order statistics—An
algebraic approach. In: Proc. IEEE SP Workshop Statistical Signal Array Processing, Corfu,
Greece, June 1996, pp. 324–327 (1996)
7. Burden, R.L., Faires, J.D.: Numerical Analysis, pp. 600–635. Thomson Learning, Inc.
(2001)
Seed Point Detection of Multiple Cancers Based
on Empirical Domain Knowledge and K-means
in Ultrasound Breast Image
Lock-Jo Koo∗, Min-Suk Ko, Hee-Won Jo, Sang-Chul Park, and Gi-Nam Wang
Abstract. The objective of this paper is to remove noises of image based on the
heuristic noises filter and to automatically detect seed points of tumor region by
using K-MEANS in breast ultrasound. The proposed method is to use 4 differ-
ent kinds of process. First process is the pixel value which indicates the light
and shade of image is acquired as matrix type. Second process is an image pre-
processing phase that is aimed to maximize a contrast of image and to prevent a
leak of personal information. The next process is the heuristic noise filter which
is based on the opinion of medical specialist and it is applied to remove noises.
The last process is to detect a seed point automatically by applying K-MEANS
algorithm. As a result, the noise is effectively eliminated in all images and an
automated detection is possible by determining seed points on each tumor.
1 Introduction
Breast cancer has been becoming a growth rate of 2% in the world since 1990. Ac-
cording to the Ministry of Health & Welfare, South Korea and American Cancer
Society, breast cancer ranks second in the list of women’s cancers. Moreover, in
most countries of the world, it is on a higher position of women’s cancers [1]-[3].
Early detection of breast cancer is important for the identifiable treatment. How-
ever, the segmentation and interpretation of ultrasound image is based on the experi-
ence and knowledge of a medical specialist and speckles make a more difficult to
distinguish a ROI (Region Of Interests). As a result, more than 30% of masses
referred for surgical breast biopsy were actually malignant [4, 5].
CAD (Computer Aided Diagnosis) is defined as a diagnosis that is made by a
radiologist, who uses the output from a computerized analysis of medical images, as a
‘second opinion’ in detecting lesions and in making diagnostic decisions. Most of the
previous research related with CAD of breast cancer has been focused on segmentation
of lesion and eliminate noises on image. To remove noises, anisotropic diffusion filter,
∗
Corresponding author.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 507–516, 2009.
© Springer-Verlag Berlin Heidelberg 2009
508 L.-J. Koo et al.
second order Butterworth filter, watershed transform etc. are used [4-7]. Each method
for removal is used by considering five features of ultrasound images which are con-
sisted area, circularity, protuberance, homogeneity, and acoustic shadow. For segmenta-
tion of lesion, algorithms based on region growing method are used because most
tumors are included in breast ultrasound images have a multiplicity of shapes & colors,
difference of pixel value between tumor and background image. According to the region
growing methods, in a case where an objective pixel and the adjacent pixels have an
identical feature using histogram, gradient based pixel value of image sequentially exe-
cuted processing integrates the pixels to form an area. Areas having the same feature are
gradually grown to achieve segmentation of the entire image. However, Most of the
previous researches have some limitation as follows.
1. The pixel value and tumor shape on ultrasound image are only considered for
noise removal, except for location of tumor on the image.
2. Only one ROI (Region of Interests) is segmented, or manual tasks are needed for
detection of plural tumors because one seed point is considered for region grow
method.
In this paper, the noise removal method based on the heuristic noise filtering related
to the lesion position on image and the detection method of one or more seed points
using K-MEANS algorithm is proposed. For accomplishment of this goal, the organi-
zation of paper is as follows. Section II is the introduction of the each step that is used
for noises removal. The detection of seed points in the breast ultrasound image is
present in Section III. Then the experimental results and conclusion are discussed in
Section IV.
2 Noise Removal
∑∑ I (i, j)
1
µ= (1)
ni n j i =1 j =1
Seed Point Detection of Multiple Cancers Based on Empirical Domain Knowledge 509
ni nj
∑∑ I (i, j) − µ
1
σ2 =
2
(2)
ni n j i =1 j =1
Min( I ) = min
i∈ni , j∈n j
( I (i, j)) (3)
The original breast ultrasound image that was used in this work is like a Fig. 1(a).
The level of pixel value information from each original image is transformed into
matrix for applying a CAD. The transformed matrix value that was shown in
Fig. 1(b) represents the light and shade value of each pixel on original image and
extra information such as outpatient’s name, distinction of sex, size of lesion. How-
ever, cutting of extra information is needed for protection of personal information.
So the extra information in images that were used in this work was cut as a result
we could only use breast ultrasound images without extra information.
Most pixel value of ultrasound image has all range of light and shade that distribute
from 0 to 255 values. But the distribution is a little too much to specific pixel value.
So, in this paper, ends-in search that makes more vivid the difference of the back-
ground and the Region Of Interest (ROI) is used for a image enhancement. A formula
of this method is as follow and the result is like Fig. 2.
Low = Min(Iorg ) + σ org
high = Max( Iorg ) − σ org
⎧ 0 Iorg (i, j ) ≤ Ιow (5)
⎪ Iorg (i, j ) − Low
I Endin (i, j) = ⎨255 × ( ) Low < Iorg (i, j ) > Ηigh
⎪255 High − Low
⎩ Iorg (i, j ) ≥ High
Where, Iorg is the ni by nj matrix which is transformed from the original image and σorg
is the standard deviation of Iorg acquired by Eq. (2).
The result of ends-in search was used for unnecessary image removal in domain
based heuristic noise filtering. The domain based heuristic noise filtering was basi-
cally designed by two domain known of ultrasound image as follows.
1. The shown lesion image on breast ultrasound is represented with a whole
feature.
2. The pixel value of noises related to unnecessary image and lesion has similar
range.
The proposed noise filter is based on distinct blocks operation of 13 by 13 matrixes as
shown in Fig. 2 and ostensive processes is operated by Eq. (6) as next step.
Step 1: The starting points of distinct blocks are four corners in Fig. 3.
Step 2: The mean of distinct block and the standard deviation of whole image
applied End-In search are computed and compared.
Step 3: If former is smaller than latter, all pixel values of distinct blocks are
transformed to 255. In against case, they are not changed by Eq. (6).
Step 4: If pixel values mentioned above are changed, the distinct blocks started by
1, 4 of Fig. 3 shift in the row direction. In against case, the progress direction
of blocks is opposite. Blocks have other starting points are operated in
reverse.
Seed Point Detection of Multiple Cancers Based on Empirical Domain Knowledge 511
∑I
1
µEndin = Endin (i, j )
NM i , j∈η
(6)
⎧ 255 µ < σ Endin
I Noi (i, j ) = ⎨ , where Endin
⎩ Endin
I (i , j ) µ Endin ≥ σ Endin
Generally, pixel value of lesion and surgical breast biopsy has low range but other
images that are not interfered have high range value. So we carry out Image negative
transformation for computing easily and clearly. A basic formula of Image negative
transformation is as follows.
I Neg (i, j ) = 255 − I Noi (i, j ) (7)
By executing the matrix search filtering, small noise in result image conducted image
negative transformation is removed. This filter used Sliding Neighborhood Operations
as 13 by 13 matrix of neighborhood blocks. Algorithms of this method are as follow
and the result is shown in Fig. 5.
512 L.-J. Koo et al.
∑
1
µNei = I Noi (i, j )
RC i, j∈ρ
Clow = µNeg − σ Neg
Chigh = µNeg + σ Neg
(8)
Where, ρ is the R by C local neighborhood of each pixel in the INoi and σNeg is the
standard deviation of INeg acquired by Eq. (2).
The differential image and final filtering are also used for noise removal. The differen-
tial image is the difference of two images that were conducted by matrix search filter-
ing and ends-in search. The result is eliminated noises that existed near the lesion
because the pixel values of the lesion image of matrix search filtering were high near to
255 but those of the lesion image of ends-in search were low near to 0. In this paper,
image of ends-in search is multiplied by 1.5 for elimination of noise around lesion.
The pixel value of ROI and background are clear certainly through the differential
image. Consequently, two-dimensional (2D) adaptive noise removal filter and local
mean filter is conducted for noise removal in this study. 2D adaptive noise removal
filter is methodology to remove noises using local mean and standard deviation [8]. In
this paper, pixel-wise Wiener filter is conducted including 100 by 100 neighborhood
blocks. The reason applying 100 by 100 matrix is that noises scatter but ROI concen-
trate. Pixel-wise Wiener filter is applied by Eq. (9) and the result is as Fig. 6.
∑
1
µDiff = I Diff (i, j )
RC i , j∈ρ
∑
1 2
α 2 Diff = I Diff (i, j ) − µDiff 2 (9)
RC i , j∈ρ
α 2 Diff −ν 2
I 2 D (i, j ) = µDiff + I Diff (i, j ) − µDiff
α 2 Diff
Where, ν2 is the average of all the local estimated variances and ρ is the R by C local
neighborhood of each pixel in the IDiff.
All noises are not entirely removed by pixel-wise Wiener filter. So local-mean fil-
ter is conducted once again. The local-mean filter is operated by computing the local
mean and standard deviation based pixels that value was higher than 1 in 2D adaptive
noise removal filtered image. The process of this method is that the sum (CS) and
difference (CD) of the mean and standard deviation is compared with each pixel value
of image. Then, the pixel value determined based on compared result. The algorithms
of determining pixel value were as Eq. (10) and the result is like a Fig. 7.
⎧ 0 I (i, j ) < CD or I 2 D > CS
I Lcl (i, j ) = ⎨ , where 2 D (10)
⎩ 255 CS ≥ I 2 D ≥ CD
In Dilation case, the origin of the structuring element is placed over the first white
pixel in the image, and the pixels in the structuring element are copied into their cor-
responding positions in the result image. Then the structuring element is placed over
the next white pixel in the image and the process is repeated. This is done for every
white pixel in the image.
In Erosion case, the structuring element is translated to the position of a white pixel
in the image. In this case, all members of the structuring element correspond to white
image pixels so the result is a white pixel. Now the structuring element is translated to
the next white pixel in the image, and there is one pixel that does not match. The
result is a black pixel. The remaining image pixels are black and could not match the
origin of the structuring element.
The result of morphology is shown Fig. 8.
In this section, seed points are detected for segmentation of lesion automatically.
Previous studies related to the detection of seed points had limitations that the number
of legions was one and the place of legion was center in breast ultrasound image. So,
K-MEANS algorithm was applied for solving these limitations above mentioned,
because this clustering method can best be described as a partitioning algorithm and is
suitable for auto detection of seed points if the number of tumors is set. The applica-
tion of this method is conducted under assumptions are as follows [9].
1. The number of tumors in breast ultrasound image was less than 10.
2. The pixel value of tumors after noise removal is close to 0.
The result is like as Fig. 9. In this paper, seed points mentioned results above are
used to improve boundary of ROI and background image because the segmentation of
ROI from seed points without additional processes take a long computing time.
a) case1 b) case2
c) case3 d) case4
The objective of this paper is to design noises removal methods and detect auto-
matically seed points for applying the breast ultrasound image to CAD. So, the whole
processes are conducted under the discussion with medical specialist and we pre-
sented auto detection of seed points preceded for segmentation with K-means algo-
rithm to find out the estimated central area of the tumor. Sometimes it is difficult to
differentiate between background color and tumor color, so we applied domain based
heuristic noise filter. As a result of applying proposed algorithm, the noise is effec-
tively eliminated in all images and an automated detection is possible by determining
seed points on each tumor.
References
1. Minister for Health: Welfare and Family Affairs (MIHWAF). Statics of Cancer, Republic of
Korea (2007)
2. Paulo, S.R., Gilson, A.G., Marcia, P., Marcelo, D.F., Chang, R.F., Suri, J.S.: A New Meth-
odology Based on q-Entropy for Breast Lesion Classification in 3-D Ultrasound Images. In:
Conf. Proc. IEEE Eng. Med. Biol. Soc., New York, vol. 1, pp. 1048–1051 (2006)
3. Moore, K.S.: Better Breast Cancer Detection. IEEE Spectrum 38, 51–54 (2001)
4. Madabhushi, A., Metaxas, D.N.: Automatic boundary extraction of ultrasonic breast lesions.
In: 2002 IEEE Int. Symp. Biomedical Imaging, pp. 601–604 (2002)
5. Huang, Y.L., Chen, D.R.: Automatic Contouring for Breast Tumors in 2-D Sonography. In:
Conf. Proc. IEEE Eng. Med. Biol. Soc., vol. 3, pp. 3225–3228 (2005)
6. Madabhushi, A., Metaxas, D.N.: Combining Low-, High-Level and Empirical Domain
knowledge for Automated Segmentation of Ultrasonic Breast Lesions. IEEE Trans. Medical
Imaging 22(2), 155–169 (2003)
7. Pietro, P., Jitendra, M.: Scale-Space and Edge Detection Using Anisotropic Diffusion. IEEE
Trans. Pattern Anal. Machine Intell. 12(7), 629–639 (1990)
8. Lim, J.S.: Two-Dimensional Signal and Image Processing, pp. 536–540. Prentice-Hall,
Englewood Cliffs (1990)
9. Guralnik, V., Karypis, G.: A scalable algorithm for clustering protein sequences. In: Proc.
Workshop Data Mining in Bioinformatics (BIOKDD), pp. 73–80 (2001)
A Controlled Scheduling Algorithm Decreasing
the Incidence of Starvation in Grid Environments
Abstract. A fair scheduling algorithm accounting for the weight and execution
time of tasks is critical in the Grid environment. The MTWCT (Minimize Total
Weighted Completion Time) has been proved to minimize the total weighted
completion time of a set of independent tasks in a processor, but it results in an-
other problem that the response time of some tasks is far longer. To decrease
the incidence of the starvation phenomena, an improved algorithm named CSA
(Controlled Scheduling algorithm) based on MTWCT is proposed, which com-
putes the ρ factors of tasks by the execution time and weight of step chains, and
selects the unexecuted step chain in terms of ρ factor and the executed time of
task. Experimental results exhibit that CSA compared with the MTWCT, de-
creases the completion time of short tasks and the average turnaround time by
sacrificing a little total weighted completion time.
1 Introduction
Efficient task scheduling can contribute significantly to the overall performance of the
Grid and Cluster computing systems [1] [2] since an inappropriate scheduling of tasks
can fail to exploit the true potential of the system [3]. Scheduling a set of independent
tasks in grid environment is NP-hard [4]. Among the algorithms that have been pro-
posed in the past, the Sufferage algorithm [5] has better performance. However, it
deals with one objective which minimizes the makespan in the set of independent tasks
[6]. In complex grid environment, the weight and execution time of takes are different.
It is unreasonable that assuming the importance and urgencies of all tasks are the same.
A scheduling algorithm called MTWCT (Minimize Total Weighted Completion Time)
[7], in the accordance to the fact that a task is made up of step chains of precedence
constraints, divides steps of each task into step chains by the ratios of weight and exe-
cution time of step chains and the step chain is to be executed as a whole. The ratios of
weight and execution time of step chains accounts for the importance of the task and
the length of execution time, which overcomes the limitation of Sufferage one-sided
pursuit of minimal makespan regardless of the importance of the task, but it results in
another problem that short tasks with a bit smaller ratios have to wait for long term.
That is because MTWCT deals with only one objective which is minimizing the total
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 517–525, 2009.
© Springer-Verlag Berlin Heidelberg 2009
518 M. Liu et al.
weighted completion time, neglecting the average turnaround time of tasks. When the
amount of long tasks is very great, it will exacerbate the extent of starvation [8]. Obvi-
ously it is unfair to the short tasks with a bit smaller ratios of weight and execution
time. In addition, it isn’t efficient for the system that all short tasks with a bit smaller
ratios haven’t chances to be executed until all long tasks have been completed; espe-
cially the execution time of long tasks is much longer than that of short tasks.
According to the above analysis, we propose a meaningful modified algorithm based
on MTWCT named CSA (Controlled Scheduling Algorithm) which is able to leverage
the strengths of MTWCT and avoid its weaknesses. CSA supports the following features.
(1) Avoiding short tasks with a bit smaller ratios waiting for long term.
(2) Decreasing the average turnaround time of all tasks. Namely, it is beneficial to
the majority of users and the system.
(3) Being able to make the total weighted completion time approximately optimal.
The rest of the paper is organized as follows: In section 2, we describe the strategies of
setting thresholds. The Controlled Scheduling Algorithm is introduced and we present
the analysis of the Controlled Scheduling Algorithm. Section 3 provides numerical
results to compare the task completion time, the average turnaround time and the total
weighted completion time under CSA and under MTWCT. Conclusions are drawn in
Section 4.
In this paper, we employ the triple used firstly by Graham [9] to depict the scheduling
1
∑
problem. 1|chains| ( w j p j )π ∪ ti : where 1 denotes all tasks are supposed to be
n
executed in a single processor and chains denotes a set of independent tasks.
1
∑
( w j p j )π ∪ ti is the objective of CSA that the total weighted completion time
n
and the average turnaround time are taking into account. Let (∑ w j p j )π denotes
1
the total weighted completion time and ti denotes the average turnaround time.
n
We summarize the notation used throughout the paper in Table 1.
Table 1. Notation
parameters descriptions
n Number of tasks in the task queue
k Number of steps every task
Tij the jth step of the ith task
pij execution time required by step Tij
wij weight of step Tij
ρ the ratio of step chain’s weight and execution time
t the limit executed time of a task
∆ρ the range of ρ factor
A Controlled Scheduling Algorithm Decreasing the Incidence of Starvation 519
In this section, two definitions and theorems related with CSA are given.
:
Definition 1 For a task is composed by k steps at most, the rule of its ρ factors is
defined as follows:
∑ ⎧ ∑ l wi j ⎫
l*
j =1
wi j ⎪ j =1 ⎪
= max ⎨ l ⎬ (1)
∑
l*
j =1
pi j
1≤ l ≤ k
⎪
⎩ ∑ j =1
pi j ⎪
⎭
namely, ρi1 (1,2,…,l*), where l* is the deadline step of ρi1 . In a similar manner,
ρi 2 ...ρik can be obtained.
:
Definition 2 Assuming that there are two tasks i and j, task i contains k steps de-
noted by Ti1 → Ti 2 → ... → Tik , and task j is made up of k steps denoted
by Ti1 → Ti 2 → ... → Tik . Let their steps’ sequence denoted by π (Ti1 , Ti 2 ,..., Tik , T j1 ,...T jk ) , the
total weighted completion time is defined to be:.
k k k k
(∑ pij wij )π = wi1 pi1 + ... + wik ∑ pim + wj1 ( p j1 + ∑ pim ) + ... + wjk (∑ pim + ∑ p jf ) (2)
m =1 m =1 m =1 f =1
∑ ∑
m r
j =1
wij w fa
Theorem 1: If > a =1
, and the step chain made up of Ti1 , Ti 2 ,..., Tim z is
∑ ∑
m r
j =1
p ij a =1
p fa
executed prior to the step chain made up of T f 1 ,T f 2, ...T fr , which makes the total
weighted completion time of steps denoted by Ti1 ,Ti 2 ,..., Tim , T f 1 ,T f 2, ...T fr minimal.
Theorem 2: If Tl is the deadline step of ρ factor about the step chain denoted by
*
Ti1 , Ti 2 ,..., Tik , a step chain denoted by Ti1 , Ti 2 ,..., Til must be uninterrupted.
*
Appropriate t and ∆ρ are critical to CSA. The objective of setting t is to make a step
chain represented by a smaller ∆ρ factor have a chance to be executed earlier than
that of a bigger ρ factor. Setting ∆ρ is aim to choose other ρ factors in
[max_ ρ − ∆ρ , max_ ρ ] , where max_ ρ denotes the maximum ρ factor, to ensure
the weighted total completion time as optimal as possible.
We define the following arrays that are used in the description of setting t
thresholds.
t[][] : array of step chains execution time. where t[i][j] denotes the execution time
required by the jth step chain of the ith task.
j
sum_t[][] : array of total execution time of step chains, where sum _ t[i ][ j ] = ∑ t[i][ f ] .
f =1
Step 1: choose two critical values ∆ρ [i ] and ∆ρ [ j ] from the one-dimensional vector
∆ρ [] arbitrarily, and select a constant from the interval (∆ρ [i ], ∆ρ [ j ]) .
Step 2: Similar to step 1, set the initial limit executed time allocate to a task which
is called t .
Step 3: Sort all the ρ factors by non-ascend order, after that save them in the one-
dimensional vector ρρ [] .
Step 4: Set a flag to every ρ factor and initialize flag=0. Where flag=0 denotes the
step chain represented by ρ factor hasn’t been executed; otherwise, means it has
been executed.
Step 5: Set a temporary limit executed time temp_t, and initialize temp_t=t.
Step 6: Choose the maximum ρ factor whose value of flag is zero from one-
dimensional vector ρρ [] , which is called max_ ρ .
Step 7: Save all the ρ factors between max_ ρ − ∆ρ and max_ ρ whose values
of flags are zero in a one-dimensional vector candi _ ρ[] .
Step 8: If executed time of the task represented by max_ ρ less than temp_t, exe-
cute the step chain represented by max_ ρ , after that the value of max_ ρ ’s flag is
set to 1 and then go to step 9. Otherwise, if candi _ ρ[] is an empty vector, execute
the step chain represented by max_ ρ , else select a candi _ ρ[i ] with the minimum
task executed time from candi _ ρ[] and process the step chain represented by
candi _ ρ[i ] .
Step 9: Set temp _ t = t , go to step 11.
Step 10: Update temp _ t = temp _ t + min_ t , go to step 11.
Step 11: Repeat the whole procedures from step 5 through step 10 until all step
chains have been executed.
The time complexity of CSA is mainly composed of three parts: setting t thresholds,
setting the ∆ρ thresholds and the choice of step chains. we can obtain that it takes
O(2nk) to set t thresholds and O(mk+hk) to set ∆ρ thresholds according to (3.3). the
choice of step chains costs O((mk+hk)*(mk+hk)). CSA costs O(2nk)+O(mk+hk) more
than MTWCT, which is ignored when m and h is very big.
According to the analysis of the CSA, we can obtain the following deductions:
Deduction 1: if t → +∞ , then CSA is equivalent to MTWCT.
522 M. Liu et al.
Proof: Accord to step 8 of the description of CSA, if t → +∞ , the task executed time
of max_ ρ always less than temp _ t , so the step chain represented by max_ ρ will
be executed at all times. ∆ρ and t has no impact on selecting unexecuted step chains ,
then CSA is equivalent to MTWCT.
Deduction 2: if ∆ρ → −∞ , then CSA is equivalent to MTWCT.
The execution time and weights of steps can’t be identified in advance, which need
predicting. In this paper, experimental parameters are assumed to be obtained. We
focus on comparing MTWCT and CSA in terms of the task completion time, the av-
erage turnaround time and the total weighted completion time.
The execution time, weights and ρ factors of step chains of every task are given
in Table 2.
From Table 3, we can see that if t = 0.5 , ∆ρ = 0.32 , t = 0.5 , ∆ρ = 0.005 and
t = 0.06 , ∆ρ = 0.02 , CSA is equivalent to MTWCT. From deduction 1, it is obvious
that t is equivalent to +∞ while t >0.489 because of the maximum threshold t for
0.489 in this experiment. According to deduction 2, when ∆ρ <0.05, ∆ρ is equiva-
lent to −∞ due to the fact that the minimum threshold ∆ρ is 0.05 in this experiment.
t , ∆ρ Sequences of ρ factors
MTWCT ρ11 , ρ 21 , ρ12 , ρ 22 , ρ13 , ρ 23 , ρ14 , ρ31 , ρ32
t = 0.06, ∆ρ = 0.02 ρ11 , ρ 21 , ρ12 , ρ 22 , ρ13 , ρ 23 , ρ14 , ρ31 , ρ32
t = 0.06, ∆ρ = 0.3 ρ11 , ρ 21 , ρ31 , ρ 32 , ρ 22 , ρ12 , ρ13 , ρ 23 , ρ14
t = 0.11, ∆ρ = 0.15 ρ11 , ρ 21 , ρ 22 , ρ 23 , ρ12 , ρ31 , ρ32 , ρ13 , ρ14
CSA t = 0.11, ∆ρ = 0.1 ρ11 , ρ 21 , ρ 22 , ρ12 , ρ 31 , ρ 23 , ρ13 , ρ32 , ρ14
t = 0.162, ∆ρ = 0.25 ρ11 , ρ 21 , ρ31 , ρ12 , ρ 22 , ρ32 , ρ 23 , ρ13 , ρ14
t = 0.5, ∆ρ = 0.32 ρ11 , ρ 21 , ρ12 , ρ 22 , ρ13 , ρ 23 , ρ14 , ρ31 , ρ32
t = 0.5, ∆ρ = 0.005 ρ11 , ρ 21 , ρ12 , ρ 22 , ρ13 , ρ 23 , ρ14 , ρ31 , ρ32
The number of tasks is fixed to 100, the category1 tasks are assumed to long tasks
and the category2 task are supposed to short tasks. The percentage of the category1
tasks are in [10%,90%]. Since t = 0.5, ∆ρ = 0.32 , t = 0.5, ∆ρ = 0.005 and
t = 0.06, ∆ρ = 0.02 , CSA is equivalent to MTWCT, only t = 0.06, ∆ρ = 0.03 ,
t = 0.11, ∆ρ = 0.15 , t = 0.11, ∆ρ = 0.1 ,and t = 0.152, ∆ρ = 0.25 are adopted to test the
performance of CSA respectively.
With the increasing of long tasks, the results of the short tasks completion time is
depicted in Fig.1. According to the simulation results, the performance of CSA varies
with different t and ∆ρ .
From Fig.1, Note that the short tasks completion time under CSA is obviously
smaller than that using MTWCT.
From Fig.1 it can be obtained easily that when short tasks are executed earlier, long
tasks are inevitably delayed. As we know, the average turnaround time can reflect the
overall performance of a scheduling algorithm. The average turnaround time is short
in favor of the majority of users, also reflects high system throughput, and high sys-
tem utilization rates. The simulation results is showed in Fig.2.
From Fig.2, we can see that either CSA or MTWCT increases the average turn-
around time with the increasing of long tasks. The values of t , ∆ρ have great impact
on CSA in the performance of average turnaround time. For t = 0.11, ∆ρ = 0.15 , CSA
is extremely efficient and when the percentage of long tasks is over 13%, CSA is
superior to MTWCT.
Fig. 3. Comparison of the weighted total completion time using two algorithms
A Controlled Scheduling Algorithm Decreasing the Incidence of Starvation 525
We can see that the weighted total completion time using CSA is definitely not better
than that using MTWCT by the previous theoretical analysis. The performance of the
total weighted completion time comparison of the CSA and MTWCT is depicted in
Figs.3.
From Fig.3, we can see that either CSA or MTWCT increases the weighted total
completion time in parallel to the increase in the number of long tasks. The weighted
total completion time using CSA is a bit greater than that using MTWCT.
4 Conclusion
In this paper, we presented a meaningful modified scheduling algorithm named Con-
trolled Scheduling Algorithm for a set of independent tasks in a processor of heteroge-
neous systems, which supports three features proposed in the section I. CSA takes into
account of fairness and effectiveness, decreasing the incidence of the starvation phe-
nomena produced by MTWCT. In addition, CSA has the flexibility to set t and ∆ρ to
meet various applications. A large number of experimental results indicate that in
general compared with MTWCT, the average turnaround time under CSA decreases
around 20% at the price of increasing 3% in the total weighted completion time.
References
1. Foster, I., Kesselman, C.: The Grid2, Blueprint for a New Computing Infrastructure. Mor-
gan Kaufmann, San Francisco (2004)
2. Abraham, A., Buyya, R., Nath, B.: Nature’s Heuristics for scheduling Jobs on Computa-
tional Grids. In: ADCOM 2000, Cochin, India, pp. 45–52 (2000)
3. Kwok, Y.-K., Ahmad, I.: Static Scheduling Algorithms for Allocating Directed Task Graphs
to Multiprocessors. In: ACM Computing Surveys, New York, pp. 406–471 (1999)
4. Maheswaran, M., Ali, S., Siegel, H.: Dynamic Matching and Scheduling of a Class of In-
dependent Tasks onto Heterogeneous Computing Systems. In: Eigth Heterogeneous Com-
puting Workshop, San Juan, Pueto Rico (April 1999)
5. Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D., Freund, R.F.: Dynamic Mapping of a
Class of Independent Tasks onto Heterogeneous Computing Systems. Journal of Parallel
and Distributed Computing 59(2), 107–131 (1999)
6. SaiRanga, P.C., Baskiyar, S.: A low complexity algorithm for dynamic scheduling of inde-
pendent tasks onto heterogeneous computing systems. In: Proceedings of the 43rd annual
Southeast regional conference, New York, pp. 63–68 (2005)
7. Pinedo, M.: Scheduling-theory, algorithms, and systems, ch. 4. Prentice Hall, Englewood
Cliffs (1995)
8. Cuesta, B., Robles, A., Duato, J.: An Effective Starvation Avoidance Mechanism to En-
hance the Token Coherence Protocol. In: IEICE Transactions on Fundamentals of Electron-
ics, Communications and Computer Sciences, Washington, pp. 47–54 (2007)
9. Graham, R.L., Lawler, E.L., Lenstra, J.K., et al.: Optimization and Approximation in De-
termi-nistic Sequencing and Scheduling. Annals of Discrete Mathematics 5, 287–326
(1979)
A Space Allocation Algorithm for Minimal Makespan in
Space Scheduling Problems
Abstract. The factory space is one of the critical resources for the machine as-
sembly industry. In machinery industry, space utilizations are critical to the ef-
ficiency of a schedule. The higher the utilization of a schedule is, the quicker
the jobs can be done. Therefore, the main purpose of this research is to derive a
method to allocate jobs into the shop floor to minimize the makespan for the
machinery industry. In this research, we develop an algorithm, Longest Contact
Edge Algorithm, to schedule jobs into the shop floor. We employed the algo-
rithm to allocate space for jobs and found that the Longest Contact Edge Algo-
rithm outperforms the Northwest Algorithm for obtaining better allocations.
However, the Longest Contact Edge Algorithm results in more time complexity
than that of the Northwest Algorithm.
1 Introduction
Scheduling is an important tool for manufacturing and engineering, as it can have a
major impact on the productivity of a process. In manufacturing, the purpose of
scheduling is to minimize production time and costs by arranging a production facility
regarding what and when to make, with which staff, and on which equipment. Pro-
duction scheduling aims to maximize the efficiency of the operation and reduce costs
[13]. In general, the scheduling problem could be very complicated. Perng et al.
[2][3][4][5][6] proposed a space resource constrained job scheduling problem. In their
researches, utilization of a shop floor space was an important issue in a machinery
assembly factory. The assembly process of a machine required a certain amount of
space on a shop floor in the factory for a period of time. The sizes of the shop floor
and machines would determine the number of machines which can be assembled at
the same time on the shop floor. The sequences of jobs would affect the utilization of
the shop floor. The space on the shop floor could be divided into several chunks and
some of these jobs could be allocated for simultaneous assembling. When a new job
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 526–534, 2009.
© Springer-Verlag Berlin Heidelberg 2009
A Space Allocation Algorithm for Minimal Makespan 527
arrived, the factory has to offer a complete space to allocate the new job based on its
space requirement. If the factory was at its capacity, the new order must wait for the
space taken by jobs previously assigned to complete production. Due to space capac-
ity constraints, we have to schedule the sequence of jobs carefully in order to maxi-
mize space utilization.
The makespan is important when the number of jobs is finite. The makespan is de-
fined as the time the last job leaves the system, that is, the completion time of the last
job [8]. The makespan shows within how many days the factory can complete these
jobs. Nichols et al. [14] was the first paper to discuss scheduling with minimum
makespan as an objective. In their research, n independent, single operation jobs, all
available at time zero, on m identical processors; each job must be processed by ex-
actly one of these processors. Computational experience with the procedure indicated
that good solutions to large scale problems were easily obtained. Gallo and Scutella
[9] tried to minimize the makespan of scheduling problem with tree-type precedence
constraints. The assembly factories wanted to select a sequence from which all jobs
can be completed in time. Liao and Lin [1] considered the two-uniform-parallel-
machine problem with the objective of minimizing makespan. The problem can be
transformed into a special problem of two identical parallel machines from the view-
point of workload instead of completion time. An optimal algorithm was developed
for the transformed special problem. Although the proposed algorithm had an expo-
nential time complexity, the results showed that it could find the optimal solution for
large scale problems in a short time. Lian al. [15] presented a swarm optimization
algorithm to solve a job shop scheduling problem. The objective was to find a sched-
ule that minimizes the makespan. Perng et al. [5] proposed an algorithm based on a
container loading heuristic (CLH) approach [7] to a space scheduling problem with
minimal makespan. However, the original CLH could not find all the possible free
space, while this research could. Zobolas et al. [10] proposed a hybrid metaheuristic
for the minimization of makespan in permutation flow shop scheduling problems. A
genetic algorithm (GA) for solutions was selected. Computational experiments on
benchmark data sets demonstrated that the proposed hybrid metaheuristic reached
high-quality solutions in short computational times. Although numerous scheduling
problems discussed makespan, there is only a limited amount of literature available on
the space scheduling field.
One of the real world applications is the efficient space allocation algorithm. This
research proposes a new space allocation algorithm to minimize the makespan for a
space scheduling problem. In the other words, the main purpose of this research is to
derive a method to allocate the jobs into a shop floor more efficiently.
2 Problem Formulation
Let N denote a set of n jobs. Let s1, s2,…, sn denote the start time for each job. Let p1,
p2,…, pn denote the processing time for each job. Let dd1, dd2,…,ddn denote the due
date for each job. Let σ denote an arbitrary sequence. The rest of notations is shown
as follows:
528 C. Perng et al.
Start
Finish
In LCEA, we tally the number of grids that the perimeter of the new job we tend to
allocate if it is allocated from the reference point contacts with the jobs previously
allocated. We examine grid (i, j) around the new job’s working space. Let grid (i, j) =
1, if grid (i, j) is occupied by any job or obstacles in the shop floor. Let grid (i, j) = 0,
if grid (i, j) is an available space. The concept of longest contact edge is shown in
Figure 2. It shows that the perimeter of the new job’s working space is surrounded by
unavailable grids. Equation (2) tallies the number of contact grids if the job is allo-
cated at the reference point (northwest corner).
In order to evaluate the performance of the proposed algorithm under traditional dis-
patching rules, namely SPT, LPT, EDD, FCFS, SSR, and LSR, the algorithm was
developed by using PHP and Microsoft Visual Basic languages. Integrated software
was also developed for solving the space allocation problem. The current version of
the software includes both Northwest and Longest Contact Edge Algorithms. The
interface of the software is shown in Figure 3. We used a Pentium IV (Celeron CPU
2.40GHz) computer for the computations. We computed the makespans for different
number of jobs (25, 50 and 75). Thirty different data sets were obtained from the OR-
Library [1] [2] due to real data being too few for statistical analysis. The space re-
quirements of a job and order information were obtained from a company located in
central Taiwan. The results are listed in Table 1 to 3.
We employ t-test for matched samples of dispatching rules to compare the
makespans between the NWA and LCEA. The results indicate that LCEA outper-
forms NWA for every dispatching rule in makespan. However, LCEA consumes more
time complexity T [5]. All results can be obtained within 5 seconds. The LCEA could
find any possible free grids in the factory. Hence, it shortens the makespan in a space
scheduling problem.
Sequencing
NWA LCEA T1 T2
rule
SPT 85.20 75.67 142455 1221082
LPT 67.43 63.90 140084.4 1232975
EDD 72.53 67.90 124962.9 1293922
FCFS 79.93 72.40 143633.4 1238529
SSR 80.37 75.47 141535.8 1679210
LSR 77.30 72.60 171410.2 921304.1
T1 NWA
T2 LCEA
Sequencing
NWA LCEA T1 T2
rule
SPT 306.33 232.73 324420.2 2012006
LPT 244.63 199.53 298566.8 2150939
EDD 262.53 198.60 296033.1 2056968
FCFS 282.63 211.43 308262 2081974
SSR 249.77 218.10 43540.9 2347393
LSR 314.60 259.60 304970.1 2039592
T1 NWA
T2 LCEA
Sequencing
NWA LCEA T1 T2
rule
SPT 151.5 130.83 345015.2 1873230
LPT 111.8 95.63 304846.5 1923400
EDD 119.37 103.77 315741.6 1919576
FCFS 134.57 115.57 327115.7 1888276
SSR 134.30 108.90 400728 2655211
LSR 130.80 108.90 369289.9 2655211
T1 NWA
T2 LCEA
A Space Allocation Algorithm for Minimal Makespan 533
5 Concluding Remark
A space scheduling problem is an important issue in a machinery assembly factory. In
this study, we compared NWA with proposed LCEA. We employed the makespan as
the performance measure in the space scheduling problem. We employed the LCEA
to allocate space for jobs. We found that the LCEA can do the better assignments than
the NWA for all schedules obtained from traditional dispatching rules. However, the
computing complexity of the LCEA is a little worse than that of the NWA.
There are some assumptions in this study, for example, all jobs are available at day
one, shapes of the orders are all rectangles, a job will not be moved until it is done,
there is no constraint on job’s height, and the buffer or storage is available to fit in
any number or any shape of jobs. It may result in different conclusions if some of
assumptions are relaxed.
References
1. Liao, C.J., Lin, C.H.: Makespan minimization for two uniform parallel machines. Interna-
tional Journal of Production Economics 84(2), 205–213 (2003)
2. Perng, C., Lai, Y.C., Zhuang, Z.Y., Ho, Z.P.: Application of scheduling technique to job
space allocation problem in an assembly factory. In: The Third Conference on Operations
Research of Taiwan, vol. 59, pp. 1–7. Yuan-Zhe University, Tao-Yuan County (2006)
3. Perng, C., Lai, Y.C., Zhuang, Z.Y., Ho, Z.P.: Job scheduling in machinery industry with
space constrain. System Analysis Section. In: The Fourth Conference on Operations Re-
search of Taiwan, vol. 5, pp. 1–11. National Dong-Hwa University, Hwa-Lian City (2007)
4. Perng, C., Lai, Y.C., Ho, Z.P.: Jobs scheduling in an assembly factory with space obsta-
cles. In: The 18th International Conference on Flexible Automation and Intelligent Manu-
facturing, Skovde, Sweden, vol. 4B, pp. 1–9 (2008)
5. Perng, C., Lin, S.S., Ho, Z.P.: On space resource constrained job scheduling problems- A
container loading heuristic approach. In: The 4th International Conference on Natural
Computation, vol. 7, pp. 202–206. Shandong University, Jinan (2008),
doi:10.1109/ICNC.2008.419
6. Perng, C., Lai, Y.C., Ho, Z.P.: A space allocation algorithm for minimal early and tardy
costs in space scheduling. In: 3rd International Conference on New Trends in Information
and Service Science papers (NISS), Sec.1(6), Beijing Friendship Hotel, Beijing, China, pp.
1–4 (2009)
7. Pisinger, D.: Heuristics for the container loading problem. European Journal of Opera-
tional Research 141(2), 382–392 (2002)
8. Sule, D.R., Vijayasundaram, K.: A heuristic procedure for makespan minimization in job
shops with multiple identical processors. Computers and Industrial Engineering 35(3-4),
399–402 (1998)
9. Gallo, G., Scutella, M.G.: A note on minimum makespan assembly plans. European Jour-
nal of Operational Research 142(2), 309–320 (2002)
10. Zobolas, G.I., Tarantilis, C.D., Ioannou, G.: Minimizing makespan in permutation flow
shop scheduling problems using a hybrid metaheuristic algorithm. Computers & Opera-
tions Research 36(4), 1249–1267 (2009)
11. Beasley, J.E.: OR-Library: distributing test problems by electronic mail. Journal of Opera-
tional Research Society 41(11), 1069–1072 (1990)
534 C. Perng et al.
Biography of Authors
Chyuan Perng is an associate professor in the Department of Industrial Engineering and En-
terprise Information at Tunghai University, Taiwan. He received his Ph.D. degree in Industrial
Engineering from Texas Tech University, USA. He has also participated in numerous industrial
and governmental projects in Taiwan.
Chin-Lun Ouyang graduated from the Department of Industrial Engineering and Enterprise
Information at Tunghai University, Taiwan. He received a Master and Bachelor degree. He
currently serves at the Conscription Agency Ministry of the Interior in Taiwan.
A New Approach for Chest CT Image Retrieval*
Abstract. A new approach for chest CT image retrieval is presented. The proposed
algorithm is based on a combination of low-level visual features and high-level
semantic information. According to the new algorithm, wavelet coefficients of the
image are computed first using a wavelet transform as texture feature vectors. The
zernike moment is then used as an effective descriptor of global shape of chest CT
images in database, and the semantic information is extracted to improve the ac-
curacy of retrieval. Finally, index vectors are constructed by the combination of
texture, shape and semantic information, and the technique of relevance feedback
is used in the algorithm to enhance the effectiveness of retrieval. The retrieval re-
sults obtained by application of our new method demonstrate an improvement in
effectiveness compared to other kinds of retrieval techniques.
1 Introduction
With DICOM (Digital Image Communication and Medician), information of patient
can be stored with the actual images. The digitally produced medical images are pro-
duced in everincreasing quantities and used for therapy and diagnostics. The medical
imaging field has generated additional interest in methods and tools for the manage-
ment, analysis, and communication of these medical images. Many diagnostic imaging
modalities are routinely used to support clinical decision making. It is important to
extend such applications by supporting the retrieval of medical images by content.
In recent years, much research has been done into specific medical image retrieval
system[1-4], and the research for medical image retrieval is based on the same anatomic
region. Chest CT is gray-scale image, so the retrieval method based on color feature can’t
attain effective performance. Meanwhile, extensive experiments in content-based image
retrieval(CBIR) systems have demonstrate that low-level features can not always de-
scribe high-level semantic concepts in the user’s mind[5]. Therefore, this paper presents a
new approach based on combination of low-level visual features and high-level semantic
information. Firstly, we extract stexture feature of images by wavelet transform. The
*
Sponsored by the Science and Technology Foundation of Hangzhou Normal University
(2009XJ065).
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 535–543, 2009.
© Springer-Verlag Berlin Heidelberg 2009
536 L.-d. Wang and Z.-x. Shou
wavelet transform based texture analysis method has proven to be an efficient method for
texture analysis due to its property of both space and frequency localization in this
specified manner. Then a region-based shape[6] descriptor is presented, which utilizes a
set of the magnitudes of Zernike moments. We show that the Zernike moment descriptor
can be used effectively as a global shape descriptor of an image, especially for a large
medical image database, and then describe the process of extracting Zernike moment
descriptor. Finally, in order to improve the retrieval performance, texture and shape
features are combined with the high-level semantic feature to reduce the “semantic gap”
between visual features and the richness of human semantics, and the technique of rele-
vance feedback is used in the algorithm to enhance the effectiveness of retrieval.
Thus, based on the above technology, a simple prototype system is developed to
compare the retrieval accuracies. Experimental results show that the method discussed
in this paper is much more effective.
The paper is organized as follows. Section 2 gives a review of feature extraction,
including wavelet transform, Zernike moments and semantic feature. In section 3, we
proposed a method to combine texture, shape features with semantic information. In
section 4, experiments have been conducted and the performance of the proposed al-
gorithm is analyzed. The conclusions are presented in section 5.
2 Features
It is found that various approaches to texture analysis are very diverse and in this re-
spect, four categories can be defined, which are namely statistical, geometrical, model
based and signal processing. The first three texture analysis methods are found suitable
for the regular and near regular methods, and the texture information of chest CT
images is focused on the middle region. Therefore, we utilize the signal processing
algorithm. The wavelet transform based technique performs the space-frequency de-
composition with low computational complexity, it has been proven to be an efficient
method for texture analysis[7]. In this paper, Daubechies’ wavelet is selected as bases
due to its orthonormal characteristics.
The experimental procedure for chest CT image texture feature extraction method is
as follows:
(1) Chest CT images are preprocessed and the sizes of the images are made to be same
to make the system computationally less complex.
(2) A two dimensional Daubechies’ wavelet transform is applied to decompose the
image into its sub-band images. Three-level pyramidal decomposition is used in
this work.
(3) All the sub-band images are stored for calculating the statistical features from the
sub-bands.
Fig.1 shows Daubechies’ wavelet based decomposed three sub-bands that are created from
512×512 chest CT image. In order to extract texture feature, all sub-band images from three
levels at different resolutions will be calculated for feature vector. During statistical feature
A New Approach for Chest CT Image Retrieval 537
M N
1
Mean =
MN
∑∑ f ( x, y ) (1)
x =1 y =1
M N
1
Variance = ∑∑ f ( x, y ) − Mean
2
2
(2)
( MN ) x =1 y =1
M N
1
Energy = ∑∑
2
2
f ( x, y ) (3)
( MN ) x =1 y =1
M N
Entropy = −∑∑ f ( x, y ) log f ( x, y ) (4)
x =1 y =1
These four values are considered to be beneficial to be used as texture features and then
calculated from all the sub-bands of chest CT images. Then the similarity between two
images can be measured by Euclidean distance.
The Zernike moment descriptor has such desirable properties: rotation invariance,
robustness to noise, expression efficiency, fast computation and multi-level represen-
tation for describing the various shapes of patterns[9-10]. Therefore, we apply the Zernike
moments to chest CT image retrieval for shape feature extraction. We can calculate the
Zernike moment feature by (7) and (8).
538 L.-d. Wang and Z.-x. Shou
1 π
2n + 2
Cnl =
π ∫∫R nl
(r ) cos(lθ ) f (r , θ ) rdrdθ (7)
0 −π
1 π
2n + 2
S nl =
π ∫∫R nl
( r ) sin(lθ ) f ( r , θ ) rdrdθ (8)
0 −π
Images in medical image database are in high similarity, it is difficult to retrieve the
images with the same pathologic feature. The low-level feature extraction can not attain
effective result. In order to retrieval accuracy of retrieval system, we try to reduce the
“semantic gap” between visual features and the richness of human semantics. In this
paper, we propose a method to extract semantic information.
Because the descriptions are subjective, different doctors have different diagnosis
words for each image, but they have the similar meaning. Considering the doctor’s
diagnostic description (Table 1) for the images, we can extract keywords from these
descriptions with doctor’s instruction. According to medical knowledge, we derive the
keyword sets(Table 2) as semantic information. Let Li be the semantic feature vectors,
Li =(Wi1,Wi2,…Wim), where m represents the number of keyword derived from de-
scription, Wij is the weight of keyword j from the semantic information of image i. In
order to reduce the computation complexity, the weight is equal to the number of times
that keyword appears in the semantic information, if it does not appear, Wij=0.
Let p, q represent the texture features of the query image and the database image
respectively. The distance between them can be computed as:
m
∑W qk
× Wpk
D ( p, q ) = 1 − k =1 (9)
m m
(∑ Wqk 2 )(∑ Wpk 2 )
k =1 k =1
3 Feature Fusion
The retrieval method based on low-level features can not attain efficient performance.
To improve the effectiveness and accuracy, we propose a method to combine low-level
feature with high-level feature. The low-level features contain texture and shape. In
order to enhance the effectiveness of retrieval, the technique of relevance feedback is
used in the algorithm. The Rocchio algorithm[12], which uses user feedback on the
relevancy of retrieved images, has been shown to improve query results. Fig.2 illus-
trates the major steps of the integrated retrieval.
Chest CT image
Zernike moments
Similarity measure computation
Similarity measure
Similarity measure
Normalization
Since different feature’s distance has different standard, before integrated retrieval
we should have a normalization.
where a represents the number of similar images and b is the number of un-similar
images in the results. That is, the percentage of similar images retrieved with respect to
the number of retrieved images.
m
1
Average − r =
m
∑ρ r
(13)
r =1
where m is a given constant, and in this paper m = 8, ρ r is the rank of rth image in the
first m similar images. Therefore the ideal value of ρ r = 4.5, and the lower numbers
indicate better results. In our case, the retrieval results return 15, 30, 45 images for each
query. Tab.3-5 show the average precision, rank of different methods.
Tab.3 compares the results obtained by three texture extraction algorithms, which
are co-occurrence matrix, texture spectrum and wavelet transform. A total average
84.5% matched retrieved images is achievable using the wavelet transform algorithm.
The same experiment, performed using co-occurrence or texture spectrum, demon-
strated lower performance on average precision and average-r.
542 L.-d. Wang and Z.-x. Shou
Tab.4 shows the quantitative results obtained by the application of integrated re-
trieval and shape algorithms. As illustrated, the feature fusion algorithm has a fairly
high precision and small average-r, while demonstrating higher retrieval accuracy.
Meanwhile, Zernike moment performed better than Hu moment. This is expected and
mostly due to its property of multi-level representation for describing the various
shapes of patterns and its rotation invariance.
Tab.5 shows the precision of second retrieval is higher than original retrieval. And
the average rank reaches ideal value. Therefore the method of relevance feedback is
successful with high retrieval efficiency.
5 Conclusion
In this paper, a new approach for chest CT image is presented. Texture and shape features
are combined with semantic information for image retrieval. The experimental results
show that the proposed algorithm is effective. Applying this method to medical image
retrieval will support the clinical decision making and have an enormous potential.
However, further developments should be made in order to improve the retrieval
accuracy. The semantic feature vector should be modified to reduce its size and im-
prove its discriminative property.
A New Approach for Chest CT Image Retrieval 543
Acknowledgement
The authors would like to thank Radiotherapic Department, Intervention Radiotherapic
Department and Endoscopic Department of Inner Mongolia Hospital.
References
[1] Wan, h.-l., et al.: Texture feature and its application in CBIR. Journal of Computer-aided
Design and Computer Graphics 15(2), 195–199 (2003)
[2] Shyu, C., Brodley, C., Kak, A., et al.: ASSERT, A physician-in-the-loop content-based
image retrieval system for HRCT image databases. Computer Vision and Image Under-
standing 75(1/2), 111–132 (1999)
[3] Aisen, A.M., Broderick, L.S., Winer-Muram, H., et al.: Automated storage and retrieval of
thin section CT images to assist diagnosis: System description and preliminary assessment.
Radiology 228(1), 265–270 (2003)
[4] Sun, J., Zhang, X., Cui, J., Zhou, L.: Image retrieval based on color distribution entropy.
Pattern Recognition Letters 27(10), 1122–1126 (2006)
[5] Liu, Y., Zhang, D., Lu, G., Ma, W.-Y.: A survey of content-based image retrieval with
high-level semantics. Pattern Recognition (2006)
[6] Wen, C.-Y., Yao, J.-Y.: Pistol image retrieval by shape representation. Forensic Science
International 155(1), 35–50 (2005)
[7] Paschos, G.: Fast Color: Texture Recognition Using Chromaticity Moments. Pattern
Recognition Letters 21, 837–841 (2000)
[8] Borah, S., Hines, E.L., Bhuyan, M.: Wavelet transform based image texture analysis for
size estimation applied to the sorting of tea granules. Journal of Food Engineering 79(2),
629–639 (2006)
[9] Mehrotra, R., Gary, J.E.: Similar-shape retrieval in shape data management. IEEE Com-
put. 28(9), 57–62 (1995)
[10] Kim, W.-Y., Kim, Y.-S.: A region-based shape descriptor using Zernike moments. Signal
Processing: Image Communication 16, 95–102 (2000)
[11] Ortega, M., et al.: Supporting similarity queries in MARS. In: ACM Conf. on Multimedia,
pp. 403–413 (1997)
[12] Baeza-Yates, R., Ribeiro-Neto, B.: ModernInformation Retrieval. Addison Wesley,
Reading (1999)
Depth Extension for Multiple Focused Images by
Adaptive Point Spread Function*
Box 1209, Department of Electronic Science & Engineering, Nanjing University No. 22,
Hankou Road, Nanjing 210093, P.R. China
yqwang@nju.edu.cn
1 Introduction
Depth of field is the area in front of and behind the specimen that will be in accept-
able focus. In any imaging system(e.g. video camera, light optical microscope, etc.),
as an inherent feature, image blurring is inevitable for all of the lens. The system
works with finite depth of field which is relative to the numerical aperture (NA) and
can not image all objects at various distances with the same clarity. Everything im-
mediately in front of or in back of the focusing distance begins to lose sharpness.
Depth fusion is performed at three different processing levels according to the
stage at which the fusion takes place: pixel, feature and decision[1]. In general, the
techniques for multi-focused image fusion can be grouped into two classes: (1)
Color related techniques, and (2) Statistical/numerical methods. Selecting the ap-
propriate approach depends strongly on the actual applications. Some commonly
*
This work is Funded by the key program from the National Natural Science Foundation of
China(60832003).
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 544–554, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Depth Extension for Multiple Focused Images by Adaptive Point Spread Function 545
According to the isoplanatism condition, PSF has the same shape over the whole field
of view at the same focal plane. The complicated PSF may be simplified to a concise
expression as:
1 (3)
P(r ) =
4r 2
where, r is the radius to the center of the blur circle, and for discrete case, it may be
denoted as a symmetric matrix, h, which’s diagonal elements are confirmed as:
⎧ i2 , i ≤ ( n + 1) / 2
hii = ⎨ (4)
⎩( n + 1 − i ) , i > ( n + 1) / 2
2
ai = hii
(5)
Then, the matrix of discrete PSF is:
h = kaT a (6)
where, k is normalized coefficient, and
n
k = 1 (∑ ai ) 2 (7)
i =1
Depth Extension for Multiple Focused Images by Adaptive Point Spread Function 547
PSF P0 P1 P2 P3 P4 P5 P6 P7
a 1,3,11,3,1 1,3,10,3,1 1,3,9,3,1 1,3,8,3,1 1,4,8,4,1 1,3,6,3,1 1,4,6,4,1 1,2,3, 2,1
k 441 400 289 256 361 196 256 81
WH(1) 0.52 0.61 0.70 0.85 0.91 1.1 1.3 1.4
Sr(2) 0.4187 0.3460 0.2803 0.2500 0.2244 0.1837 0.1406 0.1111
(1)WHthe width of PSF, i. e. the Full Width at Half Maximum of the PSF, (2)Sr=P(r)/P0(r), i.e. the central inten-
sity of PSF of a defocused image compared to that of a focused image.
As show in Fig.2, original image g1 and g2 are two images respectively with different
focused region. For easier to describe our idea, the scenes is simplified that it only
consists of two objectives which are respectively imaged within circular and
triangular regions in the image, see Fig.2. We describe the focused region with black
color, defocused region with dark gray. In image g1, the triangular region is within the
focused range and the circular region is the defocused region. In image g2, the
triangular region is out of focus and the circular region is the in focus.
The blurry image fi(k, l) may be obtained as below:
fi = g i ∗ hi (10)
g1 f1 D1
○
H ○
-
g2 f2 D2
○
H ○
-
At next step, we obtain the candidate graphics Di, as shown in Fig. 2, by the follow-
ing function:
D1 = (∑ g 2 j + ∑ g 2 j ) − (∑ g1k + ∑ g1k ) ∗ h1
j∈S j∈B k ∈S k∈B (11)
D2 = (∑ g1k + ∑ g1k ) − (∑ g 2 j + ∑ g 2 j ) ∗ h2
k∈S k ∈B j∈S j∈B
where symbols, S and B, respectively denote the sharp and blur regions in the images.
Equation (11) may be rewritten as:
D1 = (∑ g 2 j − ∑ g1k ∗ h1 ) + (∑ g 2 j − ∑ g1k ∗ h1 )
j∈S k∈B j∈B k∈S (12)
D2 = (∑ g1k − ∑ g 2 j ∗ h2 ) + (∑ g1k − ∑ g 2 j ∗ h2 )
k ∈S j∈B k ∈B j∈S
Depth Extension for Multiple Focused Images by Adaptive Point Spread Function 549
In order to enhance the absolute dispersion and avoid the negative value, we use the
square number to replace the direct subtraction as below:
D1 = (∑ g 2 j − ∑ g1k ∗ h1 ) 2 + (∑ g 2 j − ∑ g1k ∗ h1 )
2
The focused region in one image probably present as blurry region out of focus in
another image. We simply treat the area of the sharp region in one image as the same
as that of the blurry region in another image, which means that:
Area(∑ g 2 j ) ≈ Area(∑ g1k )
j∈S k ∈B (14)
Area(∑ g1k ) ≈ Area(∑ g 2 j )
k ∈S j∈B
It’s apparent that the more desirable the mask coefficients of PSF are, the lower
values of the second term in polynomial (13) will be. If set an identification thresh-
old, Vth, which is large than the values of the first term in Equation (13) and small
than that of the second term. All of the pixels in g1, g2, which behave with variant
larger than identification threshold in the candidate graphics, D2, D1, may be treated
as in-focus imaging region. The identification threshold is relative to the scene,
such as content, texture and illumination. For idiographic applied field, it may be
confirmed via statistic data from experiments or derived from experience. Then a
binary feature image, F1, F2, may be obtained in which the high gray level pixel is
sharp pixel for correspondence original image. The desirable PSF may be chosen
from one of the candidate vectors listed in Table 1, according to blurry degree of
the image.
In fact, if the discrete PSF is m×m matrix, the area of the high gray level pixel in
binary feature images, F1, F2, will be little large than that of the sharp region in origi-
nal images, g1, g2. Considering this condition, we use the fusion principle to obtain the
fusion image, I(x, y), as below:
⎧ g1 ( x, y ) ; F2 ( x, y ) = 1, F1 ( x, y ) = 0
⎪ (15)
I ( x, y ) = ⎨ g 2 ( x, y ) ; F2 ( x, y ) = 0, F1 ( x, y ) = 1
⎪[ g ( x, y) + g ( x, y )] / 2
⎩ 1 1 ; other
For the case of more than two original multi-focus images, an iterative algorithm may
be applied. Assume to there are n images, g1, g2, ……, gn, the steps of iterative algo-
rithm are taken as below:
STEP 0: g1= g1, g2= g2, i=2;
STEP 1: confirm the PSF mask according to the status of g1 and g2;
STEP 2: obtain binary feature images, F1, F2;
STEP 3: obtain the fusion image, I(x, y), as below:
550 Y. Wang and Y. Mei
⎧ g1 ( x, y ) ; F2 ( x, y ) = 1, F1 ( x, y ) = 0
⎪ ; F2 ( x, y ) = 0, F1 ( x, y ) = 1
⎪ g 2 ( x, y ) (16)
I ( x, y ) = ⎨
⎪ g 1 ( x , y ) ; F2 ( x, y ) = 0, F1 ( x, y ) = 0
⎪⎩[ g1 ( x, y ) + g1 ( x, y )]/ 2 ; F2 ( x, y ) = 1, F1 ( x, y ) = 1
The selection of appropriate PSF has a close connection with blurring degree of the
region. So it’s important to estimate image blur by some appropriate method.
Suppose that Ii(x, y) and Ij(x, y) are two differently focused images in a series of multi-
focus images with different focus settings, a differential image is defined as below:
I i , j ( x, y ) =| I i ( x, y ) − I j ( x, y ) |
(17)
where Ik(x, y) is the pixel intensity value at position (x, y) in image k. Generally, set
j=i+1 correspond to the adjacent image of the ith one.
The value of Ii,j(x, y) may indicate the blur degree. But use of solely adjacent pixel-
level indicators can make decisions vulnerable to wide fluctuations dependent on spe-
cific parameters such as noise, brightness and local contrast. Hence, corroboration
from neighboring pixels of decision choices becomes necessary to maintain robustness
of the algorithm in the face of the above adverse effect. Adding this corroboration
while maintaining pixel-level decisions requires summing the Ii,j(x, y)s over an l×l
region surrounding each decision-point. This yields an indicator for focus measure:
l/2
Bi , j ( x, y ) =
τ
∑
=− l / 2
Ii , j ( x + τ , y + τ ) (18)
4 Experiment Result
In this section, we experimentally demonstrate the effectiveness of the proposed algo-
rithm. Experiments were performed on 256-level original images of size 800×600 by
Matlab 6.0 in PC with Pentium 1.4GHz CPU and 628M RAM. Experiments were
carried out on two actual scenarios, which contain multiple objects at different dis-
tances from the camera one or more objects naturally become out of focus when the
image is taken. For example, the focus is on the trunk in Fig 3(a), while that in Fig
3(b) on the shrub and bench.
Depth Extension for Multiple Focused Images by Adaptive Point Spread Function 551
Convolved by PSF, the original images will be blurred as shown in Fig.4. We use
two PSF to respectively blur the original image, by PSF with vector a = (1, 4, 6, 4,
1) to get blurry image B1 and a = (1, 2, 3, 2, 1) to get blurry image B2 as illustrated
in Fig.4.
(a) blurry image B1 from image A2 (b) blurry image B2 from image A1
The result of image fusion applied to two images having different depth of focus, with
the focus in on the scrip in Fig 6(a), while that in Fig.6(b) on the book. Please notice
that, in Fig.6(b), the scrip is blurred by Gaussion blur and little moving blur. The fused
image in Fig.6(c) is gotten by PSF with vector a = (1, 4, 6, 4, 1) to successively con-
volve the original images with two times, 178 for identification threshold. It takes
1.322s to finish the conduct.
5 Conclusion
The defocus portions in one image represent dissimilar blurry degree, thus the vector
of PSF is different for original image. The “ideal” way to choose the appropriate vec-
tor is accurately evaluate the blurry degree of each image’s region. The blurry degree
may be confirmed by different way such as gray gradient difference, wavelet trans-
form and so on. In mathematics, the local regularity of the image function is often
measures with Lipschitz exponent. Our method for blur estimation is based on calcu-
lating the Lipschitz exponent in all points where a change in intensity is found either
in the horizontal or vertical direction. The strategy for detecting and characterizing
singularities of the original images is important for this fusion theorem. The detail
method may be described in the other paper. In fact, assisting in correlative technol-
ogy, the proposed algorithm may provide acceptable results in most cases.
References
1. Zhang, Z., Blum, R.S.: A categorization of multiscale-decomposition-based image fusion
schemes with a performance study for a digital camera application. Proc. IEEE 87(8), 1315–
1326 (1999)
2. Piella, G.: A general framework for multiresolution image fusion: from pixels to regions.
Information Fusion 4, 259–280 (2003)
3. Maik, V., Shin, J., Paik, J.: Pattern Selective Image Fusion for Multi-focus Image Recon-
struction. In: Gagalowicz, A., Philips, W. (eds.) CAIP 2005. LNCS, vol. 3691, pp. 677–684.
Springer, Heidelberg (2005)
4. Pajares, G., de la Cruz, J.M.: A wavelet-based image fusion tutorial. Pattern Recognition
(2004)
5. Forster, B., Van de Vill, D., et al.: Complex Wavelets for Extended Depth-of-Field: A
NewMethod for the Fusion of Multichannel Microscopy Images. Microscopy Research and
Technique 65, 33–42 (2004)
6. Li, S., Kwok, J.T., Tsang, I.W., et al.: Fusing Images with Different Focuses. Using Support
Vector Machines 15(6), 1555–1561 (2004)
554 Y. Wang and Y. Mei
7. Maik, V., Shin, J., Paik, J.: Pattern Selective Image Fusion for Multi-focus Image Recon-
struction. In: Gagalowicz, A., Philips, W. (eds.) CAIP 2005. LNCS, vol. 3691, pp. 677–684.
Springer, Heidelberg (2005)
8. Aizawa, K., Kodama, K., Kubota, A.: Producing object-based special effects by fusing mul-
tiple differently focused images. IEEE Transactions on Circuits and Systems for Video
Technology 10(2), 323–330 (2000)
Jingsheng Lei
Abstract. This paper proposes an algorithm using local energy of color corre-
lograms without any explicit using sub-block energy of color correlograms. The
sub-block energy is defined as sub-windows from color correlograms informa-
tion based on color distribution of original image. The model for image annota-
tion involves computing histogram using color correlograms and analysis its
sub-block characteristics. Similarly, sub-block energy is applied to annotate im-
age’s class and got a satisfied result. The model is fast and invariant to image’s
size and rotation. The comparison with SVM is done by experiments demon-
strate the model is quite successful in annotation of image.
1 Introduction
With the development of network and multimedia technology, the storage of image
information is expanding quickly and image retrieval has become a hotspot of image
research field [1-2]. Content-based image retrieval (CBIR) which uses image content
such as color and texture to compute the similarity of images have succeeded in fin-
gerprint and logo recognition, etc. However, there is a huge gulf which is called “se-
mantic gap”, the lack of coincidence between the low-level features extracted from
the visual features and high-level semantics concept, result that CBIR cannot provide
meaningful results, and current state-of-art computer vision technology lags far be-
hind the human’s ability to assimilate information at a semantic level [3].
Using different models and machine learning methods to find the relation between
image visual features and keywords for them from labeled images, and propagating
keywords to unlabeled images, this is image annotation. There are three main meth-
ods for image annotation: annotation based on image segmentation, fixed size block
and annotation based on image classification. Annotation based on image segmenta-
tion [4] depends on image visual features and precise results of segmentation [5-7].
The main point is how to correspond features of the regions to keywords. In the ideal
condition, every segmented region corresponds to one object. However, the outcome
of image segmentation is not satisfied at present. So there exists a big diversity be-
tween the expression of image object level part and human vision system. And this
problem also exists in fixed size image division, for it may divide one object into
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 555–562, 2009.
© Springer-Verlag Berlin Heidelberg 2009
556 J. Lei
several blocks or put several objects into one block. Compared with the two methods,
image annotation based on image classification can avoid the low accuracy caused by
wrong image division. Cusano et al. [8] categorized images into various groups with
Support Vector Machines (SVMs) and counted the frequencies of co-occurrence be-
tween the keywords in each group. They then used them to annotate an input image.
Gauld et al. [9] treated features separately and trained the optimal classifier for each
feature in advance. But these methods using features extracted from whole images
that result in high computing cost which is not suitable for large data set. To avoid the
problem mentioned above, this paper realize an annotation procedure by sub-block
energy based on color correlograms [10] which indicates the color and spatial distri-
butions of the image directly.
To address the problems mentioned above, in this work, a fast algorithm for image
annotation based on color correlograms is proposed. Using sub-block energy of corre-
lograms, the algorithm can find the relative energy blocks in Euclid space. Differ
from the traditional annotation approach based on the whole image features, the pro-
posed method analyze the central area which associate with the image semantics and
only vision features of the area are extracted, then LIBSVM is used for image classi-
fication to get the relationship between image visual features and semantics, lastly
image annotation is executed.
The remainder of this paper is organized as follows: The sub-block energy of color
correlogram are introduced in section 2 and a fast image annotation approach based
on central area analysis and key issues are discussed in detail in section 3. In section
4, experimentation results are presented and the concluding remarks are given in the
last section.
The correlograms shows the changes of the color histogram with the spatial distribu-
tion of pixels.
The color correlogram is selected to implement the sub-block energy model because
it is color histogram with spatial information. It signals the presence of local color dis-
tribution. In the sub-block model, a given image is computed with the feature of corre-
logram. Consequently, the correlogram is pieced into sub-blocks. The sub-blocks are
(a) dinasours
(b) horses
For the implementation of annotation of image, each type of images has their
uniquely identifiable features which can discriminate from others, surely other fea-
tures can be combined for annotation, but here we only put the sub-block vectors into
high-space. When a test sample is filtered into its sub-block energy, we can compute
its Euclid distance in high-dimensional space.
Input: The set of training images is Straining and the set of testing images is
Stesting . wh × wv is selected as window size. Levelwidth is the step size of L for
histogram and sub-block size is Sub − Block .
L = 256 , C = 256 / levelwidth .Image size is W × H and often W = H .
Output: Sub-Block Energy of Images
Begin
I ∈[S
training ,S
testing ]
For each
I ← I / levelwidth , C ← 256 / levelwidth ;
TempC ← [1: C ;1: C ] ← (0, 0);
i ∈ [1: H − wv ]
For each
j ∈ [1: W − wh ]
For each
Image Annotation Using Sub-block Energy of Color Correlograms 559
m ← I (i, j ) + 1;
n ← I (i + wv , j + wh ) + 1;
TempC ( m, n) ← TempC (m, n) + 1;
TempC ( n, m) ← TempC (n, m) + 1;
End
End
i ∈ [1: H − wv ]
For each
j ∈ [ wh + 1: W ]
For each
m ← I (i, j ) + 1;
n ← I (i + wv , j − wh ) + 1;
TempC ( m, n) ← TempC (m, n) + 1;
TempC ( n, m) ← TempC (n, m) + 1;
End
End
For each i ∈ [1: C ]
C
hc (i) ← [TempC (i, i) / ∑ TempC (i, i )];
i =1
End
For each i ∈ [1: C ]
Cov(i ) ← hc(i ) − hc ;
End
For each k ∈ [1: C ]
( k )*blocksize
Ek ← ∑
i = ( k −1)*blocksize
COV (i)
;
End
E ← Sort ( E ) ;
End
End
After getting the results of sub-block energy, we can use it for annotating a testing set.
When an unknown image is input into the model, firstly, its sub-energy is calculated
base on the algorithm. For the reason of the sort of energy, there are is no need to
align at location. When we describe energy sub-block as the point of dimensional
space C / blocksize . We can use the Euclid distance to measure the class.
Input: The sub-block of training images with annotation and the sub-block of testing
images without annotation.
Output: The set of testing with annotation.
560 J. Lei
Begin
etesting ∈ Etesting
For each
D ← [];
etraining ∈ Etraining
For each
D ← [ D, Dis tan ceEculid (etesting − etraining )];
End
D ← Sort ( D );
L(etesting ) ← L({etraining | etraining = min( D)etraining })
End
End
After the annotation, images are labeled. In order to improve the performance, the
training set energy are represented by vectors. In Additional, the algorithm can paral-
lel executed to speed up the efficiency of computing.
4 Experimental Results
In order to verify the correctness and effectiveness of our proposed solution, an anno-
tation system is developed. The scheme is tested on the real images with different
semantic. The color correlogram parameter levelwidth is set as 8, oth-
ers( wh , wv , blocksize ) are variable. For the LIBSVM [11], its parameters are s=c-
svc, c=sigmoid, gama=1/k, degree=3. Firstly, the testing image is transformed into
Fig. 3. Annotation Result (a)image to be annotated; (b)first match image; (c) match image after
first; (d) match image after second
Image Annotation Using Sub-block Energy of Color Correlograms 561
gray where range value of pixel is 0 to 255. Using the proposed annotation, we can
label the testing image properly and also we can get a list of compared images based
on distance from testing image to training one. And also, the list gives the information
of similar images. From the Fig.3., we can notice that the testing image(dinosaurs) is
labeled correctly and the found list have some similar images(buses are the second
and third match image). Therefore, the sub-block energy algorithm is suitable for
annotation of images. On the other hand, the annotation results between sub-block
algorithm and LIBSVM are compared. The test is performed using 80 items for train-
ing and 20 for each category with cross validation. Table.1 shows the result. From the
table, we can find the sub-block algorithm is better than LIBSVM. The result indi-
cates that the sub-block energy algorithm uses the characteristics of features, and can
achieve some better correct annotation for images with complex background. And the
small sub-block energy can gives more high speed for large dataset.
Table 1. Annotation Compare of Sub-Block Energy and SVM (E :Sum-Block Energy; S:SVM)
Train=240 BlockSize
Test = 40 16 8 6 4 2
The sub-block energy also has the ability of find similar images from the measure
energy list. Therefore, the result of annotation for images is satisfied and the comput-
ing cost is low based on the proposed algorithm.
5 Conclusions
In this paper, we present an algorithm for annotation of images. The contributions of
this paper can be summarized as follows:
562 J. Lei
(1) The approach uses an image’s histogram and its spatial information with
small feature vector, it can reduce computing cost dynamically. The proposed
algorithm can applied to large image dataset.
(2) This paper proposes a model using sub-block energy based on color correlo-
grams. The representation of the sub-block is a small vector. The vector can
be fast computed and can be regarded as a criterion for image’s annotation
using spatial information. The algorithm can annotate images with different
complex background. The algorithm is robust.
Experimental results indicate our method is superior in terms of accuracy, robustness,
and stability when compared to SVM method. The novelty of this solution is to find
the local relationship of image for annotation. In addition, the proposed method can
reduce computing cost. More studies in the aspect of image’s color correlograms and
its application should be invested in future.
References
1. Smeulders, A., et al.: Content-based image retrieval at the end of the early years. IEEE
Trans. PAMI 22(12), 1349–1380 (2000)
2. Zhang, R., Zhang, Z.: Effective image retrieval based on hidden concept discovery in im-
age database. IEEE Trans. on Image Processing 16(2), 562–572 (2007)
3. Naphade, M., Huang, T.: Extracting semantics from audiovisual content: The final frontier
in multimedia retrieval. IEEE Trans. on Neural Networks 13(4), 793–809 (2002)
4. Xuelong, H., Yuhui, Z., Li, Y.: A New Method for Semi-Automatic Image Annotation. In:
The Eighth International Conference on Electronic Measurement and Instruments, vol. 2,
pp. 866–869 (2007)
5. Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and
vector quantizing images with words. In: Proceedings of the International Workshop on
Multimedia Intelligent Storage and Retrieval Management (1999)
6. Duygulu, P., Barnard, K., Freitas, J., Forsyth, D.: Object recognition as machine transla-
tion: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen,
M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg
(2002)
7. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using
cross media relevance models. In: Proceedings of the ACM SIGIR Conference on Re-
search and Development in Information Retrieval (2003)
8. Cusano, C., Ciocca, G., Scettini, R.: Image annotation using SVM. In: Proceedings of
Internet Imaging IV (2004)
9. Gauld, M., Thies, C., Fischer, B., Lehmann, T.: Combining global features for content-
based retrieval of medical images. Cross Language Evaluation Forum (2005)
10. Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color corre-
lograms. In: Proc. 16th IEEE Conf. on Computer Vision and Pattern Recognition, pp. 762–
768 (1997)
11. Chang, C., Lin, C.-J.: LIBSVM: - A Library for Support Vector Machines [EB/OL],
http://www.csie.ntu.edu.tw/~cjlin/libsvm
Optimal Evolutionary-Based Deployment of Mobile
Sensor Networks
1 Introduction
Mobile sensor networks consist of the sensor nodes that are deployed in a large area
collecting important information from the sensor field. The communication between
the nodes is wireless. Since energy is an important resource [4], the nodes’ energy
consumption must be kept at a minimum rate. Optimum placement of the sensors also
includes their coverage that means the sensors should be places in the best position
assigned by a coverage function to have the best functionality. Some random deploy-
ments [1] do not apply a uniform distribution over a surface that can be important in
some cases [3].
This paper uses FIPSO to find the optimal placement and energy usage of the mo-
bile sensor networks. In section 2 FIPSO is discussed. Section 3 describes applying
FIPSO for deployment of mobile sensor networks and section 4 describes the results
and section 5 denotes the conclusion.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 563–567, 2009.
© Springer-Verlag Berlin Heidelberg 2009
564 N. Shafiabady and M.A. Nekoui
r r rr r r rr r r
vi (t ) = φ1vi (t − 1) + r1c1 ⊗ ( x personalbest − xi (t )) + r2 c2 ⊗ ( x globalbest − xi (t ))
r r r
xi (t ) = vi (t ) + xi (t − 1) (1)
1− t
φ1 = 1 − 0.5 ×
1 − t max
As it is clear in Eq. (1), the experience of every member of the population is taken
into consideration and this method has a better performance than the simple PSO.
In the binary sensor models, the detection probability of the event of interest is 1
within the sensing range, otherwise, the probability is zero. Although the binary sen-
sor model is simpler, it is not realistic as it assumes that sensor readings have no asso-
ciated uncertainty. In reality sensor detections are imprecise, hence the coverage has
to be described in probabilistic terms. The probabilistic sensor model used in this
work is given in Eq. (2).
⎧ 0 , if r + re ≤ dij (x, y)
⎪ −λaβ
cij (x, y) = ⎨e , if r − re < dij (x, y) < r + re (2)
⎪ 1 , if r + re ≥ dij (x, y)
⎩
The sensor field is represented by a n × m grid where an individual sensor is placed
in point S at grid point (x,y). Each sensor has a detection range r. For any grid point P
at (i,j), the Euclidean distance between P and the grid point at (x,y) is denoted as
d ij ( x, y ) = ( x − i ) 2 + ( y − j ) 2 , the equation expresses the coverage
cij ( x, y ) of a grid point at (i,j) by a sensor S at (x,y). In this equation λ and β are the
Optimal Evolutionary-Based Deployment of Mobile Sensor Networks 565
parameters that measure the detection probability when a target is at distance greater
than re but within a distance from the sensor and a = d ij ( x, y ) − ( r − re ) .
After optimization of coverage, all the deployed sensor nodes move to their own posi-
tion. Now our goal is to minimize the energy usage in a cluster based sensor network
topology by finding the optimal cluster head positions. For this purpose a power con-
sumption model [5] for the radio hardware dissipation where the transmitter dissipates
energy to run the radio electronics and the power amplifier and the receiver dissipates
energy to run the radio electronics, is used to be optimized. The fitness function to be
minimized is given in Eq. (3).
m n ⎛ 1.3 × 10 −6 Dis 4j ⎞
f = ∑∑ 0.01disij +
⎜ 2 ⎟ (3)
⎜
j =1 i =1 ⎝ n ⎟
j ⎠
For this approach both the free space (distance2 power loss) and the multi-path fading
(distance4 power loss) channel models were used. Assume that the sensor nodes inside
a cluster have short distance dis to the cluster head but each cluster head has long dis-
tance Dis to the base station. The base station is situated at the position (25,80).
The red spots denote the chosen heads for each selected cluster among the four
chosen clusters.
566 N. Shafiabady and M.A. Nekoui
The error of the first phase is given in Fig. 5. It shows that it has efficiently placed
the nodes in the grid’s positions.
The results are achieved in two phases. At first the particles’ coverage cost func-
tion is minimized using FIPSO. Then FIPSO is applied to the achieved set of particles
to find the head and the best position of the node that will be chosen as the head.
FIPSO has been able to give better results in compare with the previous works [1-5].
5 Conclusion
The results show that the proposed method (FIPSO) has successfully placed the mobile
sensors and then chosen the head as an appropriate member of the clusters according to
the cost functions. This method is more efficient than the simple PSO as it uses the in-
formation from all the members rather than only some of the particles. The results are
achieved fast and the coverage and energy functions are optimized simultaneously.
References
1. Chakrabarty, K., Iyengar, S., Qi, H., Cho, E.: Grid coverage for surveillance and target loca-
tion in distributed sensor networks. IEEE Transactions on Computers 51, 1448–1453 (2002)
2. Jourdan, B., Weck, O.: Layout optimization for a wireless sensor network using multi-
objective genetic algorithm. In: IEEE VTC 2004 Conference, vol. 5, pp. 2466–2470 (2004)
3. Howard, A., Mataric, M.J., Sukhatme, G.S.: Mobile sensor network deployment using po-
tential field: a distributed, scalable solution to the area coverage problem. In: Proc. Int.
Conf. on Distributed Autonomous Robotics Systems, pp. 299–308 (2002)
4. Heo, N., Varshney, P.K.: Energy-efficient deployment of intelligent mobile sensor net-
works. IEEE Transaction on Systems, Man and Cybernetics 35(1), 78–92 (2005)
5. Heinzelman, W.B., Chandrakasan, A.P., Balakrishnan, H.: An application specific protocol
architecture for wireless microsensor networks. IEEE Transactions on Wireless Communi-
cations 1(4), 660–670 (2002)
6. Mendes, R., Kennedy, J., Neves, J.: The fully informed particle swarm. IEEE Transactions
on Evolutionary Computation 1(1) (January 2005)
Research of Current Control Strategy of Hybrid Active
Power Filter Based on Fuzzy Recursive Integral PI
Control Algorithm
Abstract. According to the current control characters of hybrid active power filter
(HAPF), the current control model of HAPF is designed. The fuzzy recursive in-
tegral PI control algorithm is presented when it is compared to conventional PI
control method. The control algorithm is applied to auto-regulate the proportional
and integral parameters of PI. Thus, the robustness and response speed is en-
hanced; the dynamic performance of the HAPF device is improved. Under Mat-
lab/Simulink background, a fuzzy recursive integral PI controller is designed and
it is applied in a HAPF model in PSCAD/EMTDC. The results prove the feasibil-
ity and effectiveness of this fuzzy recursive integral PI control algorithm.
Keywords: Hybrid active power filter, Recursive integral PI, Fuzzy control,
PSCAD/EMTDC.
1 Introduction
Combined with the advantages of passive power filter (PPF) and active power filter
(APF), hybrid active power filter (HAPF) is a powerful apparatus for reactive
power compensation, harmonic suppression; it is also an effective device to resolve
power quality problems [1-3] . It is a key link to keep DC voltage stable, ensure com-
pensate current signal timely, track instruction current correctly in active power filter-
ing technology [4].
Related work. Recently, there are many current control methods, mainly including
linear-current control, digital-deadbeat control, triangular-wave control, hysteresis
control and one-cycle control [5-6]. But because of the influences of detecting preci-
sion, phase shift of output filter and time delay of control method, the tracking ability
of the system is become non-ideal. Because of its advantages of simple algorithm,
good robustness and high reliability, PI control method has been widely used in indus-
try control system [7]. The model parameters are required invariable and proportional
integral parameters are hard to determine in conventional PI control. Because the peri-
odic characteristics of error signal in HAPF system, the application of conventional PI
control is limited seriously. Fuzzy logical control has ideal dynamic performance and
is insensitive of producer parameter; it has high robustness and can overcome the in-
fluence of non-linear factors. So, it has been widely used in control system [8-10].
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 568–578, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Research of Current Control Strategy of Hybrid Active Power Filter 569
In this paper, fuzzy logical control is applied to adjust the parameters of recursive
integral PI control method, the response speed is enhanced and dynamic performance
is improved. Based on fuzzy recursive integral PI control strategy, a HAPF current
controller is designed and it is applied to auto-regulate the proportional and integral
parameters. The simulation results prove the feasibility and effectiveness of this con-
trol algorithm.
The single-phase equivalent circuit of HAPF is showed in Fig.2, in which the APF
is controlled as an ideal harmonic voltage resource U F and the load is controlled as a
harmonic current resource iL . Where I S and I L represent source current and load
current; I PF and I APF represent the current of PPF circuit and injected current of
APF. Z S and Z PF represent the equivalent impedance of source and PPF
branch; Z C and Z R represent the equivalent impedance of injected capacitance and
FRC branch; Z C 0 , LC 0 are capacitive reactance and inductance of output filter
branch. The ratio of coupling transformer is n: 1.
570 Z. Tang and D. Liao
⎧U S = U L + I S Z S
⎪U = I Z
⎪ L PF PF
⎪⎪ I APF Z C + I1Z1 = U L
⎨ (1)
⎪ I L 0 n Z L 0 = I1Z1 + nU F
2
⎪I + I = I
⎪ 1 L0 APF
⎪⎩ S
I = I APF + I PF + I L
When HAPF solve and compensate harmonic currents, harmonic voltage distortion is
very small and can be neglected, so from equation (1) we can obtain
( Z C Z PF + K 2 Z PF ) I L + nK1Z PFU F
IS = (2)
Z C Z PF + Z S Z C + Z S Z PF + K 2 ( Z PF + Z S )
where
⎪⎧ K1 = Z1 ( Z1 + n Z L 0 )
2
⎨
⎪⎩ K 2 = n Z1Z L 0 ( Z1 + n Z L 0 )
2 2
If we assume
nK1Z PF
G1 ( S ) = (3)
Z C Z PF + Z S Z C + Z S Z PF + K 2 ( Z PF + Z S )
Research of Current Control Strategy of Hybrid Active Power Filter 571
Z C Z PF + K 2 Z PF
G2 ( S ) = (4)
Z C Z PF + Z S Z C + Z S Z PF + K 2 ( Z PF + Z S )
I Sh ( S ) G2 ( S )
GL ( S ) = = (6)
I Lh ( S ) 1 + Gcon ( S )Ginv ( S )G1 ( S )
I Lh ( S )
G2 ( S )
I C∗ ( S )
⊗ ⊗I
E (S ) Sh (S )
G con ( S ) Ginv ( S ) G1 ( S )
⎧ C
⎪⎪ U R ( K ) = K P e ( K ) + ∑
i =0
K I e( K − iN )
⎨ (7)
⎪C = ent K
⎪⎩ N
U R ( K ) = U R ( K − N ) + K P e( K ) − K P e( K − N ) + K I e( K ) (8)
From the equation (8) we can obtain pulse transfer function of recursive integral PI
controller
U R (S ) KI
Gcon ( S ) = = KP + (9)
E (S ) 1 − e − SNT
Where N is sampling points in each cycle of controlled object, T is sampling time. In
Fig.3 we use recursive integral PI controller and can obtain
(11)
G2 ( S )(1 − e − SNT )
GL ( S ) =
(1 + K P Ginv ( S )G1 ( S )) × (1 − e − SNT ) + K I Ginv ( S )G1 ( S )
So the frequency characteristic equations of the two functions are
GS ( jn ⋅ 2π f ) = GL ( jn ⋅ 2π f ) = 1 (14)
Equation (14) illustrates that the amplitude of closed-loop system transfer function
and HAPF system to disturbance transfer function is 1, and the phase is 0; output of
the system can track the reference signal bitterly; the influence of load current become
0 with time increasing, when it’s frequency is power frequency or Integral Multiple
power frequency.
The systemic steady-state error can be eliminated using recursive integral PI con-
trol algorithm, but its robustness and dynamic performance are not ideal. So in this
paper fuzzy logical algorithm is applied to enhance the systemic response speed by
auto-regulating K P and K I parameters of recursive integral PI control method [9].
Compound control of fuzzy recursive integral PI is showed in figure 4. Where error e
and error rate ec is fuzzed into corresponding fuzzy variable E and EC . Firstly, it
seeks the fuzzy relation of K P , K I , E and EC , continuously detects E and EC within
operating time. Then, according to fuzzy logical control algorithm, K P and K I are
auto-regulated online to meet various requirement of recursive integral PI controller
in various E and EC . Thus makes the controlled object better robustness and dynamic
performance.
I Lh ( S )
G2 ( S )
IC (S ) E (S ) I Sh ( S )
Gcon ( S )Ginv ( S )G1 ( S )
Firstly, inputs are fuzzed by fuzzy controller; the linguistic values of input variable
and output variable are equally divided into 7 parts: {NB, NM, NS, 0, PS, PM, PB}.
According to engineering experience, using triangle membership functions, the uni-
verse of e and ec is [-6 6]. The proportion link is to proportionally reflect the systemic
error signal e. The integral link is to improve systemic indiscrimination degree. When
E is bigger, in order to have better tracking performance K P should be bigger. At
the same time, in order to avoid bigger overshoot of systemic response, the effect of
574 Z. Tang and D. Liao
E ec
NB NM NS 0 PS PM PB
NB PB PB NB PM PS PM 0
NM PB PB NM PM PS 0 0
NS PM PM NS PS 0 NS NM
0 PM PS 0 0 NS NM NM
PS PS PS 0 NS NS NM NM
PM 0 0 NS NM NM NM NB
PB 0 NS NS NM NM NB NB
E ec
NB NM NS 0 PS PM PB
NB 0 0 NB NM NM 0 0
NM 0 0 NM NM NS 0 0
NS 0 0 NS NS 0 0 0
0 0 0 NS NM PS 0 0
PS 0 0 0 PS PS 0 0
PM 0 0 PS PM PM 0 0
PB 0 0 NS PM PB 0 0
Fuzzy querying table reflects the final results of fuzzy control algorithm. It is
calculated off-line through afore-hand and stored in computer memory. In real-time
control system, the auto-regulate modification process of recursive integral PI
control parameters is translated into minute calculating querying process of control
rule table.
Research of Current Control Strategy of Hybrid Active Power Filter 575
In this paper a HAPF model controlled by current tracking strategy is set up, and its
simulation is carried out, in which current detection use fuzzy recursive integral PI
control algorithm, compared with the results using conventional PI control algorithm.
Based on Matlab/Simulink toolbox and corresponding function, a fuzzy recursive
integral PI controller is established and it is applied in the HAPF model which is set
up based on PSCAD/EMTDC V4.2 of Manitoba HVDC research center.
In this simulation model, line voltage of the source is 10KV; capacitance of the
source is 100MVA; capacitance of coupling transformer is 4MVA, and its ratio is
10000:380; where the parameters of FRC are C1 = 960uF and L1 = 15.47 mH ,
injecting capacitance CG = 19.65uF . A single-tuned passive power filter to sup-
press 6th harmonic current is consisted of the injecting capacitance and FRC
branch. A six pulse rectifier bridge is used as a harmonic source and its power fre-
quency is 50HZ. The parameters of PPF are showed in table 3 and the high-pass
filter quality factor Q = 1.25 . The simulation circuit and configuration of HAPF
system is showed in Fig.5.
Load reactive power Ndc
Q B
Power Com Bus
active power
0.35 [ohm]
P A AM
0.47 [uF]
1 3 5
A A A Ia A A A A GM
RL 90[MVA]
Va 1.0 [ohm ]
PI / 180
Ib
0.0535 [H]
B B B B B B B Pi by 180
RL #1 #2
Vb 1.0 [ohm ]
C C C Ic C 10[kV] C C C AO *
RL 3[kV]
Vc 1.0 [ohm ]
4 6 2
KB
19.65[uF]
19.65[uF]
19.65[uF]
94.50 [uF]
94.50 [uF]
94.50 [uF]
Idc
70.46[uF]
70.46[uF]
70.46[uF]
34.30[uF]
34.30[uF]
34.30[uF]
20.00 [uF]
20.00 [uF]
20.00 [uF]
690 [uF]
690 [uF]
690 [uF]
0.00088 [H]
0.00088 [H]
0.00088 [H]
12.06 [ohm]
12.06 [ohm]
12.06 [ohm]
0.0031 [H]
0.0031 [H]
0.0031 [H]
0.01547 [H]
0.01547 [H]
0.01547 [H]
0.11[ohm]
0.11[ohm]
0.11[ohm]
0.15[ohm]
0.15[ohm]
0.15[ohm]
A P
RealPower
Power
B Q VFdc
APF reactive power
A A A A 2 2 2
4[MVA] IFa G1 G3 G5
0.002 [ohm] 0.0008 [H]
B B B B
#1 #2 IFb
0.002 [ohm] 0.0008 [H]
C 10[kV] C C C
0.38[kV] IFc
4 6 2
2 2 2
G4 G6 G2
12.5 [uF]
12.5 [uF]
12.5 [uF]
Harmonics C L R
5th branch C5 = 117.03uF L5 = 3.46mH R5 = 0.11Ω
7th branch C7 = 70.46uF L7 = 2.93mH R7 = 0.15Ω
phase A
ISa Ia
2.50
2.00
1.50
1.00
0.50
kA
0.00
-0.50
-1.00
-1.50
-2.00
IaP
0.200
0.150
0.100
0.050
kA 0.000
-0.050
-0.100
-0.150
-0.200
1.700 1.720 1.740 1.760 1.780 1.800
(a) source current and active power filter compensated current waveform using conventional PI
control
Fig. 6. The harmonic suppression and compensation results of HAPF using conventional PI
control
phase A
ISa Ia
2.50
2.00
1.50
1.00
0.50
kA
0.00
-0.50
-1.00
-1.50
-2.00
IaP
0.200
0.150
0.100
0.050
0.000
kA
-0.050
-0.100
-0.150
-0.200
1.700 1.720 1.740 1.760 1.780 1.800
(a) source current and active power filter compensated current waveform using fuzzy recursive
integral PI control
Fig. 7. The harmonic suppression and compensation results of HAPF using fuzzy recursive
integral PI control
Research of Current Control Strategy of Hybrid Active Power Filter 577
(b) the frequency spectrum of load current using fuzzy recursive integral PI control
(c) the frequency spectrum of source current using fuzzy recursive integral PI control
Fig. 7. (Continued)
Fig.6 shows the harmonic suppression and compensation results of HAPF system
using conventional PI control method. Fig.7 shows the harmonic suppression and
compensation results when use fuzzy recursive integral PI control algorithm. ISa, Ia
and IaP respectively represents the source current of phase A, the load current of
phase A and APF compensated current of phase A. Table 4 shows the currents value
of 5th, 7th, 11th and 13th harmonics when use conventional PI control method and
fuzzy recursive integral PI control algorithm. Analyzed from Fig.6, Fig.7 and Table 4,
it is proved that fuzzy recursive integral PI control algorithm has better and more
ideal effect in HAPF harmonic suppression and compensation.
5 Conclusions
In this paper aiming at the harmonic detecting and control problems of HAPF system, a
fuzzy recursive integral PI control algorithm is been proposed based on the conventional
578 Z. Tang and D. Liao
PI control method, and effectively enhance the filtering performance, robustness and
dynamic response performance of HAPF system. A fuzzy recursive integral PI control-
ler is established based on Matlab/Simulink toolbox and corresponding function, and it
is applied in a HAPF model simulation under the PSCAD/EMTDC V4.2 of Manitoba
HVDC research center, the results illustrate the correctness and effectiveness of this
control algorithm.
Acknowledgement
The authors would like to thank Natural Capacity Discipline Project of Shanghai
Local High Institutions (No. 071605125) and Postgraduate Innovation Fund of
Shanghai University of Electric Power (No.D08116).
References
1. Luo, A.: Harmonic Suppression and Reactive Power Compensation Equipment and Tech-
nology. China Electric Power System Press, Beijing (2006)
2. Darwin, R., Luis, M., Juan, W.: Improving passive filter compensation performance with
active techniques. IEEE Trans on Industrial Electronics 50(1), 161–170 (2003)
3. He, N., Huang, L.: Multi-objective optimal design for passive part of hybrid active power
filter based on particle swarm optimization. Proceeding of the CSEE 28(27), 63–69 (2008)
4. Luo, A., Fu, Q., Wang, L.: High-capacity hybrid power filter for harmonic suppression and
reactive power compensation in the power substation. Proceeding of the CSEE 24(9), 115–
223 (2004)
5. Buso, s., Malesani, L.: Design and Fully Digital Control of Parallel Active Power Filters
for Thyristor Rectifiers to Comply with IEC 1000-3-2. IEEE Trans. on Industry Applica-
tion 34(2), 508–517 (1998)
6. Tang, X., Luo, A., Tu, C.: Recursive Integral PI for Current Control of Hybrid Active
Power Filter. Proceedings of the CSEE 23(10), 38–41 (2003)
7. Sun, M., Huang, B.: Iterative Learning Control. National Defense Industry Press, Beijing
(1999)
8. Zhou, K., Luo, A., Tang, J.: PI iterative learning for current-tracking control of active
power filter. Power Electronics 40(4), 53–55 (2006)
9. Xu, W.-f., Luo, A., Wang, L.: Development of hybrid active power filter using intelligent
controller. Automation of Electric Power Systems 27(10), 49–52 (2003)
10. Fukuda, S., Sugawa, S.: Adaptive signal processing based control of active power filters.
In: Proceeding of IEEE IAS Annual Meeting (1996)
A Fuzzy Query Mechanism for Human Resource
Websites
Abstract. Users’ preferences often contain imprecision and uncertainty that are
difficult for traditional human resource websites to deal with. In this paper, we
apply the fuzzy logic theory to develop a fuzzy query mechanism for human re-
source websites. First, a storing mechanism is proposed to store fuzzy data into
conventional database management systems without modifying DBMS models.
Second, a fuzzy query language is proposed for users to make fuzzy queries on
fuzzy databases. User’s fuzzy requirement can be expressed by a fuzzy query
which consists of a set of fuzzy conditions. Third, each fuzzy condition associ-
ates with a fuzzy importance to differentiate between fuzzy conditions according
to their degrees of importance. Fourth, the fuzzy weighted average is utilized to
aggregate all fuzzy conditions based on their degrees of importance and degrees
of matching. Through the mutual compensation of all fuzzy conditions, the or-
dering of query results can be obtained according to user’s preference.
1 Introduction
In traditional human resource websites [1,2,3,4,5], users must state clear and definite
conditions to make database queries. Unfortunately, users’ preferences often contain
imprecision and uncertainty that are difficult for traditional SQL queries to deal with.
For example, when a user hopes to find a job which is near Taipei City and pays good
salary, he can only make a SQL query like “SELECT * FROM Job WHERE (Loca-
≧
tion=‘Taipei City’ or Location= ‘Taipei County’) and Salary 40000”. However,
both ‘near Taipei City’ and ‘good salary’ are fuzzy terms and cannot be expressed
appropriately by merely crisp values. A job which locates in ‘Taoyuan County’ with
salary of 50000 may be acceptable in user’s original intention, but it would be ex-
cluded by the traditional SQL query. SQL queries fail to deal with the compensation
between different conditions. Moreover, traditional database queries cannot effec-
tively differentiate between the retrieved jobs according to the degrees of satisfaction.
The results to a query are very often a large amount of data, and the problem of the
information overload makes it difficult for users to find really useful information.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 579–589, 2009.
© Springer-Verlag Berlin Heidelberg 2009
580 L.-F. Lai et al.
Hence, it is required to sort results based on the degrees of satisfaction to the retrieved
jobs. Computing the degree of satisfaction to a job needs to aggregate all matching
degrees on individual conditions (e.g. location, salary, industry type, experience,
education etc.). It is insufficient for merely using the ORDER BY clause in SQL to
sort results based on some attribute. In addition, traditional database queries do not
differentiate between conditions according to the degrees of importance. One condi-
tion may be more important than another condition for some user (e.g. salary is more
important than location in someone’s opinion). Both the degree of importance and the
degree of matching to every condition should be considered to compute the degree of
satisfaction to a job. We summarize the problems of traditional human resource web-
sites as follows.
• Users’ preferences are usually imprecise and uncertain. Traditional database
queries are based on total matching which is limited in its ability to come to
grips with the issues of fuzziness.
• In users’ opinions, different conditions may have different degrees of impor-
tance. Traditional database queries treat all conditions as the same importance
and can not differentiate the importance of one condition from that of another.
• The problem of information overload makes it difficult for users to find really
useful information from a large amount of query results. Traditional database
queries do not support the ordering of query results by aggregating the degrees
of matching to all conditions (i.e. no compensation between conditions).
To solve the mentioned problems, we apply the fuzzy logic theory [15] to develop a
fuzzy query mechanism for human resource websites. First, a storing mechanism is
proposed to store fuzzy data into conventional database management systems without
modifying DBMS models. Second, a fuzzy query language is proposed for users to
make fuzzy queries on fuzzy databases. User’s fuzzy requirement can be expressed by
a fuzzy query which consists of a set of fuzzy conditions. Third, each fuzzy condition
associates with a fuzzy importance to differentiate between fuzzy conditions accord-
ing to their degrees of importance. Fourth, the fuzzy weighted average is utilized to
aggregate all fuzzy conditions based on their degrees of importance and degrees of
matching. Through the mutual compensation of all fuzzy conditions, the ordering of
query results can be obtained according to user’s preference.
Galindo et al. [8] classify fuzzy data into four types: (1) Type 1 contains attributes
with precise data. This type of attributes is represented in the same way as crisp data,
but can be transformed or manipulated using fuzzy conditions. (2) Type 2 contains
attributes that gather imprecise data over an ordered referential. These attributes admit
both crisp and fuzzy data, in the form of possibility distributions over an underlying
ordered domain (fuzzy sets). (3) Type 3 contains attributes over data of discrete non-
ordered dominion with analogy. In these attributes, some labels are defined with a
similarity relationship defined over them. The similarity relationship indicates to a
degree that each pair of labels resembles each other. (4) Type 4 contains attributes
that are defined in the same way as Type 3 attributes, without being necessary for a
similarity relationship to exist between the labels.
By analyzing several popular human resource websites [1,2,3,4,5], we sum up that
the major fuzzy data needed to be stored are location preference, salary preference,
industry preference, job category preference, experience preference, education prefer-
ence, department preference, job seeker’s profile, and hiring company’s profile. We
adopt the notions of [6] to classify these fuzzy data into three types: (1) Discrete fuzzy
data include location preference, industry preference, job category preference, educa-
tion preference, and department preference. (2) Continuous fuzzy data include salary
preference and experience preference. (3) Crisp data include job seeker’s profile and
hiring company’s profile.
rather
µ A~ ( x) rather
unsatisfactory moderately satisfactory very
satisfactory satisfactory
unsatisfactory
1
0 x
0 0.25 0.5 0.75 1
A fuzzy number A~ can be defined by a triplet (a, b, c) and the membership func-
tion µ A~ ( x) is defined as:
⎧0 ,x < a
⎪x − a
⎪ ,a ≤ x ≤ b
⎪
µ A~ ( x) = ⎨ b − a
⎪c − x ,b ≤ x ≤ c
⎪c − b
⎪0 ,x > c
⎩
Therefore, the membership function corresponding to the given ‘good salary’ can be
constructed by {(50000,1), (45000,0.92), (30000,0)} (see Figure 4).
Degree of Satisfaction
1
0.92
0
30000 45000 50000 Monthly Salary
To store the continuous fuzzy data in conventional databases, a new table is needed
to store the set of continuous data items with their degrees of conformity. A Sal-
ary_Preference table is used to store the corresponding membership function in which
the degrees of conformity have been defuzzified (see Figure 5). In the Salary
_Preference table, the RID attribute serves as a foreign key to reference the primary
key of the Resume table.
To deal with fuzzy queries on web databases, three major tasks are required to be
accomplished: (1) a fuzzy query language for users to make fuzzy queries on fuzzy
databases, (2) matching fuzzy conditions in a fuzzy query with fuzzy data in fuzzy
databases, and (3) aggregating all fuzzy conditions based on their degrees of impor-
tance and degrees of matching.
attribute values. We use linguistic degrees of importance (i.e. don’t care, unimportant,
rather unimportant, moderately important, rather important, very important, and most
important) to make it easier for users to grade relative importance. Each linguistic de-
gree of importance can be mapped to a triangular fuzzy number as shown in Figure 7.
rather rather
µ A~ ( x) unimportant moderately important
very
unimportant important important
1
0 x
0 0.25 0.5 0.75 1
The fuzzy set that defines a fuzzy condition could be discrete, continuous, or crisp. In
the example shown in Figure 8, a fuzzy query for the resume search consists of 7 fuzzy
conditions with their degrees of importance. The hiring company may consider that job
seeker’s department and salary preference are the most important, experiences and
location preference are very important, the education level is rather important, and the
company doesn’t care about other fuzzy conditions. In Figure 8, experience preference
is defined by a continuous fuzzy set {(5,1), (2,0.92), (0,0)}, while education level pref-
erence is defined by a discrete fuzzy set {(Master, very satisfactory), (Bachelor, rather
satisfactory)}. Job seekers and hiring companies can make their own fuzzy queries to
search jobs and resumes via selecting options on web pages. In addition, users can set
the least matching degree and the most amount of data listing to reduce the search space
and to avoid the information overload.
is the maximum of these minimums. As fuzzy conditions and fuzzy data may be con-
tinues, discrete or crisp, the matching of a fuzzy condition and a fuzzy data could be
classified into 9 types according to the possibility measure (see Table 1).
• In the case of the fuzzy condition and the fuzzy data being two continuous fuzzy
sets, the intersection point of two membership functions is the result of match-
ing. For example, when a job seeker’s salary preference is {(60000,1),
(50000,0.92), (30000,0)} and a job’s salary offer is {(20000,1), (30000,0.75),
(40000,0)}, we obtain the degree of matching is 0.285 (see Figure 9).
• In the case of one continuous fuzzy set and one crisp value, the point of the crisp
value mapping to the continuous fuzzy set is the result of matching. For exam-
ple, when a hiring company’s experience preference is {(5,1), (4,0.92), (1,0)}
and a job seeker’s experience is 3 years, we obtain the degree of matching is
0.613 (see Figure 10).
Fig. 9. Matching of two continuous fuzzy sets Fig. 10. Matching of a continuous fuzzy set
with a crisp value
• In the case of one continuous fuzzy set and one crisp value, the point of the crisp
value mapping to the continuous fuzzy set is the result of matching. For exam-
ple, when a hiring company’s experience preference is {(5,1), (4,0.92), (1,0)}
and a job seeker’s experience is 3 years, we obtain the degree of matching is
0.613 (see Figure 10).
• The case of one continuous fuzzy set and one discrete fuzzy set does not exist,
since an attribute cannot contain both continuous and discrete fuzzy data simul-
taneously.
• In the case of two discrete fuzzy sets, the possibility measure is a triangular
fuzzy number. For example, when a job seeker’s location preference is {(Taipei
City, totally satisfactory), (Taipei County, very satisfactory), (Taoyuan County,
moderately satisfactory)} and a job’s location offer is {(Taichung City, very sat-
isfactory), (Taipei city, rather satisfactory), (Taoyuan County, rather satisfac-
tory)}, we can obtain the minimum degree for each data item {(Taipei City,
rather satisfactory), (Taipei County, 0), (Taoyuan County, moderately satisfac-
tory), (Taichung City, 0)}. Hence, the result of matching is the maximum ‘rather
satisfactory’ which can be represented by a triangular fuzzy number
(0.5,0.75,1).
• In the case of one discrete fuzzy set and one crisp value, the result of matching
is the mapping of the crisp value to the discrete fuzzy set. For example, when a
hiring company’s education preference is {(Master, very satisfactory), (Bache-
lor, rather satisfactory)} and a job seeker’s education is ‘Master’, we obtain the
result of matching is ‘very satisfactory’ (0.75,1,1).
A Fuzzy Query Mechanism for Human Resource Websites 587
• In the case of two crisp values, the result of matching is either 1 (the fuzzy
data satisfies the fuzzy condition) or 0 (the fuzzy data doesn’t satisfy the fuzzy
condition).
on ⊕ and ⊗ operators for the computation of L-R fuzzy numbers, which is suggested
by Dubois and Prade [7]. Consider this example: (1) A job seeker makes a fuzzy
query containing a salary preference {(60000,1), (50000,0.92), (30000,0)} with
‘rather important’, a location preference {(Taipei City, totally satisfactory), (Taipei
County, very satisfactory), (Taoyuan County, moderately satisfactory)} with ‘most
important’, his education ‘Master’ with ‘moderately important’, and other fuzzy con-
ditions with the default importance ‘don’t care’. (2) A job stored in databases contains
the salary offer {(20000,1), (30000,0.75), (40000,0)}, the location offer {(Taichung
City, very satisfactory), (Taipei city, rather satisfactory), (Taoyuan County, rather
satisfactory)}, the education preference {(Master, very satisfactory), (Bachelor, rather
satisfactory)}, and a set of attributes with fuzzy data. We apply FWA to calculate the
overall degree of satisfaction between the fuzzy query and the fuzzy data as follows.
(0.5,0.75,1) ⊗ 0.285 ⊕ (1,1,1) ⊗ (0.5,0.75,1) ⊕ (0.25,0.5,0.75) ⊗ (0.75,1,1)
y=
(0.5, 0.75, 1) ⊕ (1, 1, 1) ⊕ (0.25, 0.5, 0.75)
(0.1425, 0.21375, 0.285) ⊕ (0.5, 0.75, 1) ⊕ (0.125, 0.5, 0.75)
=
(0.5, 0.75, 1) ⊕ (1, 1, 1) ⊕ (0.25, 0.5, 0.75)
(0.7675, 1.46375, 2.035) 1.431765
= ≈ = 0.63634
(1.75, 2.25, 2.75) 2.25
Applying the mathematical operations on fuzzy numbers [7,12,16], we get two fuzzy
numbers (0.7675, 1.46375, 2.035) and (1.75, 2.25, 2.75). The center of gravity is
adopted to defuzzify a fuzzy number [14], which is achieved by mathematical inte-
gral. Therefore, the overall degree of satisfaction between the fuzzy query and the
fuzzy data is 0.63634 (i.e. 63.634%).
588 L.-F. Lai et al.
3 Conclusion
In this paper, we apply the fuzzy logic theory to develop a fuzzy query mechanism for
human resource websites. The advantages of the proposed approach are as follows.
• Users’ preferences often contain imprecision and uncertainty. Our approach
provides a mechanism to express fuzzy data in human resource websites and to
store fuzzy data into conventional database management systems without modi-
fying DBMS models.
• Traditional SQL queries are based on total matching which is limited in its abil-
ity to come to grips with the issues of fuzziness. Our approach provides a
mechanism to state fuzzy queries by fuzzy conditions and to differentiate be-
tween fuzzy conditions according to their degrees of importance.
• Traditional SQL queries fail to deal with the compensation between different
conditions. Our approach provides a mechanism to aggregate all fuzzy conditions
based on their degrees of importance and degrees of matching. The ordering of
query results via the mutual compensation of all fuzzy conditions is helpful to al-
leviate the problem of the information overload.
References
1. http://hotjobs.yahoo.com/
2. http://www.104.com.tw/
3. http://www.1111.com.tw/
4. http://www.find-job.net/
5. http://www.monster.com/
6. Chang, P.T., Hung, K.C., Lin, K.P., Chang, C.H.: A Comparison of Discrete Algorithms
for Fuzzy Weighted Average. IEEE Transactions on Fuzzy Systems 14(5), 663–675 (2006)
7. Dubois, D., Prade, H.: Fuzzy Sets and Systems: Theory and Applications, New York,
London (1980)
A Fuzzy Query Mechanism for Human Resource Websites 589
8. Galindo, J., Urrutia, A., Piattini, M.: Fuzzy Databases: Modeling, Design and Implementa-
tion. Idea Group Publishing, Hershey (2005)
9. Guu, S.M.: Fuzzy Weighted Averages Revisited. Fuzzy Sets and Systems 126, 411–414
(2002)
10. Kao, C., Liu, S.T.: Competitiveness of Manufacturing Firms: An Application of Fuzzy
Weighted Average. IEEE Transactions on Systems, Man, and Cybernetics – Part A: Sys-
tems and Humans 29(6), 661–667 (1999)
11. Kaufmann, A., Gupta, M.M.: Introduction to Fuzzy Arithmetic: Theory and Applications.
Van Nostrand Reinhold, New York (1985)
12. Lai, Y.J., Hwang, C.L.: Fuzzy Mathematical Programming, Methods and Applications.
Springer, Heidelberg (1992)
13. Ngai, E.W.T., Wat, F.K.T.: Fuzzy Decision Support System for Risk Analysis in E-
Commerce Development. Decision Support Systems 40(2), 235–255 (2005)
14. Tseng, T.Y., Klein, C.M.: A New Algorithm for Fuzzy Multicriteria Decision Making. In-
ternational Journal of Approximate Reasoning 6, 45–66 (1992)
15. Zadeh, L.A.: Fuzzy Sets. Information and Control 8, 338–353 (1965)
16. Zimmermann, H.J.: Fuzzy Set Theory and Its Applications, 2nd revised edn. Kluwer Aca-
demic Publishers, Dordrecht (1991)
Selecting Cooperative Enterprise in Dynamic Enterprise
Alliance Based on Fuzzy Comprehensive Evaluation
1 Introduction
Dynamic enterprise alliance is a complex organization system. It is one organization of
cooperation and competition composed by two or more than two enterprises which have
the common strategic interests and share the enterprise resources each other in order to
achieve the business strategies and certain objectives, and restricted by the various
agreements and contracts in one certain period each other [1-3]. The development of
network manufacturing technology is restricted by the network manufacturing environ-
ment resource management, and suitable cooperative partner is the necessary condition
to ensure the efficient functioning of the dynamic enterprises alliance. It is the key tech-
nology to evaluate the cooperative partner synthetically and select the optimum coop-
erative partner scientifically to realize the network manufacturing.
Selecting the optimum cooperative partner is a typical combinatorial optimization
problem, and it also is one key problem in the process of establishing the dynamic
enterprise alliance. It is very difficult to select the optimum cooperative partner sim-
ply depending on the experience of the influence factors. The fuzzy comprehensive
evaluation is one kind of effective multitudinous factors decision method used to
evaluate the object synthetically affected by various factors, and the influence factors
of selecting the cooperative partner always are uncertain, so the fuzzy comprehensive
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 590–598, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Selecting Cooperative Enterprise in Dynamic Enterprise Alliance 591
The influence factor set is one ordinary set comprised by various factors influencing
the fuzzy comprehensive evaluation on the cooperative partner. It mainly takes into
account six factors while the sales enterprise is evaluated, such as the economic
strength, marketing strategy, economic benefit, development prospect, staff quality
and sales channels.
The economic strength factor mainly includes registered capital, permanent assets,
bank loans and liquid assets. The marketing strategy factor mainly includes marketing
purposes, marketing planning, market research and market positioning. The economic
benefit factor mainly includes return on assets, sales net profit rate, profit-tax rate of
cost and velocity of liquid assets. The development prospect factor mainly includes
enterprise culture, customer relationship, sales achievement, service quality and strain
capacity. The staff quality factor mainly includes professional dedication, professional
skill, insight, psychology bearing capacity and decision-making ability. And the sales
channels factor mainly includes the economic benefits of sales channels, the ability of
enterprise to control the sales channels and the adaptability of the sales channels to
the market environment. Thus, the influence factor set U can be represented as
U = ( u1 u2 u3 u4 u5 u6 ) (1)
592 Y. Li, G. Zhang, and J. Liu
In (1), the subset u1 indicates economic strength, the subset u 2 indicates marketing
strategy, the subset u 3 indicates economic benefit, the subset u4 indicates develop-
ment prospect, the subset u5 indicates staff quality and the subset u6 indicates sales
channels.
The economic strength subset u1 can be represented as
In (2), u11 indicates registered capital, u12 indicates permanent assets, u13 indicates
bank loans and u14 indicates liquid assets.
The marketing strategy subset u2 can be represented as
In (3), u21 indicates marketing purposes, u22 indicates marketing planning, u23 indi-
cates market research and u24 indicates market positioning.
The economic benefit subset u3 can be represented as
In (4), u31 indicates return on assets, u32 indicates sales net profit rate, u33 indicates
profit-tax rate of cost and u34 indicates velocity of liquid assets.
The development prospect subset u4 can be represented as
In (5), u41 indicates enterprise culture, u42 indicates customer relationship, u43 indicates
sales achievement, u44 indicates quality of service and u45 indicates strain capacity.
The quality of staff subset u5 can be represented as
In (6), u51 indicates professional dedication, u52 indicates professional skill, u53 indi-
cates insight, u54 indicates psychology bearing capacity and u55 indicates decision-
making ability.
The sales channels subset u6 can be represented as
In (7), u61 indicates economic benefits of sales channels, u62 indicates ability of en-
terprise to control the sales channels and u63 indicates adaptability of the sales chan-
nels to the market environment.
Selecting Cooperative Enterprise in Dynamic Enterprise Alliance 593
The evaluation set is comprised of various possible evaluate results made by the deci-
sion makers. The economic strength subset u1 , marketing strategy subset u2 , eco-
nomic benefit subset u3 , development prospect subset u4 , staff quality subset u5 and
sales channels subset u6 can be separately evaluated by the set of {good, relatively
good, general, bad, sucks}, {high, relatively high, general, low, very low} and {long,
relatively long, general, short, very short}, and so on. In this paper, the evaluation set
of economic strength subset u1 , marketing strategy subset u2 , economic benefit
subset u3 , development prospect subset u4 , staff quality subset u5 and sales channels
subset u6 are unified into one evaluation set V , it can be represented as
V = ( v1 v2 v3 v4 v5 ) (8)
In (8), v1 means excellent, v2 means good, v3 means middle, v4 means pass muster
and v5 means bad.
The weight set is comprised of every influence factor’s weight number. It can reflect
every influence factor’s importance. Assuming ai is the weight number of the influ-
ence factor ui , thus, the weight set A can be represented as
A = ( a1 a2 L am ) (9)
Usually, every influence factor’s weight number should meet polarity and non-
negativity constraint. i.e.
⎧ m
⎪ ∑ ai = 1
⎨ i =1 (10)
⎪0 ≤ a ≤ 1
⎩ i
Different evaluator maybe has the different attitude toward the same thing, and the
weight numbers offered by them also are different. In this paper, the weighted statis-
tics method is adopted to determine the weight number of every influence factor [8].
Firstly, make a weight distribution questionnaire (shown as table 1), then ask some
experts or related people fill in the optimum weight number who believe, after taking
bake the weight distribution questionnaires, adopt the weighted statistics method to
calculate the weight number A .
influence factor ui u1 u2 u3 ∑
weight number ai a1 a2 a3 1
594 Y. Li, G. Zhang, and J. Liu
The weight number set A can be calculated through the statistical investigation on
influence degree of the six subsets u1 , u 2 , u3 , u 4 , u 5 and u6 on the making-
decision of selecting the cooperative partner decision.
A = ( 0.2 0.23 0.12 0.08 0.15 0.22 ) (11)
In the same way, the weight number sets of the six subsets u1 , u 2 , u3 , u 4 , u 5 and
u6 also can be calculated as
A1 = ( 0.23 0.18 0.27 0.32 ) (12)
Based on the table 2, the membership function can be structured and represented as
⎧ 1 (ai 0 ≤ x ≤ ai1 )
⎪
µi1 ( x ) = ⎨(ai 2 − x) (ai 2 − ai1 ) (ai1 ≤ x ≤ ai 2 ) (18)
⎪ (ai 2 ≤ x ≤ aim )
⎩ 0
In (18), i = 1, 2,L , n .
⎧0 (ai 0 ≤ x ≤ aij − 2 )
⎪( x − a ) ( a − a ) (aij − 2 ≤ x ≤ aij −1 )
⎪ ij − 2 ij −1 ij − 2
⎪
µi j ( x ) = ⎨1 (aij −1 ≤ x ≤ aij ) (19)
⎪( a − x ) ( a − a ) (aij +1 ≤ x ≤ aim )
⎪ ij +1 ij +1 ij
⎧0 (ai 0 ≤ x ≤ aim − 2 )
⎪
µim ( x ) = ⎨( x − aim − 2 ) (aim −1 − aim − 2 ) (aim − 2 ≤ x ≤ aim −1 ) (20)
⎪1
⎩ (aim −1 ≤ x ≤ aim )
In (20), i = 1, 2,L , n .
The membership degree µij of the j-th evaluating indicator of the influence factor
ui on one certain technological design scheme can be calculated through the mem-
bership function.
Assuming the primary fuzzy comprehensive evaluation is carried out on the influence
factor ui in the influence set U , thus, the membership degree µij of the j-th evaluat-
ing indicator of the influence factor ui can be calculated through the membership
function, and the evaluation result of the single factor ui can be represented by the
fuzzy set as Rij
µi1 µi 2 µim
Rij = + +L + (21)
v1 v2 vn
In (21), Rij is the single factor evaluation set, it can be simply represented as
At the same way, the evaluation set corresponding to every influence factor ui can be
obtained. And the single factor evaluation matrix Ri can be represented as
Ri = [ Ri1 Ri 2 L Ri 4 ]
T
(23)
With the primary evaluation set B1 , B2 , L , and Bk , the single factor evaluation
matrix R of the influence factor set U can be represented as
Through the normalization of the fuzzy comprehensive evaluation set B , the fuzzy
comprehensive evaluation on one certain technological design scheme can be carried
out with the maximum membership degree method, the weighted average method or
the fuzzy distribution method.
The comprehensive evaluation result of the subset u1 can be calculated and repre-
sented as B1
The comprehensive evaluation result of the factor set U can be calculated and repre-
sented as B
B = A o R = ( 0.20 0.21 0.22 0.09 0.09 ) (30)
To assign the value to the evaluation set V , assuming the value of the evaluation set
V is
V = (1 0.85 0.75 0.6 0.5 ) (32)
Think of b j is the weight of the evaluation target v j , and the final evaluation result of
the certain sales enterprise can be calculated with the weighted average method.
6
V = ∑ bj ⋅ v j =
j =1 (33)
1× 0.247 + 0.85 × 0.259 + 0.75 × 0.272 + 0.6 × 0.111 + 0.5 × 0.111 = 0.793
It means the fuzzy comprehensive evaluation value of the certain sales enterprise is
0.793.
At the same way, the fuzzy comprehensive evaluation value of all the sales enter-
prises which have the cooperation intent can be obtained. And the sales enterprise it
has the maximal fuzzy comprehensive evaluation value can be selected as the opti-
mum cooperative partner.
598 Y. Li, G. Zhang, and J. Liu
6 Conclusion
It can play the role of experts adequately, reduce the harm caused by the personal
subjective assume, and provide the scientific basis for evaluating the enterprises
which have the cooperation intent based on comprehensive evaluation of the coopera-
tive partners in the dynamic enterprise alliance with the fuzzy comprehensive evalua-
tion method. It can improve the assessment level on the cooperative partners and
make the evaluation result more scientific with the fuzzy comprehensive evaluation
method.
References
1. Wanshan, W., Yadong, G., Peili, Y.: Networked manufacturing. Northeast University Press,
Shenyang (2003) (in Chinese)
2. Jicheng, L., Jianxun, Q.: Dynamic alliance synergistic decision-making model based on
business intelligence center. In: International Symposium on Information Processing, ISIP
2008 and International Pacific Workshop on Web Mining and Web-Based Application,
WMWA 2008, Moscow, Russia, May 23-25, pp. 219–223 (2008)
3. Congqian, Q., Yi, G.: Partner optimization of enterprises dynamic alliance. Journal of
Tongji University 35(12), 1674–1679 (2007) (in Chinese)
4. Lixin, W.: Fuzzy System & Fuzzy Control Tutorial. Tsinghua University Press, Beijing
(2003) (in Chinese)
5. Baoqing, H.: The basis of fuzzy theory. Wuhan University Press, Wuhan (2004)
6. Bing, Z., Zhang, R.: Research on fuzzy-grey comprehensive evaluation of software process
modeling methods. In: Proceedings - 2008 International Symposium on Knowledge Acqui-
sition and Modeling, KAM 2008, Wuhan, China, December 21-22 (2008)
7. Xulin, L., Baowei, S.: Three level fuzzy comprehensive evaluation based on Grey Rela-
tional Analysis and Entropy weights. In: 2008 International Symposium on Information
Science and Engineering, ISISE 2008, Shanghai, China, December 20-22, vol. 2, pp. 32–35
(2008)
8. Korotchenko, M.A., Mikhailov, G.A., Rogazinskii, S.V.: Value modifications of weighted
statistical modelling for solving nonlinear kinetic equations. Russian Journal of Numerical
Analysis and Mathematical Modelling 22(5), 471–486 (2007)
9. Shinguang, C., Yikuei, L.: On performance evaluation of ERP systems with fuzzy mathe-
matics. Expert Systems with Applications 36(3 Part 2), 6362–6367 (2009)
An Evolutionary Solution for Cooperative
and Competitive Mobile Agents
Abstract. The cooperation and competition among mobile agents using evolu-
tionary strategy is an important domain in Agent theory and application. With
evolutionary strategy the cooperation process is achieved by training and iterat-
ing many times. From evolutionary solution of cooperative and competitive
mobile agents (CCMA), a group of mobile agents are partitioned into two popu-
lations, cooperative agents group and competitive agent group. Cooperative
agents are treated as several pursuers, while a competitive agent is viewed as
the pursuers’ competitor called evader. The cooperation actions take place
among the pursuers in order to capture the evader as rapidly as possible. An
agent individual (chromosome) is encoded based on a kind of two-dimensional
random moving. The next moving direction is encoded as chromosome. The
chromosome can be crossed over and mutated according to designed operators
and fitness function. An evolutionary algorithm for cooperation and competi-
tion of mobile agents is proposed. The experiments show that the algorithm for
this evolutionary solution is effective, and it has better time performances and
convergence.
1 Introduction
Mobile Agent is a kind of software Agent. Mobile Agent has many characteristics,
such as autonomy, sociality, self-learning, and importantly mobility[1]. Mobile Agent
can move from one position to another. The moving involves in the states transition
and Agent’s states change. Under current context and state mobile Agent autono-
mously determines what time to move and what place it moves to[2,3].
Cooperative mobile agents have the ability of cooperation and adaptability[4,5].
Firstly, agents can cooperate by exchanging data and/or code when they meet on a
given condition. Secondly, the behavior of an agent can change based on its current
state and the information it has gathered while traveling.
In this paper the cooperation among mobile agents is studied using evolutionary
strategy. The determination of cooperation process and form is achieved by training
and iterations many times. The evolvement result is that a reasonable cooperation
strategy is obtained and stable.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 599–607, 2009.
© Springer-Verlag Berlin Heidelberg 2009
600 J. Fan, J. Ruan, and Y. Liang
2 Relative Work
The combination of evolutionary algorithm and cooperative mobile agents has been
studied in some aspects, most of which use agents to study or implement evolutionary
computation processes. The literature [6,7] presented an agent-based version of coop-
erative co-evolutionary algorithm. This type of systems has been already applied to
multi-objective optimization. Evolution learning has also been studied for multi-agent
with strategic coalition [8], which used the iterated prisoner’s dilemma game to model
the dynamic system in an evolutionary learning environment. In literature [9], an evo-
lution strategy is introduced on the basis of cooperative behaviors in each group of
agents. The evolution strategy helps each agent to be self-defendable and self-
maintainable, and agents in same group cooperate with each other. This method use
reinforcement learning, enhance neural network and artificial life. In literature [10] a
neural network is used for the behavior decision controller. The input of the neural
network is decided by the existence of other agents and the distance to the other agents.
The output determines the directions in which the agent moves. The connection weight
values of this neural network are encoded as genes, and the fitness of individuals is
determined using a genetic algorithm. There are also other studies from the perspec-
tives of game theory [11,12,13]. Other studies concentrating on neural computing can
be obtained in literatures [14,15,16].
(3) If there don’t exist reflection barriers around the Agents, the Agents move to the
adjacent areas with probability 1/8.
If the Agents have preferences under particular conditions,
(4) The Agents move to the adjacent areas with probability λi P(di), where
λi is called preference coefficient. λi denotes a measure of Agenti’s customs or cogni-
tive styles of moving directions,
di denotes moving direction of Agenti at particular moment, and at this moment
there exist reflection barriers adjacent to Agenti, then
∑i=1 to 3 λi P(di) =1 . or ∑i=1 to5 λi P(di) =1 . (1)
else,
∑i=1 to 8 λi P(di) =1 . (2)
The moving is called two-dimensional random moving if it is accord with the above
items.
Example 1. There are three Agents showed in figure 1 (a) (b) (c) respectively. (a)
shows an Agent locates at a place where there is no reflection barriers around. There
are 8 alternative directions to move towards. It probably likes to move upward verti-
cally or others. (b) shows left side of an Agent’s location is reflection barrier. There
are 5 directions to choose. (c) shows there are only three directions to choose because
there are two reflection barriers around the Agent.
(a) no reflection barriers (b) reflection barriers at one side (c) reflection barriers at two sides
right correspond to the locations 0 to 7. The bit 1 denotes the agent moves to the di-
rection, and 0 doesn’t. On no barriers circumstances the encoding resembles with the
above. The only difference is that the directions with barriers are coded with a charac-
ter neither 1 nor 0, such as an asterisk.
0 1 2 1 2
7 · 3 · 3
6 5 4 5 4
Selection. The chromosomes compete against each other and the chromosome with
the highest fitness value is selected. In this paper distance-based function is adopted
as the fitness function, that is, the Euclid distance between pursuer and evader is the
fitness value. The smaller fitness value is better for pursuing agent while the larger
fitness value is better for evading agent.
Crossover. The crossover operation exchanges parts of a pair of chromosomes, creat-
ing new chromosomes called children or offspring (If crossover does not succeed in
the probability test, then the children will be identical copies of their parents). The
crossover operation needs two chromosomes, so the population with over two agents
can be applied with crossover operator. An agent moves to only one direction. There
should be strict limitation that the offspring must have one bit 1 only. Figure 3 illus-
trates a crossover operation in the solution.
Parents A: 1 0 0 0 0 0 0 0 B: 0 0 0 1 0 0 0 0
Offspring C: 0 1 0 0 0 0 0 0 D: 0 0 1 0 0 0 0 0
Mutation. The mutation operator aims to increase the variability of the population,
allowing the evolution to simultaneously search different areas of the solution space.
This operator changes at random the value of a chromosome gene, also randomly
chosen with a given probability (named mutation rate). Also, an agent moves to only
one direction. If one bit 1 becomes 0, some other bit 0 should become 1. Figure 4
shows a mutation operation that changes the value of the forth gene from 0 to 1.
An Evolutionary Solution for Cooperative and Competitive Mobile Agents 603
Before mutation 0 0 1 0 0 0 0 0
After mutation 0 0 0 0 1 0 0 0
Let k cooperative agents (a cooperative agent abbr. cooAg) and a competitive agent
(abbr. comAg) consist of a niche in which k cooAgs pursue a comAg. In the process
of all agents’ two-dimensional random moving cooAgs pursue comAg and want to
capture comAg as rapidly as possible. The comAg tries its best not to be captured.
Once one of the cooAgs captures comAg, the algorithm halts. A penalty factor is
designed in the algorithm in order to evaluate the quality of cooAgs in pursuing co-
mAg. The penalty factor α is defined as formula (3).
Let
x’ = P(cooAgi).x - P(cooAgi).x
y’ = P(cooAgi).y - P(cooAgi).y
D = (x’2 + y’2 )1/2
Then
α, D ≤ θ0 .
α(cooAgi) = α+1, θ0 < D ≤ θ1 . (3)
α+2, θ1 < D ≤ θ2 .
(1) Initialize distance threshold value Ti, time interval T, capture distance d;
//initialize penalty factor D
(2) for (int i=0; i<=k; i++)
(3) D(cooAg[i]) = 0;
(4) t = 0; //start from time 0
(5) while (true){
t = t+T;
(6) for (int j=0; j<=k; j++)
(7) Compute D(cooAg[i], comAg);
(8) if(D<=d){
(9) Capture succeed and break; }
(10) else{
(11) Compute D(cooAg[i]) according to D and formula (1); }
(12) Select the chromosome with the smallest D to the next generation;
(13) Crossover the chromosomes with bigger D to produce offspring;
(14) Mutate the chromosome with the largest D;
(15) if (t exceeds the longest time threshold) break;
(16) } //The algorithm halt
In formula (3), P( · ) is agent’s position in coordinate system. θ0, θ1, θ2 are threshold
values of distance. The distances between cooAgs and comAg are computed at a fixed
interval T. From time 0, at time kT (k=1, 2, …) the distance D between each cooAg
and comAg is computed and the penalty factor α is computed according to D. The
chromosome with smaller α has better fitness. The chromosomes of cooAgs with
bigger α are crossed over or mutated. The chromosome with the smallest α is copied
to the next generation.
In our algorithm, it is assumed that if the distance D between some cooAg and co-
mAg is less than a threshold value d, the cooperative capture succeeds. The threshold
d is called capture distance. The algorithm description is given by figure 5.
5 Experimental Analysis
Assume that XOY shown in figure 1 is participants’ active area. The whole active
area is discrete, and the participants perform two-dimensional random moving in
XOY. In our experiments, each cell is represented by coordinate value, that is, a cell
as a coordinate point. Although the agents move randomly, they have their own pref-
erences, which are illustrated in figure 6 and 7. In figure 6 two persuading agents and
P1
P1
P2
P2
E E
P1
P3
P3
P1
P2
P2
an evading agent are interacting in some niche. In figure 6(a) the evader prefers to
move to the direction on the whole as the arrow shows, but two cooperative persuad-
ers P1 and P2 search within the scope of area they like. Under this circumstance, P1
and P2 can’t capture the evader however they endeavor. On the contrary, figure 6(b)
shows that the evader E has preferential moving direction meeting with the pursuer
P1’s. Figure 7 shows the similar situation but with three persuading agents.
In this experiments, the evading agent moves at random. The algorithm proposed in
this paper is mainly used to solve the problems shown in figure 6(a) and figure 7(a).
During evolution the point with the next moving direction towards the evading
agent’s preferential direction and the nearest distance with the evading agent is se-
lected to continue the persuader’s moving. Under other circumstances the agents’
chromosome is crossed over or mutated.
Table 1 shows the comparison of random capture times and evolutionary capture
times with three cooperative agents. The capture distance d takes values 50, 100 and
200. Maximized moving times are designed with six values. With each d and all de-
signed moving times 50 mobile experiments are implemented with random and evolu-
tionary modes respective. The experiment results show that the times of evolutionary
cooperative mobile agents capture successfully are more than random mobile agent
obviously.
The average iterative generations and average capture time according to different
capture distances 50, 100 and 200 listed in table 2. From the table we can see that at
each distance less moving times and more moving times can lead to more average
iterative generations and capture time than medium moving times. This is because
less moving times make the agents’ mobility ceasing much faster, and more moving
times make their mobile preferences more diverse.
Table 2. Average iterative generations and average capture time according to different capture
distances
6 Conclusions
References
1. Rao, A.S., Georgeff, M.P.: BDI agents: From theory to practice. In: Proc. the First Interna-
tional Conference on Multi-Agent Systems, San Franciso, CA, USA, pp. 312–319 (1995)
2. Lange, D.B., Oshima, M.: Seven good reasons for mobile agents. Communications of the
ACM 42(3), 88–89 (1999)
3. Kotz, D., Gray, R.S.: Mobile Agents and the Future on the Internet. ACM Operating Sys-
tems Review 33(3), 7–13 (1999)
4. Bassett, J., De Jong, K.: Evolving behaviors for cooperating agents. In: Ras, Z. (ed.) Pro-
ceedings from the Twelfth International Symposium on Methodologies for Intelligent Sys-
tems, Charlotte, NC, pp. 157–165. Springer, Heidelberg (2000)
5. Berenji, H., Vengerov, D.: Learning, cooperation, and coordination in multi-agent systems,
Technical Report IIS-00-10, Intelligent Inference Systems Corp., 333 W. Maude Avennue,
Suite 107, Sunnyvale, CA 94085–4367 (2000)
6. Dreżewski, R., Siwik, L.: Agent-based co-operative co-evolutionary algorithm for multi-
objective optimization. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M.
(eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 388–397. Springer, Heidelberg (2008)
7. Dreżewski, R., Siwik, L.: Multi-objective optimization technique based on co-evolutionary
interactions in multi-agent system. In: Giacobini, M. (ed.) EvoWorkshops 2007. LNCS,
vol. 4448, pp. 179–188. Springer, Heidelberg (2007)
8. Yang, S.-R., Cho, S.-B.: Co-evolutionary learning with strategic coalition for multiagents.
Appl. Soft Comput. 5(2), 193–203 (2005)
9. Lee, M.: A study of evolution strategy based cooperative behavior in collective agents. Ar-
tificial Intelligence Review 25(3), 195–209 (2006)
10. Lee, M., Chang, O.-b., Yoo, C.-J., et al.: Behavior evolution of multiple mobile agents un-
der solving a continuous pursuit problem using artificial life concept. Journal of Intelligent
and Robotic Systems 39(4), 433–445 (2004)
11. Nitschke, G.: Designing emergent cooperation: a pursuit- evasion game case study. Artifi-
cial Life Robotics 9(4), 222–233 (2005)
12. Tanev, I., Shimohara, K.: Effects of learning to interact on the evolution of social behavior
of agents in continuous predators-prey pursuit problem. In: Banzhaf, W., Ziegler, J.,
Christaller, T., Dittrich, P., Kim, J.T. (eds.) ECAL 2003. LNCS (LNAI), vol. 2801, pp.
138–145. Springer, Heidelberg (2003)
13. Tanev, I., Brzozowski, M., Shimohara, K.: Evolution, generality and robustness of
emerged surrounding behavior in continuous predators-prey pursuit problem. Genetic Pro-
gramming and Evolvable Machines 6(3), 301–318 (2005)
14. Lee, M., kang, E.-K.: Learning enabled cooperative agent behavior in an evolutionary and
competitive environment. Neural compu. & applic. 15(2), 124–135 (2006)
15. Lee, M.: Evolution of behaviors in autonomous robot using artificial neural network and
genetic algorithm. Information Sciences 155(1-2), 43–60 (2003)
16. Caleanu, C.-D., Tiponut, V., et al.: Emergent behavior evolution in collective autonomous
mobile robots. In: 12th WSEAS international conference on system, Heraklion, Greece,
July 2008, pp. 428–433 (2008)
Towards an Extended Evolutionary Game Theory with
Survival Analysis and Agreement Algorithms for
Modeling Uncertainty, Vulnerability, and Deception
Zhanshan (Sam) Ma
1 Background
The endeavor to develop evolutionary game theory (EGT) was initiated in the 1970s
by John Maynard-Smith and Peter Price with the application of game theory to the
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 608–618, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Towards an Extended EGT with Survival Analysis and Agreement Algorithms 609
modeling of animal conflict resolution [21]. Their pioneering research not only
provided a refreshing mathematical theory for modeling animal behavior, but also
revealed novel insights to evolution theory and greatly enriched the evolutionary
ecology. It also quickly became one of the most important modeling approaches for
emerging behavioral ecology in the 1980s. The well-known prisoner's dilemma (PD)
game (especially its latter versions), which plays a critical role in the study of coop-
eration or altruism, is essentially an evolutionary game [1][3]. Today, the applications
of EGT have been expanded to fields well beyond behavioral ecology.
Evolutionary game is a hybrid of continuous static and continuous dynamic games
(differential games) in the sense that its fitness (payoff) functions are continuous, but
the strategies are inheritable phenotypes, which usually are discrete. In the study of
animal communication or engineering communication (such as wireless sensor net-
works), the strategy can be the behavior of animals or communication nodes. Another
feature is that evolutionary game is played by a population of players repeatedly.
Although the latter is not a unique feature of evolutionary games, the integration with
population dynamics theory is somewhat unique and makes it particularly powerful
for modeling biological systems or any systems where population paradigm is impor-
tant [4][9][21][23]. One of the most frequently used models is the replicator dynam-
ics model [4][23].
In evolutionary game theory, replicator dynamics is described with differential
equations. For example, if a population consists of n types E1 , E2 ,..., En with frequen-
cies x1, x2 ,..., xn . . The fitness fi (x) of Ei will be a function of the population structure,
or the vector, x = ( x1, x2 ,..., xn ) . Following the basic tenet of Darwinism, one may
define the success as the difference between the fitness fi (x) of Ei and the average
fitness of the population, which is defined as:
f ( x) = ∑ x f ( x).
i i (1)
for i = 1,2,..., n. The population x(t ) ∈ Sn, where S n is a simplex, which is the space
for population composition, is similar to mixed strategies in traditional games [4][23].
Another formulation is the Fitness Generating Function (G-function) and is invented
by Vincent and Brown (2005) [23] to specify groups of individuals within a popula-
tion. Individuals are assigned the same G-function if they possess the same set of
evolutionarily feasible strategies and experience the same fitness consequences within
a given environment [23].
EGT still exposes certain limitations in dealing with dynamic (time-, space-, and/or
covariate-dependent) uncertainty, vulnerability, and deception of game players. For
example, the so-called UUUR (Unknown, latent, Unobserved, or Unobservable Risks)
events exist in many problems such as the study of reliability, security and survivabil-
ity of computer networks, and UUUR events are largely associated with uncertainty,
vulnerability, or their combinations (the frailty) [9][19]. Another challenge is the
modeling of deception in communication among the players: the players not only
610 Z. (Sam) Ma
communicate, but also need to reach consensus (agreement) or make decisions with
the existence of dynamic frailty and deception [9][10].
In this paper, I introduce two extensions to EGT to primarily deal with the above
mentioned limitations. In the first extension, I introduce survival analysis and its
'sister' subjects (competing risks analysis and multivariate survival analysis), which
have some uniquely powerful features in dealing with UUUR events, to deal with
dynamic uncertainty and vulnerability. The second extension is designed to address
deception of game players: the consequence of dynamic deception and frailty on the
game strategies, as well as the capability for game players to reach an agreement
under the influences of deception and frailty. There is a third extension—using hedg-
ing principle from mathematical finance for decision-making, which is necessary for
some applications such as survivability analysis [9], prognostic and health manage-
ment in aerospace engineering [19], and strategic information warfare [10] [18], but is
skipped here.
The significance of EGT is probably best revealed by a brief analysis of the fun-
damental elements that Darwin's evolutionary theory addresses. Competition, coop-
eration and communication were clearly in Darwin's primary concerns. According to
an historical analysis by Dugatkin [3] in his landmark volume "Origin of Species"
(Darwin 1859), Darwin focused on the competition, or the struggle for life, but Dar-
win was also clearly concerned with the cooperation or altruism in nature. In his
"Origin of Species," Darwin's solution to altruism was that selection may be applied
to the family, and individual may get the desired benefit ultimately. It took near a
century for scientists to formalize Darwin's idea mathematically with a simple equa-
tion, known as Hamilton's Rule, first formulated by William Hamilton (1964). About
two decades later, Hamilton's collaborative work with political scientist Robert Ax-
elrod ([1]), which implemented Robert Trivers' (1971) suggestion of using PD game
to study altruism, led to the eruption of the study of cooperation in the last three dec-
ades [3]. Today, there is hardly a major scientific field to which PD game has not
been applied. The PD game, especially its latter versions, is essentially the evolution-
ary game. Besides competition and cooperation, Darwin also published a volume on
communication, titled "The Expression of the Emotions in Man and Animals" in 1872. A
century later, "The Handicap principle: a missing piece of Darwin's Puzzle" by Zahavi
(1997) opened a new chapter in the study of animal communication. It was also the
EGT modeling that led to the wide acceptance of the Handicap principle. The sim-
plest EGT model for this principle is the Sir Philip Sydney (SPS) game, which has
been used to elucidate the handicap principle—that animal communication is honest
(reliable) as long as the signaling is costly in a proper way. Therefore, the phenome-
non that EGT seems to offer the most lucid modeling elucidation for both cooperation
and communication shows its power. Of course, the fact that EGT is a 'marriage' be-
tween game theory and Darwin's evolution theory is the best evidence of its impor-
tance in the study of competition (the struggle for life).
The far reaching cross-disciplinary influences of Darwin's evolution theory and PD
game also suggest that the study of communication, especially with EGT (e.g., SPS
game) may spawn important interdisciplinary research topics and generate significant
cross-disciplinary ramifications that are comparable to the study of cooperation and
PD games [9][16]. It is hoped that the extensions to EGT in this paper will also be
helpful to the cross-disciplinary expansion because these extensions address some
Towards an Extended EGT with Survival Analysis and Agreement Algorithms 611
critical issues that exist across several fields from reliability and survivability of dis-
tributed networks, prognostics and health management in aerospace engineering,
machine learning, to animal communication networks [9][10][16]–[20].
In the following, we assume that the players of an evolutionary game form a popu-
lation, similar to a biological population. Population dynamics over space and time is
influenced by environmental covariates (factors). The covariates often affect the life-
time or fitness of game players by introducing uncertainty and/or frailty. Players may
have stochastically variable vulnerabilities. In addition, the individual players may
form various spatial structures or topologies, which can be captured with a graph
model or evolutionary games on graphs. Players can form a complex communication
network; in the networks, some act as eavesdroppers, bystanders or audience. The
communication signals may be honest or deceitful depending on circumstance, or the
so-called dynamic deception. Finally, the 'failure' (termination of lifetime or a proc-
ess) of players in engineering applications is often more complex than biological
death in the sense that a failing computer node may send conflicting signals to its
neighbor nodes, or the so-called asymmetric faults. Indeed, the challenge in dealing
with somewhat arbitrary Byzantine fault as well as dynamic deception is the major
motivation why we introduce agreement algorithm extension to EGT.
The remainder of this paper is organized as follows: Section 2 and 3 summarize the
first extension with survival analysis, and the second extension with agreement algo-
rithms, to EGT, respectively. Section 4 discusses implementation issues related to the
EEGT, with the modeling of animal communication networks as an example. Due to
the complexity involved, the EEGT has to be implemented as a simulation environ-
ment in the form of software. The page space limitation makes it only feasible to
sketch a framework of the EGGT. In the following, I often use the network settings
for survivability analysis, strategic information warfare, or animal communication as
examples, and use the terms node, individual, and player interchangeably.
This definition is exactly the same as the traditional reliability function, but survival
analysis has much rich models and statistical procedures such as Cox's proportional
612 Z. (Sam) Ma
hazard modeling [equ. (4) & (5)] and accelerated failure time modeling. Dedicated
volumes have been written to extend the Cox model (see citations in [11]).
Cox proportional hazard model represents conditional survivor function:
⎡ t ⎤
where S 0 ( t ) = exp ⎢ −
⎣
∫0
λ 0 ( u ) du ⎥ .
⎦
(5)
and z is the vector of covariates, which can be any factors that influence the survival
or lifetime of game players.
Competing risks analysis is concerned with the scenario that multiple risks exists but
only one of the risks leads to the failure ([2][13]). The single risk that leads to the failure
becomes the failure cause aftermath, and the other risks that competed for the 'cause' are
just latent risks. Competing risks analysis is neither univariate nor truly multivariate
given its univariate cause and multivariate latent risks. Multivariate survival analysis
deal with truly multivariate systems where multiple failure-causes and multiple failures
(modes or failure times) may exist/occur simultaneously [5][14]. Furthermore, the mul-
tiple failure causes (risks) may be dependent with each other, and so do the multiple
failures (modes or times). Indeed, multivariate survival analysis often offers the most
effective statistical approaches to study dependent failures [5][9][14].
Observation (or information) censoring refers to the incomplete observation of ei-
ther failure events or failure causes or both. Censoring is often unavoidable in the
studies of time-to-event random variables such as reliability analysis. However, tradi-
tional reliability analysis does not have a rigorous procedure to handle censored ob-
servations; either including or excluding the censored individuals may cause bias in
statistical inference.
In network reliability and survivability analysis, Ma (2008) proposed to utilize sur-
vival analysis to assess the consequences of the UUUR (Unpredictable, latent, Unob-
served or Unobservable Risks) events [9]. Mathematically, although the probabilities
of UUUR events are unknown, survival analysis does provide procedures to assess
their consequences, e.g., in the form of the survivor function variations at various
censoring level, the influences of latent risks on failure time and/or modes, or the
effects of shared frailty (the unobserved or unobservable risks) on failure times,
modes, and/or dependencies. From the perspective of application problems such as
network reliability and survivability, UUUR covers a class of risks that are particu-
larly difficult to characterize, such as malicious intrusions, virus or worm infections,
software vulnerabilities that are poorly understood. The event probabilities associated
with those kinds of risks are often impossible to obtain in practice. Furthermore, these
risks are time, space and covariate dependent or dynamic, and this further complicates
the problem. On the other hand, survival analysis models such as Cox proportional
hazard models are designed to deal with covariate dependent dynamic hazards (risks).
Therefore, survival analysis provides a set of ideal tools to quantitatively assess the
consequences of UUUR events, which is recognized as one of the biggest challenges
in network security and survivability research.
In the context of evolutionary games, the concept of UUUR events can be used to
describe 'risk' factors that affect the lifetime or fitness of game players. Of course,
Towards an Extended EGT with Survival Analysis and Agreement Algorithms 613
those factors may affect the fitness positively, negatively, or variable (nonlinearly) at
different times. Therefore, the term 'risk' may bear a neutral consequence, rather than
consistently negative. In general, UUUR captures uncertainty or frailty. The latter can
be considered as the combinations or mixture of uncertainty and vulnerability. For
example, information can dispense uncertainty and consequently eliminate some
perceived vulnerability [9]–[11]. In addition, deception in some applications such as
the study of animal communication networks or strategic information warfare can also
be described with UUUR risks from the perspective of the players [10][16][18].
From above discussion it can be seen that survival analysis models can be har-
nessed to describe the lifetime and/or survivor probability of individual players,
which can often be transformed into the individual fitness. An alternative modeling
strategy is to utilize survival analysis models for population dynamics of the game
players, or the dynamics of meta-populations. For example, in the Dove-Hawk game,
there are two populations: one is the hawk population and another is the dove popula-
tion. Both populations can be described with survival analysis models. The population
modeling with survival analysis should be similar to the modeling of biological popu-
lation dynamics [9].
In summary, the advantages from introducing survival analysis include: (i) flexible
time-, space- and covariate-dependent fitness function; (ii) unique approaches to as-
sess the consequences of UUUR events; (iii) deception modeling, which can be recast
as a reliability problem; (iv) effective modeling of the dependency between 'failure'
events or between the 'risks' that influence the 'failure' events.
Each of the generals may have different (inhomogeneous) time-variant (not constant)
hazard functions, λi(t), i =1, 2, ..., g, where g is the number of generals.
To overcome this limitation of lacking real time notion in tradition hybrid fault
models, Ma and Krings (2008) extended the traditional hybrid fault models with the
so-called Dynamic Hybrid Fault models (DHF) [9][12]. Essentially, there are two
limitations with the traditional hybrid fault models, when they are applied to reliabil-
ity analysis. The first is the lack of the notion of real time, as explained in the previ-
ous paragraph, and the second is the lack of approaches to incorporate hybrid fault
models into reliability analysis after the issues associated with the first limitation is
resolved. The latter depends on the integration of the dynamic hybrid fault models
with the evolutionary game theory.
The solution to the first aspect, the missing notion of real time, or the Agreement-
algorithm aspect of the problem, is to introduce survival analysis. In particularly, time
and covariate dependent survivor functions or hazard functions can be used to de-
scribe the survival of the Byzantine generals. For example, the constraint of the BGP
under oral message assumption is,
N ≥ 3m + 1 (6)
is replaced with the following model in the dynamic hybrid fault models:
N (t ) ≥ 3m(t ) + 1 . (7)
Further assuming that the survivor function of generals is S(t|z), a simplified concep-
tual model can be:
N (t ) = N (t − 1) * S (t | z ) (8)
m(t ) = m(t − 1) * S m (t | z ) (9)
where N(t) and m(t) are the number of total generals and treacherous generals (trai-
tors) at time t, respectively. S(t|z) and Sm(t|z) are the corresponding conditional survi-
vor functions for the total number of generals and traitors, respectively. They can use
any major survival analysis models, such as Cox proportional hazard model, ex-
pressed in equation (4). One immediate benefit of the dynamic hybrid fault models is
that it is now possible to predict the real-time fault tolerance level in a system.
The above extension with survival analysis models is necessary, but not sufficient
for applying the dynamic hybrid fault models to reliability analysis, except for ex-
tremely simple cases. The difficulty arises when there are multiple types of failure
behaviors. This is typical in real world dynamic hybrid fault models. For example, the
failure modes could be symmetric vs. asymmetric, transmissive vs. omissive, benign
vs. malicious, etc. Besides different failure modes, node behaviors can also include:
cooperative vs. non-cooperative, mobile nodes vs. access points (which might be
sessile), etc. To model the different behaviors, we will need multiple groups, or sys-
tem of equations (8) and (9). The challenge is that we lack an approach to synthesize
the models to study reliability and survivability. This is the second limitation (aspect)
of traditional hybrid fault models. The solution to the second limitation (aspect), is a
new notion termed 'Byzantine Generals Playing Evolutionary Games,' first outlined in
Ma (2007, unpublished dissertation proposal).
Towards an Extended EGT with Survival Analysis and Agreement Algorithms 615
may form complex coalitions. Beyond those complexities, there are conditional com-
mitments, which evolve daily. To deal with all the complexity, the 'Byzantine generals
playing evolutionary games' paradigm, which turns the problem into an evolutionary
game, is necessary. Different problems may have different function g, but the principle
should be similar.
In principle, there are four major types of function g: (i) borrow from traditional
games; e.g., Hawk-Dove game; (ii) utilize the replicator dynamics modeling [4]; (iii)
Fitness Generating Function (G-function) [23]; (iv) 'Byzantine Generals Playing
EG'—the approaches outlined above.
Actually, there is a third extension to the EGT with the principle of hedging, which
is necessary when UUUR events dominantly influence decision-making, e.g., dealing
with malicious actions in survivability analysis. The three-layer survivability analysis
is essentially the application of the EEGT to survivable network systems at the tacti-
cal and strategic levels, and further extended the EGT with the hedging principle at
the operational level, which is responsible for the decision-making in managing sur-
vivable network systems or planning information warfare [9][10][18][19].
audience, etc. The second row of five boxes lists the major issues of animal communi-
cation. The third row is the major models (algorithms) involved, such as those from the
EEGT, models for behavior adaptation and evolution (e.g., the Handicap principle),
and other constraints (competition and cooperation). The last row shows the major
results obtainable from the EEGT modeling.
In perspective, it is my opinion that the study of animal communication networks
and their reliability is on the verge to spawn cross-disciplinary research similar to that
generated by the study of cooperation and PD games. Honesty or deception is an issue
potentially interested in by researchers from many disciplines, such as psychology,
sociology, criminal justice study, political and military sciences, economics, and
computer science. Deception can create uncertainty, vulnerability and frailty, but what
makes it particularly hard to model is its asymmetric or Byzantine nature. Further-
more, Byzantine deception and frailty are likely to be the major factors that often
contribute to the unexpected but catastrophic 'crashes.' This may explain the phe-
nomenon that the fraud and corruption in social and economic systems, as well as
deception in information warfare are often among the most illusive and difficult ob-
jects to describe in any quantitative modeling research. It is hoped that the extensions
to EGT introduced in this paper will relieve some of the difficulties.
Fig. 1. The major modules of modeling animal communication networks with the EEGT
References
1. Axelrod, R., Hamilton, W.D.: The evolution of cooperation. Science 211, 1390–1396
(1981)
2. Crowder, M.J.: Classical Competing Risks Analysis, p. 200. Chapman & Hall, Boca Raton
(2001)
3. Dugatkin, L.A.: The Altruism Equation. Princeton University Press, Princeton (2006)
4. Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics, p. 323. Cam-
bridge University Press, Cambridge (1998)
5. Hougaard, P.: Analysis of Multivariate Survival Data, p. 560. Springer, Heidelberg (2000)
6. Lamport, L., Shostak, R., Pease, M.: The Byzantine Generals Problem. ACM Transactions
on Programming Languages and Systems 4(3), 382–401 (1982)
618 Z. (Sam) Ma
7. Lawless, J.F.: Statistical models and methods for lifetime data, 2nd edn. Wiley, Chichester
(2003)
8. Lynch, N.: Distributed Algorithms. Morgan Kaufmann Press, San Francisco (1997)
9. Ma, Z.S.: New Approaches to Reliability and Survivability with Survival Analysis, Dy-
namic Hybrid Fault Models, and Evolutionary Game Theory. PhD Dissertation, University
of Idaho (2008)
10. Ma, Z.S.: Extended Evolutionary Game Theory Approach to Strategic Information War-
fare Research. Journal of Information Warfare 8(2), 25–43 (2009)
11. Ma, Z.S., Krings, A.W.: Survival Analysis Approach to Reliability Analysis and Prognos-
tics and Health Management (PHM). In: Proc. 29th IEEE–AIAA AeroSpace Conference,
p. 20 (2008a)
12. Ma, Z.S., Krings, A.W.: Dynamic Hybrid Fault Models and their Applications to Wireless
Sensor Networks (WSNs). In: The 11-th ACM/IEEE International Symposium on Model-
ing, Analysis and Simulation of Wireless and Mobile Systems, ACM MSWiM 2008, Van-
couver, Canada, p. 9 (2008)
13. Ma, Z.S., Krings, A.W.: Competing Risks Analysis of Reliability, Survivability, and Prog-
nostics and Health Management (PHM). In: Proc. 29th IEEE–AIAA AeroSpace Confer-
ence, p. 21 (2008)
14. Ma, Z.S., Krings, A.W.: Multivariate Survival Analysis (I): Shared Frailty Approaches to
Reliability and Dependence Modeling. In: Proc. 29th IEEE–AIAA AeroSpace Conference,
p. 21 (2008)
15. Ma, Z.S., Krings, A.W.: Dynamic Populations in Genetic Algorithms, SIGAPP. In: 23rd
Annual ACM Symposium on Applied Computing (ACM SAC 2008), Brazil, March 16-20,
p. 5 (2008)
16. Ma, Z.S.: The Handicap Principle for Trust in Computer Security, the Semantic Web and
Social Networking. In: Liu, W., Luo, X., Wang, F.L., Lei, J. (eds.) WISM 2009. LNCS,
vol. 5854, pp. 458–468. Springer, Heidelberg (2009)
17. Ma, Z.S.: Towards a Population Dynamics Theory for Evolutionary Computing: Learning
from Biological Population Dynamics in Nature. In: Deng, H., Wang, L., Wang, F.L., Lei,
J. (eds.) AICI 2009. LNCS (LNAI), vol. 5855, pp. 195–205. Springer, Heidelberg (2009)
18. Ma, Z.S., Krings, A.W., Sheldon, F.T.: An outline of the three-layer survivability analysis
architecture for modeling strategic information warfare. In: Fifth ACM CSIIRW, Oak
Ridge National Lab (2009)
19. Ma, Z.S.: A New Life System Approach to the Prognostic and Health Management (PHM)
with the Three-Layer Survivability Analysis. In: The 30th IEEE-AIAA AeroSpace Confer-
ence, p. 20 (2009)
20. Ma, Z.S., Krings, A.W.: Insect Sensory Systems Inspired Computing and Communica-
tions. Ad Hoc Networks 7(4), 742–755 (2009)
21. Maynard Smith, J., Price, G.R.: The logic of animal conflict. Nature 246, 15–18 (1973)
22. McGregor, P.K. (ed.): Animal Communication Networks. Cambridge University Press,
Cambridge (2005)
23. Vincent, T.L., Brown, J.L.: Evolutionary Game Theory, Natural Selection and Darwinian
Dynamics, p. 382. Cambridge University Press, Cambridge (2005)
24. Zahavi, A., Zahavi, A.: The Handicap Principle: A Missing Piece of Darwin’s Puzzle. Ox-
ford University Press, Oxford (1997)
Uncovering Overlap Community Structure in
Complex Networks Using Particle Competition
1 Introduction
In the last years, the advances and the convergence of computing and commu-
nication has rapidly increased our capacities of generating and collecting data.
However, most of this data is in its raw form, and it is not useful until it is
discovered and articulated. Data Mining is the process of extracting the implicit
potentially useful information from the data. It is a multidisciplinary field, draw-
ing works from areas including statistics, machine learning, artificial intelligence,
data management and databases, pattern recognition, information retrieval, neu-
ral networks, data visualization, and others [1,2,3,4,5].
Community Detection is one of the data mining problems that arose with
the advances in computing and the increasingly interest in complex networks,
which studies large scale networks with non-trivial topological structures, such
as social networks, computer networks, telecommunication networks, transporta-
tion networks, and biological networks [6,7,8]. Many of these networks are found
to be divided naturally into communities or modules, therefore discovering of
these communities structure became an important topic of study [9,10,11,12,13].
Recently, a particle competition approach was successfully applied to detect
communities modeled in networks [14].
The notion of communities in networks is straightforward, they are defined
as a subgraph whose nodes are densely connected within itself but sparsely
connected with the rest of the network. However, in practice there are common
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 619–628, 2009.
c Springer-Verlag Berlin Heidelberg 2009
620 F. Breve, L. Zhao, and M. Quiles
cases where some nodes in a network can belong to more than one community.
For example: in a social network of friendship, individuals often belong to several
communities: their families, their colleagues, their classmates, etc. These nodes
are often called overlap nodes, and most known community detection algorithms
cannot detect them. Therefore, uncovering the overlapping community structure
of complex networks becomes an important topic in data mining [15,16,17].
In this paper we present a new clustering technique, based on particle walking
and competition. We have extended the model proposed in [14] to output not
only hard labels, but also a fuzzy output (soft labels) for each node in the
network. The continuous-valued output can be seen as the levels of membership
from each node to each community. Therefore, the new model is able to uncover
the overlap community structure in complex networks.
The rest of this paper is organized as follows: Section 2 describes the model in
details. Section 3 shows some experimental results from computer simulations,
and in Section 4 we draw some conclusions.
2 Model Description
and Wii = 0.
Then, we create a set of particles P = (ρ1 , ρ2 , . . . , ρc ), in which each particle
corresponds to a different community. Each particle ρj has a variable ρω j (t) ∈
[ωmin ωmax ] is the particle potential characterizing how much the particle can
affect a node at time t, in this paper we set the constants ωmin = 0 and ωmax = 1.
Each node vi has two variables viω (t), and viλ (t). The first variable is a vector
vi (t) = {viω1 (t), viω2 (t), . . . , viωc (t)} of the same size of P, where each element
ω
ω
vi j (t) ∈ [ωmin ωmax ] corresponds to the instantaneous level of ownership by
particle ρj over node vi . The sum of the levels of ownership of each node is
always a constant, because a particle increases its own ownership level and, at
Uncovering Overlap Community Structure in Complex Networks 621
the same time, decreases the other particles ownership levels. Thus, the following
equations always holds:
c
ω
vi j = ωmax + ωmin (c − 1). (3)
j=1
The second variable is also a vector viλ (t) = {viλ1 (t), viλ2 (t), . . . , viλc (t)} of the
same size of P and it also represents ownership levels, but unlike viω (t) which
λ
denotes the instantaneous ownership levels, vi j (t) ∈ [0 ∞] rather denotes long
term ownership levels, accumulated through the whole process. The particle with
higher ownership level in a given non-overlap node after the last iteration of the
algorithm is usually the particle which have visited that node more times, but
that does not always apply to overlap nodes, in which sometimes the dominant
particle could easily change in the last iterations, and thus it would not corre-
spond to the particle which have dominated that node for more instants of time.
Therefore, the new variable viλ (t) was introduced in order to define the owner-
ship of nodes considering the whole process. Using a simple analogy, we can say
that now the champion is not the one who have won the last games, but rather
the one who have won more games in the whole championship. Notice that the
long term ownership levels only increases and their sum is not constant, they
are normalized only at the end of the iterations.
We begin the algorithm by setting the initial level of instantaneous ownership
vector viω by each particle ρj as follows:
ω ωmax − ωmin
vi j (0) = ωmin + ( ), (4)
c
which means that all nodes starts with all particles instantaneous ownership
levels equally set. Meanwhile, the long term ownership levels viλ (t) are all set to
zero:
λ
vi j (0) = 0. (5)
The initial position of each particle ρvj (0) is set randomly, to any node in V, and
the initial potential of each particle is set to its minimum value, as follows:
ρω
j (0) = ωmin . (6)
where k is the index of the node node being visited by particle ρj , so Wki = 1
if there is an edge between the current node and vi , and Wki = 0 otherwise.
622 F. Breve, L. Zhao, and M. Quiles
The deterministic walk means that the particle will try to move to a neighbor
with probabilities according to the nodes instantaneous ownership levels, i.e., the
particle ρj will try to move to any neighbor vi chosen with probabilities defined
by:
ω
Wki vi j
p(vi |ρj ) = n ωj , (8)
q=1 Wqi vi
where fij represents the final membership level of the node vi to community j.
In summary, our algorithm works as follows:
1. Build the adjacency matrix W by using Eq. 1,
2. Set nodes ownership levels by using Eq. 4 and Eq. 5,
3. Set particles initial positions randomly and their potentials by using Eq. 6,
4. Repeat steps 5 to 8 until convergence or for a pre-defined number of steps,
5. Select the target node for each particle by using Eq. 8 or Eq. 7 for determin-
istic movement or random movement respectively,
6. Update nodes ownership levels by using Eq. 9,
7. If the random movement was chosen, update the long term ownership levels
by using Eq. 11,
8. Update particles potentials by using Eq. 10,
9. Calculate the membership levels (fuzzy classification) by using Eq. 12.
3 Computer Simulations
In order to test the overlap detection capabilities of our algorithm, we generate
a set of networks with community structure using the method proposed by [13].
Here, all the generated networks have n = 128 nodes, split into four communities
containing 32 nodes each. Pairs of nodes which belongs to the same community
are linked with probability pin , whereas pairs belonging to different communities
are joined with probability pout . The total average node degree k is constant
and set to 16. The value of pout is taken so the average number of links a node
has to nodes of any other community, zout , can be controlled. Meanwhile, the
value of pin is chosen to keep the average node degree k constant. Therefore,
zout /k defines the mixture of the communities, and as zout /k increases from zero,
624 F. Breve, L. Zhao, and M. Quiles
the communities become more diffuse and harder to identify. In each of these
generated networks we have added a 129th node and created 16 links between
the new node and nodes from the communities, so we could easily determine an
expected “fuzzy” classification for this new node based on the count of its links
with each community.
The networks were generated with zout /k = 0.125, 0.250, and 0.375 and the
results are shown in Tables 1, 2, and 3 respectively. The first column of these
tables shows the number of links the 129th node has to communities A, B, C, and
D, respectively. Notice that in each configuration the 129th node has different
overlap levels, varying from the case where it fully belongs to a single community
up to the case where it belongs to the four communities almost equally. From
2nd to 5th column we have the fuzzy degree of membership of the 129th node
relative to communities A, B, C, and D respectively, obtained by our algorithm.
The presented values are the average of 100 realizations with different networks.
For these simulations, the parameters were set as follows: pdet = 0.5, ∆v = 0.4
and ∆ρ = 0.9.
The results shown that the method was able to accurately identify the fuzzy
communities of the overlap nodes. The accuracy gets lower as zout /k increases,
this was expected, since a higher zout /k means that the communities are more
diffuse and the observed node can be connected to nodes that are overlap nodes
themselves.
Based on this data, we have created an overlap measure in order to easily
illustrate the application of the algorithm in more complex networks with lots of
overlap nodes. Therefore, the overlap index oi for a node vi is defined as follow:
fij∗∗
oi = (13)
fij∗
where j∗ = arg maxj fij and j ∗ ∗ = arg maxj,j=j∗ fij , and oi ∈ [0 1], where
oi = 0 means completely confidence that the node belongs to a single commu-
nity, while oi = 1 means the node is completely undefined among two or more
communities.
626 F. Breve, L. Zhao, and M. Quiles
0.9
4
0.8
0.7
2
0.6
0 0.5
0.4
−2
0.3
0.2
−4
0.1
−6 0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Fig. 1. Problem with 1000 elements split into four communities, colors represent the
overlap index from each node, detected by the proposed method
Fig. 2. The karate club network, colors represent the overlap index from each node,
detected by the proposed method
Then, we have applied the algorithm to a problem with 1000 elements, split
into four communities with 250 elements each. There are four gaussian kernels in
a two dimensional plane and the elements are distributed around them. To build
the network, each element is transformed into a network node. Two elements
i and j are connected if their Euclidean distance d(i, j) < 1. The algorithm
parameters were set as follows: pdet = 0.6, ∆v = 0.4 and ∆ρ = 0.9. In Figure 1
the overlap index of each node is indicated by their colors. It is easy to realize
that the closer to the communities frontier the nodes are, the higher are their
respective overlap indexes.
Finally, the algorithm was applied to the famous Zachary’s Karate Club Net-
work [18] and the results are shown in Figure 2. The algorithm parameters were
set as follows: pdet = 0.6, ∆v = 0.4 and ∆ρ = 0.9. Again, the overlap index of
each node is indicated by their colors. In Table 4 the fuzzy classification of all
the nodes on this network are shown.
Uncovering Overlap Community Structure in Complex Networks 627
Table 4. Fuzzy classification of the Zachary’s Karate Club Network achieved by the
proposed method
4 Conclusions
This paper presents a new clustering technique using combined random-
deterministic walking and competition among particles, where each particle cor-
responds to a class of the problem. The algorithm outputs not only hard labels,
but also soft labels (fuzzy values) for each node in the network, which corre-
sponds to the levels of membership from that node to each community. Com-
puter simulations were performed in both synthetic and real data, and the results
shows that our model is a promising mechanism to uncover overlap community
structure in complex networks.
Acknowledgements
This work is supported by the State of São Paulo Research Foundation (FAPESP)
and the Brazilian National Council of Technological and Scientific Development
(CNPq).
References
1. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan
Kaufmann, San Francisco (2006)
2. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Tech-
niques, 2nd edn. Morgan Kauffman, San Francisco (2005)
628 F. Breve, L. Zhao, and M. Quiles
3. Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cam-
bridge (2001)
4. Weiss, S.M., Indurkhya, N.: Predictive Data Mining: A Practical Guide. Morgan
Kaufmann, San Francisco (1998)
5. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pear-
son/Addison Wesley (2005)
6. Newman, M.E.J.: The structure and function of complex networks. SIAM Re-
view 45, 167–256 (2003)
7. Dorogovtsev, S., Mendes, F.: Evolution of Networks: From Biological Nets to the
Internet and WWW. Oxford University Press, Oxford (2003)
8. Bornholdt, S., Schuster, H.: Handbook of Graphs and Networks: From the Genome
to the Internet. Wiley-VCH (2006)
9. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in net-
works. Physical Review E 69, 026113 (1–15) (2004)
10. Newman, M.: Modularity and community structure in networks. Proceedings of
the National Academy of Science of the United States of America 103, 8577–8582
(2006)
11. Duch, J., Arenas, A.: Community detection in complex networks using extremal
optimization. Physical Review E 72, 027104 (1–4) (2006)
12. Reichardt, J., Bornholdt, S.: Detecting fuzzy community structures in complex
networks with a potts model. Physical Review Letters 93, 218701 (1–4) (2004)
13. Danon, L., Dı́az-Guilera, A., Duch, J., Arenas, A.: Comparing community structure
identification. Journal of Statistical Mechanics: Theory and Experiment 9, P09008
(1–10) (2005)
14. Quiles, M.G., Zhao, L., Alonso, R.L., Romero, R.A.F.: Particle competition for
complex network community detection. Chaos 18, 033107 (1–10) (2008)
15. Zhang, S., Wang, R.S., Zhang, X.S.: Identification of overlapping community struc-
ture in complex networks using fuzzy c-means clustering. Physica A Statistical
Mechanics and its Applications 374, 483–490 (2007)
16. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community
structure of complex networks in nature and society. Nature 435, 814–818 (2005)
17. Zhang, S., Wang, R.S., Zhang, X.S.: Uncovering fuzzy community structure in
complex networks. Physical Review E 76, 046103 (1–7) (2007)
18. Zachary, W.W.: An information flow model for conflict and fission in small groups.
Journal of Anthropological Research 33, 452–473 (1977)
Semi-supervised Classification Based on Clustering
Ensembles
Abstract. In many real-world applications, there only exist very few labeled
samples, while a large number of unlabeled samples are available. Therefore, it
is difficult for some traditional semi-supervised algorithms to generate the useful
classifiers to evaluate the labeling confidence of unlabeled samples. In this pa-
per, a new semi-supervised classification based on clustering ensembles named
SSCCE is proposed. It takes advantages of clustering ensembles to generate mul-
tiple partitions for a given dataset, and then uses the clustering consistency index
to determine the labeling confidence of unlabeled samples. The algorithm can
overcome some defects about the traditional semi-supervised classification algo-
rithms, and enhance the performance of the hypothesis trained on very few la-
beled samples by exploiting a large number of unlabeled samples. Experiments
carried out on ten public data sets from UCI machine learning repository show
that this method is effective and feasible.
1 Introduction
In traditional supervised learning, a large number of labeled samples are required to
learn a well-performed hypothesis, which is used to predict the class labels of unla-
beled samples. In many real-word applications such as toxicity prediction of chemical
products, computer-aided medical diagnosis, labeled samples are very difficult to
obtain because labeling samples needs to spend much time and effort [1]. However, a
large number of unlabeled samples are easily available. If only using very few labeled
samples, it is difficult to generate a good classification model, and a lot of available
unlabeled samples are wasted. Thus, when the labeled samples are very limited, how
to enhance the performance of the hypothesis trained on very few labeled samples, by
making use of a large number of unlabeled samples, has become a very hot topic in
machine learning and data mining [1].
Semi-supervised learning is one of the main approaches exploiting unlabeled sam-
ples [1]. This paper mainly focuses on semi-supervised classification algorithms
which aim to improve the performance of the classifier by making fully use of the
labeled and unlabeled data. A kind of semi-supervised classification learning ap-
proaches are called pre-labelling approaches [2]. The method firstly trains an initial
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 629–638, 2009.
© Springer-Verlag Berlin Heidelberg 2009
630 S. Chen, G. Guo, and L. Chen
classifier on a few labeled samples and then the unlabeled samples are utilized to
improve the performance of the classifier. For example, self-training [5] is a well-
known pre-labelling method. In this method, only a classifier is trained on the labeled
training set. In some pre-labelling methods such as co-training [7], tri-training [9], and
co-forest [6], more classifiers are used in their training processes. However, when
there only exist very few labeled samples, some pre-labelling approaches [5, 6, 7, 9]
are difficult to generate the initial useful classifier to estimate the labeling confidence
of unlabeled samples, so that a certain number of unlabeled samples may be misclas-
sified. Additionally, some algorithms place constraints to the base learner [6, 8], or
require the sufficient and redundant attribute sets [7].
Another kind of semi-supervised classification learning methods are called post-
labelling approaches [2, 17]. The method firstly generates a data model on all the
available data by applying a data density estimation method or a clustering algorithm,
and then labels the samples in every cluster or estimates the class conditional densi-
ties. However, using only a single clustering algorithm may generate the bad-
performed data model, and thus affect the performance of the final classifier.
In this paper, a new post-labelling algorithm named SSCCE, i.e., Semi-Supervised
Classification based on Clustering Ensembles, is proposed. Based on clustering en-
sembles, SSCCE firstly generates multiple partitions on all the samples, and matches
clusters in different partitions. Then by analyzing the multiple clustering results, the
unlabeled samples with high labeling confidence are selected to label and added into
the labeled training set. Finally, a learner is trained on the enlarged labeled training
set. Experiments carried out on ten UCI data sets show that this method is effective
and feasible. SSCCE is easier to understand, which uses the clustering consistency
index to estimate the labeling confidence of unlabeled samples, and requires neither
special classifiers nor the sufficient and redundant attribute sets.
The rest of the paper is organized as follows: Section 2 gives a brief review on
semi-supervised learning and clustering ensembles. Section 3 presents the SSCCE
algorithm. Section 4 reports the experimental results carried out on ten UCI data sets.
Section 5 concludes this paper and raises several issues for future work.
2 Related Work
2.1 Semi-supervised Learning
initial labeled samples are not large enough to train a high accuracy classifier, which
is likely to lead to misclassify a certain number of unlabeled samples. In [6], Li et al.
proposed the co-forest algorithm which incorporates Random Forest [18] algorithm to
extend the co-training paradigm. But this algorithm tends to under-estimate the error
rates of concomitant ensembles, and it places constraints to the base learner and en-
semble size.
The post-labelling approach [2, 17] firstly generates a data model on all the avail-
able data by applying a data density estimation method or a clustering algorithm, and
then labels the samples in every cluster or estimates the class conditional densities.
An unlabeled sample is labeled mainly according to its relative position with the la-
beled data. The kind of algorithm is formally described in [2]. In [17], Parzen win-
dows are used for estimating the class conditional distribution and a genetic algorithm
is applied to maximize the posteriori classification of the labeled samples.
tion, where C j is the j cluster in the partition π i , and n i is the number of the cluster
i
i th
3 SSCCE Algorithm
samples using the k-means algorithm [10], and then matches clusters in different par-
titions. Secondly, the samples with high labeling confidence are selected and the clus-
ter labels of these selected unlabeled samples are matched with the real class labels of
the labeled samples. Then the selected unlabeled samples are labeled and added into
the initial labeled training set. Finally, a hypothesis is trained on the enlarged training
set. The SSCCE algorithm is summarized as follows.
Algorithm: SSCCE
Input: labeled training set L={(x1, y1),(x2, y2),...,(x|L|, y|L| )};
unlabeled training set U = {x1′ , x2′ ,..., x|U |′} ;
X =L ∪ U ;
H: number of partitions (i.e., ensemble size);
π a : reference partition;
α : threshold value of clustering consistency index;
k: number of cluster centers of k-means algorithm;
Output: a hypothesis h.
Step1: for i=1 to H
1.1. Randomly select k initial cluster centers from the data set X ;
1.2. Use the k-means algorithm to generate partition π i on the data set X ;
end for
Step2: Match clusters in different partitions according to the matching clusters
method described in Section 3.2.
Step3: Compute clustering consistency index (CI) and clustering consistency label
(CL) of each sample in the data set X .
Step4: Select the samples with CI > α from the data set X , and these samples are
divided into clusters C1 , C2 ,..., Cm according to their corresponding cluster-
ing consistency labels CL1 , CL2 ,..., CLm (m is the number of labels).
Step5: Using the methods described in Section 3.4, match the corresponding clus-
tering consistency labels CL1 , CL2 ,..., CLm and the real class labels of the la-
beled training set L.
Step6: Assign the re-matched clustering consistency labels to all the selected unla-
beled samples in clusters C1 , C2 ,..., Cm , and add them into the labeled train-
ing set L.
Step7: A hypothesis h is trained on the enlarged labeled training set L.
According to the above outline of SSCCE, the following three main issues are re-
quired to address: 1) how to match the clusters in different partitions? , 2) how to
define the clustering consistency index (CI) and clustering consistency label (CL)? ,
and 3) how to match the cluster labels of unlabeled samples and the real class labels?
We will discuss these issues in the following sections.
In the multiple different partitions, the similar clusters in different partitions may
be assigned with different labels. For example, the cluster labels of five samples are
(1, 2, 2, 1, 2) and (2, 1, 1, 2, 1) in two different partitions, respectively. In fact, these
two partitions are exactly the same, and are only assigned with different cluster labels.
Semi-supervised Classification Based on Clustering Ensembles 633
How to match clusters in different partitions is one of the most important problems
needed to be resolved.
Based on the methods proposed in [12, 19], SSCCE uses Jaccard index to deter-
mine the best matching pair of clusters in different partitions after converting clusters
C ij (the j cluster in partition π i ) into a binary valued vector X j . The value of l
th i th
th
entry of X ij , say 0 or 1, denotes absence or presence of l sample in the cluster C ij .
Note that any partition can be arbitrarily selected as the reference partition from H
partitions, so that the clusters of the other partitions are matched with those of the
selected reference partition.
After matching the clusters in different partitions, it is easy to find that some samples
are steadily assigned to the same cluster, whereas the clusters are frequently changed
for some examples in H partitions. In this paper, the samples with the stable cluster
assignment are considered as the ones with high labeling confidence, and then these
unlabeled samples are selected to label. Therefore, SSCCE introduces clustering con-
sistency index (CI) to measure the labeling confidence. Clustering consistency index
(CI) [16] is defined as the radio of the maximal number of times which the sample is
assigned in a certain cluster to the total number of partitions H. The clustering consis-
tency index CI( x ) of a sample x is defined by Eq. (1).
{ } ,with δ ( a , b ) = ⎧
1, if a = b
H
1
CI ( x ) = max ∑ δ (π ( x ), C ) ⎨ . (1)
⎩ 0, otherwise
i
H i =1 C ∈ cluster labels
By analyzing these partitions π 1 , π 2 ,..., π H , the samples with the clustering consis-
tency index greater than the given threshold value α are considered to be steadily
assigned to a certain cluster. The label of this cluster is defined as clustering consis-
tency label (CL). For a sample x , if CI( x ) > α , then its corresponding clustering
consistency label CL( x ) is defined by Eq. (2).
{∑ }
H
The greater the clustering consistency index of a sample is, the higher the labeling
confidence of the sample is. SSCCE needs to label the samples with high clustering
consistency index. However, their corresponding clustering consistency labels do not
match with the real class labels of the labeled samples. Therefore, we should match
clustering consistency labels and real class labels.
Suppose that the clustering consistency labels of the selected samples with high clus-
tering consistency index are respectively CL1 , CL2 ,..., CLm (m is the total number of
CL). Then the clusters C 1 , C 2 , ..., C m only including the selected samples, correspond-
ing to the clustering consistency labels C L1 , C L 2 , ..., C L m , may be the following three
cases: (1) both labeled samples and unlabeled samples are contained; (2) only the
634 S. Chen, G. Guo, and L. Chen
unlabeled samples are contained; (3) only the labeled samples are contained. The third
case will not be taken into account because there are not any unlabeled samples in
clusters. The first two cases will be analyzed as follows.
In the first case, according to the principle of majority vote, if cluster Ci
( i ∈ {1,...., m} ) contains both labeled samples and unlabeled samples, then its corre-
sponding label CLi is re-matched with the majority label of the labeled samples in
cluster Ci . In the second case, according to k-nearest-neighbor [13], if cluster Ci
contains only the unlabeled samples, then its corresponding label CLi is re-matched
with the majority label of the labeled nearest neighbors of all the unlabeled samples in
cluster Ci . Finally, all the unlabeled samples in cluster Ci are labeled with the re-
matched clustering consistency labels, and then added into the labeled training set.
4 Experiments
In order to test the performance of SSCCE, we use ten public data sets from the UCI
machine learning repository [4], named solar-flare, breast-w, Australian, credit-a, tic-
tac-toe, Liver, heart-statlog, heart-c, colic, and auto-mpg, respectively. For each data
set, 10-fold cross validation is employed for evaluation. In each fold, training data are
randomly partitioned into labeled training set L and unlabeled training set U under the
label rate only 1%, i.e., just 1% training data (of the 90% data) are used as labeled
samples while the remaining 99% training data are used as unlabeled samples. The
entire 10-fold cross validation process is repeated ten times. In each time, randomize
the ordering of samples, and then run a tested algorithm to generate an experimental
result using 10-fold cross validation. Finally, the ten results of 10-fold cross valida-
tion are averaged for each test.
For SSCCE, SimpleKMeans algorithm from the Weka Software package [3] is
used to generate multiple partitions, and the number of clusters is set to the number of
classes of the labeled training data. When using the matching clusters algorithm, ref-
erence partition is set to partition π 1 . Use 1-nearest-neighbor to deal with the second
case that a cluster only contains unlabeled samples in Section 3.4.
For comparison, we evaluate the performances of SSCCE, the supervised learning
algorithm trained on only the labeled training set L, co-forest [6] and self-training [5].
For SSCCE, the ensemble size H of clustering ensembles is set to 100. In co-forest,
the number of classifiers is set to 10. For fair comparison, Random Forest [18] in
WEKA [3] is used as the classifier in all the above algorithms. In either co-forest or
self-training, the threshold value of labeling confidence is set to 0.75. For each algo-
rithm, the average classification accuracy, i.e., the ratio of the number of test samples
correctly classified to the total number of test samples, is obtained with 10 times of
10-fold cross validation on each data set. Table 1 illustrates the average classification
accuracies of the supervised learning algorithm using Random Forest (denoted by
RF), SSCCE using Random Forest denoted by SSCCE (RF), co-forest, and self-
training using Random Forest denoted by self-training (RF). The last row Average
shows the average results over all the experimental data sets using Random Forest.
The highest average accuracy is shown in bold style on each dataset.
Semi-supervised Classification Based on Clustering Ensembles 635
Table 1 shows that SSCCE benefits much from the unlabeled samples since the
performances of SSCCE (RF) are better on 7 data sets than RF, co-forest, and self-
training (RF). Furthermore, Table 1 also shows that SSCCE outperforms self-training
on all the data sets, and the average result of SSCCE (RF) over all the experimental
data sets is highest. Therefore, Table 1 supports that SSCCE is effective and feasible,
and can improve the performance of the classifier trained on very few labeled samples
by utilizing a large number of unlabeled data under the label rate 1%.
For further analysis on SSCCE, Table 2 tabulates the average threshold value α
of clustering consistency index. In the experiments, set the threshold value α to the
arithmetical mean of the clustering consistency indices of all the training data on each
data set. The column add_ratio shows the ratio of the number of unlabeled samples
added into the labeled training set L to the total number of unlabeled samples. In
addition, the column one-k-means shows the average classification accuracy of the
extreme case of SSCCE using Random Forest when H = 1, i.e., the k-means algorithm
is run only one time to generate a partition, and then all the unlabeled samples are
given the matched class labels and added into the training set L. In Table 2, SSCCE
(RF) is the same as those of Table 1. All the parameters of the algorithms still keep
unchanged.
From Table 2, it can be observed that the average 47.71% of unlabeled samples are
added into the labeled training set L. And when H=100, the average threshold value
α is about 0.46. It can be concluded that SSCCE can effectively select a certain
number of unlabeled samples with high labeling confidence. Moreover, the average
accuracies of SSCCE when H=100 are better than those of one-k-means on 9 data
sets. Therefore, it validates that clustering ensembles can outperform the individual
clustering, and thus enhance the performance of SSCCE.
In the experiments, we also use J48 [32] from the Weka Software package [3] to
observe the impact of different classifiers on the average classification accuracies of
these compared algorithms. The results show that the performances of SSCCE (J48)
are better than the supervised algorithm J48 and self-training using J48 on 6 data sets
among 10 data sets, and the average result of SSCCE using J48 over all the experi-
mental data sets is highest.
Therefore, SSCCE is effective when using either Random Forest or J48, and co-
forest and self-training fail to perform well on the most data sets. One possible expla-
nation is when the number of labeled samples is very few, i.e., the label ratio only 1%,
it is difficult to generate the initial useful classifier to estimate the labeling confidence
of unlabeled samples, and a certain number of noises can be added into the initial
labeled training set for co-forest and self-training. However, SSCCE can benefit much
from a large number of unlabeled samples based on clustering ensembles, which can
select a certain number of unlabeled samples with high labeling confidence to label
and enhance the performance of the initial classifier.
Table 3. Average classification accuracies of SSCCE (RF) over different ensemble sizes (%)
Note that in previous experiments, the ensemble size H of SSCCE is fixed to 100.
But the different ensemble size might affect the performance of SSCCE. Table 3
tabulates the performances of SSCCE using Random Forest on H=50, H=100, and
H=150. From Table 3, it can be seen that the last row average accuracy on H=50 is
better than that on H=100 and H=150. Ensemble sizes of the highest average accura-
cies of SSCCE (RF) are the same on most data sets except heart-statlog.
5 Conclusion
In this paper, a new semi-supervised classification based on clustering ensembles
named SSCCE is proposed. It firstly generates multiple different partitions on all the
samples using the k-means algorithm which uses the different cluster centers each
time, and then matches clusters in different partitions. Moreover, the samples with
high clustering consistency index are selected, and their corresponding clustering
consistency labels are matched with the real class labels. Then the selected unlabeled
samples are labeled with the re-matched clustering consistency labels. Finally, these
unlabeled samples are added into the initial labeled training set, and then a hypothesis
is trained on the enlarged labeled training set.
Using an easily understandable method of estimating the labeling confidence of
unlabeled samples, SSCCE can make full use of the labeled and unlabeled samples,
and overcome some defects in some traditional semi-supervised learning algorithms.
Experiments carried out on ten UCI data sets show that SSCCE outperforms the su-
pervised learning algorithm only trained on labeled training data. Compared with
some traditional semi-supervised learning algorithms such as co-forest and self-
training, SSCCE can also perform better on most data sets, when the number of the
labeled training data is very few.
The ensemble size H, i.e., the number of partitions, might impact the performance
of SSCCE. Therefore, how to overcome the effect of ensemble size and further im-
prove the performance of SSCCE will be studied in the further work.
References
1. Zhou, Z.H., Li, M.: Semi-Supervised Regression with Co-Training Style Algorithms. IEEE
Transactions on Knowledge and Data Engineering 19, 1479–1493 (2007)
2. Gabrys, B., Petrakieva, L.: Combining Labelled and Unlabelled Data in the Design of Pat-
tern Classification Systems. International Journal of Approximate Reasoning, 251–273
(2004)
3. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques,
2nd edn. Morgan Kaufmann, San Francisco (2005)
4. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. School of Information
and Computer Science. University of California, Irvine, CA,
http://www.ics.uci.edu/~mlearn/MLRepository.html
638 S. Chen, G. Guo, and L. Chen
5. Nigam, K., Ghani, R.: Analyzing the Effectiveness and Applicability of Co-Training. In:
Proceedings of the 9th International Conference on Information and Knowledge Manage-
ment, pp. 86–93 (2000)
6. Li, M., Zhou, Z.H.: Improve Computer-Aided Diagnosis with Machine Learning Tech-
niques Using Undiagnosed Samples. IEEE Transactions on Systems, Man and Cybernetics
– Part A: Systems and Humans 37, 1088–1098 (2007)
7. Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: Pro-
ceedings of the 11th Annual Conference on Computational Learning Theory (COLT
1998), MI, Wisconsin, pp. 92–100 (1998)
8. Goldman, S., Zhou, Y.: Enhancing Supervised Learning with Unlabeled Data. In: Proceed-
ings of the 17th International Conference on Machine Learning (ICML 2000), CA, San
Francisco, pp. 327–334 (2000)
9. Zhou, Z.H., Li, M.: Tri-training: Exploiting Unlabeled Data Using Three Classifiers. IEEE
Transactions on Knowledge and Data Engineering 17, 1529–1541 (2005)
10. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kauf-
mann, San Francisco (2006)
11. Topchy, A., Jain, A.K., Punch, W.: A Mixture Model for Clustering Ensembles. In: Pro-
ceeding of the 4th SIAM International Conference on Data Mining, pp. 379–390 (2004)
12. Fred, A.: Finding Consistent Clusters in Data Partitions. In: Kittler, J., Roli, F. (eds.) MCS
2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001)
13. Roussopoulos, N., Kelly, S., Vincent, F.: Nearest Neighbor Queries. In: Proceedings of the
1995 ACM SIGMOD International Conference on Management of Data, pp. 71–79 (1995)
14. Li, M., Zhou, Z.H.: SETRED: Self-Training with Editing. In: Ho, T.-B., Cheung, D., Liu,
H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 611–621. Springer, Heidelberg
(2005)
15. Dubes, R., Jain, A.K.: Clustering Techniques: the User’s Dilemma. Pattern Recogni-
tion 41, 578–588 (1998)
16. Topchy, A., Minaei-Bidgoli, B., Jain, A.K., Punch, W.F.: Adaptive Clustering Ensembles.
In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004),
vol. 1, pp. 272–275 (2004)
17. Kothari, R., Jain, V.: Learning from Labeled and Unlabeled Data. In: IEEE World Con-
gress on Computational Intelligence, IEEE International Joint Conference on Neural Net-
works, HI, Honolulu, USA, pp. 1468–1474 (2002)
18. Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)
19. Zhou, Z.H., Tang, W.: Clusterer Ensemble. Knowledge-Based Systems 9, 77–83 (2006)
Research on a Novel Data Mining Method Based on the
Rough Sets and Neural Network
Weijin Jiang1, Yusheng Xu2, Jing He1, Dejia Shi1, and Yuhui Xu1
1
School of computer, Hunan University of Commerce, Changsha 410205, China
jwjnudt@163.com
2
China north optical-electrical technology, Beijing 100000, China
yshxu520@163.com
Abstract. As both rough sets theory and neural network in data mining have
special advantages and exiting problems, this paper presented a combined algo-
rithm based rough sets theory and BP neural network. This algorithm deducts
data from data warehouse by using rough sets’ deduct function, and then moves
the deducted data to the BP neural network as training data. By data deduct, the
expression of training will become clearer, and the scale of neural network can
be simplified. At the same time, neural network can easy up rough set’s sensi-
tivity for noise data. This paper presents a cost function to express the relation-
ship between the amount of training data and the precision of neural network,
and to supply a standard for the change from rough set deduct to neural network
training.
1 Introduction
As a product of the combination of many kinds of knowledge and technology, Data
mining (acronym: DM) is a new technology rose in 90’s of twentieth century. On
August, 1989, Detroit of American, the concept of KDD (vivid form: Knowledge
Discovery in Database) is presented firstly on the eleventh international artificial
intelligence union meeting[1,2]. 1995, American computer meeting (ACM) proposed
Data Mining, and it vividly described large database as a valuable mineral resources,
from which useful information can be found by efficient knowledge-finding technol-
ogy[3,4]. Because Data Mining is the key step of KDD, people usually will not distin-
guish Knowledge Discovery in Database (KDD) from Data Mining (DM).
Data Mining is a technology which is used to analyze observed data set, usually
very large, the intention of DM is to find unknown relation and summarize data under
an efficient way, which can be understood by data owner. The common algorithm and
theory of Data Mining are Rough Sets[5], Artificial Neural Networks, Decision trees,
Genetic algorithm and etc. this paper mainly discussed Rough Sets theory and BP
Neural Networks in Artificial Neural Networks[6].
Both Rough sets and BP Neural Networks have classification function in Data
Mining. The advantage of Rough Sets is that it is good at parallel execution, descrip-
tion of uncertain information and the strategy dealing of redundant data, and the prob-
lem is it is sensitive with object noise[7,8]. BP Neural Networks is the most popular
Neural Networks, whose main merits is high precision and non-sensitive with noise,
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 639–648, 2009.
© Springer-Verlag Berlin Heidelberg 2009
640 W. Jiang et al.
but its problem is that redundant data can easily cause over-training of neural net-
works, besides, networks scale and the amount of training sample’s influence on the
speed of network training and training time are headache problem[9].
As to the merits and the demerits of Rough Set theory and BP Neural Networks,
this paper proposed a new Data Mining algorithm that combines Rough Sets theory
and BP Neural Network[10]. The algorithm overcame rough sets’ sensitive on noise
data; meanwhile, it reduced the training time of BP Neural Network, provided
constringency, which improved efficiency much.
card ( POS R ( B ))
yR ( B ) =
card ( U )
POS R ( B ) = UR X
X ∈U / IND ( B )
()
And card · denotes the base of function, POSR (B)is property set R’s positive
()
area in U/IND B .
Definition 6. The importance of property different property has different influence
on dependency relation between condition and decision property. The meaning of
Adding property a to R for classification U/IND B is: ()
SGF ( a,R,B)= y (B)-y R R-|a| (B)
The importance of property a is relative, and it is dependent on property set B and R.
so, the importance of property may different in different situation. If set D as decision
(,, )
property, then the meaning of SGF a R D is: After adding property a in prop-
erty set R, dependency degree between R and D change, and this reflects the impor-
tance of property a.
642 W. Jiang et al.
k < ∑ ( in )
n
1
(1)
i=0
k is the amount of training sample, n is the number of input neural, ni is the number of
hidden level neural, when i>ni, ( ) =0.
n1
i
n1 = n − m + a (2)
m is the number of output neural, n is the number of input neural, a’s value is between
1 and 10, n1 is the number of hidden level neural.
n1 = log 2 n (3)
neural are adopt to discriminate sex. As a result, the number of output must equal with
log2X, or bigger than log2X, in the formula, X represents the needed classification
model. In common situation, the number of output neural always equal to the standard
of classification model.
Concept property
Level valve
Fig. 1.
644 W. Jiang et al.
The difficulty of the algorithm is the end condition of rough set reduction, in an-
other words, it is hard to decide the selection condition of the amount of BP neural
network training data. According to the statement of third part, selecting the amount
of training data has big influence for the training time of BP neural network. At pre-
sent, there is no certain way to decide the amount of training data, just a cursory
evaluation method: the amount of training data is twice as many as connect power.
For example, if a BP neural network has n input knots, n1 hidden level knots and m
( )
output knots, then it need 2× n×n1+ n1×m training data[11].
The selection of the amount of training data is concern with the precision of neural
network. Usually, error is used to reflect the capability of study. The definition of
error is:
m n
∑∑ ( d
i =1 j =1
ij
− yij )
e
m⋅n
m denotes the sample number of training set, n is the amount of the output unit of
neural network.
When the amount is increased, the error will be smaller; as a result, adding more
training data will help to avoid error. But at the same time, when the training data is
increased, the training time also will be longer. Based on this, a cost function is pro-
posed.
Cost function is to describe the relation between the amount of training data and er-
ror, the error function can be modified as:
m n
∑∑ ( d
i =1 j =1
ij
− yij )
e
Xm ⋅ n
A variable X is added into this formula, when X=1, the function will be turned back.
The form of cost function is:
1
y = x /(1 − )
x
x is coefficient, the value area is >1, y is cost guideline. Table 1 illustrates the relation
between cost function, its differential coefficient and the value of x.
From the table, when x‘s value is 2.25, cost function’s one differential coefficient
is 0, which is the minimum value of cost function. When cost function’s differential
coefficient is smaller than -1, or bigger than 1, we can deem that the cost change too
fast, and the coefficient is wrong. As a result, the coefficient should be in the area
between 1.93 and 4, the optimal selection is 2.25.
Research on a Novel Data Mining Method Based on the Rough Sets 645
Table 1.
Cost function can be regarded as the selection guideline or the end rule of rough set
reduction. As for the sample with many properties, the optimal cost coefficient 2.25
will be chose to be the selection guideline, as for sample with few properties, cost
higher than 2.25 will be chose. Data mining mainly deal with tremendous data, cost
2.25 will be absolutely the best answer, in order to mind special situation, here, the
situation with few data has been taken in to consider. The algorithm is written as
follows:
Step 1: sampling data, pointing mining condition, and deciding the goal of mining.
Step 2: deleting redundant property by following rough set theory.
Step3: doing property reduction under rough set theory.
Fig. 2.
646 W. Jiang et al.
Step 4: if the minimum property set has been get, choosing training data set by cost
2.25.otherwise, using the highest cost 4 and reduction property to calculate training
sample data, if the results of calculating is smaller than the reduced amount, then turn
to step 3, otherwise, choosing training data by the definition and cost function.
Step 5: designing neural network by training data, and training these training data
sample.
Step 6: outputting the final results.
The flow chart is illustrated by fig.2[12].
5 Applications
Here, a car data table in reference [2] is used to illustrate the algorithm in table 2.
The decision property is Make-model, others are condition properties. Using rough
set to reduce the redundant propertied, table 3 is got.
After reducing the redundant properties, two properties are deleted. Doing data re-
duction for table 3, taking the user’s request that the property reduction set must con-
tain displace and weight, table 4 is obtained.
Then building the neural network, and selecting training sample. The neural net has
4 inputs neural, 3 outputs neural, the hidden level has 4 neural, the structure of the
network is illustrated by fig.3.
Following network structure and cost coefficient, the number of training sample is
( )
4×2× 4×4+4×3 =224. Then training these samples, the final results will be output
as table 5.
Table 2.
Disdlac
Trans
Make-
Weight
Mile-
Fig. 3.
Table 3.
Make-
Obj cy1 Door Compress Power Trans Weight Mileage
model
1 USA 6 2 Medium High Auto Auto Medium
2 Germany 4 2 Big Medium Manual Heavy High
3 Japan 4 2 Small Medium Low Light High
… … … … … … … … …
Table 4.
Table 5.
6 Conclusions
In the mining process of data house which has tremendous data and many properties,
this algorithm possesses the advantages of both rough set theory and BP neural network.
It can overcome the noise’s influence for data sensation, at the same time; it can delete
redundant data, provide clearer training data, lessen the scale of network, improves
efficiency of mining. The proposal of cost function not only resolved the relation
648 W. Jiang et al.
between training data and mining precision, but also provided guideline for the trans-
formation from rough set to neural network. Unfortunately, data mining is aiming to big
data warehouse, so the algorithm is not suitable for data mining with small scale.
Acknowledgement
This paper is supported by the Society Science Foundation of Hunan Province of
China No. 07YBB239.
References
1. Bazan, J., Son, N.H., Skowron, A., Szczuka, M.S.: A View on Rough Set Concept Ap-
proximations. In: RSFDGrC, Chongqing, China (2003)
2. Bazan, J.: Dynamic reducts and statistical inference. In: Sixth International Conference on
IPMU, pp. 1147–1152 (1996)
3. Kai, Z., Jue, W.: A Reduction Algorithm Meeting Users’ Requirements. Journal Computer
Science and Technology 17(15), 578P–P593 (2002)
4. Hu, K., Lu, Y., Shi, C.: Feature Ranking in Rough Sets (2002)
5. Nguyen, H.S., Ślęzak, D.: Approximate Reducts and Association Rules. In: Zhong, N.,
Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI), vol. 1711, pp. 137–145.
Springer, Heidelberg (1999)
6. Nguyen, T.T., Skowron, A.: Rough set Approach to Domain Knowledge Approxima-tion.
In: RSFDGrC 2003, Chongqing, China (2003)
7. Pawlak, Z.: Rough Sets. Int. J. Comput. In-form. Sci. 11(5), P341–P356 (1982)
8. Pawlak, Z.: Rough sets-Theoretical Aspects of reasoning about data. Kluwer Academic
Publishers, Dordrecht (1991)
9. Polkowski, L.: A Rough Set Para-digm for Unifying Rough Set Theory and Fuzzy Set
Theory. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.) RSFDGrC 2003. LNCS
(LNAI), vol. 2639, pp. 70–77. Springer, Heidelberg (2003)
10. Polkowski, L., Tsumoto, S., Lin, T.Y.: Rough Set Methods and Applications: New Devel-
opments. In: Knowledge Discovery in Information Systems. Springer, Heidelberg (2000)
11. Shaffer, C.A.: A Practical Introduc-tion to Data Structures and algorithm analysis. Prentice
Hall, Englewood Cliffs (1998)
12. Weijin, J.: Research on Extracting Medical Diagnosis Rules Based on Rough Sets Theory.
Computer Science 31(11), 97–101 (2004)
The Techno-Economic Analysis of Reducing NOx of
Industrial Boiler by Recycled Flue Gas
1 Introduction
Coal is the main fuel of industrial boilers in our country, and the primarily boilers are
the low-power grate furnace. It is estimated that the number of all kinds of boilers
% ~
which are being used is more than 50 ten thousand, the quantity is large, and mean-
while, more than 60 is chain boilers that is 0.5 75T/h. The consumption of coal is
350Mt/a [1.2], and 1000 kilogram coal could produce 7.4 kilogram NOx, thus it leads
to make 2600 thousand NOx annual. Although our country does not have the standard
about the emission of NOx that the industry boilers make, however, the quantity is
quite considerable, and its number is still increasing year by year, the capacity is also
expanding, and the emissions of nitrogen oxide compound (NOx)is increasing. How
to reduce NOx which the industrial boilers emit economically is extremely important
to reduce the air pollution.
Reducing the NOx that the industry boilers emit is a complex technology, lots of
methods according to different control mechanism. The technology of “selectivity
catalytic combustion SCR” is best to reduce NOx, but the investment and operating
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 649–657, 2009.
© Springer-Verlag Berlin Heidelberg 2009
650 C. Zhang et al.
Appraisal index
Degree
Craft plan The level Reduction
of Initial Operation
of craft Site area amount
technical investment cost
difficulty of NOx
mature
Grading
Fuel A little plant
Mature Complex ignored 30ˉ40
graduation large equipment
(big)
Inleting the Air pipe
secondary Mature Simple none expend ignored 20
wind (small)
Recycled
Recycled Relativel
Mature Simple pipe expend ignored 30ˉ40
flue gas y small
(small)
cost are extremely high, a lot of small and medium-sized enterprise can not afford it,
therefore it is not realistic. From economic and technological angel, the technology of
recycled flue gas was selected to reduce the NOx emissions as much as possible. It
will provide the reality basis for government to establish the policy formulation on the
NOx emission.
From table 1, which shows that the comparison of low N0x control technology:
The technology of recycled flue gas is mature, the craft is simple, at the premise of
the NOx which drops largely, and it only adds the recycled pipeline, the cost of the
invest at the beginning could be ignored, so it is a feasible method for reducing NOx.
The feasibility of the technology was analyzed that uses recycled flue gas to reduce
NOx, which is emitted by industrial boiler from the economical and technological
points.
The simulation object is a 35t/h chain boiler in this article. This is a boiler with a sin-
gle barrel and a natural circulation water-tube boiler, using the lamination coaling
installment, the burning way of scale type chain link fire grate level, with the suitable
coal of bituminous coal, the chamber size 7500mm×459mm×13500mm .The first
inlet sets six wind areas to enter the wind separately, the amount of wind is different,
and around the arches are two inlet areas. The functions of the secondary wind is
mainly to disturb the above the smoke current, supplying the oxygen promptly, burn-
ing the aerosol combustible substance, to enhance the combustion efficiency, thus the
design request of the position cannot cause the first wind and the wind pressure fluc-
tuate to influence the combustion. Part flue gas after dust removal is sent to the first
The Techno-Economic Analysis of Reducing NOx of Industrial Boiler 651
wind pipe through the recycling fan from the induced draft fan, then it is sent to the
furnace after mixed with the first wind, and the quality of the recycled wind can be
adjusted by the both sides of the door of the recycled flue gas.
Because the reaction that occurs in the entrance and combustion mainly concen-
trates above the chain, the grids of this region are relatively crowded to others in
order to increase computation precision; the grid division can not cause the false
proliferation in numerical simulation. In order to reduce the false proliferation, the
grids that are in the calculate area is supposed to adapt the development of mobile
as far as possible, causing the streamline enter the grid in vertical direction to con-
trol false proliferation occur. In this article, dividing the grids of the burning area
follows this thought. Liking figure 1, firstly, decomposing the entire chamber into
5 parts, because the reaction in above individual is very few, the computation re-
quest is not so accurate and the size of the grids is 0.3 meter. The downward has
two gusty areas, therefore the request is high, the size of grids is 0.1 meter, the
coal entrance, the first wind area as well as the chain are main reaction regions,
which are needed to concentrate the encryption, the dense of the grids is
dense is 0.05, the diameter of the second wind areas itself only has 0.05 meter,
because of its particularity, the grids are 0.02 meter to guarantee continuously and
uniformity.
The simulation of the combustion process mainly has the mutual couplings of the
gas and solid, the combustion reaction also with the turbulence, momentum, energy
and mass transfer computation to draw the distribution of the temperature of boiler,
652 C. Zhang et al.
the flowing and the component. The solid phase pellet uses the stochastic model,
adds to the computation as the separation, the pellet has the independence equations
to carry on the computation, considering the mutual functions between the two
phase; the coal pellet in the combustion process is thought that the particle size is
invariable, along with burning the quality of them reduces. The turbulence model
uses the K-ε model, the burning computation uses the simply PDF to compute com-
ponent score that is in the reaction process, the fuel and the oxidant are thought to
react in a rapid rate in this computation, the percent of the fuel and the oxidant is
mixed score. The double mixed score is used in this paper, and the experiential
form is used in PDF calculation, thinking that the fuel and the oxidant fully contact
and react. 90% of thermal transmission comes from radiation; the radiation uses the
separate coordinate mode, considering the radiation exchanging. The temperature of
surface of the wall is 600K, according to the temperature of saturated water add to
revise temperature that takes entire, the revise temperature is considered the general
burning situation that the slagging causes heating. According to the function of
standard wall, we process the boundary condition of the surface. The entrance of air
and the coal pellet entrance are established separately, the inlet area is thought to be
the speed entrance. The outlet is established as the free, and meanwhile we suppose
that the quality that enters and flows out the chamber is constant. The parameter of
the entrance is illustrated in Table 2, the kind of the coal: bituminous coal. The pa-
rameters are listed in table 3.
The
combustion
of the
0.731 1.219 1.920 2.194 1.950 0.731 0.455
normal
operation
of boiler
The
recycled
0.731 1.700 2.720 3.094 2.650 0.731 0.455
flue gas
rate is 30%
The Fluent software is applied in this computation. Firstly solve the flow field of
the uniform temperature when starting calculating, after the momentum equation
restraining, then couple pellet field, burning and the radiation heat transfer, sec-
ondly carry on the iteration; When the residual error of the equation of continuity
and the energy all no longer reduces, each parameter is constant along with the it-
erative, the computation namely is restraining. At last the numerical simulation of
NOx emission is the post-processing of the combustion, after restraining calculate
the amount of the NOx, at that time the flow model, the rapids model, the energy
equation, the radiation model and the PDF mode of the component do not partici-
pate in the computation, the calculate region, the parameter value of other each field
are frozen except the component of NOx , that is the NOx reaction and the process
of the combustion are not coupled.
2.3.1 The Comparison of the Distribution of the Temperature Field with and
without Recycled Flue Gas
Figure 2 (a) is the distribution of the temperature field without recycled flue gas,
Figure 2 (b) is map that is with recycled flue gas. By contrast we could observe: the
average temperature of the boiler with recycled flue gas decreases to some extent, it
~ ℃ ℃
drops 30 80 , the central region decreases 90 .This is because the quantity of
the flue which enters to the chamber increases, with it the flue that needs to be
heated also increased with recycled flue gas. Moreover, the concentration of the O2
that is in the recycled flue gas drops obviously, but the concentration of CO2 rises,
this has some influence to burning. Therefore the average temperature of the boiler
decreases on the whole, it is an advantage to reducing of NOx, and the temperature
that reduces does not affect the normal combustion as well as the movement.
654 C. Zhang et al.
(a) The distribution of the temperature field without recycled flue gas
(b) The distribution of the temperature field with recycled flue gas
2.3.2 The Comparison of the Concentration of NOx with and without Recycled
Flue Gas
The change of NOx is illustrated in chart 2-3, the total concentration of NOx is about
394mg/m3 (192ppm) without recycled flue gas while the concentration reduces to
The Techno-Economic Analysis of Reducing NOx of Industrial Boiler 655
281mg/m3 (137ppm) with recycled flue gas, in certain region, the emissions of NOx
drops from 610mg/m3 to 450mg/m3. Speaking of the total quantity of NOx, the emis-
sion of NOx improves greatly.
(c) The quality score of NOx at the outlet along the height of the boiler without recycled flue gas
(d) The quality score of NOx at the outlet along the height of the boiler with recycled flue gas
Fig. 3. Distribution of the quality score of NOx at the outlet along the height of the boiler
656 C. Zhang et al.
L ω2
hL = λ • (1)
D 2g
From the Table 4 conclude that: the dropped pressure which the recycled pipe causes
is probably ranging 0.71 mm water column to 8.09 mm water column, its respective
dropped pressure does not affect the use of air blower comparing with the selected
blower, therefore the investment of this technical t only increases the expense of the
pipeline and labor, the cost of the operation can be ignored.
In order to study the optimizing control countermeasure, the cost of information on
the control technology related nitrogen oxide compound is essential. Under ordinary
circumstances, the total expense includes the expense of investment and annual operat-
ing, but the cost still does not have stipulation. Normally, the expense of the investment
contains react equipment, pipeline and so on; but the annual cost includes depreciation,
service, and electric power and so on. table 3 is illustrated Several kinds of low NOx
control technology of industrial chain boiler are illustrated in table 3, the beginning of
investment, cost of movement, reducing of NOx and so on are compared in it.
The pressure
The hL(mm of the air
quantity ¬ Q(m3/s) V(m/s) D(m) L(m) water blower˄mm
˄T˅ column ) water
column˅
10 0.018 7.29 12-15 0.49-0.61 8-16 3.25-8.09 107
35 0.018 19.7 12-15 1.3-1.45 10-20 1.10-3.81 170
60 0.018 33.33 12-15 2.22-2.78 15-25 0.71-2.32 214
4 Conclusions
1. Comparing with the boiler without recycled flue gas, the boiler with recycled flue
gas, the average temperature in internal boiler drops to some extent, the emission
concentration of NOx decreases about 30% 40%. -
2. Regarding the middle and small scale industrial boilers, the use of recycled flue gas
does not affect the use of the original air blower.
3. The technology of recycled flue gas is mature, the craft is simple, the investment is
small and the cost of operation is little.
References
1. Commandr, J.-M., Stanmore, B.R., Salvadore, S.: The high temperature reaction of carbon
with nitric oxide. Combust Flame, 128 (2005)
2. Glarborg, P., Jensen, A.D., Johnsson, J.E.: Fuel nitrogen conversion in solid fuel fired sys-
tems. Progress in Energy and Combustion Science, 29 (2003)
The Techno-Economic Analysis of Reducing NOx of Industrial Boiler 657
3. Jones, J.M., Patterson, P.M., Pourkashanian, M., Williams, A., Arenillas, F.R., Pis, J.J.:
Modeling NOxformation in coal particle combustion at high temperature an investigation of
the devolatilisation kinetic factors. Fuel, 88 (1999)
4. Xu, X.C., Chen, C.H., Qi, H.Y., et al.: Development of coal combustion pollution control
for SO2 and NOx in China. Fuel Processing Technology 62, 153–160 (2000)
5. Yin, C., Cailat, S., Harion, J.-l.: Bernard Baudo Everest Perez Investigation of the flow util-
ity tangentialy fired pulverized-coal boiler. Fuel, 81 (2002)
Urban Traffic Flow Forecasting Based on Adaptive
Hinging Hyperplanes
Abstract. In this paper, after a review of traffic forecasting methods and the
development of piecewise linear functions, a new traffic flow forecasting model
based on adaptive hinging hyperplanes was proposed. Adaptive hinging hyper-
planes (AHH) is a kind of piecewise linear models which can decide its division
of the domain and the parameters adaptively. Acceptable results (forecasting er-
ror is smaller than 15%) were obtained in the test of the real traffic data in Bei-
jing. After comparison with the results of prediction model base on MARS, the
following conclusions can be drawn. First, the two methods have almost the
same performance in prediction precision. Second, AHH will be a little more
stable and cost less computing time. Thus, AHH model may be more applicable
in practical engineering.
1 Introduction
Traffic flow forecasting is a very important aspect in Advanced Traffic Information
System (ATIS). The quality of the system’s service depends on the accuracy of the
traffic prediction to a great extent. Over the past 30 years, traffic flow forecasting has
been a very active issue but without perfect solutions. There are a variety of forecast-
ing methods[1,2,3,4,5], such as: historical average, time series models (ARIMA),
neural network and nonparametric models (k-NN method), which all have their own
shortage. Historical average method cannot respond to dynamic changes, ARIMA
model is very sensitive to disturbance which cannot be avoided in traffic environment,
neural network is possible to overtrain the network and k-NN model has complicated
way in looking for neighbors. As we know, the main application of the above meth-
ods is in highway traffic flow prediction, but the urban transportation system is more
complex. To overcome the drawbacks above, Ye applied MARS in urban traffic flow
forecasting[6], and obtained good results.
In another aspect, during the past 30 years, the research about piecewise linear
function has been developing gradually and there have been some kinds of represent-
ing models, such as canonical piecewise linear representation [7,8,9,10], lattice PWL
function model[11] and hinging hyperplanes model[12]. Among them, the HH model
is well-known and comparatively practical model which was proposed by Breiman in
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 658–667, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Urban Traffic Flow Forecasting Based on Adaptive Hinging Hyperplanes 659
1993 with a good capacity in nonlinear function approximation. Due to its special
structure, the least square method can be used to make identification. Therefore, the
HH model is a very useful tool in black-box modeling. However, in fact, HH model is
equivalent with the canonical PWL model. To improve the model’s representation
ability, Wang introduced generalized hinging hyperplanes (GHH) in 2005, which is
actually derived from the lattice PWL function, so the model’s general representation
ability is proved directly[13].
Adaptive Hinging Hyperplanes model[14], which was proposed by Xu in 2008,
based on MARS and GHH and then proved to be a special case of GHH, shares the
advantages of the two approaches. The model is adaptive, flexible and its basic func-
tion is linear in each subregion, so the acceptable results in traffic flow prediction
using AHH can be expected because the model based on MARS has already obtained
good results. In this paper, we proposed an urban short-term traffic flow forecasting
model based on adaptive hinging hyperplanes, and the real traffic data in Beijing was
used to test its performance. Furthermore, a comparison between AHH and MARS is
drawn to demonstrate AHH’s effectiveness in forecasting.
The paper is organized as follows. Section 2 introduced AHH model and its algo-
rithm after a review of MARS and GHH. How to use AHH model to carry out the
traffic flow prediction was described in detail in Section 3. We then compared the
results of AHH model and MARS model in Section 4 and made a brief conclusion in
Section 5.
2.1.1 MARS
MARS (Multivariate Adaptive Regression Splines) was first introduced by Friedman
in 1991[15]. It is developed from recursive partitioning regression as generally con-
sidered. The basic idea of the latter method is to find the most appropriate division of
the domain through successive partition. In the beginning of the procedure, there is
only one father node and it can be divided into two child nodes, each of them denotes
one subregion. One subregion does not intersect with each other and the union of all
subregions constructs the whole domain. The above step is repeated until getting the
terminal node which cannot be split anymore, while the region it denotes corresponds
with one basic function. The partitioning is based on a greedy algorithm where in
each step it selects the split which can give the best fit to the data.
Recursive partitioning regression can be written as the following forms of a set of
basic functions:
M
fˆ ( x) = ∑ am Bm ( x) (1)
m =1
in which I is an indicator function having the value one when x ∈ Rm and zero
otherwise.
From the splitting procedure, the disadvantage of recursive partitioning regression
is obvious to be known. The subregion is discontinuous and the method cannot repre-
sent the functions which have no interactions between their variables. To overcome
the problems above, firstly, MARS uses truncated power function ⎡⎣ ± ( x − t ) ⎤⎦ + to
q
replace the step function, where q = 1 and the form [ ]q+ denotes the positive part of
the expression. The basic function is constructed in the terms of tensor product of the
truncated power function. Second, MARS allows the parent basic function which has
already been split to be involved in the further splitting, so it has the ability to ap-
proximate the additive functions whose variables do not interact with each other.
MARS model can be written as follows:
M Km
f ˆ( x) = a1 + ∑ am ∏ ⎡⎣ skm ⋅ ( xv (k ,m) − tkm )⎤⎦ + (3)
m= 2 k =1
2.1.2 GHH
Generalized hinging hyperplanes was first introduced by Wang in 2005 as an extension
of the hinging hyperplanes. Although HH model can approximate many nonlinear func-
:
tions well, it has been proved that the model is lack of the ability to represent CPWL
functions in high dimensions. HH model can be written as follows
Wang made some modification to the above form and rewrote it as follows :
∑σ i { }
max l ( x, θ1 (i )), l ( x, θ 2 (i )),..., l ( x, θ ki +1 (i )) (5)
i
The truncated power splines ⎡⎣ skm ⋅ ( xv ( k ,m ) − tkm ) ⎤⎦ in MARS basic function can be
+
written as max{0, skm ⋅ ( xv ( k , m) − tkm )} , and can be seen as a special HH model. To
improve the stability when the dimension of the predictive variable is high, after
Urban Traffic Flow Forecasting Based on Adaptive Hinging Hyperplanes 661
making a modification to the MARS basic function, replacing the operator “∏” by
“min”, the AHH model is obtained[14]. AHH model can be written as follows:
M
f ˆ( x) = a1 + min {max{0, skm ⋅ ( xv ( k , m) − tkm )}}
∑ am k∈{1,2..., (6)
Km}
m= 2
2
1 N
∑ [ yi − fˆM ( xi ) ] [1−C% ( M ) N]
2 (7)
LOF ( fˆ ) = GCV ( M ) =
M N i =1
Forward Procedure
M −1
The AHH algorithm also needs a backward step just as MARS algorithm. In the
backward procedure, the basic functions which do not make contributions to the fit-
ting precision will be removed.
662 Y. Lu et al.
In this paper, the real traffic flow and occupancy data collected from 50 detectors are
provided by Beijing Traffic Management Bureau, which last for 16 days from Oct.22,
2006 to Nov.6, 2006. The sampling interval is 10mins, which means 144 sample
points per day for each detector.
A small part of the network, which is shown in Figure 1, will be used to test the
prediction effect of the model. The flow and occupancy data in link1 and its
neighbors, such as link2, 3, 4, 5 and 6, will be used to forecast the traffic flow in
link1.
where flowit , flowit −1 ,..., flowit − N denotes the link’s own historic flow data before the
prediction interval, in which t , t − 1,..., t − N denote different time intervals;
flowit+1, flowit+ 2, ..., flowit+ L denote the neighbors’ flow data in t interval, in which L de-
cides how many neighbors should be considered; and occit σ i is the link’s own historic
occupancy data in t interval, σ i can be chosen as 1 or 0.
Some explanation is made here to illustrate why the formula above should contain
these factors as predictive variable. First, because the traffic flow can be recognized
as a time series which changed gradually and not very fast, the data which is even N
Urban Traffic Flow Forecasting Based on Adaptive Hinging Hyperplanes 663
time intervals before the prediction time can be considered to be related, not only one
interval before. Second, it is obvious that the link to be predicted has correlation with
its neighbors, so the neighbor’s data should also be taken into account. Third, the
situation in the network obtained through prediction with both the flow and occu-
pancy data may be more accurate than with only one of them. However, what value N
and L should be is an important issue worth discussing later. If the value is small, the
trained network may be incomplete while too large value will cause the overfitting
problem in the process of training.
1 N
predict (i ) − obversed (i )
MAPE (%) =
N
∑ obversed (i )
× 100% (12)
i =1
Another is called PT criterion which denotes the Percentage of the sample points
whose predictive error is smaller than the given Threshold. The criterion can be writ-
ten as following formula:
N
predict (i ) − obversed (i )
∑ predict (i) |
1
PT (%) = < α × 100% (13)
N i =1 obversed (i )
N 0 1 2 3 4 5 6 7 8 9
MAPE(%) 16.5 14.4 13.9 13.9 14.1 13.3 12.9 13.4 13.5 13.6
PT(%) 69.9 78.9 79.4 80.0 77.7 76.0 77.4 77.9 77.8 77.6
From Table 1, a tendency of MAPE that first becomes smaller and then larger can
be observed with the increasing of N. When N is small, the historic information which
is needed to make prediction is deficient. But when N is too large, the prediction ef-
fect is also worse than the results when N=6. For example, if N=9, the traffic informa-
tion which is nearly 2 hours before the prediction time will be taken into account.
664 Y. Lu et al.
Actually, the data 2 hours ago does not have much relationship with current situation.
However, if these data is seen as predictive variable, the network trained will get
accustomed to the situations with no sense and reduce the prediction accuracy.
N 0 1 2 3 4 5 6 7 8 9
MAPE(%) 15.6 14.3 13.8 13.9 13.7 13.3 12.9 13.4 13.5 13.6
PT(%) 73.4 78.8 78.7 80.0 78.4 76.1 77.4 77.9 77.8 77.6
Table 3. Mean absolute percentage error results using own flow, occupancy and neighbors’
flow information in AHH
Link N 0 1 2 3 4 5 6 7 8 9
2 16.0 14.3 13.7 13.9 13.0 12.6 13.0 13.4 13.5 13.6
2,3 14.8 13.9 13.5 13.2 13.3 12.8 13.0 13.4 13.8 13.9
2,3,4 13.8 13.4 13.0 13.2 12.7 12.5 13.5 13.5 13.2 13.2
2,3,4,5 14.3 14.2 13.5 13.2 12.7 12.5 13.5 13.5 13.2 13.2
2,3,4,5,6 14.2 14.1 13.8 13.5 13.5 12.8 13.4 13.5 13.2 13.2
800
flow
600
400
200
0
0 20 40 60 80 100 120 140
Samples
As seen in Figure 3, in general, the predictive performance of AHH and MARS model
using their own flow information are almost the same. In specific terms, when N is
small, the precision of AHH model is better with smaller error. But when N becomes
large, the MARS model will have some advantages gradually.
Error using AHH and MARS Time cost between AHH and MARS
0.17 200
0.15
error
100
0.145
80
0.14
60
0.135
40
0.13 20
0.125 0
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
N N
Fig. 3. General Comparison between AHH and Fig. 4. Time Cost Comparison between AHH
MARS and MARS
666 Y. Lu et al.
From Figure 3, we can also observe that there exists more fluctuation in the per-
formance of MARS. For instance, when N=7, 8 and 9, a slightly increase in error is
intended to be seen because of the overfitting problem, however, a sharp increase
followed with dramatic decrease in error have been found instead.
5 Conclusions
There have been a variety of traffic forecasting methods during these years, but every
approach has its own limitation, so the research about this issue is still very active.
We combine the MARS algorithm and the thinking of piecewise linear functions,
introducing a new traffic prediction method.
In this paper, after a review of MARS algorithm and the development of piecewise
linear functions, a traffic forecasting model based on adaptive hinging hyperplanes
was proposed. Acceptable results (error is smaller than 15%) were obtained in the test
of the real traffic data in Beijing. After comparison with the results of MARS model,
the following conclusions can be drawn. First, the two methods have almost the same
performance in precision. Second, AHH will be more stable and cost less computing
time so it is more applicable in practice.
However, there are still some problems unsolved. The prediction results based on
AHH which is nonparametric model cannot always follow the real data when the
traffic flow is becoming very fast. In further research, some other methods may be
adopted together with AHH as a combination model.
Acknowledgement. This work described in the paper is partially supported by Na-
tional Natural Science Foundation of China (NSFC) 50708054, 60674025 and
60534060, Hi-Tech Research and Development Program of China (863Project)
2007AA11Z222 and 2007AA04Z193, National Basic Research Program of China
(973Project) 2006CB705506, National Key Technology Research and Development
Program 2006BAJ18B02 and The Research Fund for the Doctoral Program of Higher
Education 200800030029.
References
1. Smith, B.L., Williams, B.M., Oswald, R.K.: Comparison of parametric and nonparametric
models for traffic flow forecasting. Transportation Research Part C 10, 303–321 (2002)
2. Okutani, I., Stephanedes, Y.J.: Dynamic prediction of traffic volume through Kalman fil-
tering theory. Transportation Research Part B 18B, 1–11 (1984)
Urban Traffic Flow Forecasting Based on Adaptive Hinging Hyperplanes 667
3. Williams, B.M., Hoel, L.A.: Modeling and Forecasting Vehicular Traffic Flow as a Sea-
sonal ARIMA Process: Theoretical Basis and Empirical Results. ASCE Journal of Trans-
portation Engineering 129(6), 664–672 (2003)
4. Davis, G.A., Nihan, N.L.: Nonparametric Regression and Short-term Freeway Traffic
Forecasting. ASCE Journal of Transportation Engineering 117(2), 178–188 (1991)
5. Smith, B.L., Demetsky, M.J.: Traffic Flow Forecasting: Comparison of Modeling Ap-
proaches. ASCE Journal of Transportation Engineering 123(4), 261–266 (1997)
6. Ye, S.Q., et al.: Short-Term Traffic Flow Forecasting Based on MARS. In: The 4th Inter-
national Conference on Natural Computation, pp. 669–675. IEEE Press, Jinan (2008)
7. Chua, L.O., Kang, S.M.: Section-wise piecewise-linear functions: canonical representation,
properties, and applications. IEEE Transactions on Circuits and Systems 30(3), 125–140
(1977)
8. Kang, S.M., Chua, L.O.: Global representation of multidimensional piecewise-linear func-
tions with linear partitions. IEEE Transactions on Circuits and Systems 25(11), 938–940
(1978)
9. Kahlert, C., Chua, L.O.: A generalized canonical piecewise linear representation. IEEE
Transactions on Circuits and Systems 37(3), 373–382 (1990)
10. Lin, J.N., Unbehauen, R.: Canonical piecewise-linear networks. IEEE Transactions on
Neural Networks 6(1), 43–50 (1995)
11. Tarela, J.M., Alonso, E., Martinez, M.V.: A representation method for PWL functions ori-
ented to parallel processing. Mathematical & Computer Modelling 13(10), 75–83 (1990)
12. Breiman, L.: Hinging hyperplanes for regression, classification and function approxima-
tion. IEEE Transactions on Information Theory 39(3), 999–1013 (1993)
13. Wang, S.N., Sun, X.S.: Generalization of Hinging Hyperplanes. IEEE Transactions on In-
formation Theory 12(51), 4425–4431 (2005)
14. Xu, J., Huang, X.L., Wang, S.N.: Adaptive Hinging Hyperplanes. In: The 17th World
Congress of International Federation of Automatic Control, World Congress, Seoul, Korea,
pp. 4036–4041 (2008)
15. Friedman, J.H.: Multivariate adptive regression splines. The Annuals of Statistics 19(1), 1–
61 (1991)
16. Friedman, J.H., Silverman, B.W.: Flexible parsimonious smoothing and additive modeling.
Technometrics 31, 3–39 (1989)
Turbine Fault Diagnosis Based on
Fuzzy Theory and SVM
Fei Xia1, Hao Zhang1, Daogang Peng1, Hui Li1, and Yikang Su2
1
College of Electric Power and Automation Engineering,
Shanghai University of Electric Power,
200090 Shanghai, China
2
Nanchang Power Supply Corporation,
330006 Nanchang, JiangXi, China
{Fei.Xia,Hao.Zhang,Daogang.Peng,Hui.Li,Yikang.Su,
xiafei}@shiep.edu.cn
Abstract. A method based on fuzzy and support vector machine (SVM) is pro-
posed to focus on the lack of samples in fault diagnosis of turbine. Typical fault
symptoms firstly are normalized by the membership functions perceptively.
Then some samples are used to train SVM of fault diagnosis. With the trained
SVM, the correct fault type can be recognized. In the application of condenser
fault diagnosis, the approach enhances successfully the accuracy of fault diag-
nosis with small samples. Compared with the general method of BP neural net-
work, the method combining advantages of fuzzy theory and SVM makes the
diagnosis results have higher credibility.
1 Introduction
The turbine on electric power plant is the important equipment [1]. Its structure is
complex and the operation environment is particular. It works under the particular
operation environment which is under high temperature, high-pressured, and high
speed revolving. Obliviously, the steam turbine has high fault rate and hazardous
breakdown. These breakdowns will make the heavy economic loss and the social
consequence [2]. Therefore, it needs to use the advanced intelligent technology to
monitor and analysis device status parameters in order to judge the corresponding
equipment whether it is in a health state.
Condenser is the primarily auxiliary equipment of steam turbine whose quality of
work situation affects the safety and economic operation of generator. Therefore, the
monitoring and diagnosis of condenser running state are generally concerned by the
operation department of the power plant. The research of condenser system and its
fault diagnosis has great significance to reduce the downtime of units and improve
availability of units.
In the application of modern fault diagnosis, the method of neural network has be-
come an important tool with the characteristics of self-learning capability, fault-tolerant
capability and parallel calculation capability. Various methods of neural network have
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 668–676, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Turbine Fault Diagnosis Based on Fuzzy Theory and SVM 669
been applied to fault diagnosis of the condenser [3-6]. As the result of nonlinear rela-
tionship between the failures and the symptoms, complexity, ambiguity and random-
ness, it is hard to construct the precise mathematical model for the condenser system.
Therefore, the values of symptoms can be transformed by the corresponding member-
ship function with fuzzy theory. After that, more methods of fuzzy neural network are
used in the fault diagnosis of condenser [7-8]. Although these methods take into account
the merits of the two principles, the neural network is still vulnerable to fall into
local minimum point. Even though, the network structure should be determined by the
experience.
For solving these problems, the approach combined with fuzzy theory and Support
Vector Machine (SVM) is proposed to achieve the fault diagnosis results in the con-
denser system. SVM is a new machine learning method developed on basis of the
statistical learning theory [9]. Based on the principle of structural risk minimization, it
can effectively address the learning problem and have better classification accuracy
[10]. SVM has been used for face detection, speech recognition and medical diagnosis
etc. Through the study of this paper, it can be proved successfully that SVM does very
well in the tasks of fault diagnosis of condenser.
Y1:serious fault of cycle pump; Y2:full water of condenser; Y3:not working of con-
densate pump. Eight symptoms relating with condenser fault are extracted as follows:
X1: vacuum declined significantly and sharply; X2: vacuum declined slowly and
slightly; X3: output pressure of condensate pump increased; X4: output pressure of
condensate pump decreased; X5: temperature rise of cycle water decreased; X6: termi-
nal difference of condenser increased; X7: coldness of condensation water increased;
X8: pressure difference between pumping port and input of extractor decreased.
The In the daily life, many concepts are vague. It is far insufficient to describe with
belonged absolutely or not belong absolutely. Thus, it has the necessity to break the
relations of absolutely subordinates. University of California Professor Zadeh has
introduced the fuzzy set theory in 1965. The fuzzy set theory develops the ordinary
set theory concept, value scope of characteristic function is expanded from gathers
{0,1} to [0,1]. Some object of the universe is no longer regarded as belongs to this set
or not belong to this set, but said that the degree of subordinates in this set is how
much [4].
Assuming that a mapping is assigned on universe U:
A : U → [0,1] (1)
u → A(u )
(2)
So, A is fuzzy set on the U, all fuzzy set in the U are recorded as (U), that is,
(U ) = { A | A : U → [0,1]}
(3)
A fuzzy set definite in the universe, its membership function have many different
form. Determines the membership function correctly is the foundation for using fuzzy
set to describe fuzzy concept appropriately. One basic step of the application of fuzzy
mathematics to solve the practical problem is to find one or a several membership
function. If this question has been solved, other questions are easily solved.
In order to disperse data from sensors, proper membership function should be con-
structed. Membership function is the foundation of fuzzy sets applied to practical
problems, correctly constructing membership function is the key of properly using
fuzzy sets. While there is no mature and effective method in the existing membership
functions, they are all determined by experience, and then corrected by experiments
or feedback information by computer simulating. In the turbine fault diagnosis, if the
corresponding membership function is unable to determined, the following three
methods can be adopted according to specific conditions:
Turbine Fault Diagnosis Based on Fuzzy Theory and SVM 671
⎧ 0 x<a
⎪ ( x − a ) / (b − a ) a≤ x≤b
⎪⎪
µ A ( x) = ⎨ 1 b≤ x≤c
⎪( d − x ) / ( d − c ) c<x≤d
⎪
⎪⎩ 0 x>d
(5)
3 )Normal-shaped fuzzy function
x −a 2
−( )
µ A ( x) = e b
(6)
3.3 Definition
Judging whether fault symptom exist or not directly by values of each thermal pa-
rameters of condenser is not accurate at all, because one fault symptom may corre-
spond to different fault, and although the changing trend of same fault symptom of
different faults are the same, the degree of changing are different. Therefore, thermal
parameters should be fuzzy treated with the concept of fuzzy mathematics. Mean-
while, data from sensors can be dispersed. The changing trends are shown natural and
veritable by blurring each input parameters using membership function.
According [11], the following three types of membership functions are applicable
to condenser fault symptom:
1) Smaller sized
(9)
In the formula, c∈U is an arbitrary point; k is a parameter larger than zero (k>0).
672 F. Xia et al.
According to the upper formulas and combining practical situation, the member-
ship function of operating parameters of 300MW unit condenser corresponding symp-
tom can be obtained and they are shown in TABLE I.
4.1 Fundamentals
Two types of points can be separated from a right line which exists numerously.
The classification function is approximated by the following function:
f ( x, a ) = w ⋅ x + b (10)
⎧ n ⎫
f ( x) = sgn ⎨∑ α i* yi ( xi ⋅ x) + b* ⎬ (12)
⎩ i =1 ⎭
With the deep study on support vector machine, many researchers increased and
changed the function, the variable coefficient method or formula for the deformation,
resulting in advantages that there is a certain application or algorithm with a number
of support vector machines deformation algorithm. The main algorithms are the fol-
lowing: C-SVM Series, ν-SVM Series, One-class SVM, RSVM (reduced SVM),
WSVM (weighted SVM) and LS-SVM (least-square SVM).
C-SVM algorithm for the only adjustable parameter C is not an intuitive explana-
tion, it is very difficult in practice to choose the appropriate value of the defection,
Schblkoph proposed ν-SVM algorithm. ν-SVM with the new parameter ν to replace the
C, so that we can control the number of support vector or error, but also easy to choose
the value than that of C parameter. It has very clear physical meaning for lν which
represents the maximum amount of support vector borders and the lower limit of the
number of support vector. l is the size of samples. Although the standard ν-SVM algo-
rithm is more complex than C-SVM algorithm, it has the effective way to solve the
problem of classification with small samples. Chang et al proposed a deformed ν-SVM
algorithm which has original objective function with the increment of variable a. In
order to distinguish the two ν-SVM methods, the later is called Bν-SVM algorithm. For
simplification, the ν-SVM algorithm is adopted in the fault diagnosis analysis task. A
simple process of ν-SVM algorithm is displayed in Figure2.
5 Simulation Experiments
The three classical fault types of condenser mentioned are used in the stimulation of
fault diagnosis. The fault set is expressed by Y={Y1, Y2, Y3}. In the set, Y1 repre-
sents serious fault of cycle pump, Y2 represents full water of condenser and Y3 repre-
sents not working of condensate pump.
For different fault types, 40 samples are collected respectively. Then 20 samples
are used in SVM train and the other 20 samples are to be measured the SVM through
the previous train. Some examples in rain set and the corresponding fault type are
shown in Table 2.
1 2 3 4
X1 0.05 0.06 0.03 0.01
X2 0.65 0.60 0.53 0.75
X3 0.15 0.11 0.24 0.12
X4 0.18 0.17 0.42 0.15
X5 0.56 0.50 0.67 0.86
X6 0.43 0.40 0.58 0.69
X7 0.11 0.09 0.13 0.22
X8 0.37 0.28 0.41 0.73
Fault Type Y1 Y1 Y2 Y3
σ, which is the perception variable, determines the width of Gaussian function. The
value of σ can control the number of support vector machine. When the number of
support vector machine is too much, it can be appropriate by reducing the value of σ.
Generally the value is based on data analysis and experiences. To simplify, the value
of σ in the work is determined by the follow formula:
Turbine Fault Diagnosis Based on Fuzzy Theory and SVM 675
⎛ 2⎞
σ 2 = E ⎜ xi − x j ⎟ (14)
⎝ ⎠
According to the trained SVM, 60 samples of condenser fault are adopted for the
verification. And some of the test results are shown in Table 3.
1 2 3 4
X1 0.05 0.03 0.02 0.01
X2 0.62 0.55 0.70 0.72
X3 0.10 0.26 0.11 0.13
X4 0.16 0.41 0.13 0.15
X5 0.52 0.66 0.79 0.84
X6 0.43 0.58 0.71 0.69
X7 0.09 0.14 0.18 0.21
X8 0.30 0.43 0.78 0.71
Fault Type Y1 Y2 Y3 Y3
Actual Situation Y1 Y2 Y3 Y3
The results of stimulation test shows that SVM has the ability to classify correctly
the fault samples especially for the small samples due to the strong capability of data
processing.
In order to compare with the method of SVM proposed in this paper, the same
eight fault characteristics and three fault types of condenser are also adopted in the
method of BP neural network. For compared the result, the same membership func-
tion and three-level BP neural network [12-13] are adopted in tests. Eight fault char-
acteristics are inputs of the neural network and three fault types are the outputs.
Furthermore, the above two methods were tested through 30 simulation data. The
results are shown in Table 4.
Diagnostic results show that SVM has a higher diagnostic accuracy than that of BP
neural network in the application of small and complex fault samples. Generally the
correct classification rate is improved 8%.
676 F. Xia et al.
6 Conclusion
SVM is a learning algorithm based on the principle of structural risk minimization,
which has stronger theoretical basis and better generalization ability than neural network
algorithm based on the experience minimization. Steam condenser fault diagnosis is
taken as an example to verify the approach of SVM with fuzzy theory in intelligent
steam turbine fault diagnosis. Experiment results show that SVM had higher diagnostic
accuracy and stronger ability to classify than BP neural network in the complex fault
diagnosis with small samples. In the future research, the approach will be applied in the
more complicated fault diagnosis to verify the applicability. Furthermore, other intelli-
gent diagnosis methods can be combined with SVM to enhance the accuracy of fault
diagnosis.
Acknowledgments. This work is supported by Program of Shanghai Subject Chief
Scientist with (09XD1401900) and Natural Science Foundation of Shanghai (No.
09ZR1413300).
References
1. Diao, Y., Passinb, K.M.: Fault diagnosis for a turbine engine. Control Engineering Prac-
tice, 1151–1165 (December 2004)
2. Zeng, X., Li, K.K., Chan, W.L., Yin, X., Chen, D., Lin, G.: Discussion on application of
information fusion techniques in electric power system fault detection. Electric Power 36,
8–12 (2003)
3. Zhao, H., Li, W., Sheng, D., et al.: Study on Fault Diagnosis of Condenser Based on BP
Neural Network. Power System Engineering 20, 32–34 (2004)
4. Chen, Z., Xu, J.: Condenser fault diagnosis based on Elman networks. East China Electric
Power 35, 871–874 (2007)
5. Wang, J., Li, L., Tang, G.: Study on Fault Diagnosis of Condenser Based on RBF Neural
Network. Electric Power Science and Engineering 23, 27–31 (2007)
6. Ma, Y., Yin, Z., Ma, L.: Study on fault diagnosis of condenser based on SOM neural net-
work. Journal of North China Electric Power University 33, 5–8 (2006)
7. Wang, X., Wang, Q.: A Study on the Diagnosis of the Condenser Faults Based on the
Fuzzy Neural Network. Modern Electric Power 18, 12–17 (2001)
8. Wu, Z., Lin, Z.: FNN-based fault diagnosis system for condenser. Gas Turbine Technol-
ogy 21, 42–45 (2008)
9. Burges, C.J.: A Tutorial on Support Vector Machine for Pattern Recogntion. Data Mining
and Knowledge Discovery 2(2) (1998)
10. Vapnik, V.N.: The Nature of Statistical Learning. Tsinghua University Press, Beijing
(2004)
11. Jia, X.: Fault Diagnosis in Turbine of Condenser. Harbin Engineering University, Haerbin
(2004)
12. FeiSi Center of Technology and Research, Neural Network Theory and MATLAB 7 Ap-
plication, pp. 44–51. Publishing House of Electric Industry, Beijing (2005)
13. Wu, Z., Lin, Z.: FNN-based fault diagnosis system for condenser. Gas Turbine Technol-
ogy 21, 42–45 (2008)
Architecture of Multiple Algorithm Integration
for Real-Time Image Understanding Application
Abstract. Robustness and real-time usually are the main challenges when de-
signing image understanding approach for practical application. To achieve ro-
bustness, integrating multiple algorithms to make a special hybrid approach are
becoming a popular way and there have been a lot of successful hybrid ap-
proaches. But this aggravates the difficulties in achieving real-time because of
heavy computational workload of multiple algorithms. To design a hybrid ap-
proach with real-time constraint more easily, some theoretical researches about
multiple algorithm integration are very necessary. This paper presents a common
multiple algorithm integration model and architecture for typical image under-
standing applications. To achieve robustness and real-time in a hybrid approach,
the strategies for increasing robustness and speed up are analyzed. Finally a ro-
bust hybrid approach for rear vehicle and motorcycle detection and tracking is
introduced as a sample.
1 Introduction
When developing an image understanding system for practical applications, follow
challenges usually are faced: robustness and real-time. For example, an obstacle de-
tection approach for driver assistance system need to provide robust perception in wide
variety of outdoor environment, and this procedure need to be finished as quickly as
possible to save time for driver reaction [1]. A successful image understanding system
should meet the two requirements at the same time, but it is very difficult to design such
an approach because the two requirements are usually with strong conflicts.
In image understanding procedure, it is often the case that many algorithms exist to
acquire same special information, and each possessing different performance and
computing workload characteristics. Since no single algorithm will be robust enough to
deal with a wide variety of environment conditions, integrating multiple algorithms to
make a special hybrid approach is becoming a popular way to increase robustness. But
image understanding algorithms usually have high computational workload, this inte-
gration aggravates the difficulty in achieving real-time. So it become very important to
do some common research about how to design a hybrid image understanding approach
which can achieve robustness and real-time at the same.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 677–684, 2009.
© Springer-Verlag Berlin Heidelberg 2009
678 B. Duan et al.
Up to now, there have been a lot of researches about multiple algorithms integration
in image understanding. Most of them propose their hybrid approaches for special
problems, which usually are with a form of multiple kinds of technologies hybrid or
multiple visual cues integration. For example, P. Zhang et al. [2] propose a hybrid
classifier for handwritten numeral recognition, which uses neural network as coarse
classifier and used decision tree as fine one. K. Toyama et al. [3] propose a hybrid visual
tracking approach, which includes multiple trackers such as edge of polygon tracker,
rectangle tracker and etc. B. Leibe et al. [4] propose a segmentation approach for object
recognition that integrates multiple visual cues and combines multiple interest region
detectors. Multiple algorithms integration has been proved to be an effective way for
increasing robustness, and some theoretical researches have been performed. For ex-
ample, M. Spengler et al. [5] propose a common framework for visual tracking that can
support the integration of multiple cues. These researches can help to design robust
approach for some special purposes such as tracking, but deeper researches are still
needed to be performed about how to design a robust image understanding hybrid
approach for common purpose with real-time constraint.
This paper is organized as follow, firstly a common multiple algorithm integration
model is proposed, which can support common purpose such as segmentation, classi-
fying, tracking and etc. Then some strategies are introduced about how to design a
robust and real-time hybrid approach base on the model, and design strategies and
executing strategies are also analyzed about increasing robustness and speeding up.
Then architecture for typical applications is introduced, which can help design a total
solution for image understanding task. At last a hybrid approach for rear vehicle
detection and tracking is introduced as a sample.
Algorithm Pool includes a set of different algorithms which can provide same output
information and each algorithm has a description about its performance and computing
workload characteristics. In most cases, quality of output information could be as a
usual description of performance characteristic of algorithm, and computing cost could
be a usual description of computing workload characteristic. These descriptions may be
express in different forms, for example, the average detection rate of segmentation
algorithm in some illumination condition may be as the expression of quality of its
output information, and average executing time may be as the expression of its com-
puting cost. And this information can be acquired from statistic.
Evaluator is in charge of providing the evaluation values of performance and
computing workload characteristics in current input data. For example, to above seg-
mentation algorithms, the evaluation value could be given according to current illu-
mination condition which could be calculated base on current input data.
Fuser is in charge of fusing output information of multiple algorithms to give final
output. And most technologies about multi-sensor data fusion and multi-cues integra-
tion could be applied very well for this procedure.
Scheduler is in charge of scheduling algorithms according to current available
computing power, executing strategy and evaluation values of performance and com-
puting workload characteristics of each algorithm. Usually computing power indicates
available executing time of hybrid approach and executing strategy indicates the pri-
ority of performance and computing workload requirement, and special executing
strategy will be implemented with special scheduling method.
The main advantage of HYBRID model is that it can support common image un-
derstanding task. On the one hand, the algorithms of “Algorithm Pool” may be detector,
classifier and tracker, so the model can also support different purposes such as detec-
tion, classification, tracking and etc. On the other hand, when taking visual cue ob-
servation procedures as algorithms of “Algorithm Pool”, HYBRID procedure is
equivalent to multiple visual cues integration. And when taking information acquiring
procedures from sensors as algorithms of “Algorithm Pool”, HYBRID procedure is
680 B. Duan et al.
equivalent to multiple sensor data fusion. It shows the model can support multi-cues
integration even multi-sensor data fusion. The analyzing indicates that HYBRID model
can provide support for current main image understanding purposes.
One important aim of this research is to find some strategies that can help to design a
robust and real-time approach base on above HYBRID model. For such an approach,
strategies about increasing robustness can be analyzed base on three parts. The first part
is “Algorithm Pool”. For single algorithm, an adaptive algorithm such as adaptive
SOBEL edge detector is helpful for robustness in complexity environment. The second
part is “Scheduler”. The selection of appropriate algorithms sequence is helpful for
increasing robustness. The third part is “Fuser”. The fusion of multiple visual cues and
multiple sensor data can provide huge help for robustness.
Since the procedure of information fusion usually needs less computing power than
image understanding algorithm, the strategies about speeding up hybrid approach will
be analyzed base on only “Algorithm Pool” and “Scheduler”. For single algorithm in
Algorithm Pool, the speeding up can be implemented in two levels. The first level is
software implementation of single algorithm. And high efficient implementation in
special hardware is an important way to achieve real-time, such as high performance
program base on data parallel technology. The second level is base on the reducing of
computing complexity of single algorithm, multi-resolution technology is popular way
to reduce complexity. For multiple algorithm integration in Scheduler, the determining
of appropriate algorithms sequence is helpful for archive real-time.
Above strategies can be implemented in two phases: designing phase and executing
phase. Strategy implementations in design phase are mainly focusing on choosing
efficient algorithms for “Algorithm Pool”. And strategy implementations in executing
phase are mainly focusing on scheduling algorithms and fusing their output informa-
tion. And this paper will mainly provide some analysis about algorithms scheduling
method. In actual applications the scheduling strategies usually are various, but they
still can be summarized into two patterns in the rough as follow.
The HYBRID model and above strategies can help design one approach for some
purpose, but they are not enough for designing a total solution of special image un-
derstanding application. So a framework is necessary and there are a lot of researches
that provide their frameworks [9] [10]. Base on above researches, this paper customizes
a hierarchical framework for common image understanding task and give an analysis
about how to design a hybrid approach base on the framework. As Fig.4, the framework
includes six kinds of information (sensor data, data, feature, object hypotheses, object
observation, object tracking), and it includes five typical processing procedures (pre-
process, feature extraction, hypothesis generation (HG), hypothesis verification (HV),
tracking), which can transform one or multiple kinds of information to another. One
ellipse is a kind of information, and one a dash arrow indicates there is procedure that
can implement the transform along the direction of arrow. One circle is a special case of
some kind of information, and a real arrow indicates there is an algorithm which can
implement the transform along the direction of arrow. If there is a circle towards which
there are multiple real arrows, it means we can design a hybrid approach base on
HYBRID model for acquiring the information. Then once the expected information is
specified, if a data chain from sensor data to expected information could be constructed,
a solution of this image understanding task could be determined. And a total solution
could be enhanced robustness and speed up using above strategies.
4 Conclusions
To increase robustness of image understanding approach, integrating multiple algo-
rithms to make a special hybrid approach are becoming popular. But this aggravates the
difficulties in achieving real-time because of heavy computational workload of multi-
ple image understanding algorithms. Some theoretical researches about how to design a
robust and real-time hybrid approach are necessary.
684 B. Duan et al.
This paper presents a common multiple algorithm integration model and analyzes
the strategies and scheduling method for increasing robustness and speeding up. Then
architecture for typical applications is introduced and it can help to design a total image
understanding approach for special application. At last the prototype of rear vehicle and
motorcycle detection and tracking validate efficiency of the framework and HYBRID
model for achieving robustness and real-time.
Research about multiple algorithms integration can help people to apply image
understanding in real applications more easily and it is necessary to do deeper re-
searches. Our next step is to validate them using a multi-sensor system.
References
1. Sun, Z., Bebis, G., Miller, R.: On-road vehicle detection using optical sensors a review. In:
IEEE International Conference on Intelligent Transportation Systems, pp. 585–590. IEEE
Press, New York (2004)
2. Heidemann, G., Kummert, F., Ritter, H., Sagerer, G.: A Hybrid Object Recognition Archi-
tecture. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS,
vol. 1112, pp. 305–310. Springer, Heidelberg (1996)
3. Tay, Y.H., Khalid, M., Yusof, R., Viard-Gaudin, C.: Offline Cursive Handwriting Recog-
nition System base on Hybrid Markov Pattern and Neural Networks. In: IEEE International
Symposium on Computational Intelligence in Robotics and Automation, vol. 3, pp.
1190–1195. IEEE Press, New York (2003)
4. Leibe, B., Mikolajczyk, K., Schiele, B.: Segmentation based multi-cue integration for object
detection. In: 17th British Machine Vision Conference, vol. 3, p. 1169. BMVA Press (2006)
5. Spengler, M., Schiele, B.: Towards Robust Multi-Cue Integration fot Visual Tracking. In-
ternational Journal of Machine Vision and Applications 14, 50–58 (2003)
6. Cao, X., Balakrishnan, R.: Evaluation of an On-line Adaptive Gesture Interface with
Command Prediction. In: Graphics Interface Conference, pp. 187–194. Canadian Hu-
man-Computer Communications Society (2005)
7. Teng, L., Jin, L.W.: Hybrid Recognition for One Stroke Style Cursive Handwriting Char-
acters. In: 8th International Conference on Document Analysis and Recognition, pp.
232–236. IEEE Computer Press, Los Alamitos (2005)
8. Koerich, A.L., Leydier, Y., Sabourin, R., Suen, C.Y.: A Hybrid Large Vocabulary Hand-
written Word Recognition System using Neural Networks with Hidden Markov Patterns. In:
8th International Workshop on Frontiers in Handwriting Recognition, p. 99. IEEE Press,
New York (2002)
9. Hall, D.L.: Mathematical Techniques in Multisensor Data Fusion. Artech House, Boston
(2002)
10. Scheunert, U., Lindner, P., Richter, E., Tatschke, T., Schestauber, D., Fuchs, E.: Early and
Multi Level Fusion for Reliable Automotive Safety Systems. In: IEEE Intelligent Vehicles
Symposium, pp. 196–201. IEEE Press, New York (2007)
11. Duan, B., Liu, W., Fu, P., Yang, C., Wen, X., Yuan, H.: Real-Time On-Road Vehicle and
Motorcycle Detection Using a Single Camera. In: IEEE International Conference on In-
dustrial Technology, pp. 1–6. IEEE Press, New York (2009)
Formalizing the Modeling Process of Physical Systems
in MBD
Abstract. Many researchers have proposed several theories to capture the es-
sence of abstraction. The G-KRA model(Genera KRA model), based on the
GRA model which offers a framework R to represent the world W where a set of
generic abstraction operators allows abstraction to be automated, can represent
the world from different abstraction granularity. This paper shows how to model
a physical system in model-based diagnosis within the G-KRA model framework
using various kinds of knowledge. It investigates, with the generic theory of ab-
straction, how to automatically generate different knowledge models of the same
system. The present work formalizes the process of constructing an abstract
model of the considered system (e.g., using functional knowledge) based on the
fundamental model and abstract objects database and expects that formalizing
the modeling process of physical systems in MBD within the G-KRA framework
will open the way to explore richer and better founded kinds of abstraction to
apply to the MBD task.
1 Introduction
Abstraction is an essential activity in human perception and reasoning. In the Artificial
Intelligence community, abstraction has been investigated mostly in problem solving
[1], planning [2], diagnosis [3], and problem reformulation [4]. In general, the proposed
approaches differ either in the kind of abstractions they use or in how they formally
represent them. They all fail to characterize the practical aspects of the abstraction
process. Saitta and Zucker [5] have proposed a model of representation change that
includes both syntactic reformulation and abstraction. The model, called KRA
(Knowledge Reformulation and Abstraction), is designed to help both the conceptu-
alization phase of a problem and the automatic application of abstraction operators.
Compared to the KRA model, the G-KRA model(Genera KRA model) [6] is more
general and flexible to represent the world. It can represent the world from different
abstraction granularity.
*
Corresponding author: ouyangdantong@163.com
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 685–695, 2009.
© Springer-Verlag Berlin Heidelberg 2009
686 N. Wang et al.
The modeling process within the G-KRA model can be divided into two correlative
phases: fundamental modeling and abstract modeling. The former one uses the funda-
mental knowledge, i.e. structural and behavioral knowledge, to derive the fundamental
model of the system based on some kind of ontology and representational assumption
which is relatively simple and intuitive. While the abstract modeling process automati-
cally constructs a(some) more abstract model(models) for the same system utilizing other
knowledge such as functional knowledge, teleological knowledge.
In this section we simply introduce the definition of G-KRA abstraction model, see [6]
for details.
Definition 1. A primary perception P is a 5-ple, P = (OBJ,ATT,FUNC,REL,OBS)
where OBJ contains the types of objects considered in W, ATT denotes the types of
attributes of the objects, FUNC specifies a set of functions and REL is a set of relations
among object types.
Definition 2. Let P be a primary perception, A be an agent (who perceives W) and Oa
be a database. The objects with some abstract types in Oa are predefined by A. An
abstract perception of A is defined as P*=δa(P,Oa), where δa denotes an abstract per-
ception mapping.
Definition 3. Given a primary perception P, an abstract objects database Oa and an
abstract perception mapping δa, a general representation framework R* is a 4-ple
(P*,D*,L*,T*), where P*=δa(P,Oa) is an abstract perception, D* is a database, L* repre-
sents a language and T* specifies a theory.
Formalizing the Modeling Process of Physical Systems in MBD 687
The primary perception can be taken as the fundamental model of W. Firstly we are
supposed to define the ontology of the model and identify what to represent of the real
system in the model according to the knowledge involved above. There exists repre-
sentational links and ontological links between the different fundamental models of the
same world [11]. Then we can choose some appropriate kinds of structural and be-
havioral knowledge to build the primary perception P within the framework R and
moreover construct the fundamental model of W as shown in figure 1 where L desig-
nates the union of representational links L1 and ontological links L2.
primary perception P1
P1(W)
W L
primary perception P2
P2(W)
In this section we take the diagnosis task as an example to show how to build up the
fundamental model of a particular physical system. We will consider the hydraulic
system reported in Figure 2, which is the same used in [12].
We here perceive four types of components, i.e., pipes, valves, three-way nodes, and
pumps. Moreover we treat ports that connect components with each other (internal
ports), or connect components to the environment(external ports), as the kind of entities
that the model must designate.
We assume the state of the valves has been given that V1 is open while V2 is closed
without considering the commands set on the valves. The volumetric pump PM1, that
we assume is active, is delivering a constant flow equal to Fk. We choose a qualitative
behavioral representation of the hydraulic system used in the example instead of its
quantitative version.
The possible behaviors of pipes, valves, pumps and three-ways are reported in [12].
>
For example, pipes have two behavioral modes: OK(Fin=Fout) and LK(Fin Fout).
688 N. Wang et al.
Now we show how to construct the fundamental model of the system in Figure 2.
We use the representation in [13] with a few different details.
∪
-OBJ=COMP {PORT},COMP={PUMP, PIPE, VALVE, THREE-WAY}
-ATT={ObjType: OBJ->{pipe,pump,valve,three-way,port},
Direction: PORT->{in,out},THREE-WAY->{2wayOut,2wayIn},
Observable: PORT->{yes, no}, State : VALVE->{open,closed}}
-FUNC={Bpump : PUMP->{ok,uf,of,lk},Bpipe: PIPE->{ok,lk},
Bvalve: VALVE->{ok,so,sc},Bthree-way: THREE-WAY->{ok}}
-REL’={port-of ⊆ PORT×COMP,connected ⊆ PORT×PORT}
-OBS’={(PM1,P1,…,P6,V1,V2,TW1,TW2,t1,…,t13,t2’,…,t12),(ObjType(PM1)=pump,Ob-
jType(P1)=pipe,…,ObjType(P6)=pipe,ObjType(V1)=valve,…,ObjType(TW1)=three-way,
…,ObjType(t1)=port,…),(Direction(TW1)=2wayOut,…,Direction(t1)=in,…),(Observable(t
1)=yes,…),(State(V1)=open,State(V2)=closed),(port-of(t1,PM1),…),(connected(t2,t2’),…)}
For the sake of space, the contents of the structure/database D, the logical language
L and the theory T, which have been described in [13], will not be provided here.
Two or more objects perceived in the fundamental modeling process may have the same
nature from some perspective so that we can take them individually as one more abstract
object. It is an automatically matching process, before which the abstract objects data-
base must be constructed manually directed by some kind of subject knowledge and at
the same time it is important to design abstract mapping in order to automatically op-
erate and then generate the more abstract model of the same world appropriate for some
particular reasoning tasks.
In the following section we will formalize the abstract modeling process within the
G-KRA model based on the fundamental model of the same system built up before.
We will give the formal definition of Abstract Objects Database and Abstract Map-
ping and take the functional knowledge for example to automatically construct the
model of functional roles which is based on an object-centered ontology.
We will represent the abstract objects using the concept of the framework R=(P,D,L,T)
proposed by Saitta and Zucker. We define P as abstract objects perception, denoted by
AOP, where the observation OBS has not been involved as it is not a particular system
that we perceive but a set of abstract objects. We will also introduce some assumptive
object representation for characterizing the behavior of the abstract objects. The rest of
the framework R will be automatically regulated to AOD, AOL, and AOT to form a
new framework AOR of different meaning with R.
Definition 4. An abstract objects perception is a 4-ple AOP =(A_OBJ,A_ATT,
A_FUNC,A_REL), where A_OBJ contains the types of abstract objects predefined
based not on fundamental knowledge, A_ATT denotes the types of attributes of the
abstract objects, A_FUNC specifies a set of functions, and A_REL is a set of relations
among object types. These sets can be expressed as follows:
Formalizing the Modeling Process of Physical Systems in MBD 689
-A_OBJ = {A_TYPEi|1 ≤ i ≤ N}
- A_ATT= {A_Aj : A_TYPEj → Λj|1 ≤ j ≤ M}
- A_FUNC= {A_fk : A_TYPEik × A_TYPEjk × ... → Ck|1 ≤ k ≤ S}
- A_REL= {A_rh ⊆A_TYPEih × A_TYPEjh |1 ≤ h ≤ R}
The physical system comprises many individual components which have also been
assigned abstract roles in the functional role model. The functional role of a component
is an interpretation of the equations describing its behavior, aimed at characterizing how
the component contributes to the realization of the physical processes in which it takes
part. Here we introduce three kinds abstract functional objects involved in [14]: conduit,
which enables a generalized flow (effort, energy or information and so on) from one
point to another in the structure of a system; generator, which causes a generalized flow
from one point to another in the structure of a system; barrier, which prevents a gener-
alized flow from one point to another in the structure of a system.
Let AOP=( A_OBJ, A_ATT, A_FUNC, A_REL) be the abstract objects perception
based on functional knowledge specified as follows:
- A_OBJ= A_COMP ∪
A_PORT ∪
… where A_COMP=CONDUIT ∪
∪
GENERATOR BARRIER
- A_ATT={ A_ObjType: A_OBJ→{conduit,generator,barrier,port},
A_Direction: A_PORT→{in,out},A_MeasureType: A_PORT→{flow,pressure}}
- A_FUNC={ A_Fc: CONDUIT→{funcc},A_Fg: GENERATOR→{funcg},
A_Fb: BARRIER→{funcb}}
- A_REL={ A_portof ⊆ A_PORT×A_COMP,
A_connected ⊆ A_PORT×A_PORT }
Going back to the representation framework AOR, we will define three instantiated
abstract objects c(CONDUIT), g(GENERATOR) and b(BARRIER) and introduce
several ports(A_PORT) p1,p2,… to characterize the database AOD which contains the
tables described as follows.
-TableAObj=(A_obj, A_objtype, A_direction, A_measuretype), which describes
abstract components and their attributes.
-TableAFunc=(A_obj, A_func, A_funcmode), which describes the functional
modes of the abstract components.
-TableAPortOf=(A_port, A_comp), which describes what ports is attached to the
abstract components.
In the table TableAObj some of the entries can be set to N/A (not applicable), as not all
attributes are meaningful for all abstract objects. For the sake of simplification, we will
not discuss all of the generalized flow types but assume the measure type of the ports is
flow. We list the contents of the three tables as follows.
The definition of AOL and AOT are similar to L and T in the framework R so that
we can, for example, describe the structure relating a conduit using AOL:
∧ ∧ ∧
conduit(c,p1,p2) ⇔ A_comp(c) A_port(p1) A_portof(p1,c) in(p1) flow(p1) ∧
∧ ∧ ∧ ∧
A_port(p2) A_portof(p2,c) out(p2) flow(p2)
690 N. Wang et al.
Table 1. TableAObj
Table 2. TableAFun
Also the functional modes described with equations in Table 3 can be rewritten in
logical terms in the theory. For instance: conduit(c,p1,p2) ∧
funcc(c)→ △
FlowValue(p1,p2)=0.
Table 3. TableAPortOf
------------------------------------------------------------------------------------
Procedure SMP COMP c1, A_COMP c2
in1=the set of all in ports of c1;
in2=the set of all in ports of c2;
out1=the set of all out ports of c1;
out2=the set of all out ports of c2;
If (|in1|<>|in2| or |out1|<>|out2|) return FAIL;
If the types of in1 and in2 can’t completely match, return FAIL;
If the types of out1 and out2 can’t completely match, return FAIL;
return SUCCEED;
------------------------------------------------------------------------------------
Fig. 3. Structure Matching Procedure
-------------------------------------------------------------------------------------------------------
Procedure BMP COMP c1, A_COMP c2
IN={in1,in2,…,inm}: the input ports of c1(or c2) individually;
OUT={out1,out2,…,outn}: the output ports of c1(or c2) individually;
E={E1,E2,…,Ej}: the equations representing c1’s normal behaviors;
F={F1,F2,…,Fk}: the equations representing c2’s functional modes;
Put the elements of IN and OUT to the corresponding equation of E and F, and receive the
new sets E’ and F’;
If E’ completely matches F’, return SUCCEED;
return FAIL;
-----------------------------------------------------------------------------------------------------
Fig. 4. Behavior Matching Procedure
-------------------------------------------------------------------------------------------------------
Procedure CRP ObjType type1, State S, A_ObjType type2
//replace the components having type type1 and state S with the abstract object having abstract
type type2 in the considered system
∪ ∪ ∪
IF S is arbitrary {
OBJ’=COMP’ PORT … where COMP’=COMP-type1 type2;
ATT’=ATT-{ATTname: type2→ ∧
j}-{ObjType} ∪
{ A_ObjType : COMP’ →
∧ ObjType-{type1}∪ {type2}}
∪
FUNC’=FUNC-{Btype1: type1→ck} {A_Ftype2: type2→functype2}
REL’=Keep REL unchanged except for replacing COMP in the relation of REL with
A_COMP.
Modify OBS to OBS’ based on OBJ’,ATT’, FUNC’,and REL’;
P’={OBJ’,ATT’, FUNC’, REL’, OBS’};
Modify the contents of D, L, and T accordingly ,based on P’, AOD, AOL,and AOT , to
D’, L’ and T’;
}
-------------------------------------------------------------------------------------------------------
Fig. 5. Components Replacement Procedure
like valves) are functionally equivalent to the given abstract object. So then we can
replace the components with the matched abstract type object through the procedure
CRP described as follows:
We have only considered in Procedure CRP the situation that the state of compo-
nents can be ignored (like pipes). While dealing with the components whose state needs
to be discussed we must do more works in the procedure. On the one hand, we are not
allowed to delete the object type type1 in the more abstract perception, because not all
components with type type1 stay in state S. Only, the new type type2 is added. On the
other hand, the replacement happens only on the components in state S, so we should
check the state of the considered component before we modify it.
Execute the three procedures repeatedly until there is no component matching any
abstract object of the abstract objects database, and then we can generate a more
abstract model based on functional knowledge with less object types in it. We use the
procedure GAMP (from Generate Abstract Model Procedure) to describe the process.
-------------------------------------------------------------------------------------------------------
Procedure GAMP
Add a flag ENABLE(initially ENABLE=YES) to each component of the considered
system;
//ENABLE=YES represents the component has not been checked or replaced, namely, it can
be dealt with. Whereas if ENABLE=NO, then it indicates either the component has been replaced
or it does not match any abstract object of the given abstract database.
While there exist components unsettled, do{
c=any component whose ENABLE value is YES;
type1=ObjType(c); S=State(c);
For each object oj of the given abstract database, do{
if (SMP(c, oj)=SUCCEED)
if(BMP(c, oj)=SUCCEED) {
type2=A_ObjType(oj);
CRP (type1, State S, type2);
Set ENABLE=NO to all the components of type type1 and state S;
}
Set ENABLE=NO to all the components of type type1 and state S;
}
}
-------------------------------------------------------------------------------------------------------
In the rest of this section we give an example based on the hydraulic system de-
scribed in figure 2 to show how to automatically construct the abstract model from the
fundamental model using functional knowledge.
According to the procedure GAMP, we compare any component c unsettled to each
object of the abstract objects database to find an abstract object completely matching it
in both structure and behavior and then replace all components of the same type as c in
the considered system with the abstract object. We choose the components in such an
order as follows: PM1,P5,TW1,P1,V1,P2,P3,V2,P4,TW2,P6.
Formalizing the Modeling Process of Physical Systems in MBD 693
Fig. 7. Replace all the active pumps of the system with generators
Fig. 9. Replace all open valves with conduits and closed valves with barriers
694 N. Wang et al.
3 Conclusion
Many researchers have explored how to use relevant knowledge to construct the
models of physical systems in MBD, for instance, structural and behavioral knowl-
edge(e.g.,[9]), functional and teleological knowledge(e.g.,[14]), multimodeling ap-
proach(e.g.,[11]). In recent years, new approaches have been proposed in MBD to
exploit the degree of observability for defining useful abstractions (e.g., [12],[15]).
While these approaches have developed some interesting solutions to the problem of
abstracting models for MBD, they have failed to investigate the relations between
proposed methods and general theories of abstraction. The present work represents a
step in the direction of filling this gap. In particular, the present paper introduces the
extended KRA model, i.e., G-KRA model, which can represent the world more general
and flexible. We have shown how to automatically construct an abstract model based
on the fundamental model which contributes to formalizing the process of transforming
models with fundamental knowledge to other models with abstract knowledge, such as
structural and behavioral models to functional role models described above. There still
exist some problems we will explore in future, e.g., how to deal with the situation that a
component has multiple roles about different domains to automatically build up mul-
tiabstract models; how to formalizing the realization of the links of two fundamental
models with different ontologies or representation assumption, and so on.
Formalizing the Modeling Process of Physical Systems in MBD 695
Acknowledgment
The authors are grateful to the support of NSFC Major Research Program under Grant
Nos. 60496320 and 60496321, Basic Theory and Core Techniques of Non Canonical
Knowledge; NSFC under Grant Nos. 60773097 and 60873148, Program for New
Century Excellent Talents in University; Jilin Provine Science and Technology De-
velopment Plan under Grant Nos. 20060532 and 20080107, and European Commission
under Grant No. TH/Asia Link/010 (111084).
References
1. Holte, R., Mkadmi, T., Zimmer, R., MacDonald, A.: Speeding up problem-solving by ab-
straction: A graph-oriented approach. J. Art. Intelligence 85, 321–361 (1996)
2. Knoblock, C., Tenenberg, J., Qiang, Y.: A spectrum of abstraction hierarchies for planning.
In: Proc. AAAI WS on AGAA, pp. 24–35 (1990)
3. Mozetic, I.: Hierarchical model-based diagnosis. J. Int. Journal of Man-Machine Stud-
ies 35(3), 329–362 (1991)
4. Subramanian, D.: Automation of abstractions and approximations: Some challenges. In:
Proc. AAAI WS on AGAA, pp. 76–77 (1990)
5. Saitta, L., Zucker, J.: Semantic abstraction for concept representation and learning. In: Proc.
SARA, pp. 103–120 (1998)
6. Shan-wu, S., Nan, W., Dan-tong, O.Y.: General KRA Abstraction Model. J. Journal of Jilin
University (Science Edition) 47(3), 537–542 (2009)
7. Weld, D., De Kleer, J.: Readings in Qualitative Reasoning about Physical Systems. Morgan
Kaufmann, San Mateo (1990)
8. Bobrow, D.G. (ed.): Special Volume on Qualitative Reasoning about Physical Systems. J.
Artificial Intell. 24 (1984)
9. Davis, R.: Diagnostic reasoning based on structure and behavior. J. Artificial Intelli-
gence 24, 347–410 (1984)
10. Sticklen, J., Bond, E.: Functional reasoning and functional modeling. IEEE Expert 6(2),
20–21 (1991)
11. Chittaro, L., Guida, G., Tasso, C., Toppano, E.: Functional and teleological knowledge in
the multimodeling approach for reasoning about physical system:a case study in diagnosis.
IEEE Trans. Syst. Man, Cybern. 23(6), 1718–1751 (1993)
12. Chittaro, L., Ranon, R.: Hierarchical model-based diagnosis based on structural abstraction.
Art. Intell. 155(1-2), 147–182 (2004)
13. Saitta, L., Torasso, P., Torta, G.: Formalizing the abstraction process in model-based di-
agnosis. In: Tr cs, vol. 34, Univ. of Torino, Italy (2006)
14. Chittaro, L., Ranon, R.: Diagnosis of multiple faults with flow-based functional models:the
functional diagnosis with efforts and flows approach. Reliability Engineering and System
Safety 64, 137–150 (1999)
15. Torta, G., Torasso, P.: A Symbolic Approach for Component Abstraction in Model-Based
Diagnosis. In: Proceedings of the Model-Based Diagnosis International Workshop (2008)
Study on Stochastic Programming Methods Based on
Synthesizing Effect
1 Introduction
Randomness is a widespread phenomenon in the real world and is unavoidable in
many practical fields. How to establish an effective and workable method to process
random information is a widespread concern on production management, artificial
intelligence, optimization of complex systems research, and the stochastic programming
is generally involved in these research areas. As a random variable is a family of data
that satisfy some laws and dose not have clear order, the current programming methods
should not be directly applied to solving the problem of stochastic programming. At
present, there are three basic methods to solve the stochastic programming problem:
1) Expectation model, the basic idea is to use mathematical expectation to describe
random variables, then turn the stochastic programming into the general programming
problem. 2) Chance-constrained model, the basic idea is to convert stochastic constraints
and objective functions into ordinary constraints and objective functions through some
reliability principles. 3) Dependent-chance programming model, the basic idea is to
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 696–704, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Study on Stochastic Programming Methods Based on Synthesizing Effect 697
regard objective functions and constraints as events under random environment and
solve the stochastic programming problem by maximizing the chances of all the
events happening. These methods have achieved much good applications, such as, in
[3], the authors studied measures program of oil field by using expectation model, and
paper [4] use chance-constrained model to size batteries for distributed power system,
then paper [5] use it to optimize allocation of harmonic filters on a distribution network.
But they could not solve the stochastic programming problem under complicated
environment effectively, and the deficiencies mostly are: 1) it is hard to use expectations
to describe and represent random variables effectively and it is difficult to ensure the
reliability of model when the randomness is larger by expectation model; 2) when the
stochastic characteristics are complex (that is to say, it is difficult to determine the
distribution of the stochastic environment), the computational complexities of the
chance-constrained model and dependent-chance programming model are so large to
get an analytic form of the program viable solution In response to these problems,
many scholars from different perspectives studied stochastic programming problem,
for example, papers[6-9] studied to solve chance-constrained model and dependent-
chance programming model by using the stochastic simulation technique and genetic
algorithm. But all of these methods have strongly points. Thus, so far, there are not
systematic and effective stochastic programming methods.
Based on the analyses above, this paper analyses the basic characteristics of stochastic
programming problems, combines with the inadequacy of existing methods, and have
the following contributions: a) we propose the concept of synthesizing effect function
for processing the objective function and constraints, and further we give some com-
monly-used models of synthesizing effect functions; b) we establish a general solution
model (denoted by BSE-SGM for short) based on synthesizing effect function for sto-
chastic programming problem; c) We analyze the characteristic of our model by an
example, and the results indicate that our methods are effective.
In this paper, let (Ω, B , P) be the probability space, and for every random variable
ξ in the (Ω, B , P), let E (ξ ) and D (ξ ) be the mathematical expectation and variation of
the random variable ξ respectively.
⎧max f ( x, ξ ),
⎨ (1)
⎩s.t. g i ( x, ξ ) ≤ 0, i = 1, 2, L, m.
Where x = ( x1 , x2 ,L , xn ) is the decision vector, and ξ = (ξ1 , ξ 2 , L , ξ n ) is the given
random variable vector in the probability space (Ω, B , P ) , and f ( x, ξ ) , g j ( x, ξ ) ,
are random variable functions, j = 1, 2, L , m .
As there is no simple order between random variables and g j ( x, ξ ) ≤ 0 mostly
could not be completely satisfied, model (1) is just a model and can not be solved
698 F. Li, X. Liu, and C. Jin
⎧max E ( f ( x, ξ )),
⎨ . (2)
⎩s.t. E ( g j ( x, ξ )) ≤ 0, j = 1, 2, L , m.
Generally, we call the model (2) the Expectation model [3].
When the variation of the random variable is larger, the mathematical expectation
could not describe the variable effectively. So we could not get the optimum solution
of the stochastic programming by using the model (2).
For the problem that the constraint of the stochastic programming often can not satisfy
absolutely, we can use the reliability to dispose of the constraints and objective functions
of the programming. Then the model (1) becomes the following model (3):
⎧max f ( x ),
⎪ .
⎨s.t. P( f ( x, ξ ) ≥ f ( x)) ≥ α , (3)
⎪ P( g j ( x, ξ ) ≤ 0) ≥ α j , j = 1, 2, L , m.
⎩
Generally, we call the model (3) the chance-constrained model. In the model,
α j , α ∈ [0, 1] are the reliabilities that the solutions satisfy constraints and objective
functions, and P( A) is the probability that the event A may happen. Compared to the
model (2), this model could control the quality of the decision, but we don’t know the
range of α j ’s value when the model (3) could be soluble.
As stochastic programming is an uncertain decision-making problem, the results of
its decision-making in general should not be bound to make the relevant constraint
established absolutely. So compared to bounding by the given reliability of constraints
and objectives in advance, it is more suitable for the characters of integrated decision-
making that considering the constraint satisfaction and the size characteristics of ob-
jective function. To establish a general solution model under this concept, we can
synthesize the objective function value and constraint satisfaction together through
some kind of strategy (called the synthesizing strategy the synthesizing effect func-
tion), then discuss the programming based on the synthesized value. Further we give
the following random effects multi-attribute axiomatic system of synthesizing effect
functions.
Study on Stochastic Programming Methods Based on Synthesizing Effect 699
Remark 1. Since the above four principles also must be obeyed for multi-objective deci-
sion, we can similarly establish the axiomatic system for synthesizing effect function
for multi-objective programming, and for the above symbolic system, we only con-
sider u as (u1 , u 2 ,L, u m ) , change that S (u , v) is monotone non-decreasing on u into
that S (u , v) is monotone non-decreasing on each u i , these changes will not have essen-
tial effect for the results.
700 F. Li, X. Liu, and C. Jin
form synthesizing effect function on [a, b] . Here, ∧ is min operation of real numbers.
② For any k ∈ (0, + ∞ ), c ∈ [0, + ∞ ), α j ∈ (0, + ∞ ) , S (u , v) = k (u + c)Π jn=1v j j is a
α
model (2). Here, δ (t ) satisfies: δ (t ) = 0 for each t < 0 , δ (t ) = 1 for each t ≥ 0 ; and
η (t ) satisfies η (0) = −∞, η (1) = 1 .
Remark 5. The above analysis indicate that, model (4) includes the existing stochastic
programming model, and also it has better structural characteristic and strong inter-
pretation, therefore model (4) provides a theoretical platform for solving stochastic
programming problem. For different problems, we can use different (uniform) synthe-
sizing effect functions to embody and describe different decision consciousness.
5 Example Analysis
In this part, we will further analyze the characteristic of stochastic programming model
(4) by an example.
Example 1. Consider the following programming problem.
Here, ξ1 is a random variable with the uniform distribution on interval [2, 12]; ξ 2 is a
random variable with the normal distribution N(5, 10) ; ξ 3 is a random variable with
the normal distribution N (8, 20) ; ξ 4 is a random variable with the exponential distri-
bution that its parameters is 0.5, and ξ1 , ξ 2 , ξ3 , ξ 4 are independent with each other.
In order to facilitate the analysis of the characteristics of BSE-SGM, we only solve
the problem (6) by using the expectation model (2) and the BSE-SGM as follows.
I. By using the model (2), the problem (6) could be transformed into the general pro-
gramming problem (7):
Obviously, (7) and (8) are nonlinear programming problems and they could not be
solved easily by an analytical way. Genetic Algorithm is used here (its parameters set
as follows: Code mode: binary; Mutation probability: 0.001; Crossover probability: 1;
Population size: 80; Evolutionary generations: 100). Then the results of the problem
under different methods are shown in Table 1 (Here, S.E.V. denotes the Synthesizing
Effect Value).
The above analysis and computation results indicate: 1) The variations of the de-
cision results for the same stochastic programming problem by using BSE-SGM are
smaller than the one by using expectation model, which shows that BSE-SGM is
much closer with the essence of the decision than expectation model; 2) For the
different synthesizing effect functions, the decision results are different, and even
the difference is great. Therefore, BSE-SGM can effectively merge uncertainty
Study on Stochastic Programming Methods Based on Synthesizing Effect 703
Table 1. The results of solving the problem (6) under different methods
6 Conclusion
In this paper, for the solution of multi-attribute stochastic programming, by analyzing
the deficiencies of the existing methods, we propose the concept of multi-attribute
synthesizing effect function, give an axiomatic system for multi-attribute synthesizing
effect function, and establish a general solution model for stochastic programming
problem; we further analyze the characteristic of our model by an example. All the
results indicate that the multi-attribute synthesizing effect function is an effective tool
for processing decision preference, it can merge the processing thought of stochastic
information into the quantitative operation process, and it has theoretical systematiza-
tion and operational application.
Acknowledgements
This paper is supported by National Natural Science Foundation of China (70671034,
70871036, 70810107018) and the Natural Science Foundation of Hebei Province
(F2006000346) and the Ph. D. Foundation of Hebei Province (05547004D-2,
B2004509).
704 F. Li, X. Liu, and C. Jin
References
1. Charnes, A., Cooper, W.W.: Management Models and Industrial Applications of Linear
Programming. Johhn Wiley & Sons Inc., New York (1961)
2. Liu, B.: Dependent-chance programming: A class of stochastic programming. Computers
& Mathematics with Applications 34(12), 89–104 (1997)
3. Song, J.K., Zhang, Z.X., Zhang, Y.: A Stochastic expected value model for measures pro-
gram of oil field. Journal of Shandong University of Technology (Sci & Tech) 20(3), 9–12
(2006)
4. Sun, Y.J., Kang, L.Y., Shi, W.X., et al.: tudy on sizing of batteries for distributed power
system utilizing chance constrained programming. Journal of System Simulation 17(1),
41–44 (2005)
5. Zhao, Y., Deng, H.Y., Li, J.H., et al.: Chance-constrained programming based on optimal
allocation of harmonic filters on a distribution network. Proceedings of the CSEE 21(1),
12–17 (2001)
6. Iwamura, K., Liu, B.: A genetic algorithm for chance constrained programming. Journal of
Information & Optimization Sciences 17(2), 40–47 (1996)
7. Zhao, R., Iwamura, K., Liu, B.: Chance constrained interger programming and stochastic
simulation based genetic algorithms. Journal of Systems Science and Systems Engineer-
ing 7(1), 96–102 (1998)
8. Chen, J.M.: Solving order problems with genetic algorithms based on stochastic simula-
tion. J. Chongqing Technol. Business Univ (Nat. Sci. Ed.) 22(2), 179–181 (2005)
9. Liu, B.: Dependent-chance goal programming and its genetic algorithm based approach.
Mathematical and Computer Modelling 24(7), 43–52 (1996)
10. Holland, J.H.: Genetic algorithms and the optimal allocations of trials. SIAM J. of Com-
puting 2, 8–105 (1973)
Rate Control, Routing Algorithm and Scheduling for
Multicast with Network Coding in Ad Hoc Networks
Abstract. In this paper, we developed a distributed rate control and routing algo-
rithm for multicast session in ad hoc networks. We studied the case with dynamic
arrivals and departures of the users. With random network coding, the algorithm
can be implemented, and work at transport layer to adjust source rates and at net-
work layer to carry out network coding in a distributed manner. The scheduling
element of our algorithm is a dynamic scheduling policy. Numerical examples are
provided to complement our theoretical analysis. Modeling and solution algorithm
can be easily tuned according to a specific networking technology.
1 Introduction
As a network layer problem, routing involves simply replicating and forwarding the
received packets by intermediate nodes in multi-hop networks. Network coding ex-
tends routing by allowing intermediate nodes to combine the information received from
multiple links in the subsequent transmissions and enables wired network connections
with rates that are higher than those achieved by routing only [1]. Subsequently, im-
portant progress has been made regarding the low-complexity construction of network
codes. Li et al. [2] showed that the maximum multicast capacity can be achieved by
performing linear network coding. Ho et al. [3], Jaggi et al. [4] and Sanders et al. [5]
showed that random linear network coding over a sufficiently large finite field can
(asymptotically) achieve the multicast capacity. Following these constructive theo-
retical results about network coding, Chou et al. [6] proposed a practical scheme for
performing network coding in real packet networks. Network coding has been extended
to wireless environments with distributed implementation ([6], [7], [8]).
In order to achieve high end-to-end throughout and efficient resource utilization, rate
control, routing and scheduling need to be jointly designed in ad hoc networks.
Cross-layer design is becoming increasingly important for improving the performance
of multihop wireless networks ([11], [12], [13], [14], [15]).
In this paper, we consider the problem of rate control and resource allocation
(through routing and scheduling) for multicast with network coding over a multi-hop
*
Corresponding author.
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 705–714, 2009.
© Springer-Verlag Berlin Heidelberg 2009
706 X.N. Miao and X.W. Zhou
2 Related Work
There are several works on rate control of cross-layer design over wireless ad hoc
networks and works on rate control of multicast flows. In this section, we will only
briefly discuss several references that are directly relevant to this paper. To the best of
our knowledge, this paper is the first effort to study cross-layer design for network
coding based multicasting in ad hoc networks.
The work in [13], [15], [16], [17] provides a utility–based optimization framework
to study rate control over ad hoc networks, the authors study joint rate control and
media access control for ad hoc wireless network, and formulate rate allocation as a
utility maximization problem with the constraints that arise from contention for channel
access. This paper studies the case with dynamic arrivals and departures of the users,
and extends this work to include routing and to study cross-layer design for network
coding based multicast.
With network coding, the work that are most similar to our work are [18], [19], [20].
What differentiates our work from others are the following: First, we extend this model
to ad hoc wireless networks. Second, our rate control algorithm is a dual subgradient
algorithm whose dual variables admit concrete and meaningful interpretation as con-
gestion prices. Third, the session scheduling in our cross-layer design is a dynamic
scheduling over ad hoc networks.
The rest of the paper is structured as follows. The system model and problem for-
mulation is presented in Section 3. We present a fully distributed cross-layer rate con-
trol algorithm in Section 4.In section 5, we describe a scheduling policy over ad hoc
networks, and the simulation results are presented in Section 6, and the conclusion is
given in Section 7.
G
vector of global power assignments and let c = [ cl , l ∈ L ] denote the vector of data
G
G
( )
rates. We assume that c = u P the data rates are completely determined by the global
power assignment. The function u (.) is called the rate-power function of the system.
G
Note that the global power assignment P and the rate-power function u (.) summarize
the cross-layer control capability of the network at both the physical layer and the MAC
layer. Precisely, the global power assignment determines the Signal- to- Interference-
Ratio (SIR) at each link. Let M denote the number of multicast sessions. Each session
has one source s
m
∈ N and a set Dm ⊂ N of destinations. Network coding allows
flows for different destinations of a multicast session to share network capacity by
being coded together: for a single multicast session m of rate x m ,information must
m
flow at rate x to each destination. The arrival process of any multicast session m
according to a Poisson process with rate λ m and that each node brings with it a file for
transfer whose size is exponentially distributed with mean 1/ µm . Thus, the traffic in-
tensity brought by sessions of class m is ρm = λm / µm . Consider a graph
G = (V , E ) with capacity set c and a collection of multicast ses-
sions S m = ( s m , d , x m ) , m = 1, " M , d ∈ Dm as the end-to-end traffic demands.
Let nm ( t ) ( m = 1, 2,", M ) denote the number of multicast sessions that are pre-
sent in the system, In the rate assignment model that follows, the evolution of
nm ( t ) will be governed by a Markov process. Its transition rates are given by:
nm ( t ) → nm ( t ) + 1, with rate λm , and
nm ( t ) → nm ( t ) − 1 , with rate µm xm ( t ) nm ( t ) , if nm ( t ) > 0 .
Let p ( t ) denote congestion cost at link l in the network at time t . As in [17], we say
m
l
as H → ∞ . This means that the fraction of time that the amount of “unfinished work” in
the system exceeds a certain level H can be made arbitrary small as H → ∞ . In other
words, the number of sessions at each source node and the queues at each link must be
finite. The capacity region of the network is defined to be the set of user arrival rate
vectors for which the network can be stabilized by some scheduling policy. The ca-
pacity region of a constrained queuing system, such as a wireless network, is well
characterized in [27]. Let A = ⎡⎣ Alr ⎤⎦ denote the multicast matrix, i.e., Alm = 1 if
m
l ∈ L and Alm = 0 otherwise. For our model, the capacity region is given by the set
708 X.N. Miao and X.W. Zhou
⎧⎪ ⎡ M Am λ ⎤ ⎫⎪
Λ = ⎨λ : ⎢ ∑ l mm ⎥ ∈ Co ( S ) ⎬
⎪⎩ ⎣ m=1 µm cl ⎦ l∈L ⎪⎭
Where Co ( S ) represents the convex hull of all link schedules S .
With coding the actual physical flow on each link need only be the maximum of the
individual destination's flows [1]. For the case of multiple sessions sharing a network,
achieving optimal throughout requires in some cases coding across sessions. However,
designing such codes is a complex and largely open problem. In this paper, we use
intra-session coding approach to solve the multiple sessions problem similar to [26].
In this case, these constrains of the set of feasible flow vectors can be expressed as
⎧ xm if i = s m ;
⎪ (1)
∑ f md ( ij ) − ∑ f md ( ji ) = ⎨− x m if i = d ;
j:( ij )∈L j:( ji )∈L ⎪0 otherwise.
⎩
0 ≤ f md ( ij ) ≤ g md ( ij ) , ∀d ∈ Dm , ∀ij ∈ L (2)
M
∑ g ( ij ) ≤ c ( ij ) , ∀d ∈ D
m =1
md
m
, ∀ij ∈ L (3)
In practice, the network codes can be designed using the approach of distributed random
linear network coding ( [21]). If (1)-(2) holds, each sink receives with high probability a
set of packets with linearly independent coefficient vectors, allowing it or decode.
In this paper we consider the problem of rate control (congestion control) over a
multihop wireless ad hoc network. We take into among signals transmitted simulta-
neously by multiple nodes and the fact that a single node's transmission can be received
by multiple nodes, referred to as wireless multicast advantage. We model the conten-
tion relation between subflows as the interference set of the link, i.e., the links in the
interference set contend each other. Now we let S ( l ) = m ∈ Sm l ∈ L ( m ) , ∀l ∈ L be { }
the set of multicast sessions that use link l . Note that ij ∈ L ( m ) if and only if m ∈ S ( l ) .
We denote by I S ( l ) the interference set of link l = ij , including the link l itself. This set
indicates which groups of subflows interfere with the subflows which go through the
link l . Because links included in the interference set I S ( l ) share the same channel re-
source c ( l ) of the link l , only one of the subflows going through link
( )
k k ∈ I S ( l ) may transmit at any given time. The accurate interference set of the link
can be constructed based on the SIR model as proposed in [22].
Rate Control, Routing Algorithm and Scheduling Multicast with Network Coding 709
∑ ∑
k∈I S ( l ) m∈S ( k )
ylm ≤ clm , ∀l ∈ L
The physical flow rate ylm for each multicast session m though
link l is y l
m
= m ax {A x m
lr
m
r }.
r
Note that network coding comes into action through the constraint (4). The utility
function is a concave function, and it is easy to prove that the network coding region
constraint (linear constraint) is a convex set. Therefore, solving the problem (5) is a
convex optimization problem. For a data with multiple multicast sessions, the maxi-
mum utility and its corresponding optimal routing strategy can be computed efficiently
and in a distributed fashion.
At time t , given congestion price p ( t ) , the source s m adjusts its sending rate ac-
m
cording to aggregate congestion price over the multicast tree Tr . This rate control
mechanism has the desired price structure and is an end-to-end congestion control
mechanism.
The solution to the cross-layered rate control problem is of the following form
similar to [28].
⎛ ⎞
⎜ wm ⎟ (6)
x ( t ) = x ( kT ) = min ⎜ L
m m
, Vm ⎟
⎜ Am p m kT ⎟
⎜∑ l l ( ) ⎟
⎝ l =1 ⎠
For kT ≤ t < ( k + 1) T , where Vm is the maximum data rate for users of class m .The
congestion costs are updated as
710 X.N. Miao and X.W. Zhou
plm ( ( k + 1) T ) = [ plm ( kT ) +
(7)
⎛M ( k +1)T ⎞
α l ⎜ ∑ Alm ∫ nm ( t ) x m ( kT ) dt − cl ( t ) T ) ⎟]+
⎝ m =1 kT
⎠
[] [
+
where α l is a positive scalar stepsize. . is a projection on 0, ∞) .
The scheduling (8) is a difficult problem for ad hoc network. Our algorithm is similar to
a distributed variant of the sequential greedy algorithm presented in [24].
• Over link l , send an amount of coded packets for the session
G
ml ( t ) = arg max ∑ pl
m(t )
at rate φ ( t ) .
m
l
5 Numerical Examples
In this section, we provide numerical examples to complement the analysis in the pre-
vious sections. We consider a simple ad hoc network with two multicast sessions
shown in Fig. 1. The network is assumed to undirected and a link has equal capacities in
1
both directions. And assume that session one with source node s and
destination x1 and x 2 , and session two with source node s 2 and
destination d and d with the same utility U m ( xm ) = log ( xm ) . We have chosen such a
1 2
( ) ( ) ( )
2 units of capacity, links s 2 , y , ( y, x1 ) , y, d 2 and y, x 2 have 3 units of capacity and
all other links have 1 units of capacity when active.
Rate Control, Routing Algorithm and Scheduling Multicast with Network Coding 711
Fig. 2 and Fig. 3 show the evolution of source rate and congestion price of each
session with stepsize α l = 0.01 in (7). We see that the source rates converge quickly to a
neighborhood of the corresponding stable values .
4
session 1
3.5 session 2
3
Average Source Rates
2.5
1.5
0.5
0
0 20 40 60 80 100 120 140 160 180 200
Number of Iterations
0.7
session 1
session 2
0.6
0.5
Average Congestion Prices
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100 120 140 160 180 200
Number of Iterations
The simulation result also shows coding occurs over the multicast trees in Fig. 4 and
( 2
)
Fig. 5: 2 units of traffic of session one is coded over link s , y and 2 units of traffic of
(
session two is coded over link y , x
1
).
Note that the routing depends not only on the network topology, which determines
the interference among links, but also link capacity configurations. So we can see that
( )
the link s 2 , x 2 is not used in Fig. 3. But the distributed dynamic scheduling always
picks the link with the locally heaviest weight, this feature such that the link ( s 2 , y ) is
always active due to it has a chance to be a locally heaviest link a lot of time.
6 Conclusions
We have presented a model for the joint design of distributed rate control algorithm,
routing and scheduling for multicast session with network coding in ad hoc networks.
With random network coding, the algorithm can be implemented in a distributed
manner, and work at transport layer to adjust source rates and at network layer to carry
out network coding. We study the case with dynamic arrivals and departures of the
users. The scheduling element of our algorithm is a dynamic scheduling policy. Nu-
merical examples are provided to complement our theoretical analysis. Modeling and
solution algorithm can be easily tuned according to a specific networking technology.
We will further study the practical implementation of our algorithm and extend the
results to networks with more general interference models. Solving this problem will
further facilitate the practical deployment of network coding in real networks.
Rate Control, Routing Algorithm and Scheduling Multicast with Network Coding 713
Acknowledgement
This work is supported in part by the National Science Foundation of P.R. China under
Grant no.60773074 and by the National High Technology Research and Development
Program of P. R. China under Agreement no.2009AA01Z209.
References
[1] Ahlswede, R., Cai, N., Li, S.Y.R., Yeung, R.W.: Network information flow. IEEE Trans.
Inform. Theory 46, 1204–1216 (2000)
[2] Li, S.Y.R., Yeung, R.W., Cai, N.: Linear network coding. IEEE Trans. Inform. Theory 49,
371–381 (2003)
[3] Ho, T., Koetter, R., Karger, M.D.R., Effros, M.: The benefits of coding over routing in a
randomized setting. In: Proc. Int’l Symp. Information Theory, Yokohama, Japan. IEEE,
Los Alamitos (2003)
[4] Jaggi, S., Chou, P.A., Jain, K.: Low complexity optimal algebraic multicast codes. In: Proc.
Int’l Symp. Information Theory, Yokohama, Japan. IEEE, Los Alamitos (2003)
[5] Sander, P., Egner, S., Tolhuizen, L.: Polynomial time algorithms for network information
flow. In: Symposium on Parallel Algorithms and Architectures (SPAA), San Diego, CA,
pp. 286–294. ACM, New York (2003)
[6] Lun, D.S., Ratnakar, N., Médard, M., Koetter, R., Karger, D.R.: Minimum-cost multicast
over coded packet networks. IEEE Trans. Inform. Theory (2006)
[7] Sagduyu, Y.E., Ephremides, A.: Joint scheduling and wireless network coding. In: Proc.
WINMEE, RAWNET and NETCOD 2005 Workshops (2005)
[8] Wu, Y., Chou, P.A., Kung, S.-Y.: Minimum-energy multicast in mobile ad hoc networks
using network coding. IEEE Trans. Commun., 1906–1918 (2005)
[9] Deb, S., Srikant, R.: Congestion control for fair resource allocation in networks with ulti-
cast flows. IEEE Trans. on Networking, 274–285 (2004)
[10] Kelly, F.P., Maulloo, A., Tan, D.: Rate control in communication networks: Shadow
prices, proportional fairness and stability. Journal of the Operational Research Society,
37–252 (1998)
[11] Huang, X., Bensaou, B.: On Max-min Fairness and Scheduling in Wireless Ad-Hoc Net-
works: Analytical Framework and Implementation. In: Proceedings of IEEE/ACM Mo-
biHoc, Long Beach, CA, October 2001, pp. 221–231 (2001)
[12] Sarkar, S., Tassiulas, L.: End-to-end Bandwidth Guarantees Through Fair Local Spectrum
Share in Wireless Ad-hoc Networks. In: Proceedings of the IEEE Conference on Decision
and Control, Hawaii (2003)
[13] Yi, Y., Shakkottai, S.: Hop-by-hop Congestion Control over a Wireless Multi-hop Net-
work. In: Proceedings of IEEE INFOCOM, Hong Kong (2004)
[14] Qiu, Y., Marbach, P.: Bandwith Allocation in Ad-Hoc Networks: A Price-Based Ap-
proach. In: Proceedings of IEEE INFOCOM, San Francisco, CA (2003)
[15] Xue, Y., Li, B., Nahrstedt, K.: Price-based Resource Allocation in Wireless Ad hoc Net-
works. In: Jeffay, K., Stoica, I., Wehrle, K. (eds.) IWQoS 2003. LNCS, vol. 2707, pp.
79–96. Springer, Heidelberg (2003)
[16] Lin, X., Shroff, N.B.: Joint Rate Control and Scheduling in Multihop Wireless networks.
Technical Report, Purdue University (2004),
http://min.ecn.purdue.edu/_linx/papers.html
714 X.N. Miao and X.W. Zhou
[17] Lin, X., Shroff, N.: The impact of imperfect scheduling on cross-layer rate control in mul-
tihop wireless networks. In: Proc. IEEE Infocom (2005)
[18] Wu, Y., Chiang, M., Kung, S.Y.: Distributed utility maximization for network coding
based multicasting: A critical cut approach. In: Proc. IEEE NetCod (2006)
[19] Wu, Y., Kung, S.Y.: Distributed utility maximization for network coding based multi-
casting: A shortest path approach. IEEE Journal on Selected Areas in Communications
(2006)
[20] Ho, T., Viswanathan, H.: Dynamic algorithms for multicast with intrasession network
coding. In: Proc. Allerton Conference on Communication, Control and Computing (2005)
[21] Chou, P.A., Wu, Y., Jain, K.: Practical network coding. In: Proc. Allerton Conference on
Communication. Control and Computing (2003)
[22] Gupta, P., Kumar, P.R.: The capacity of wireless network. IEEE Trans. on Information
Theory 46(2), 388–404 (2000)
[23] Shor, N.Z.: Monimization Methods for Non-Differentiable Functions. Springer, Heidel-
berg (1985)
[24] Preis, R.: Linear time 1/2-approximation algorithm for maximum weighted matching in
general graphs. In: Meinel, C., Tison, S. (eds.) STACS 1999. LNCS, vol. 1563, p. 259.
Springer, Heidelberg (1999)
[25] Jain, K., Padhye, J., Padmanabhan, V.N., Qiu, L.: Impact of interference on multi-hop
wireless network performance. In: Proc. ACM Mobicom (2003)
[26] Ho, T., Viswanathan, H.: Dynamic algorithms for multicast with intra-session network
coding. In: Proc. 43rd Annual Allerton Conference on Communication. Control and
Computing (2005)
[27] Tassiulas, L., Ephremides, A.: Stability properties of constrained queuing systems and
scheduling policies for maximum throughout in multihop radio networks. IEEE Trans. on
Automatic Control 37(12), 1936–1948 (1992)
[28] Lin, X., Shroff, N.B.: On the stability region of congestion control. In: Proceedings of the
42nd Annual Allerton Conference on Communication. Control and Computing (2004)
Design and Experimental Study on Spinning Solid
Rocket Motor
State Key Laboratory of Explosion Science and Technology, Beijing Institute of Technology,
Beijing, China
xh930@163.com, {jiangchuwh,wangskyshark}@bit.edu.cn
Abstract. The study on spinning solid rocket motor (SRM) which used as power
plant of twice throwing structure of aerial submunition was introduced. This kind
of SRM which with the structure of tangential multi-nozzle consists of a com-
bustion chamber, propellant charge, 4 tangential nozzles, ignition device, etc.
Grain design, structure design and prediction of interior ballistic performance
were described, and problem which need mainly considered in design were
analyzed comprehensively. Finally, in order to research working performance of
the SRM, measure pressure-time curve and its speed, static test and dynamic test
were conducted respectively. And then calculated values and experimental data
were compared and analyzed. The results indicate that the designed motor op-
erates normally, and the stable performance of interior ballistic meet demands.
And experimental results have the guidance meaning for the pre-research design
of SRM.
1 Introduction
Solid rocket motor (SRM) which has characteristics of small size, reliable operation,
simple structure, low cost and long-term preservation, is widely used in all kinds of
small-sized, short range military rocket and missile power plant. In recent years, a
series of studies on SRM have been made, and these studies are important for the
construction of structure design, internal flow field simulation and improvement of
interior ballistic performance [1], [2], [3], [4].
In this paper, spinning SRM is used as the power plant of twice throwing structure of
aerial submunition. Its structural characteristic is 4 tangential nozzles distributing
uniform on the circumference of the motor case, the thrust generated by nozzle acting
along tangential direction of the rocket body and then the motor rotates. Depending on
the centrifugal force generated by the rotation, bullets were dispersed in a large area.
Meanwhile, rotation ensures the flight stability of submunition.
Firstly, according to technical parameters the general scheme of SRM was deter-
mined, and three-dimensional overall structure design was achieved by using Inventor
software. Prediction of interior ballistic performance was carried out after the deter-
mination of overall structure design and grain design. Finally, static test and dynamic
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 715–723, 2009.
© Springer-Verlag Berlin Heidelberg 2009
716 H. Xue, C. Jiang, and Z. Wang
test were conducted, and the test results which were compared and analyzed under
different conditions, providing reference for similar SRM design.
Combustion chamber is one of the most important components of SRM, and it provides
the place of storage and combustion for propellant charge. Furthermore, the combus-
tion chamber bears high temperature, high pressure and under the action of various
loads. At the same time, the shell of combustion chamber is opened holes to install
nozzles. In order to ensure combustion chamber still work reliably under the above bad
working conditions, the combustion chamber should have characteristics of sufficient
strength and rigidity, reliable connection and seal, light structural weight etc. Based on
the above considerations, shell material uses 35CrMnSi, connection of shell and noz-
zle, shell and closure head adopts screw connection.
Design and Experimental Study on Spinning Solid Rocket Motor 717
Through simplifying the working process of SRM, mass conservation equation and
state equation of the combustion chamber were established. Simplification is as fol-
lows: (1) Combustion chamber gas is ideal gas and heat loss is constant in combustion
process; (2) Propellant charge is complete combustion and the burning temperature is
unchanged; (3) The charge is ignited under the pressure of ignition pressure; (4) Do not
consider effects of erosive burning or the change of burning surface caused by scour-
ing. The charge obeys geometric burning law in combustion process.
Seriously, interior ballistic calculation is relevant to varying with time of flow pa-
rameters which distributing along the length of combustion chamber. Then
one-dimensional unsteady flow equations seemed to be used to determine the func-
tional relation between time and flow parameters such as gas pressure, temperature,
718 H. Xue, C. Jiang, and Z. Wang
Generally speaking, pressure rising stage is an unsteady state and it reflects motor’s
igniting process. The pressure rapidly increases to close to equilibrium state in a very
short time. Equation of this stage is as follows [5]:
1
⎧ ⎡ ⎛ p ⎞1− n ⎤ − (1− n )φ2 At Γ RTeq ⎫1− n
⎪ t ⎪
pc = peq ⎨1 − ⎢1 − ⎜ ⎟ ⎥e
(1)
ig
⎬
Vc
⎢
⎪⎩ ⎣ ⎝ peq ⎥
⎠ ⎦ ⎪⎭
Where pc is combustion chamber pressure, pa; peq is average pressure, pa; pig is igni-
tion pressure, pa; e is grain web thickness, mm; n is pressure index; φ2 is correction
coefficient; Γ is specific heat ratio function; t is working time, s; VC is initial chamber
volume, m3 ; R is molar gas constant, J ⋅ mol −1 ⋅ K −1 ; Teq is equilibrium temperature,
K ; At is nozzle exit area, m 2 .
Because of the designed tubular grain with ends coated burns on the internal and
external surface simultaneously, the burning surface is neutral burning. The stage of
pressure balance can be calculated as:
Vc a dpc
pc = peq (
1− n )
− (2)
xφ2 Γ At kC de
2 ∗
⎡ 2Vcf ⎤ 2k
pc = peq ⎢ ⎥ (3)
⎢⎣ 2Vcf + ( k − 1) RTeq φ2 At Γt ⎥⎦ k − 1
k +1
⎛ 2 ⎞ 2( k −1)
Γ= k⎜ ⎟ (4)
⎝ k +1 ⎠
7
6
Pressure(MPa)
5
4
3
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0
Time(s)
4 Experimental Studies
In order to study working performance of the spinning SRM and measure pressure-time
curve and speed, static test and dynamic test were conducted respectively.
the assembled spinning SRM is fixed with bolts in a test stand. By using pressure
sensor, pressure of the combustion process was converted into electrical signals. And
combustion chamber pressure and working time were measured by data acquisition
system.
The test motor consists of a combustion chamber, propellant charge, 4 tangential
nozzles, charge baffle plates, igniter cartridge, etc. Combustion chamber shell material
uses 35CrMnSi, connection of shell and nozzle, shell and closure head adopt screw
connection. Charge baffle plate is made of low carbon steel. As the motor’s working
time is short, no additional thermal protection measures are need. The motor uses the
ways of electric ignition, igniter charge bag is placed in front of the grain, and igniting
charge is 2# small powder grain. The charge is double base propellant grain, tubular
grain with seven columns are used.
The grains which two ends coated by 2 millimeter thick nitrocellulose lacquer, burn
on internal and external surface simultaneously. The ignition charge mass of first and
second test are 15gram and 20gram respectively.
Pressure sensor (model ZQ-Y1) was used, and the main technical specifications are
as follows, range: 20MPa, accuracy: 0.3%FS, power: 5-15VDC, output: 1.3672mV/V.
8
7
Pressure(MPa)
6
5
4
3
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0
Time(s)
7
6
Pressure(MPa)
5
4
3
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0
Time(s)
The results also indicate that the designed motor work well, structure is reliable, the
thread connections have no loosening, combustion chamber and nozzles have no de-
formation, and the technical requirements are met.
Table 1 presents the first and second experimental data, and Table 2 presents a
comparison of calculated values and experimental data. As can be seen in Table 2, the
designed average thrust Fn and average pressure Pn is lower than experimental data, and
the burning time Tb is longer than experimental data. The main reason of this discrep-
ancy is that the actual burning rate is higher than theoretical value. And this causes
burning time shortened and pressure reduced.
In order to check dynamic performance of the spinning SRM, dynamic test was
conducted. As shown in Fig. 7, with bearings installed on the top, the motor was sus-
pended. By using high speed camera, dynamic working process of the motor was re-
corded and the speed was measured. According to high speed photographic records, the
speed is up to 15000r/min without load and meets the design requirements.
The results indicate that the spinning SRM work well, and have no abnormal sounds.
However, high-speed rotation has great influence on working performance of the
spinning SRM. Compared to static situation, the ablation of nozzle is more serious in
rotation conditions. Meanwhile, the centrifugal force generated by the rotation caused
metal oxides deposited on the motor case. And this may lead to thermal damage of head
structure.
5 Conclusions
Design and experimental methods of spinning SRM were preliminarily studied in the
paper. By comparing with calculated values and experimental data, the following re-
sults were obtained:
(1) The ignition charge mass can be of significant effect on the pressure-time curves
and internal ballistic performances of the spinning SRM. And in order to obtain satis-
fied internal ballistic performances, the ignition charge mass must be strictly
controlled.
Design and Experimental Study on Spinning Solid Rocket Motor 723
(2) By comparing Fig. 3 with Fig. 6, and from Table.2 we can see that calculated
values accord with experimental data, and the calculation method presented in the
paper is simple and practicable.
(3) The experimental results indicate that the designed motor operated normally, and
the stable performance of interior ballistic meet demands.
(4) High-speed rotation has great influence on working performance of the spinning
SRM.
References
1. Kamm, Y., Gany, A.: Solid Rocket Motor Optimization. In: 44th AAIA/ASME/SAE/ASEE
Joint Propulsion Conference and Exhibit, AIAA-2008-4695, Hartford, CT (2008)
2. French, J., Flandro, G.: Linked Solid Rocket Motor Combustion Stability and Internal Bal-
listic Analysis. In: 41st AAIA/ASME/SAE/ASEE Joint Propulsion Conference and Exhibit,
AIAA-2005-3998, Tucson, Arizona (2005)
3. Shimada, T., Hanzawa, M.: Stability Analysis of Solid Rocket Motor Combustion by Com-
putational Fluid Dynamics. AIAA Journal 46(4), 947–957 (2008)
4. Willcox, M.A., Brewster, Q., Tang, K.C.: Solid Rocket Motor Internal Ballistics Simulation
Using Three-Dimensional Grain Burnback. Journal of Propulsion and Power 23(3), 575–584
(2007)
5. Zhang, P., Zhang, W.S., Gui, Y.: Principle of Solid Rocket Motor. Beijing Institute of Sci-
ence and Technology Press, Beijing (1992) (in Chinese)
An Energy-Based Method for Computing Radial Stiffness
of Single Archimedes Spiral Plane Supporting Spring
Enlai Zheng1, Fang Jia1, Changhui Lu1, He Chen1, and Xin Ji2
1
School of Mechanical Engineering, Southeast University, Nanjing, 211189, China
2
Nanjing Jienuo Environment Technology Co.Ltd, Nanjing, 210014, China
xx_xx1111@163.com, fangjia1988@yahoo.com, magiclu2007@126.com,
sjzyzch2222@yahoo.com.cn, njjn@Njjinuo.com
1 Introduction
Precision instrument and system for aeronautics and astronautics usually endure
stronger impact or vibration under hostile environments. In order to obtain longer life
and higher reliability, it is essential to introduce effective measures for vibration re-
duction and isolation. Due to certain factors like constraints of assembly space, etc,
common isolation components usually fail to meet fully the requirements of vibration
isolation. Therefore, with its space-based adaptive performance of smaller volume,
lighter weight, lower stiffness, greater deformation energy and better static and dy-
namic characteristics, the plane supporting spring begins to attract attention recently.
The performance of plane supporting spring is mainly determined by its radial stiff-
ness, and environmental vibration varies from case to case in its requirements for radial
stiffness of springs, the study of which thus becomes an important issue. Currently, the
general way of obtaining the radial stiffness of plane supporting spring is to calculate
by the finite element simulation, or to measure by experiment. In our previous studies,
Jia Fang et al [1] give the relationship between turn number and radial stiffness by FEA;
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 724–733, 2009.
© Springer-Verlag Berlin Heidelberg 2009
An Energy-Based Method for Computing Radial Stiffness 725
Chen Nan et al [2] [3] carry out the FEA and experimental research on the stiffness of the
spring; and Gao Weili et al [4] study the stiffness and stress performance of involutes of
circle applied to linear compressor by ANSYS. Besides, other researchers, like A. S.
Gaunekar et al[5], also make a study on a kind of structural supporting spring by FEA.
Nonetheless, finite element analysis is constrained by the need of hardware and soft-
ware support, and experimental method tends to have the deficiencies of long cycle,
high cost and complicated process, etc.
As far as we know, no attempts have ever been reported to have taken an energy-
based approach to calculating the radial stiffness of plane supporting spring until our
current study, in which the X dimensional radial stiffness formula of single spiral
plane supporting spring is derived by means of energy approach and its validity is
verified by finite element simulation.
1 Thickness h
2 Outer support outer diameter Doo
3 Outer support inner diameter Doi
4 Inner support outer diameter Dio
5 Inner support inner diameter Dii
6 Length of inner connecting line Li
7 Length of outer connecting line Lo
8 Width of flexible spiral spring line b
9 Number of flexible spiral spring n1
10 Turn number of flexible spiral spring line n2
The flexible spiral spring line is composed of two Archimedes spirals with differ-
ent starting points. Width of flexible spiral spring line can be adjusted by regulating
the starting points.
The inner and outer connecting lines are designed to prevent stress concentration.
The outer support is used for spring location; the electronic instrument is tightened on
the inner support. The key parameters for radial stiffness include thickness, outer
support inner radius, inner support outer radius, width of flexible spiral spring line,
and number of flexible spiral spring. The influence of each parameter on radial stiff-
ness is analyzed by control variate method.
dW = FN (θ )d ( ∆l ) * + M (θ )dθ * + FS (θ ) d λ * (3)
where FN (θ ) , M (θ ) and FS (θ ) stand for axial force, flexural moment and shearing
force under external force F respectively.
The total visual works obtained by the integration of Eq.3 is shown in the follow-
ing equation.
An Energy-Based Method for Computing Radial Stiffness 727
With the displacement under external force F in X direction as the visual displace-
ment on condition that a unit force is added in X direction of the single helix plane
supporting spring, Eq.6 can also be reformulated into Eq.7.
∆ = ∫ FN (θ )d ( l ) + ∫ M (θ )dϕ + ∫ FS (θ )d λ (7)
where FN (θ ) ,M (θ ) and FS (θ ) stand for axial force, flexural moment and shear-
ing force under unit force respectively and d ( l ) , dϕ and d λ for relative axial
displacement, relative angle and relative dislocation, of which
M (θ ) = F isin θ i ρ
M (θ ) = sin θ i ρ
M (θ )
dϕ = i ρ iθ
EI
Being of Archimedes spiral, we have
ρ = Rio + tθ (8)
As the plane supporting spring mainly endures bending deformation, the strain energy
generated by axial tension and compression and shear can be ignored, unlike that
generated by bending deformation. Therefore,
∫F N (θ )d ( l ) =0
∫ F (θ )d λ =0
S
F (Rio + tθ )4 1
= ( − sin(2θ )(Rio + tθ )3 − (9)
2EI 4t 2
3t 3t 2 3t3
cos(2θ )(Rio + tθ )2 + sin(2θ )(Ri0 + tθ ) + cos(2θ )) θ2Liπ −θLo
4 4 8
Of which
t = ( Roi − Rio − b ) / 2π
θ Li = Li / t
θ Lo = b / t
An Energy-Based Method for Computing Radial Stiffness 729
Value
No. Parameter Symbol
( mm )
1 Thickness h 1
The finite element model and the displacement chart are demonstrated in Figures 4
and 5.
It can be seen from the Fig.5 that under the action of unit force in the X direction,
the value of displacement of finite element simulation is 1.4971mm, while the theo-
retical displacement by radial stiffness formula is 1.61334mm.
Fig. 6. The radial stiffness of theoretical calculation and finite element simulation varying with
changing width of flexible spiral spring line
It can be seen from the graph that under the action of radial unit force,the stiffness
goes up with the increasing width of flexible spiral spring line. Radial stiffness in
theory increases from 0.3125 N/mm to 0.4958 8 N/mm, and radial stiffness of finite
element simulation from 0.3368 N/mm to 0.48652 N/mm.The curves are cubic. The
results from theoretical calculation match those from finite element simulation.
2) Changing inner Support Outer Radius from 5mm to 11mm by the Step of 1mm
The radial stiffness curves of theoretical calculation and finite element simulation
with the changes of the inner support outer radius of plane supporting spring are
shown in Fig.7.
As revealed in Fig.7, the stiffness declines with the increasing inner support outer
radius of plane supporting spring under the action of X direction force. Radial stiff-
ness in theory decreases from 0.61985N/mm to 0.3269 N/mm, and radial stiffness of
finite element simulation from 0.6680N/mm to 0.4178 N/mm. The deviation between
the results of theoretical calculation and simulation turns narrower in that Eq.10 ig-
nores the impact of inner support energy in the calculation.
3) Changing thickness from 0.2mm to 0.8mm by the step of 0.1mm
The thickness-radial stiffness curves of theoretical calculation and finite element
simulation are shown in Fig.8.
This graph indicates that the radial stiffness ascends linearly with the increment of
thickness under the action of X direction force. Radial stiffness in theory increases
from 0.5N/mm to 0.8671 N/mm, and radial stiffness of finite element simulation from
0.5343N/mm to 0.9696 N/mm. The deviation between theoretical calculation and
simulation turns narrower due to the same reason mentioned above.
732 E. Zheng et al.
Fig. 7. The radial stiffness of theoretical calculation and finite element simulation varying with
changing inner support outer radius of plane supporting spring
Fig. 8. The radial stiffness of theoretical calculation and finite element simulation varying with
changing thickness of plane supporting spring
5 Conclusions
With wider application of the plane supporting spring, an ideal theoretical formula for
calculating radial stiffness grows more and more important in the process of its design.
This study has proposed a brand-new energy-based method for computing the radial
stiffness of single spiral plane supporting spring, with the radial stiffness formula well
An Energy-Based Method for Computing Radial Stiffness 733
established theoretically. The results generated by the formula are basically consistent
with those from finite element simulation. Comparing both results makes it possible to
derive the general discipline of the radial stiffness of single spiral plane supporting
spring in relation to changes of the spring’s key parameters. And it may fairly be as-
sumed that our study has provided a solid theoretical basis for further research on the
multi-spiral/turn plane supporting spring as well.
Acknowledgement
We would like to acknowledge the contributions of a number of staff with Jienuo
Environment Technology. Special thanks go to their valuable work and, especially, to
their company, Nanjing Jienuo Environment Technology Co., Ltd., which provided us
with whatever we needed for the research and experiments. We are also greatly in-
debted to Professor Li Lingzhen for her successful organization and great support for
the project.
References
[1] Jia, F., Zhang, D., Zhang, Z.: Speedy stiffness modeling and designing of plane supporting
spring. In: The 15th International Conference on Mechatronics and Machine Vision in
Practice (M2VIP 2008), Auckland, New Zealand, pp. 209–214 (2009)
[2] Chen, N., Chen, X., Wu, Y.N., Yang, C.G., Xu, L.: Spiral profile design and parameter
analysis of flexure spring. Cryogenics (46), 409–419 (2006)
[3] Nan, C., Xi, C., Yinong, W., Lie, X., Chunguang, Y.: Performance analysis of spiral flex-
ure bearing. Cryogenics and Superconductivity 33(4), 5–8 (2005)
[4] Weili, G., Pengda, Y., Guobang, C.: Influence of geometry parameters on the performance
of flexure bearing. Cryogenics (6), 8–11 (2007)
[5] Gaunekar, A.S., Goddenhenrich, T., Heiden, C.: FiniteElement nalysis and Testing of
Flexure Bearing Elements. Cryogenics 36(5), 359–364 (1996)
[6] Liu, H.-w.: Mechanics of Materials (II). Higher Education Press, Beijing (2004)
[7] Jiang, S., Jia, F., Zhang, D., Wang, X.: Parameterized Modeling Technology of Plane Sup-
porting Spring Based on APDL. Equipment for Electronic Products Manufacturing 37(12),
46–49 (2008)
Erratum to: Implementation of On/Off Controller for
Automation of Greenhouse Using LabVIEW
H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 251–259, 2009.
© Springer-Verlag Berlin Heidelberg 2009
_______________________________________________
DOI 10.1007/978-3-642-05253-8_80
_______________________________________________
The original online version for this chapter can be found at
http://dx.doi.org/10.1007/978-3-642-05253-8_28
_______________________________________________
Author Index