0% found this document useful (0 votes)
130 views570 pages

Real Time Fault Monitoring of Industrial Processes

Uploaded by

Bel Abd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views570 pages

Real Time Fault Monitoring of Industrial Processes

Uploaded by

Bel Abd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 570

Real Time Fault Monitoring of Industrial Processes

International Series on
MICROPROCESSOR-BASED AND
INTELLIGENT SYSTEMS ENGINEERING

VOLUME 12

Editor
Professor S. G. Tzafestas, National Technical University, Athens, Greece

Editorial Advisory Board


Professor C. S. Chen, University 0/ Akron, Ohio, US.A.
Professor T. Fokuda, Nagoya University, Japan
Professor F. Harashima, University o/Tokyo, Tokyo, Japan
Professor G. Schmidt, Technical University 0/ Munieh, Germany
Professor N. K. Sinha, McMaster University, Hamilton, Ontario, Canada
Professor D. Tabak, George Mason University, Fairjax, Virginia, US.A.
Professor K. Valavanis, University 0/ Southem Louisiana, La/ayette, US.A.
Real Tillle Fault Monitoring
of Industrial Processes
by

A. D. POULIEZOS
Technical University 0/ Crete,
Department 0/ Production Engineering and Management,
Chania, Greece
and
G. S. STAVRAKAKlS
Technical University 0/ Crete,
Electronic Engineering and Computer Science Department,
Chania, Greece

SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.


Library of Congress Cataloging-in-Publication Data

Poullezos. A. 0 .• 1951-
Real tlme fault monltoring of industrial processes I by A.D.
Pouliezos and G.S. Stavrakakls.
p. cm. -- (Internatlonal series on mlcroprocessor-based and
lntelligent systems engineering: v. 12)
Includes bibllographical references and indexes.
ISBN 978-90481-4374-0 ISBN 978-94-015-8300-8 (eBook)
DOI 10.1007/978-94-015-8300-8
1. Fault location (Engineering) 2. Process control. 3. Quality
control. I. Stavrakakis. G. S .• 1958- II. Serles.
TA189.8.P88 1994
870.42--dc20 94-2137

ISBN 978-90-481-4374-0

Printed on acid-free paper

All Rights Reserved


© 1994 Springer Science+Business Media Dordrecht
Originally published by Kluwer Academic Publishers in 1994
Softcover reprint ofthe hardcover 1st edition 1994
No part of the material protected by this copyright notice may be reproduced or
utilized in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner.
Table 0/ contents
he/ace ......................................................................................................................... xi
List 0/figures................. .................. .... ..... ..... .... .... ........ ..... .... ... .... ............ ....... ..... ... xv
List 0/ tables .............................................................................................................. xxi
Introduction ............................................................................................................. xxiii

CHAPTER 1
FAULT DETECTION AND DIAGNOSIS METHODS IN THE
ABSENCE OF PROCESS MODEL
1.1 Introduction .................................................................................................. 1
1.2 Statistical aids for fault occurrence decision making ....................................... 2
1.2.1 Tests on the statistical properties of process characteristic
quantities ............................................................................................ 2
1.2.1.1 Limit checking fauIt monitoring in electrical drives ................ 20
1.2.1.2 Steady-state and drift testing in a grinding-classification
circuit.................................................................................... 21
1.2.1.3 Conclusions ........................................................................... 24
1.2.2 Process Control Charts .......................................................................................... 26
1.2.2.1 An application example for Statistieal Proeess Control
(SPC) ................................................................................... 40
1.2.2.2 ConcIusions ........................................................................... 42
1 3 Fault diagnosis based on signal analysis instrumentation .............................. 43
1.3 I Machine health monitoring methods ................................................... 43
1.3.2 Vibration and noise analysis applieation examples ............................. 64
1.3.3 ConcIuslOns ....................................................................................... 77
References . ... ................................................................................................ 78
Appendix 1 A..................... .............................................................. ............... 82
Appendix I.B ................................................................................................... 87

CHAPTER2
ANALYTICALREDUNDANCYMETHODS
2.1 Introduction ................................................................................................ 93
2.2 Plant and failure models .............................................................................. 94
2.3 Design requirements ................................................................................... 97
2.4 Mcthods of solution ..................................................................................... 98
2.5 Stochastic l110deling methods ..................................................................... 102
2.5.1 Simple tests ..................................................................................... 103
2.5 1.1 Tests ofmean ...................................................................... 104
2.5.1.2 Tests of covariance .............................................................. 105

v
vi Real time fault monitoring of industrial processes

2.5.1.3 Tests ofwhiteness ................................................................ 110


2.5.1.4 Two-stage methods .............................................................. 111
2.5.2 The Multiple Model (MM) method .................................................. 113
2.5.3 The Generalized Likelihood Ratio (GLR) method ............................. 116
2.5.3.1 Additive changes .................................................................. 116
2.5.3.2 Non-additive changes ........................................................... 120
2.6 Deterministic methods ............................................................................... 122
2.6.1 Observer-based approaches ............................................................. 122
2.6.2 Parity space approach ....................... ,............................................. 129
2.7 Robust detection methods .......................................................................... 136
2.7.1 Robust observer-based methods ....................................................... 136
2.7.2 Parity relations for robust residual generation ..................................... 149
2.8 Applications .............................................................................................. 153
2.8.1 Fault detection in ajet engine system ............................................... 153
2.8.2 Applications in transportation engineering ........................................ 156
2.8.3 Applications in aerospace engineering .............................................. 161
2.8.4 Applications in automotive engineering ............................................ 166
2.8.5 Applications in robotics ................................................................... 170
References ...................................................................................................... 172

CHAPTER 3
PARAMETER ESTIMATION METHODS FOR FAULT MONITORING
3.1 Introduction .............................................................................................. 179
3.2 Process modeling for fault detection ........................................................... 182
3.3 Parameter estimation for fault -detection ..................................................... 186
3.3.1 Recursive least squares algorithms ................................................... 187
3.3.2 Forgetting factors ............................................................................ 191
3.3.3 Implementation issues ...................................................................... 196
3.3.3.1 Covariance instability ......................................................................... 196
3.3.3.2 Covariance singularity......................................................................... 200
3.3.3.3 Speed - Fast algoritluns ..................................................................... 202
3.3.3.4 Data weights selection ........................................................................ 205
3.3.5 Robustness issues ............................................................................ 211
3.4 Decision rules ........................................................................................... 218
3.5 Practical examples .................................................................................... 224
3.5.1 Evaporator fault detection................................................................ 224
3.5.2 Gas turbine fault detection and diagnosis ......................................... 228
3.5.3 Fault detection for electromotor driven centrifugal pumps ................ 231
3.5.4 Fault detection in power substations ................................................. 237
3.5.5 Fault diagnosis in robotic systems .................................................... 242
3.6 Additional references ................................................................................. 246
Appendix 3.A.................................................................................................. 247
Appendix 3.B .................................................................................................. 249
References .. '" ................................................................................................. 250
Table of eontents VII

CHAPTER4
AUTOAfATIC EXPERT PROCESS FAULT DIAGNOSIS AND
SUPERVISION
4.1 Introduction .............................................................................................. 256
4.2 Nature of automatie expert diagnostie and supervision systems .................. 257
4.2.1 Expert systems for automatie process fault diagnosis ....................... 257
4.2.1.1 The tenninology of knowledgc engineering ................................. 257
4.2.1.2 Teehniques for knowledge aequisition ........................................... 261
4.2.1.3 Expert system approaehes for automatie process fault
diagnosis ..................................................................................................... 271
4.2.1.4 High-speed implementations of rule-based diagnostie
systems ....................................................................................................... 277
4.2.1.5 Validating expert systems ................................................................. 283
4.2.2 Event-based arehitecture for real-time fault diagnosis ....................... 284
4.2.3 Curve analysis teehniques for real-time fault diagnosis ..................... 287
4.2.4 Real-time fault detection using Petri nets .......................................... 291
4.2.5 Fuzzy logie theory in real-time process fault diagnosis ..................... 297
4.3 Application exarnples ................................................................................ 301
4.3.1 Automatie expert diagnostie systems for nuclear power plant
(NPP) safety................................................................................... 301
4.3.1.1 Diagnostie expert systems for NPP safety .................................... 301
4.3.1.2 Fuzzy reasoning diagnosis for NPP safety ..................................... 305
4.3.2 Automatie expert fault diagnosis ineorporated in a process
SCADA system ............................................................................... 311
4.3.3 Expert systems for quiek fault diagnosis in the meehanieal and
electrical systems domains ............................................................... 328
4.3.4 Automatie expert fault diagnosis for maehine tools, robots and
CIM systems ................................................................................... 335
4.4 Conclusions .............................................................................................. 343
References ...................................................................................................... 346
Appendix 4.A A generie hybrid reasoning expert diagnosis model .................. 352
Appendix 4.B Basie definitions of place/transition Petri nets and their use
for on-line process failure diagnosis ......................................... 360
Appendix 4.C Analytieal expression for exception using fuzzy logie and its
utilization for on-line exeeptional events diagnosis ................... 364

CHAPTERS
FAULT DIAGNOSIS USING ARTIFICIAL NEURAL NETWORKS
(ANNs)
5.1 Introduction .............................................................................................. 369
5.2 Introduction to neural networks ................................................................. 372
5.3 Charaeteristies of Artifieial Neural Networks ............................................ 374
5.4 ANN topologies and leartiing strategies ..................................................... 378
5.4.1 Supervised learning ANNs ............................................................... 378
viii Real time fault monitoring of industrial processes

5.4 .1.1 Mu1tilayer, feedforward networks ......................................... 379


5.4.1.2 Recurrent high-order neural networks (RHONNs) ................ 383
5.4.2 Unsupervised learning ..................................................................... 385
5.4.2.1 Adaptive Resonance Architectures (ART) ............................ 385
5.4.2.2 Kohonen maps ..................................................................... 390
5.5 ANN-based fault diagnosis ........................................................................ 392
5.5.1 Choice of neural topology ................................................................ 392
5.5.2 Choice of output fauIt vector and cIassification procedure ................ 393
5.5.3 Training sampie design .................................................................... 395
5.6 Application examples ................................................................................ 395
5.6.1 Applications in chemical engineering ............................................... 396
5.6.2 Applications in CIM ........................................................................ 401
5.6.3 Power systems diagnosis .................................................................. 404
5.6.4 Neural four-parameter controller ..................................................... 407
5.6.5 Application of neural networks in nuclear power plants
monitoring ....................................................................................... 410
5.7 The integration ofneural networks in real-time expert systems ................... 419
5.7.1 The AI components ......................................................................... 421
References ...................................................................................................... 423

CHAPTER6
IN-TIME FAlL URE PROGNOSIS AND FATIGUE LIFE PREDICTION
OF STRUCTURES
6.1 Introduction .............................................................................................. 430
6.2 Recent non-destructive testing (NDT) and evaluation methods with
applications ............................................................................................... 431
6.2.1 Introduction..................................................................................... 431
6.2.2 The main non-destructive testing methods ........................................ 435
6.2.2.1 Liquid penetrant inspection................................................... 435
6.2.2.2 Magnetic particle inspection ................................................. 436
6.2.2.3 Electrical test methods (eddy current testing (ECT )) ............ 438
6.2.2.4 Ultrasonic testing ................................................................. 440
6.2.2.4 Radiography ........................................................................ 449
6.2.2.5 Acoustic emission (AE) ........................................................ 451
6.2.2.6 Other non-destructive inspection techniques .......................... 452
6.2.3 Signal processing (SP) for NDT ..................................................... .456
6.2.4 Applications of SP in automated NDT ............................................ 459
6.2.5 Conclusions ..................................................................................... 461
6.3 Real-time structural damage assessment and fatigue life prediction
methods .................................................................................................... 463
6.3.1 Introduction ..................................................................................... 463
6.3.2 Phenomenological approach for fatigue failure prognosis ................. 464
6.3.3 Probabilistic fracture mechanics approach for FCG life
estimation ........................................................................................ 467
6.3.4 Stochastic process approach for FCG life prediction ........................ 478
6.3.5 Time series analysis approach for FCG prediction ............................ 482
Table of contents ix

6.3.6 Intelligent systems for in-time structural damage assessment ............ 488
6.4 Application examples ........................................................................... ..... 506
6.4.1 Nuclear reactor safety assessment using the probabilistic fracture
mechanics method ...................................................... ..................... 506
6.4.2 Marine structures safety assessment using the probabilistic
fracture mechanics method .............................................................. 509
6.4.3 Structural damage assessment using a causal network ...................... 519
References ......................................................................................... ............. 523
AuthoT index ............................................................................................................ 529
Subject index ............................................................................................................ 535
Preface

Tbis book is basicaUy concemed with approaches for improving safety in man-made
systems. We caU these approaches, coUectively, fault monitoring, since they are
concemed primarily with detecting faults occurring in the components of such systems,
being sensors, actuators, controUed plants or entire strucutures. The common feature of
these approaches is the intention to detect an abrupt change in some characteristic
property of the considered object, by monitoring the behavior of the system. This
change may be a slow-evolving effect or a complete breakdoWD.
In tbis sense, fault monitoring touches upon, and occasionaUy overIaps with, other areas
of control engineering such as adaptive control, robust controller design, reIiabiIity and
safety engineering, ergonomics and man-macbine interfacing, etc. In fact, a system
safety problem, could be attacked from any of the above angles of view. In tbis book,
we don't touch upon these areas, unless there is a strong relationship between the fauIt
monitoring approaches discussed and the aforementioned fields.
When we set out to write tbis book, our aim was to incIude as much material as possible
in a most rigorous, unified and concise format. Tbis would incIude state-of-the-art
method as weil as more cIassical techniques, stilI in use today. AB we proceeded in
gathering material, however, it soon became apparent that these were contradicting
design criteria and a trade-off had to be made. We believe that the completeness vs.
compactness compromise that we made, is optimal in the sense that we have covered the
majority of available methodologies in such a way as to give to the researcbing engineer
in the academia or the professional engineer in industry, a starting point for the solution
to his/her fault detection problem. Specifically, tbis book may be ofvalue to workers in
the foHowing fields:
• Automatic process control and supervision.
• Statistical process contro!.
• Applied statistics.
• Quality contro!.
• Computer-assisted predictive maintenance and plant monitoring
• Structural reliability and safety.
The book is structured according to the main categories of fault monitoring methods, as
considered by the authors: cIassical techniques, model-based and parameter estimation
methods, knowledge- and rule-based methods, techniques based on artificial neural
networks plus a special chapter on safety of structures, as a result of our involvement in
tbis related field. The various methods are complemented with specific applications from
industrial fields, thus justifying the title of the book. Wherever appropriate, additional
references are summarized, for the sake of completeness. Consequently, it can also be
used as a textbook in a postgradute course on industrial process fault diagnosis.

xi
xü Real time fault monitoring of industrial processes

We would like at this point, firstly, to cite our distinguished colleagues, who have before
us attempted a similar task, and have in this way guided us in the writing ofthis book:
Anderson T. and PA Lee (1981). Fault tolerance: Prineiples and practice. Prentice-
Hall International.
Basseville M. and A Benveniste, Eds. (1986). Detection of abrupt changes in signals
and dynamical systems, Springer-Verlag.
Basseville M. and I. Nikiforov (1993). Detection of abrupt changes: Theory and
application. Prentice Hall, NJ.
Brunet J., Jaume D., Labarn~re M., Rault A and M. Verge (1990). Detection et
diagnostic de pannes: approche par modelisation. Hermes Press.
Himmelblau D.M. (1978). Fault detection and diagnosis in chemical and petrochemical
processes. Elsevier Press, Amsterdam.
Patton RJ., Frank P.M. and RN. Clark, Eds. (1989). Fault diagnosis in dynamic
systems: theory and application, Prentice-Hall.
Pau L.F. (1981). Failure diagnosis and performance monitoring. Control and Systems
Theory Series ofMonographs and Textbooks, Dekker, New York.
Telksnys L., Ed. (1987). Detection of changes in random processes. Optimization
Software Inc., Publications Division, New York.
Tzafestas, S. (1989). Knowledge-based system diagnosis, supervision and control.
Plenum Press, London.
Viswanadham N., Sarma V.V.S. and M.G. Singh (1987). Reliability of computer and
control systems. Systems and Control Series, vol.8, North-Holland, Amsterdam.
Secondly, we would like to eite some very important survey papers, that provided us
with useful insights:
Basseville M. (1988). Detecting changes in signals and systems - A survey. Automatica,
24, 309-326.
Frank P.M. (1990). Fault diagnosis in dynamic systems using analytical and knowledge-
based redundancy - A survey and some new results. Automatica, 26, 459-474.
Gertler J.J. (1988). Survey of model-based failure detection and isolation in complex
plants. IEEE Control Systems Magazine, 8, 3-11.
Iserman R (1984). Process fault detection based on modeling and estimation methods:
A survey. Automatica, 20, 387-404.
Mironovskii L.A (1980). Functional diagnosis of dynamic systems - A survey.
Automation and remote control, 41, 1122-1143.
Willsky AS. (1976). A survey of design methods for failure detection in dynamic
systems. Automatica, 12,601-611.
Thirdly, we would Iike to note some important international congresses, devoted to fault
monitoring, which show the great importance that this field has recently acquired:
1st European Workshop on Fault Diagnostics, Reliability and related Knowledge-based
approaches. Rhodes, Greece, August 31-September 3, 1986. Proceedings appeared in
Preface xiü

Tzafestas S., M. Singh and G. Schmidt, Eds. System fault diagnostics and related
knowledge-based approaches, D. Reidel, Dordrecht, 1987.
Ist IFAC Workshop on fault detection and safety in chemical plants, Kyoto, Japan,
September 28th-October 1st, 1986.
2nd European Workshop on Fault Diagnostics, Reliability and related Knowledge-based
approaches. UMIST, Manchester, England, April 6-8, 1987. Proceedings appeared in
M. Singh, K.S. Hindi, G. Schmidt and S.G. Tzafestas (Eds.). Fault Detection and
Reliability: Knowledge-based and other approaches, Pergamon Press, 1987.
IFAC-IMACS Symposium SAFEPROCESS '91, Baden-Baden, Germany, September
10-13, 1991.
International Conference on Fault Diagnosis TOOLDIAG '93, Toulouse, France, April 5-
7, 1993.
IFAC Symposium SAFEPROCESS '94, Espoo, Finland, June 13-15, 1994.

Next, we would like to express our sincerest thanks to all those who helped us in tbis
effort: our secretaries Stella Mountogiannaki, lrini Marentaki, Dora Mavrakaki and
Vicky Grigoraki, our postgraduate students George Tselentis, Michalis Hadjikiriakos and
Eleftheria Sergaki and our wives Olga and Aithra who beared with us through the
writing ofthis book.
Lastly we would like to deeply thank Professor S. Tzafestas, not only because as the
Editor of this series, showed trust in us, but also because he has been constantly
encouraging and helping us in our career so far.
A.D. Pouliezos
G.S. Stavrakakis
December 1993,
Chania, Greece.
List o[figures
Figure 1.1 Grinding-classification circuit. ..................................................................... 22
Figure 1.2 Test of steady state app1ied on Q6 .............................................................. 24
Figure 1.3 Drift test app1ied on Q9 .............................................................................. 24
Figure 1.4 Standard deviation test applied on Q9 .......................................................... 25
Figure 1.5 Shewhart control chart ................................................................................ 26
Figure 1.6 Flowchart for computer operated control chart ............................................. 27
Figure 1.7 Three variable polyplot ................................................................................ 39
Figure 1.8 Five variable polyplot .................................................................................. 39
Figure 1.9 Seventeen variable polyplot ......................................................................... 39
Figure1.10 Six variable polyplot with Hotelling's T2 of production data, 2
observations per glyph ................................................................................. 41
Figure 1.11 Frequency analyzed results give earlier warning .......................................... .45
Figure 1.12 Vibration Criterion Chart (from VDI 2056) ................................................ .48
Figure 1.13 Benefits offrequency analysis for fault detection ......................................... .49
Figure 1.14 Typical machine "signature" ........................................................................ 50
Figure 1.15 Effect of misalignment in gearbox ................................................................ 51
Figure 1.16 Electric motor vibration signature ................................................................ 52
Figure 1.17 Mechanical levers ........................................................................................ 53
Figure 1.18 Proximity probe .......................................................................................... 53
Figure 1.19 Accelerometer ............................................................................................. 53
Figure 1.20 Extraction fan control surface ......................................................... ,............ 56
Figure 1.21 System analysis measurements .................................................................... 60
Figure 1.22 Differences between H 1 and H 2 measurements ............................................. 64
Figure 1.23a Effect of tooth deflection ............................................................................. 65
Figure 1.23b Effect of wear ............................................................................................. 65
Figure 1.24 Gear toothmeshing harmonics ...................................................................... 66
Figure 1.25 The use of the cepstrum for fault detection and diagnosis of a gearbox ......... 67
Figure 1.26 Faults in rolling element bearings ................................................................ 68
Figure 1.27 Faults in ball and roller bearings .................................................................. 68
Figure 1.27a Block diagram representation of the on-line bearing monitoring
system ......................................................................................................... 69
Figure 1.28 Reciprocating machine fault detection .......................................................... 72
Figure 1.29 Basic steps used in the analysis for collecting spectra ................................... 72
Figure 1.30 Simplified logic tree and complementary interrogatory diagnosis .................. 73
Figure 1.31 Flow chart of the automated spectral pattern fault diagnosis method
for gas turbines ........................................................................................... 74
Figure 1.A1 Operating characteristic curves for the sampie mean test, Pf = 0.01 ............. 82
Figure 1.A.2 Operating characteristic curves for the sampie mean test, Pf = 0.05 .............. 82
Figure 1.A.3 Power curves for the two-tailed x2-test at the 5% level of
significance ................................................................................................. 83
Figure I.B.I Derivation of the power cepstrum ................................................................ 92
Figure 2.1 General architecture ofFDI based on analytical redundancy ......................... 98

xv
xvi Real time fault monitoring of industrial processes

Figure 2.2 General structure of a residual generator ...................................................... 99


Figure 2.3 Backward SPRT failure detection system and trajectory of backward
LLR ............................................................................................................ 108
Figure 2.4 Dedicated observer scheme (after Frank, 1987) ............................................ 125
Figure 2.5 Simplified observer scheme (after Frank, 1987) ........................................... 126
Figure 2.6 Generalized observer scheme (after Frank, 1987) ......................................... 126
Figure2.7 Local observer scheme of an ROS (after Frank, 1987) ................................. 127
Figure 2.8 General structure of an observer-based residual generation approach ............ 137
Figure 2.9 Jet engine .................................................................................................... 154
Figure 2.10 Norm ofthe output estimation error ............................................................. 155
Figure 2.11 Absolute vaIue ofthe fault-free residual ....................................................... 155
Figure 2.12 Faulty output and residual in the case ofa fault in T7 .................................. 156
Figure 2.13 Faulty output of the pressure measurement P6 and corresponding
residual ....................................................................................................... 156
Figure 2.14 Overall structure ofthe IFD-scheme ............................................................ 159
Figure 2.15 f(t), no-fault case ........................................................................................ 161
Figure 2.16 f(t), 5% d-sensor fault ................................................................................. 161
Figure 2.17 The ADIA block diagram ............................................................................ 162
Figure 2.18 Soft failure detection and isolation logic ....................................................... 164
Figure 2.19 Adaptive threshold logic .............................................................................. 165
Figure 2.20 Engine model .............................................................................................. 167
Figure 2.21 Residual generation strategy ........................................................................ 168
Figure 2.22 Experimental conditions for residual generation validation ........................... 169
Figure 2.23 No fault residuals; throttle (top) ................................................................... 169
Figure 2.24 10% throttle sensor fault (bottom) ............................................................... 169
Figure 2.25 Friction characteristics of the MANUTEC r3 robot ..................................... 172
Figure 2.26 External torque estimation ........................................................................... 172
Figure 3.1 Fault detection based on parameter estimation and theoretical
modeling ..................................................................................................... 181
Figure 3.2 Second order electrical network ................................................................... 184
Figure 3.3 Effect of different forgetting factors on the quality of estimate ...................... 206
Figure 3.4 A unified strategy for fault detection based on parameter estimation ............. 208
Figure 3.5 Simulation of self-tuning estimator with variable forgetting factor
00=0.0125 ................................................................................................... 209
Figure 3.6 Choice ofv(t) ..............................................................................................211
Figure 3.7 Evaporator configuration and notation ......................................................... 225
Figure 3.8 Estimate ofUA for ,t=0.95 .......................................................................... 226
Figure 3.9 Estimate ofxFfor ,t=0.95 ............................................................................ 227
Figure 3.10 EKF estimate of UA and xF with confidence intervals .................................. 228
Figure 3.11 Non-faulty data sets and faulty data set in aircraft engines ........................... 230
Figure 3.12 Simulation results ........................................................................................ 231
Figure 3.13 Scheme of a speed controlled d.c. motor and centrifugal pump ..................... 232
Figure 3.14 Block diagram ofthe linearized d.c. motor-pump-pipe system ...................... 234
Figure 3.15 Step responses for a change ofthe speed setpoint ......................................... 236
Figure 3.16 Process coefficient estimates after start ofthe cold engine ............................ 236
Figure 3.17 Change of armature circuit resistance .......................................: .................. 236
List of figures xvü

Figure 3.18 Change of pump packing box friction by tightening and loosening of
the cap screws ............................................................................................. 236
Figure 3.19 Detailed one-line diagram of a typical high voltage substation ...................... 242
Figure 3.20 Four processor real-time computer implementation of DC-drive fault
detection algorithm ...................................................................................... 246
Figure 4.1 Relationship between tenns in knowledge engineering .................................. 259
Figure4.2 Event based diagnostic architecture and messages ........................................ 286
Figure 4.3 Curve analysis based diagnosis combining digital signal processing
and rule-based reasoning ............................................................................. 290
Figure 4.4 Diagnosis of sensors .................................................................................... 292
Figure 4.5 Diagnosis of sensors .................................................................................... 293
Figure4.6 Different states in the Petri net based monitoring concept ............................. 293
Figure4.7 Concept of the mechanism, which handles the rules in Petri net based
fault diagnosis ............................................................................................. 295
Figure 4.8 Representation of the fuzzy function ............................................................ 298
Figure4.9 Detennination of the maximum ordinate of intersection between A and
A* ............................................................................................................... 299
Figure 4.10 The expert system diagnostic process for NPP safety ................................... 303
Figure 4.11 Diagram ofboiling water reactor cooling system .......................................... 305
Figure 4.12 Flow offailure diagnosis with implication and exception .............................. 309
Figure 4.13 Example of fuzzy fault diagnosis by CRT tenninal ...................................... 31 0
Figure 4.14 The general appended KBAP configuration .................................................. 313
Figure 4.15 Metalevel control in a KBAP ....................................................................... 315
Figure 4.16 The internal organization or the meta1evel control rule node ...................... 316
Figure 4.17 General object level rule examples for a low voltage bus .............................. 318
Figure4.18 Partial decision tree diagram and corresponding PRL rules for a
motor pump fault diagnosis ......................................................................... 323
Figure 4.19 Circuit to detect transient faults in a microcomputer system ......................... 327
Figure4.20 A production system workstation monitoring system.................................... 336
Figure 4.21 CIM system layout ...................................................................................... 339
Figure4.22 CIM system example diagnosis .................................................................... 341
Figure 4.23 Updated probabilities in deep KB for CIM system diagnosis ........................ 343
Figure 4.24 D-S (deep-shallow) type of expert hybrid reasoning ..................................... 352
Figure 4.25 Functional hierarchy as deep knowledge base ............................................... 353
Figure 4.26 Rule hierarchy as shallow knowledge base ................................................... 354
Figure4.27 Schematic diagram for diagnostic strategy ................................................... 355
Figure 4.28 Sets ofrelation between failure and symptom ............................................... 365
Figure 4.29 Linguistic truth value offailure derived from exception ................................ 368
Figure 5.1 Feedforward and CAM/AM Neural Network structure ................................. 373
Figure 5.2 Features ofartificial neurons ....................................................................... 375
Figure 5.3 Neuron activation characteristics ................................................................. 376
Figure 5.4 Neuron output function characteristics ......................................................... 377
Figure 5.5 Structure of multiple-Iayer feedforward neural network ................................ 380
Figure 5.6 Structure of ART network ........................................................................... 386
Figure 5.7 Expanded view of ART networks ................................................................ 387
Figure 5.8 Topological map configurations ................................................................... 390
xviii Real time fault monitoring of industrial processes

Figure 5.9 A topological neiborhood Ne of unit u e showing shrinking training


iteration nj ................................................................................................... 391
Figure 5.10 Three continuous stirred tank reactors in series ............................................ 396
Figure 5.11 Trained network (numbers in circles represent biases ofnodes) .................... 398
Figure 5.12 Experimental results .................................................................................... 399
Figure 5.13 Generalization capacity vs. training set size ................................................ .400
Figure 5.14 Network training inputs ............................................................................... 403
Figure 5.15 Intermediate node positions .......................................................................... 403
Figure 5.16 Final node positions ..................................................................................... 404
Figure 5.17 Example power system ................................................................................ 407
Figure 5.18 Four-point controller ................................................................................... 408
Figure 5.19 General controller and plant ......................................................................... 408
Figure 5.20 Neural network CDC controller for actuator failures ................................... .409
Figure 5.21 Real plant during test .................................................................................. 410
Figure 5.22 Global development cycle for integrating neural nets in Expert
Systems ...................................................................................................... 420
Figure 6.1 Origins of some defects found in materials and components ......................... .432
Figure 6.2 Magnetic flaw detection ............................................................................... 437
Figure 6.3 (a) Vector point. (b) Impedance plane display on oscilloscope,
showing differing conductivities (c) Impedance plane display,
showing defect indications .......................................................................... 440
Figure 6.4 Normal probe transmission technique ........................................................... 442
Figure 6.5 Angle probe transmission method ............................................................... .443
Figure 6.6 Reflective technique with angle probe ......................................................... .443
Figure 6.7 Crack detection using a surface wave probe ................................................ .444
Figure 6.8 "A" scan display (a) reflections obtained from defect and backwall;
(b) representation of "A" scan screen display .............................................. .444
Figure 6.9 "B" scan display .......................................................................................... 445
Figure 6.10 Effect of defect size on screen display .......................................................... 446
Figure 6.11 (a) Micro-porosity, (b) Elliptical defect, ( c) Angled defect .......................... .446
Figure 6.12 Method of scanning a large surface ............................................................. .44 7
Figure 6.13 Indication of lamination in thick plate: (a) good plate; (b) laminated
plate ............................................................................................................ 447
Figure 6.14 Indication of lamination in thin plate\: (a) good plate; (b) laminated
plate ............................................................................................................ 448
Fidure 6.15 Detection of radial defects in: ..................................................................... .448
Figure 6.16 Probe and wave path geometry as used to measure the size of a crack
in a weldedjoint .......................................................................................... 454
Figure 6.17 Summary of FCG rate data for the Virkler et al. (1979) case
calculated by the ASTM E64 7-83 standard method ..................................... .4 72
Figure 6.18 The expert structural damage assessment inference process .......................... 490
Figure 6.19 Examples of rules for the damage degree of reinfored concrete bridge
decks ........................................................................................................... 491
Figure 6.21 TFM graphical solution ............................................................................... 497
Figure 6.22 ITFM graphical solution .............................................................................. 498
Figure 6.23 MPD graphical solution ............................................................................... 499
List of figures xix

Figure 6.24 Types offiltering processes ......................................................................... 502


Figure 6.25 A typical causal network ............................................................................. 504
Figure 6.26 Probability of rupture per year of a PWR pressure vessel after 40
years of operation ........................................................................................ 510
Figure 6.27a Exceedances spectrum divided for construction ofhistogram ........................ 513
Figure 6.27b Stress-range histogram corresponding to the exceedances spectrum
shown in fig. 6.27a ...................................................................................... 513
Figure 6.28 Examp1e ofpower spectral density function (double peaked spectra .............. 514
Figure 6.29 Network for post-earthquake damage assessment of a reinforced
concrete building ......................................................................................... 522
List oltables
Table 1.1 Performance effects of staged faults on a 4-phase switched
reluctance motor .......................................................................................... 21
Table l.A.l Values of k such that Pr(y<k-1)<a12 where y has the binomial
distribution with p=0.5 ................................................................................ 84
Table l.A.2 Values of I-Pd for the sign test: PI = 0.05 ................................................... 85
Table 1.A.3 Values of I-Pd forthe sign test; PI =0.01 ................................................... 85
Table 1.A.4 Critical values of the rank: corelation coefficient.. ......................................... 86
Table 2.1 Residual structure ....................................................................................... 170
Table 3.1 Operations count for window size nw (scalar output case) ............................ 196
Table 3.2 Number of arithmetic operations used for updating p(t) once. The
number ofparameters is n ........................................................................... 202
Table 3.3 Cases in aircraft engine fault detection simulation ........................................ 231
Table 3.4 Interlocking scheme of substation of fig. 3.19 .............................................. 239
Table 3.5 Substation configuration after restoration ofsupply ..................................... 241
Table 3.6 True and estimated values for test run .......................................................... 246
Table 4.l Definition offailure and symptom vector ..................................................... 306
Table 4.2 Matrices of fuzzy relation offailures with symptoms ................................... 307
Table 4.3 Matrix of alternative fuzzy relation Eij and vector of exceptional
proposition ~ ............................................................................................. 307
Table 4.4 Calculation results for all nodes ................................................................... 342
Table 4.5 Finding the parent node ............................................................................... 342
Table 5.1 List of selected faults ................................................................................... 397
Table 5.2 Sensor measurement patterns of six selected faults ....................................... 397
Table 5.3 An example 2-D input space ........................................................................ 402
Table 5.4 Input patterns .............................................................................................. 406
Table 5.5 Response times oftypical real-time problems .............................................. .422
Table 6.1 Basic principles and major features of the main non-destructive
testing (NDT) systems ................................................................................. 433
Table 6.2 Example ofinspection data .......................................................................... 490
Table 6.3 Values ofthe parameters F 1 andF2 on the JONSWAP Spectrum ................. 517
Table 6.4 Numerical example for network of fig. 6.25 ................................................. 521

xxi
Introduction

The writing of this book has been motivated from the fact that a very large amount of
knowledge, regarding fault monitoring, has been accumulated. This accumulation is the
result of two factors: firstly, man has always been interested in preventing catastrophes,
being a consequence of his works or of natural causes, and secondly, as technology
advances, it seems that unfortunately, the risk of catastrophic events occurring,
increases.
The latter fact is the result of bigger and more complicated plants, which makes it
impossible for human operators to manage or control them. Thus the need for automatic
or operator-aiding fault monitoring systems. Fortunately, results from research into
man-made system safety, is equally applicable to protection from undesirable natural
phenomena, such as earthquake prediction or meteorological forecasts. Even more, the
same results find applications in many diverse scientific disciplines, such as
bioengineering (e.g. arrythmia detection), speech processing, traffic control (incident
detection) and in any other area where dynamic phenomena with possibly time-varying
parameters occur. In this sense, the meaning of the term jailure can be extended to
mean change. Thus the following definition can be made:
A change is any discrepancy between an assumed value of a monitored parameter of an
object and its measured, estimated or predicted value. This change may be the result of a
natural operation, assuming many operational modes, or the result of a malfunction.
Since this book is about fault monitoring in industrial, thus man-made, systems, let us
concentrate henceforth in this area. Malfunctions can occur in sensors, actuators,
controller hardware or software, the process itself and to structures (vessels, pipes,
beams etc.). The terms fault and failure are used interchangeably, but a subtle difference
does exist: a jault occurs when the item in question operates incorrectly in a permanent
or intermittent manner; jai/ure on the other hand denotes a complete operational
breakdown. To avoid confusion these two terms will be used with the same meaning
throughout this book. Related to this terminology is the notion of jault to[erance,
signifying the ability of a system to withstand malfunctions, whilst still maintaning
tolerable performance. It is obvious however, that fault tolerance includes fault
monitoring and diagnosis and the ability of the system to reorganize or restructure itself,
following fault identification.
A fault monitoring system should perform the following tasks:
• Fault detection and isolation (FDI).
• Diagnosis of effect, cause and severity of faults in the components of a system.
• Reconfiguration or restructuring of appropriate control laws, to effect tolerable
operation of the system if possible. If not, issue of shutdown or other emergency
advice (eg. abandon aircraft).

xxiii
xxiv Real time fault monitoring of industrial processes

Additionally, the performance of the above tasks, should meet certain requirements.
Stated informally, these are:
• As many as possible true faults should be detected, while as few as possible false
alarms should be triggered.
• The delay time between a fault occurrence and a fault declaration should be small.
• The accuracy of the estimated fault parameters (location, size, occurrence time etc.)
must be high.
• The employed method must be insensitive (robust) to model inaccuracies (if a
mathematical model is used) such as simplification errors resulting from linearization
or unmodeled, usually non-linear components, e.g. friction, and external phenomena
such as noise, load variation etc.
rt is obvious, even to the uninitiated, that simultaneous satisfaction of the above
requirements leads to contradiction, which is usually resolved by trade-off methods.
Incorporating a fault monitoring system into an industrial process results in improved
reliability, maintanability and survivability. These terms are defined as:
• Reliability deals with the ability to complete a task satisfactorily and within the
period oftime over which that ability is retained.
• Maintanability concerns the need for repair and the ease with which repairs can be
made, with no premium placed on performance.
• Survivability relates to the likelihood of conducting an operation safely (without
danger to human operators or the system) whether or not the task is completed.
Furthermore increased system autonomy is achieved.
The main types of failures and errors which end anger the safe operation of a technical
process are:
• System component failures caused by physical or chemical faults.
• Energy supply failures, caused for example by power supply faults.
• Environmental disturbances and external interference.
• Human operator errors.
• Maintenance errors and failures caused by wrong repair actions.
• Control system failures.
Since "a chain is not stronger than its weakest link", the task of achieving a high system
availability must be performed with a total system availability attitude. This concept,
however is not easy to be realised in practice. The key point in this practical
implementation is the concept of lije-cycle maintenance. This is defined as those actions
that are appropriate for maintaining a facility in a proper condition, so that its required
function·s are performed throughout its life-cycle.
Two major problems have to be considered in life-cycle maintenance. One is how to
cope with unexpected deteriorations and failures. The other is how the maintenance
activities are properly adjusted to the various changes inside and outside the facility.
Introduction xxv

To realize the concept of life-cycle maintenance, in which the optimum maintenance


strategies for each component of the facility are selected based on the prediction of
deteriorations, enormous amount of information and information processing are
required .. Uti\ization of computers is therefore, the on\y way to integrate and to process
information relevant to the maintenance activities. A general framework for a computer
assisted predictive maintenance system (CAPMS) is shown in fig. 1.1.

Common Data Base

FllCltity Data Base

. 11 Pred,cl,on
- mSlll. auen of
- mocbficauon dei 'ora-
_knowledge en
(rom OIhcr ·'11111 .. I'on
planlS er mode
researches &

i. deterioration
E
unexpected
~ {failure
=

Maintenance Management Su!1-system

Figure 1. J Architecture of a computer assisted predictive maintenance system (CAPMS)


(from Proceedings, IFACIIMACS Symposium SAFEPROCESS '91, Baden-Baden, Gennany,
10-13 September, 1991)
The system consists of two subsystems and two groups of data bases. The strategy
planning subsystem provides the function of selecting the optimum maintenance strategy
for each component of the facility. It first predicts deteriorations in each of the
components in terms of the mode and the progressive pattern. Deterioration prediction,
available maintenance technology and the effect of deterioration are the primary factors
for maintenance strategy planning. The effects of deterioration are evaluated in terms of
the safety and economic effects of the functional degradations or failures induced by it.
The evaluation ofmaintenace technologies is carried out in terms ofthe avai\abi\ity ofthe
monitoring, diagnostic and repair techniques. For the prediction of deteriorations, the
xxvi Real time fault monitoring of industrial processes

subsystem needs to refer to specific data about the facility in question. This data is
contained in a facility data base which consists of a facility model, an environment model
and an operation and maintenance record.
The maintenance management subsystem manages and controls the actual maintenance
actions based on the strategy selected in the strategy planning subsystem. From the
results of maintenance actions, the deteriorations and failures detected in the facility are
analyzed. If the deteriorations and failures correspond to those predicted in the strategy
planning subsystem, the maintenance management subsystem keeps the same strategy
and makes a plan for the next maintenance cycle. On the other hand, occurrences of
unpredicted deteriorations or failures indicate improper predictions in the strategic
planning subsystem. In this case, the information is fed back to the strategy planning
subsystem, and the prediction of deteriorations is carried out over again for revising the
maintenance strategy plan.
To sum up, there are two feed back loops in CAPMS. One is the routine feed back loop
to provide the information gathered during maintenance actions to the next maintenance
plan. The second is the strategic feed back loop which becomes active when the actual
data is reco gnized as inconsistent with the assumed scenario of the maintenance
strategy.
To reaJize a CAPMS the followings major items have to be studied:
• Prediction of deterioration. As mentioned already, the prediction of deterioration is
an essential function of a CAPMS.
• Deterioration ejject evaluation. The effects that the deteriorations propagate in the
facility cause functional degradations and failures. To evaluate these effects with a
computer, one needs functional models of the facility. Significant amount of works
on this subject have been done in conjuction with diagnostic expert systems.
• MOl1itoring and diagnosis. Although a number of technologies have been developed
for monitoring and diagnosis, one still needs techniques for detecting the progress of
various deteriorations at their early stages.
• Selectiol1 of maintenance strategies. The selection of the optimum maintenance
strategy is the key function of a CAPMS.
• Commol1 data base and facility data base. CAPMS requires sophisticated data
bases. It is necessary to make a study on the structure of the common data base to be
effective in deterioration prediction. With regard to the facility data base, the product
model should be used as the foundation.
The subject ofthe present book is the detailed exposition ofthe first three items.
It is weil known that the use of traditional techniques to build a desired control system
requires a huge effort. Thus, there is also significant need for better ways to create a
system. If the basic tasks of process automation, the feed-forward and feedback control,
are dedicated to a first automation level, the various tasks with supervisory junctions can
be considered as forming a second level. These supervisory functions serve to indicate
Introduction xxvii

undesired or unpennitted process states and to take appropriate actions in order to avoid
a damage of the process or an accident with human beings.
It is assumed that faults affect the technical process and its control. As mentioned earlier,
a fault is to be understood as a nonpermitted deviation of a characteristic property of the
process itself, the actuators, the sensors and controllers. If these deviations influence the
measurable variables of the process, they may be detected by an appropriate signal
evaluation. The corresponding fault detection and isolation (FDI) functions, called
monitoring, consist of checking the measurable variables with regard to a certain
tolerance of the normal values (limit or trend checking) and triger alarms if the tolerances
are exceeded. Based on these alarms the operator takes appropriate actions. In cases
where the limit value violation signifies a dangerous process state, an appropriate action
can be initiated automatically. This is called automatie proteetion. Both supervisory
functions may be applied directly to the measured signals or to the results of a following
signal analysis, as in the case of frequency spectra of vibrations for rotating machines.
These classical ways of limit value checking of some important measurable variables are
appropriate for the overall supervision of the processes. However, developing internal
process faults are only detected at a rather late stage and the available information does
not allow an in-depth fault diagnosis. This is one of the reasons that process operators
are still required for the supervision of important processes. These human operators use
their own sensors, data records, own reasoning and long term experience to obtain the
required information on process changes and its diagnosis.
If the supervision is going to be improved and automated, a natural first step consists of
adding more sensors and a second step to transfer the operators' knowledge into
computers. Here it is usually desirable to add such sensors which directly indicate faults.
Because the number of sensors, transmitters and cables increases, the cost goes up and
the overall reliability is not necessarily improved. Furthermore many faults cannot be
detected directly by available sensor technology.
In practice, the most frequently used diagnostic approach is the limit checking of
individual plant variables. While very simple, this approach has serious drawbacks,
namely:
• Since the plant variables may vary widely due to input variations, the test thresholds
have to be set quite conservatively;
• Since a single component fault may cause many plant variables to exceed their limits,
fault isolation is very difficult (multiple symptoms of a single fault appear as multiple
"faults").
Consistency checks for groups of plant variables eliminate the above problems; the price
to be paid is the need for an accurate mathematical model. Model-based FDI consists of
two stages: residual generation and decision making based on these residuals.
In the first stage, outputs and inputs of the system are processed by an appropriate
algorithm (a processor) to generate residual signals which are nominally near zero and
xxviii Real time fault monitoring of industrial processes

wbich deviate from zero in characteristic ways when particular faults occur. The
techniques used to generate residuals differ markedly from method to method.
In the second (decision making) stage, the residuals are examined for the likelihood of
faults. Decision functions or statistics are calculated using the residuals, and adecision
rule is then applied to determine if any fault has occurred. Adecision process may consist
of a simple threshold test on the instantaneous values or moving averages of the
residuals, or it may be used directly on methods of statistical decision theory, e.g.
sequential probability ratio testing.
For a static system, the residual generator is also static; it is simply a rearranged form of
an input-output model, e.g. a set of geometric relationsbips or of material balance
equations. For a dynamic system, the residual generator is dynamic as weIl. It may be
constructed by a number of different techniques. These include parity equations or
consistency relations, obtained by the direct conversion of the input-output or state-
space model of the system, diagnostic ob servers and Kalman filters.
While a single residual is sufficient to detect a fault, a set of residuals is required for fault
isolation. To facilitate isolation, residual sets are usually enhanced, in one of the
following ways:
• In response to a single fault, only a fault-specific sub set of the residuals becomes
nonzero (structured residuals).
• In response to a single fault, the residual vector is conflned to a fault specific
direction (fixed direction residuals).
Also, to simplity statistical testing in a noisy system, it is useful if the residuals are
"wbite", that is, uncorrelated in time. Residuals need to be insensitive to some
disturbance variables. Tbis may be addressed as an explicit disturbance decoupling
problem or handled as a special case of structured residuals.
A fundamental issue in the generation of residuals is their robustness (insensitivity)
relative to unavoidable modeling errors. Robustness concems have plagued
implementation of detection filters (as weIl as other failure detection methods) since their
introduction. False alarms and incorrect identification of faults due to noise,
disturbances, plant parameter uncertainties and unmodelled system dynamics have led to
the design of robust detection filters and to determining appropriate thresholds for a
given detection filter. Various techniques have been proposed to make the failure
detection process more robust. Design methods are proposed with the goal of making
the detection filter very much more sensitive to one fault than others. These methods
have been shown to be specific cases of the unknown input observer approach. In tbis
approach, noise, disturbances, parameter uncertainties and unmodelled dynamics are
modeled as "fault events" of the system, along with the fault events arising from actual
system failures. An ob server is then designed to be sensitive to a fault event of interest,
while insensitive to as many other real and pseudo-fault events as possible.
Introduction xxix

Increasing automatization of processes means increasing functional dependence on


automation systems. The reliability of automation systems is thus becoming more and
more important when the overall reliability of the processes is considered. In modern
digital automation systems the most unreliable part is the field instrumentation; the
reliability of electronics and the man-machine interface devices being very high in normal
conditions. The relatively high fault frequency in the field instrumentation is due to the
fact that the instruments make contacts with the process often in hard conditions and
many of them, especially actuators, are exposed to wearing because of mechanically
functioning parts. It is therefore important to look for methods that help detecting faults
in this area in as an early phase as possible.
In recent years research efforts have shown that process changes due to faults can be
detected in an early stage by using process models and common sensors. Then
nonmeasurable quantities like process state variables and parameters may also be used.
With this improved knowledge a process superv;s;on with jault d;agnos;s (also called
condition monitoring) becomes possible. Signal processing now provides jeatures, like
direct measurable quantities or nonmeasurable quantities in the form of state variables or
process parameter estimates. By comparison with normal values, changes are detected
resulting in symptoms. A knowledge-base fault diagnosis indicates its cause and location.
The next step is a jault evaluation, that is an assessment is made of how the fault will
affect the process. The faults are then divided into different hazard classes according to
an incidentlsequence or a fault tree analysis. Then the following action can be decided. If
the fault is evaluated to be tolerable, the operation may continue and if it is conditionally
tolerable, a change of operation, areconfiguration of process parts or just maintenance
has to be performed. However, if the faults are intolerable an immediate stop is required
and the fault must be eliminated, e.g. by repair. A looped signal flow exists from the
measured signals through the different actions back to the process. It is therefore
possible to refer to supervisory loops.
With such advanced supervisory functions it is possible to improve the further
automation of technical processes with the following features:
• Early detection and diagnosis of developing faults (also called incipient fault
diagnosis).
• Prevention of further fault expansion.
• Preventive maintenance.
• Maintenance on request.
• Telediagnosis by using modern communication nets.
Process control computers can be extended to include automated diagnosis through the
integrated use of Artificial Intelligence techniques. Diagnosis, a complex reasoning
activity, is first characterized and then decomposed into its constituent information-
processing tasks, each being described in terms of input, output, knowledge
representation and inferencing strategy. AI-based techniques have been applied
throughout the process plant control infrastructure: from the low-end "execution level"
xxx Real time fault monitoring of industrial processes

to the high-end "supervision and planning level". The execution level includes the use of
techniques such as "fuzzy control" or "neural control" for closed loop control. Fuzzy
logic is used to express and manipulate ill-defined qualitative terms like "large", "smali",
"very smalI", etc. in a weil defined mathematical way to mimic the human operator's
manual control strategy. Qualitative rules are used to express how the control signal
should be chosen in different situations. "Neural control" refers to the use of neural
networks in developing process models which are then used to implement robust, model-
predictive controllers.
The high-end "supervisory level", on the other hand, seeks to extend the range of
conventional control algorithms through the use ofknowledge based systems (KBSs) for
tuning controllers, performing fault diagnosis and on-line reconfiguration of control
systems and process operation.
For diagnosis, these knowledge-driven techniques involve the interpretation of sensor
readings and other process observations, detection of abnormal operating conditions,
generation and testing of malfunction hypotheses that can explain the observed
symptoms and finally resolution of any interactions between hypotheses. Fundamentally,
diagnosis is viewed as a decision-making activity that is not numeric in nature. While the
governing elements are symboJic, numeric computations still play an important role of
providing certain kinds of information for making decisions and drawing diagnostic
conclusions.
Neural nets are expected to improve today's automated supervising systems, because
complex classifiers can be designed with neural nets. Artificial neural networks, even in
their simplest form, are good pattern recognizers. Input vectors are introduced into the
network and via supervised or unsupervised learning the weights on the connections of
the network are adjusted to achieve certain goals: matching targets for supervised
learning or forming clusters with unsupervised learning. Subsequent input vectors of
similar types can be classified properly, but, of course, novel input patterns that the
network was not trained to recognize, cannot be classified succesfully any more than a
clustering code will correctIy classify a new cluster on which it was not trained.
As the pattern vectors get to be large and complex, conventional numerical algorithms
may not be able to properly handle the task of recognition promptly (with a computer of
reasonable cost). For example, in the analysis of faults in rotating equipment, several
sensors could be used to collect measurements of the vibrations in the x, y, and z axes.
Ultra high-speed data sampling would be applied to get the complex waveforms
involved, followed by Fourier analysis to extract the frequency components. Then
statistical reduction might be applied to isolate patterns of condition from which a
decision could be made as to whether the equipment was operating normally, or not. But
all ofthese calculations take time and computer power. An ANN, once trained, can reach
the decision state far more rapidly.
Another desirable feature about ANN is that good models are not required to reach the
decision stage. In a typical operation in a chemical plant, the process model may be only
Introduction xxxi

approximate and the critical measurements may all be correlated with each other and
include non-normally distributed noise. Thus, the assumptions underlying the usual
statistical analysis for faults are violated to some unknown extent. An ANN seems to be
able to intemally map the functional relations that represent the process, filter out the
noise, and handle the correlations as weil.
Petri nets are suitable for the description of discrete events or processes. An important
property of these nets are their capability to model and describe concurrent and
asynchronous processes. Petri nets are supported by a rich mathematical theory, enabling
one to simulate and analyze them. By this characteristic they can be used for processing
in computers and controllers as weil as in tool automation.
Since the presentation of the Petri nets in 1962, a range of net classes has been defined.
These classes are divided according to the different quantity of inherent information. In
Petri nets-based diagnosis, place/transition nets are used on the area level and
conditionlevent nets on the component level. These net classes are, on the one hand,
sufficient for modeling events and processes in machines or plants and, on the other
hand, they provide high performance during their processing in a computer or a PLC.
Ageing plant (nuclear, conventional power, chemical, offshore, etc.) life management,
life-cycle optimization and in-time safety assessment are the integral elements in safe
operation and maintenance practices. Life management relies on accurate condition
assessment and this can be achieved by integration of on-line monitoring and off-line
inspections. The essential role of the maintenance activities is to provide plant operators
with all the functions needed for safe and economical plant operation.
A systematic approach to life management has three steps:
• Data management and selection of critical and/or important areas.
• Life assessment in the critical/important areas or condition assessment.
• Control of life and life extension, if needed, including possible refurbishment or
upgrading.
During the first step, for example, the economical benefits of the process, the life-limiting
factors of components, fault history and safety factors affect the decision of criticality.
The second step includes conventional life assessment work. The third step is the actual
process of managing plant life by operation, maintenance, refurbishment, training and
cost control.
Tools are needed for organizing, analyzing and transferring knowledge between the
people involved. Tools must be closely related to the strategies ofthe company and they
must fit the tasks of the users. Therefore, the strategies and the operation models have to
be defined before the system definition.
The data to be used in the analyses has to be systematically gathered and saved.
Inspection results, maintenance history and fault history are alt important when trying to
assess the life of a certain component. Data management is a necessity. Before doing
any life assessment or action plan, the critical components have to be defined. They are
xxxii Real time fault monitoring of industrial processes

the components that for some reason are suspected as critical and require further
analysis.
The criticality of a component can be defined with several methods. Theoretical
criticality consists of analysis of economical aspects, like the operationaI effects of a
damage in the component, the delivery time of a component, the statistical fault
frequency of the component and safety factors analysis. One way of determining
criticality is to find out the components with the lowest remaining Iife time based on
stress calculations and the international standards for Iife assessment. The effect of creep
can be calculated based on estimated (static) temperatures and pressures or actual
temperature and pressure history. The calculation gives as a result an estimated
remaining Iife in hours. However, the calculations are very conservative and therefore the
results can not be taken as accurate facts. Instead, they offer quite a good picture of the
most critical parts of a critical installation, ego piping. The components with the lowest
remaining Iife time or the components whose usage factor exceeds a certain level are
gathered as critical. On the basis of calculations the first plans for more accurate methods
Iike non-destructive tests can be made. Operational history affects the life of
components. Also authorities' demands have to be taken into account when planning the
components to be checked. NOT test results and fault history give quite an accurate
estimate about the condition of a component. There are several kinds of standards,
material expert knowledge and company specific directions for determining the
criticality.
The planned Iife time of the utility controls all Iife management decisions. The decisions
made in operationallevel have to follow the strategy of the company. A schedule of
reinspections and other maintenance operations is needed for planning for instance the
overhauls and budgeting. It is also needed for the authorities who must be confident on
the safe operation of the plant. The schedule gets more tight when the utility gets older.
Also, even if a plan for a longer period is needed, it changes over time according to the
entering information.
There are several kinds of decisions concerning the Iife management:
• Reinspection intervals and reinspection methods to be used have to be frequently
determined.
• It may be necessary to decide if a component should be repaired or changed
(investment costs have to be taken into account).
• If the strategy of the company is to extend the life of the plant, change of the
operation parameters of a critical component can be considered.
The following modules are examples of the domain knowledge that is needed for life
management:
• Components: the component classification (pipe, bend, weId, etc.) and the relations
between the components have to be described.
Introduction xxxiii

• Material: material properties, allowable maximum stresses, development speed of a


certain kind offault mechanism (creep, fatigue, etc.) in the material are all important
knowledge.
• Material tests: the inspection methods and knowledge oftheir usage, accuracy, cost,
etc. is needed for various reasons (for planning the reinspection methods and
intervals and interpreting the accuracy ofthe test resuIts).
All the modules are represented as object cIasses and their instances. For example,
material standards have been stored in data bases and they can be easily looked at and
fetched to the problem solving space when needed.

A lot of work is needed in integrating different systems and in generating a systematic


approach for each plant. The life management is a complicated domain. The amount of
relations and effects between different input parameters is enormous and getting an
accurate, optimized schedule for the operation and maintenance activities is far from
simple.
In this book we have tried to analyse the aforementioned aspects by providing theoretical
results with appropriate industrial applications. We hope that in tbis way, the various
methods will be cIarified and their merits and shortcomings understood.
CHAPTER 1

FAULT DETECTION AND DIAGNOSIS METHODS IN THE


ABSENCE OF PROCESS MODEL

1.1 Introduction

Malfunctions of industrial plant equipment and instrumentation increase the operating


costs of any plant. Even more serious are the consequences of a gross accident, because
of faulty plant operation. A fault may be defined as an abnormal change in the
characteristics of a system which gives rise to undesirable performance. The fault may be
an actual malfunction or perhaps only a change in an operating parameter related with
the structure of the system. Complete malfunction (failure) of equipment is usually
relatively easy to detect, but when failure has occurred, considerable damage may have
taken place. Therefore, it is desirable for an equipment monitoring system to be able to
identify faults of small extent or latent malfunctions, in order to predict a later significant
degradation or to locate failures which can very rapidly grow to catastrophic event.
In this chapter fault diagnosis techniques that do not necessitate any process model
knowledge are presented. In the first part ofthe chapter, a hypothesis testing formulation
of the fault detection process and some of the more recent statistical tools used in fault
decision making are presented. According to this formulation, the time record of one or
more process variables or parameters is observed, simple or more complicated statistics
of these variables are computed and robust statistical hypothesis tests are carried out to
detect faulty operation automatically.
Control charts, as the graphical means of monitoring the state of a dynamical system by
observing again the time record of one or more variables, are presented.
Control and monitoring of machine health by means of noise and vibration analysis
instrumentation, can also be of considerable help in the early diagnosis of potential
failures and are presented in the third part of this chapter. These methods are included in
2 Real time fault monitoring of industrial processes

this chapter because no process model is needed for their application and important
signal processing is needed for their performance. llIustrative examples from practical
fault diagnosis cases for a11 the above methods are presented.

1.2. Statistical aids fOT fault occurrence decision maldng

The diagnosis of the working state of an installation is a treatment lying between the
acquisition phase of the information and the oontrol phase of this installation.
The treatment is necessary to have representative information about the process state in
order to act on the process operation. This representativeness implies that information
must be sufficient (observability ooncepts), that it must be oollected from judiciously
positioned sensors and that it has to be free from errors being detrimental to the
interpretation of the phenomena represented by these information.
It is also a delicate treatment, because there are various types of information (Iogical,
anaIogical, deterministic, statistical, fuzzy, ... ) and because they are relative to different
subsets (the process itself, the actuators, the regulation systems, the chains of meas-
urement acquisition, ... ). This treatment oomprises oftwo aspects:
• The first concerns the detection and localization of failures that may affect the
process and also the instrumentation set-up, except the sensors. "Failure" may be an
alteration in the operation of a proceeding element such as sensor bias, actuator
locking, chocking up or significant deviation of the process state variables from the
no-fault operation limits. The noises ofmeasurement are not considered as failure.
• The second aspect forms part of the data validation. It consists of detecting failures
in sensors and in correcting the doubtful measurement if the case arises. This aspect
will be dealt with in Chapter 2.
In this section some of the basic statistical tools used in failure decision making will be
described. The strategy proposed is based on exploitation of the information given by
sensors and detectors at local level. The results may be synthesized by a "knowledge
based system" proposing a diagnosis. Knowledge based diagnosis systems are examined
in detail in Chapter 4.

1.2.1 Tests on the statistical properties ofprocess characteristic quantities

A. Univariate statistical aids


The statistical aids presented here can be applied to single output processes or to
muItioutput processes with outputs mutually statistically independent.
Fault detection and diagnosis methods in the absence ofprocess model 3

Al. Limit checking


Plant measurements are compared to preset limits; exceeding a limit indicates a failure
situation. In many systems, there are two levels of limits: the first level serves for
prewarning only, wbile the second level triggers emergency action. The limits are usually
set such that a large enough distance to the appearance of damage is retained on the one
band, while unnecessary fault alarms are avoided on the other.
Al. Constitution 0/ observation windows.
A set of several successive measurements called window, is collected for every measured
process variable in order to make the following test and treatments possible.
In real time environment the window must be sliding. On tbis window, statistical
properties of the measured signal are studied. During normal operation, the tested
parameters develop in an uncertain way around a medium value, according to a partially
known law of distribution: the greater the number of sampies, the greater the significance
of the obtained results.
The adjustment of the window's width is delicate and has to be realized by learning. A
narrow window permits the quick detection of failures wbile a wider window permits
the detection of failures of small amplitude, but its utilization is harmful as regards the
speed of detection. In order to obtain a satisfactory compromise of detection rapidity-
security, several windows of different width can be used together.
Moreover it is judicious, in order to guarantee the significance of the statisical tests, that
the minimum number of elements contained in the smallest window be at least equal to
30 (Kendall, (1982) and Basseville, (1980».
Assume that one observes a piecewise constant signal disturbed by a white noise; in
other words, let y(k) be a sequence of independent gaussian variables with variance q2,
and piecewise constant mean p(k), where the jumps in the mean occur at unknown time
instants. The problem of the on-line detection of such jumps appears for example in the
automatie decoding of striped labels: the detection of the vertical stripes has to be done
during the horizontal scanning ofthe label. In an on-line framework, the basic problem to
be solved is the detection of a single jump, as quickly as possible, in order to allow the
detection of near successive jumps.
According to the previous paragraph, let )'(k) be a wbite noise sequence with variance
q2and lety(k) be the observations sequence such that:
y(k) = P(k) + )'(k) (1.1)
where,
f k~r-I(Bo)
/J(k) ={ /Jo 1. (1.2)
/JI if k ~ r (BI)
r being the fault occurrence time instant.
4 Real time fault monitoring of industrial processes

It is first assumed that the means f.1o and JJl before and after the jump are known. Several
possible solutions in the (real) situation where f.1o is known (possibly via a recursive
parameter identification) and JJI is unknown are indicated in Chapters 2 and 3.
General comments on hypolhesis lesting. To test any hypothesis on a basis of a
random sampie of observations, the sampie space n (Le. all the possible sets of ob-
servations) is divided into two regions. If the observed point, say r, falls into one of these
regions, say CiJ, the hypothesis is rejected in favour of an alternative hypothesis; if r falls
into the complementary region tJ-CiJ the hypothesis is accepted. CiJ is known as the critical
region ofthe tests and tJ-CiJ is called the acceptance region.
When making statistical hypothesis tests, the possibility of erroneous inference exists.
This falls into two categories for the case where a null hypothesis (Ho) is tested against
an alternative hypothesis (HI ):
Type I: (Ho) is rejected when it is true.
Type II: (Ho) is accepted when it is false.
The probability of a type I error is equal to the size of the critical region used, termed the
significance level of the test and denoted by a. Thus,
p[r E CiJIHo] =a
In the present context a will be defined as the probability, Pf , of a false alarm.
Hence,
(1.3)
The probability of a type II error is a function ofthe alternative hypothesis (HI ), termed
the operating characteristic (OC) of the test and denoted by ß. Hence,
P[r E tJ-mI Hd = ß
P[r E mI Hd = l-ß
The complementary probability l-ß is called the power of the test and in the present
context it will be defined as the probability, Pd, of correct fault detection. Thus,
p[r E (QIHd = Pd
For a given Pf solution of (1.3) will generally yield an infinity of sub regions all obeying
(1.3). In this case (Q is chosen so that Pd is maximum. This is a fundamental principle in
statistical decision theory first expressed by 1. Neyman and E. S. Pearson (KendalI1982).
A critical region whose power is no smaller than that of any other region of the same size
for testing a hypothesis Ho against an alternative H I is called a best critical region
(BeR), and a test based on a BCR is called a most powerjul (MP) test.
When testing a hypothesis (Ho) against a class of alternatives, i.e. a composite hypothesis
(for example, when testing for a zero mean against non zero mean) a MP test could be
Fault detection and diagnosis methods in the absence ofprocess model 5

found for the different members of (H I ) (an infinity for the aforementioned example). If
there exists a BCR which is best for every member of (HI ) then this region is called
unijormly most powerful (UMP) and the test based on it a UMP test.
Fault diagnosis. Any kind of fault occurrence makes the standardized scalar residual
stochastic process y(k) = y(k) - p(k) depart from its zero mean, (f2 - variance and/or
whiteness properties. Therefore it is useful to perform the foUowing four statistical tests:
a. Sampie mean (parametrie test).
The test statistic commonly used for testing,
(Hoyy(k) =0
against (H1):y(k) = Yl(k)"* O;k = i, ... ,j
is the sampie mean defined by:
A 1 j
r=-Lr(k) (1.4)
n k=i
Under the null hypothesis, the sampie mean is normally distributed with zero mean and
variance ein, where c is the variance of the observations sequence (i.e. (f2 or the
calculated sampie variance if (f2 is unknown) and n is the size ofthe sampie.
The probabilities Pfand Pd are respectively given by:

Pr = P [Irl>-;-zPf
~..Jc 12 ] (1.5)

and Pd ~l-{<I>[-rl(k) ~ +zp,,,]-+rm ~ -ZP'I2]} (1.6)

where za is defined by:

and
A 1
cP(z)= ~
JZ e-Y 2/2dy
v2n -00

Pf ' Pd and n are functionally related in the two equations defining Pf and Pd" Pd also
depends on the unknown Y(k) . Typical values for Pf' are 0.1,0.05 though this will of
course depend on the specific application requirements. Having fixed Pf , then Pd, n and
1;1
the critical region can be chosen using equations (1.5), (1.6).
The UCL and LCL values (defined in Section 1.2.2) are given by:
6 Real time fault monitoring of industrial processes

UCL=-zp
.Jc /2'
n f

LCL=--zp
.Jc 12
n f

The graph of I-Pd' called the operating characteristics (OC) curve is shown in Appendix
I.A (figs.I.A.I and 1.A.2 of Appendix I.A) for different values of the sampIe size n and
for PI = 0.01 and 0.05 respectively. As it can be seen from the graphs, increasing the
sampIe size increases Pd, but at the expense of an increase in the detection delay time,
since by averaging a larger number of residuals the effect of a fault is smoothed out.
In the case of a sliding window the sampie mean r
given by equation (1.4), can be
calculated iteratively, thus reducing the amount of computation in on-line operations.
Define the window sampie mean using a new notation as folIows:

r
A"

I ,}
1 j
=- Lr(k)
nk=i
Then, in the case of a sliding window

r
i +1,j+1
j+l
=~ L r(k)
n k=i+l

= -Ht,r(k) + +I)-r(i)}
r(j

1 {rU+l)-rU) }
=yI,} +-
A"

n
If the residuals are correlated, the sampIe mean test may still be used but its controllimits
have to be modified accordingly. Statistical tests for the mean in the presence of
correlated measurements do not appear to exist in the statistical literature. This means
that in such cases the robustness of the appropriate tests must be examined when the
assumption of independence is violated.
To calculate the effect on the sampie mean control limits, consider the varlance of the
residual sampie mean, which is now calculated using the formula (Kendall, 1982):

var[r] = ~{nc +2(n-l)cPl +2(n-2)cP2+ ...}


n
where Pi is the ith order correlation between the residuals and c is the varlance of the
residuals under the null hypothesis. Then,
Fault detection and diagnosis methods in the absence ofprocess model 7

rA] C

n
2c n-I
var[ =-+2"" ~)n-k)Pk
n k=1

~.E..{1+2tPk}
n k=1

If the process is autoregressive of order I, then Pk = p k. Hence, for such processes,


k P
LPk=LP
Q() Q()

=-
k=1 k=1 1- P
Consequently,
var[r] ~ ~{1 + 2P } =~{1 +p} (1.7)
n I-p n I-p
This result implies that in the case of correlated measurements, the limits of the control
chart (see section 1.2.2.) for the sampie mean have to be modified according to (1.7). If
the correlation is negative the limits have to be decreased, whereas if the correlation is
positive the limits have to be increased, since,
l+p.
--IS
{<I if p<O
I-p >1 if p>O
In the first stage of the fault monitoring process the correlation is not known, therefore if
the occurred fault induces large p the mean test will give erroneous results.
b. Sign test (non-parametric test).
This is a non-parametric test used to test hypotheses on the value of the median of a
population. Since the residuals are normal under alt hypotheses the median is equal to the
mean and therefore this test ean be applied to test for zero mean.
The sign test proeedure is as folIows: the number of positive residuals in a batch is
ealculated and compared to two thresholds whieh depend on the sampie size n and
significance level a. Thus if,
nl < (number ofpositive residuals) < n2 : aeeept (Ho)
otherwise :(Ho) is rejected.
Table I.A.I of Appendix I.A is a table of the percentage points of the symmetrie
binomial distribution for different sampie sizes and signifieanee levels. It is shown in
Bennett and Franklin (1954), that it may be used for the sign test as folIows:
i. Count the number ofvalues above and below zero, say n+ and n-.
ii. Choose the smallest ofthe two values, say n+.
iii. Compare n+ with the table entry for chosen n and a, say n a .
iv. Ifn+ < n a , reject (Ho); otherwise aeeept it.
8 Real time fault monitoring of industrial processes

The entries in Table I.AI of Appendix I.A may be modified to indicate percentage
points for the number ofpositive residuals. Iffor a sampie size n the table entry is na' it
fellows that the number ofpositive residuals can vary from n a to n-na.
The thresholds nj and n2 are then chosen to satisfy:

Hence nl and n2 represent the UCL and LCL respectively for the sign test.
Oixon gave the Tables l.A2 and l.A3 of Appendix l.A, where values of I-Pd for PI
equal 0.05 and 0.01 respectivelyare shown.
In the case of a sliding window of observations the number of positive residuals can also
be calculated iteratively. Let n~j be the number of positive residuals in the residual
vector rj,j = [r(i) y(i + 1) ... y(j)Yand,
nj= I if r(i»O
=0 if r(i)<O
The best procedure for residual values that are equal to zero is to disregard them and
reduce the sampie size by their number. This is also intuitively appealing since a zero
value contributes equally to both negative and positive values. Then,
+ +
nj+l,j+1 = nj,j + n j+1 - nj

The robustness of the sign test in the case of correlated residuals, can be investigated
similarly to the case ofthe sampie mean test. Let,
ni = I if r (i) > 0
= -I if r (i) < 0
Then,
E[njl = 0 , var [njl = I
The random variable nj , may be associated with the positive and negative residuals.
Hence, ifthe r(i) are correlated,

cov[ njonj + j ] =E[ njnj+j ]


2 . -I
=-sm p.
1t }

where,
Pj = E[y(i)r(i + j)]
Fault detection and diagnosis methods in the absence of process model 9

If _ j h [ ] - 2 . -1 j _ (s)
Pj -P , t en E njnj +j --sm P -Pj
n
The variance of the sign test statistic will be given by:

The sum is equal to,

~2 . -1..fl
,t... -sm p
h=I 7r

Expanding the inverse sine (i.e. sin- I (.» in Taylor's series about 0,

sin -1 rJ =( :p sin -1 rJ )i p=o rJ + O{fJh)


=rJ +O{fJh)
Hence if orders >2 are neglected,

var[n+.]=[a(S)r{l+ 2p }=[a(S)r n+p(2-n) (1.8)


I,J n n(l- p) n n(1- p)
The modifying factor in tbis case is appreciably less than the factor appearing in (1. 7).
Tbis suggests that the sign test will be more robust in departures from independence than
the corresponding sampie mean test.
c. Testing jor whiteness
Among the various tests proposed for testing independence, two tests for whiteness are
investigated here. These include one parametrie and one non- parametrie:
1. First order serial correlation, rl
ii. Rank correlation.
All of these tests are weil documented in the statistical literature and a brief review of
each method is given here (Kendall (1982), Anderson 1958, Bennett and Franklin 1954).
i) The first order serial correlation of a window of observations is defined by:

n L~~i {rem) - ri,i)(r(m + 1) - ri,i)}


Ij = - - .
n -1 L~=i(r(m) - r i ,i)2

where yijis the sampie mean. For small sampie sizes «20) more accurate forms may
be used (Kendall, 1982).
10 Real time fault monitoring of industrial processes

Under the null hypothesis of whiteness the random variable rl is distributed


asymptotically normal with mean E(rl) = -I/(n-l) and variance var(rl) = (n..:l)2/(n_l)3.
Sampling experiments on serial correlation distributions suggest that the null case normal
theory remains approximately valid even for 1lw = 10 or 20, (Kendall, 1982). Confidence
limits for hypothesis testing can be found using normal distribution theory. The
probabilities PI and Pd are respectively given by:

Pi =p[IPII > z(.SPr )]

Pd =1- {<p[ -PI + z(.SPr )] - <p[ -PI - Z('SPf) J}


where Za is defined by

p[z>za]=_I_roo e-· sz2 dz=a


J2;rJza
and PI is the standardized normal variable(lj - E(lj» / var(lj)1I2 .
ii) The rank correlation coefficient is the non-parametric equivalent of the
standard correlation coefficient of two sets of variables. The usual procedure for the
calculation of the rank correlation coefficient for a set of values xl> Yl; x2, Y2; ... , xm Yn is
to replace each ~" Yj by their rankings xj and yj among the x's and y's respectively, and
calculate:
~~ d~
r' =1- .i..J,=1 ,
n(n 2 -1)
where d j =xj - yj.
If serial independence is to be tested, the set { xj } is replaced by the set {j }, while the
set {Yj} represents the population values (Kendall, 1982). Therefore, to test for
whiteness of the residual sequence, calculate,

r' =
Lj .{m-i+l-y'(m)Y
lD=1

n(n 2 -1)
where r'(m) is the rank of y(m) among the y(i)'s. The calculated value of r' is then
compared to its LCL and UCL values which are found from Table 1.A.4 of Appendix
1.A, for different PI'
d Testingfor variance.
The residual population variance is calculated from the sampie by the formula,
Fault detection and diagnosis methods in the absence ofprocess model II

2 1 j ·2
=-
A.

S L(y(m) - r")
nm=i

Confidence limits for testing,


(Ho):c = u 2

against (H1):c:#: u 2 ;C = u/
are found using the fact that the quantity,
.;1. _ (n -1)8 2 (1.9)
.{n-l - 2
U

is distributed;(l with (n-l) degrees offreedom. It then follows that the relation,
(n -1)8 2 (n -1)8 2
~---=~ < c < ~---=:...-
in-I, a/2 in-I,I-a/2

will have a probability of (l-a) of being correct (Benett and Franklin, 1954).
Equivalently,

Cin-I,I-aI2)
C A2 c
---------<5 <--------
trn-l,al2)
n-l n-l
represent the confidence limits on 82 with a probability of error Q. Hence, the UCL and
LCL are given by,

UCL =
trn-l,aI2) c (1.10)
n-l

LCL =
(rn-II-al2)
'
c
(l.ll)
n-l
The power ofthe test is given by,

1t(u[) = p[in-I< ;2 in-l,al2 ] + p[in-I> ;2 in-I,I-al2 ] (1.12)

where,

Figure 1.A.3 of Appendix 1.A shows some power curves for Pf =O.05 and for 1F3, 10,
30.
12 Real time fault monitoring of industrial processes

For real time applieations, the varianee and the first order serial eorrelation, as indeed
eorrelations of higher order ean be ealeulated iteratively. The equations deseribing the
evolution of eorrelations are developed by Pouliezos, (1980). These are:

co = c
j j-I
0
+ .!.. {y2 (j) - y2 (i -1)} - a j (a j + 2yi-l,j-l)
n
yi,j =yi-I,j-I + aj
aj =.!..{y(j) - y(i -I)}
n
and c~ ean be ealeulated from C~_I using:

where,
k l =j-n+m+2

q~ = q~-l - y(j - n - m)
P~ = P~_l-y(j-n+l)
pt =qt =n yi,j
and c~ denotes the sampie serial eorrelation of lag m ealeulated from the residual
sampie r ij. Specifieally the varianee, 52 = cg , and the first order serial eorrelation
1j = cl ean be ealeulated iteratively using the above formulae.
Both the tests of mean and varianee assume that the residuals sequenee is white (Kendall
1982, Mehra 1971). Therefore, it is important to test the residuals sequenee for
whiteness first, especially using tests whieh are invariant with respect to the mean and
varianee ofthe distribution as those presented in paragraph (e) previously.
AJ. Test 0/ steady state
The aim of this test is to determine whether the examined variables are in statie or
dynamie state, so that appropriate treatments ean be realized afterwards. When the
distribution law of measurements is postulated to be normal, the test of the mean square
ofthe sueeessive differenees is utilized (Commissariat al' Energie Atomique, 1978).
Suppose the measurements Xi represent a gaussian sequenee, then the variable r,
Fault detection and diagnosis methods in the absence ofprocess model 13

where x is the mean of the sequence and n the window's length, has a probable value
equal to 1. Ifn is bigger than 25, it can be assumed that,
u = (1- r)~(n -1)· (n + 1) / 2
follows a zero-mean standardized gaussian law. The decision rule used is:
U >I:: dynamic state case
U~ 1:: static state case
The confidence limits for hypothesis testing can be found in a similar way to that used for
the first order serial correlation test'I (see paragraph (c) previously).
When the distribution law is anomalous, the test can be substituted by the serial
correlation test.
A4. Drift test
This test is aimed at detecting the slow variations. It is based on the exploitation of the
results obtained by the previous test, utilized with the smallest and the largest window.
The small window is first considered; if the steady state test, utilizing the small window
detects a steady state for a duration equal to the large window and if for the same
duration the steady state test utilizing the large window detects a dynamic state, then a
drift is present.
A5. Robust univariate signal detector
In the previous paragraphs the case in which the signal was disturbed by a white noise
was considered. However, in many situations one may only have inexact knowledge of
the underlying univariate noise distribution. In these situations, it is desirable to employ a
robust detection scheme which ofTers some degree of protection against deviations from
the "assumed" noise distribution.
The discrete-time detection problem under consideration reduces to a hypothesis test of
the following form:

Ho:~ =Nj
Hl:~=Nj+s

for i=l, ... , n, where n denotes the number ofsamples, the positive signal s is assurned to
be known and the Ni are ii.d. random variables. Based on the realizations {y i} 7=1 of the
random variables {Yj } 7=1 the detector attempts to decide between hypotheses R o and R I .
14 Real time fault monitoring of industrial processes

In practical situations one would not likely know the precise distribution of the noise.
Accordingly, one will assurne that the noise distribution lies witbin an appropriately
defined neighbourhood of the nominal Laplace distribution and employ a robust test. The
nominal noise has a probability density function (Pd!) given by,

Po(z) =I e-r~1
2
where r is a positive parameter. The Laplace noise model exhibits the "heavy tail"
behavior wbich typically characterizes impulsive noise, and it appears in many
engineering investigations. Examples where the Laplace pdf is used as a noise model
indude undersea, atmospheric and speech processing applications.
Let B(R) denote the (f - algebra of Borel sets on the real line R, and let M denote the
dass of all probability measures on (R, B(R». Let Po and PI denote the probability
measures on (R, B(R». The admissible noise families under Ho and H I are given by Po
and P 1> respectively.
Po = {Q eM:Q«-oo,z» ~ (1- EO)Po«-OO,z» - Öo for all zeR}
and,
II ={Q eM:Q«z,oo» ~ (1- E1)1l«z, 00» - Öl for all zeR}
where the non negative numbers E 0, E 1> ~, and öl are sufficiently small to insure the
disjointness of Po and PI' The designer must consider the definitions of the
aforementioned admissible noise families when assigning values to these parameters, as
tbis will determine the breadth of Po and/or PI' Smaller choices of the E and 15 parameters
lead to a correspondingly smaller dass of admissible distributions. It should be noted that
several popularty used noise ctasses such as E-contamination, total variation,
Kolmogorov distance, Levy distance, and Prohorov distance are sub sets of Po and/or PI
(see Thompson, 1991).
Loosety speaking, the saddlepoint approach is one wbich takes the "least favorable"
distribution from the dass of admissible distributions and then specifies a detector to
maximize performance for tbis "least favorable" distribution. Without specifying all of the
details, the robust detector, wbich is the solution of a specific saddlepoint criterion, has
the canonical detector structure of a nonlinearity, followed by an accumulator, followed
by a threshold comparator. The ditIerence between the Neyman-Pearson optimal
detector for Laplace noise and the robust detector for Laplace noise lies in the
nonlinearities. Both the Neyman-Pearson (gNP) and robust (gk) detector nonlinearities are
illustrated in the figure below.
The Neyman-Pearson and robust nonlinearities. The robust nonlinearity is obtained
trom the Neyman-Pearson nonlinearity by censoring at prescribed vertical heights. The
method for determining the censoring height can be found in Thompson, (1991), and
because of space considerations cannot be induded here. Note that by putting k=r s, the
Fault detection and diagnosis methods in the absence ofprocess model 15

Neyman-Pearson nonlinearity is obtained !Tom the robust nonlinearity. It follows that the
test statistic for the robust detector is given by,

Tk =:L;=lgk(YJ

The distribution function of the test statistic Tk is defined and given in closed form by
Thompson, (1991).

For relatively small sampie sizes, the evaluation oftbis detector revealed that a significant
loss in detection probability can occur when an overly cautious approach toward
robustness is taken.
For larger sampie sizes, the closed form test statistic distribution function proposed by
Thompson, (1991), is computationally difticult to use. For tbis reason, simulation
techniques can be used for larger sampie sizes. A modified Monte Carlo simulation
technique known as improved importance sampling has been suggested by Thompson
and has been shown to significantly reduce the number of simulation runs required to
estimate smaller false alarm probabilities.

B. Multivariate statistical aids


The statistical aids presented in the present section can be applied to multioutput
processes with outputs statistically dependent between them. An attempt has been made
to give the limiting distribution under the null hypothesis (Ho) in order to be able to
calculate Plfor these cases.
BI. Tests ofmean
These tests check whether the observed innovation sequence is zero mean or not. Let
X) =(Xlj ... Xpj);j=l, ... , n denote a sampie !Tom a p-dimensional, absolutely
16 Real time fault monitoring of industrial processes

continuous residual population with a p-vector mean 8. One wishes to test (Ho): 8 = 0
versus (Ha): (}:f:. O. Here 0 is used without loss of generality, since (HO):8=80 can be
tested by subtracting 8 0 from each observation vector and testing whether these
differences are located at o.
The classical procedure for this problem is Hotelling's T2 (Anderson, 1958, Randles,
1989), which assurnes that the underlying population is p-variate normal, that is, an
NpC8,I) distribution with mean vector 8and variance-covariance matrix I(pxp). It rejects
(Ho) in favour of(HJ if,
2 -T 1-
T =nX S- X~(n-l)pFa(p,n-p)/(n-p)

where X is the sampie mean vector,


n
S = ~)Xi -X)(Xi _X)T / (n -I)
i=1

is the unbiased estimator of the variance-covariance matrix and Fa (nb n2) is the upper
ath quantile of an F distribution with nl and n2 degrees of freedom. This test is quite
effective and has many nice properties, inc1uding the intuitive property that it is invariant
under all nonsingular linear transformations of the data. That is, if Jj=DXj; j=l, ... , n,
where D is any nonsingular pxp matrix, then T 2 (l] . . . Yn ) =T 2 (XI .. . X n)

Thus, if the data points are rotated or if they are reflected around a p-l dimensional
hyperplane or if the scales of measurement are altered, the value of T2 stays the same.
This property is intuitively appealing, and it also ensures that the performance of T2 is
the same for any variance-covariance matrix I. Many nonparametric competitors to
Hotelling's r 2 have been proposed which concentrate on sign statistics, that is, ones that
use the direction of the observations from 0 rather than the distances from o. The most
popular such statistic is the component sign test, which uses a sign statistic for each
component of the vectors and combines them in a quadratic form. Let
ST = (SI ... Sp) where,
n
Sj = Lsgn(Xij) and sgn(t)=I,(O,-I) as t>(=,<)O
j=l

The test rejects (Ho) for large values of,


n
S; =ST (nWrIS, where ~i' = n-ILsgn(Xij)sgn(Xi'j)
j=1

for 1 :::; i and i':::; p. The limiting distribution of Sn· under Ho is 4.


The statistic Sn· is not invariant under nonsingular linear transformations, and it does
not have a small-sample distribution-free property unless a null distribution is generated
Fault detection and diagnosis methods in the absence ofprocess model 17

by conditioning on the observed vectors, giving equal probability to each data vector
being the observed one or a point on the opposite side of o. The performance properties
of S: vary depending on X and the direction of shift from o. It has been demonstrated by
Randles, (1989), that it may not perform weIl when there are substantial correlations
among variates in the vectors. In an etfort to stabilize the performance, this test can be
performed on transformed data. This creates invariant (or asymptotically invariant)
procedures, that for any X, have significance levels and power comparable with those of
S: when X= I, (Randles 1989).
Distribution-free tests are also investigated by Randles, (1989), for the one-sample
multivariate location problem. Counts, called interdirections, which measure the angular
distance between two observation vectors relative to the positions of the other
observations, are introduced there. These counts are invariant under nonsingular linear
transformations and have a small-sample distribution-free property over a broad class of
population models, called distributions with elliptical directions, which include all
elliptically symmetrie populations and many skewed populations. A sign test based on in-
terdirections is described, including, as special cases, the two-sided univariate sign test
and Blumen's bivariate sign test (Blumen, 1958). The statistic is shown to have a limiting
z! null distribution and, because it is based on interdirections, it is also seen to be
invariant and to have a small-sample distribution-free property. Pitman asymptotic
relative efficiencies and a Monte Carlo study have shown the test to perform weil
compared with Hotelling's T2, particularly when the underlying population is heavy-
tailed or skewed. In addition, it consistently outperforms the component sign test, which
is often recommended in the nonparametrie literature (Anderson (1958), MacNeill
(1974), Randles (1989».
B2. Test 0/ covariance
The unbiased covariance of the p-dimensional residual population sequence is estimated
as,

1 ~ - -T
SO =-L..J(Xj -X)(Xj -X)
A

n -1 j=l
Under the null hypothesis,So has a Wishart distribution (Kendall 1982, Anderson 1958).
The trace of So has a chi-square distribution with (n-l)p degrees of freedom. Thus So
can be tested for its null hypothesis covariance, equal to an identity matrix for the case of
the standardized residual sequence.
Also here, the parametrie tests of mean and covariance assume that the residual sequence
is white.
18 Real time fault monitoring of industrial processes

B3. Test for autoco"elation (whiteness)


There exists several tests for autocorrelation and randomness in multiple time series
(Chitturi, (1976), MacNeill, (1974), Liggett, (1977), Ali, (1989), Robert, (1985». The
exact distributions of these statistics are generally unknown and the asymptotic
distributions that are known, do not provide adequate approximation to the exact ones in
small sampies. The test statistics were modified by Ali, (1989). Asymptotically, these
modified statistics are equivalent to their original counterparts; however, it is found that
the asymptotic distributions of these statistics provide adequate approximation to the
exact ones in relatively small sampies and possibly when the time series are nonnormal.
The adequacy of the approximations is examined there by simulation experiments. The
original test statistics are based on sampie lag cross-covariances, autocovariances, cross-
correlations, and autocorrelations standardized by their asymptotic means and
covariances. The modified statistics are obtained when the asymptotic means and
covariances in the standardization are replaced by the exact means and covariances. The
expressions for these moments are derived on the assumption that the time series is
Gaussian. These moments (both asymptotic and exact) involve nuisance parameters. In
constructing the test statistics, these nuisance parameters are replaced by their sampie
counterparts, which are consistent estimates of the parameters.
Let {Yt} be an m-variate stationary stochastic process with E{Yt} = P and,

cov{Yt,Yt+k} = E{(Yt - P)(Yt+k - p)} T = r k = (Yk(i,j»


Pk =(Pk(i,j»; Pk(i,j) = Yk(i,j) / ~ Yo (i,i)yoU, j) ; k = 0,1,2, ...

, , ... , n, be the sampIed t'Ime senes,


Let Yt = [Ylt ... Y mt ]T ; t = 12 . Y-T = [-
YI ... Y- m )T
D

where Yi = L(Yil / n);i = 1,2, ... ,m and eit =Yit - Yi; ;=1, 2, ... , m; t=1, 2, ... , n. Then
1=1
for k=O, 1,2, ... , n-l,
n-k
ck(i,j) = ~)ejtej(t+k» / n
t=1

and,

Ik(i,j) = ck(i,j) / ~co(i,i)coU,j); k =1,2, ... ,(n -I); i,j =1,2, ... ,m.
A test for lag k; k= 1,2, ... , autocorrelation and cross-correlation, pi; ,j), can be based
on the statistic rk(i ,j). The null hypothesis ofthis test is thatYt; t=l, 2, ... , n, is a random
sampie. The alternative is that Pk(i ,j) 'I:- 0.
Fault detection and diagnosis methods in the absence ofprocess model 19

It may be desirable to test simultaneously for all autocorrelation and cross-correlations at


a specific lag k; k=l, 2, ... A suggested portmanteau statistic to test such a hypothesis is,

Qk = nLijr;(i,j)
An analogous test based on the following statistic is suggested by Ali (1989),

QSk =n(vecCkl (C~l ® C~l)(vecCk)


where (vecA) is the vector obtained by stacking the columns of the matrix A and ® is
the standard direct product of matrices. The statistic QSk can be obtained from the
statistic proposed by Chitturi, (1976), by defining,

R k =CkC~l
Under the null hypothesis of randomness, following Chitturi (1976), rk (i ,j) are normal
asymptotically and both Qk and QSk are distributed as the chi- squared variable with m2
degrees of freedom. Unfortunately, the distribution of these statistics , rk (i ,j), Qk and
QSk is unknown in small sampies.
There is a long history in the investigation of the distribution of the sampie lag
autocorrelation, rt<i, i) (Kendall (1982), Anderson, 1971). Except for some partial
successes, there has been no practical solution to this problem. Only recently, Ali (1989),
has observed that for relatively small sampie sizes, the null distribution of rt<i, i) can be
weil approximated by a normal distribution matching the first two exact moments. Thus,
it is suggested that the null distribution of rk (i,j), ;:t:j, be approximated by anormal
distribution matching the first two moments. Alternatively, the statistic rk (i ,j) IS
modified to,

rk(i,j) =(rk(i,j) - E{rk(i,j)}) / ~var{rk(i,j)}


and the distribution of this modified statistic is approximated by its asymptotic
distribution, which is normal. Both E{rii ,j)} and var{rii ,j)} involve nuisance
parameters Po(i,}). These are replaced by their consistent estimates ro(i,j). Ali, (1989),
gave the expressions for the first two exact moments of rk (i ,}) , it:j, assuming Yt to be
normally distributed.
The statistic QSk can alternatively be written as,

QSk =(vecCk - Ekl A k1(vec C k - E k )


where Ek is the asymptotic mean vector, which is a vector of zeros under the null
hypotnesis ofrandomness, andAk=(CO ® Co) / n is the estimated asymptotic covariance
matrix for (vec Ck ). Given experience with approximating the distribution ofrt<i,j) using
its exact moments (Ali, 1989) it is suggested that the statistic QSk be modified to,
QAk = (vecCk - E{vecCk})T Bk1(vec C k -E{vecCk })
20 Real time fault monitoring of industrial processes

where E(vecCk) is the estimated exact first moment of(vecCk), i.e. E(vecCk), and Bk is
the estimated exact covariance matrix for (vecCk) , i.e. Bk' Both E(vecCk) and Bk involve
nuisance parameters Po(i ,j), which are replaced by their consistent estimates TOO ,j) to
obtain E(vecCk ) and Bk . The exact mean vector and covariance matrix for (vecC0
are derived by Ali (1989), assuming Yt to be normally distributed. QA k can be shown to
be asymptotically equivalent to QSk . The null distribution of QA k will be approximated
by a chi-squared distribution with m2 degrees of freedom.
The statistic QA k is a modification of QSk in that the asymptotic mean and covariance
matrix of (vecCk) in QSk are replaced by their exact counterparts. Thus QA k is the mean-
and covariance-corrected QSk' Similarly, one could modify QSk by correcting only the
mean of (vecCk) and the covariance matrix of (vecCk). These modified statistics are,
respectively,

and,
QAVk =(vecCk ) T Bk
"_1
(vecCk )
One may also apply a correction to QAMk to obtain another statistic,
QAMHk = ( n / (n - k) ) QAMk
Each of the statistics QAMk, QAVk, and QAMHk is asymptotically distributed as a chi-
squared variable with m2 degrees of freedom. It is expected that this asymptotic
distribution will provide adequate approximation to the null distributions of these
statistics in small sampIes.
The statistic QSk may be modified to QHk = (n/(n - k»QSk and to QMk = QSk + m2/n.
The adequacy of approximating the null distribution of QHk and QMk by a chi-squared
distribution with m2 degrees offreedom, were also examined by Ali (1989).
In all, the statistics,
Tk(i,j), Qk' QAk , QAMk , QAVk , QAMHk , QSk' QHk and QMk
may be considered. The null distribution of Tk(i,j) is approximated by the standard
normal distribution, and the null distribution of each of the rest of the eight statistics is
approximated by a chi-squared distribution with m2 degrees of freedom. These eight
statistics are referred to as Q statistics.

1.2.1.1 Limit checking fault monitoring in electrical drives

Several devices have been developed for fault detection in industrial drives, performing
limit checking tests.
Fault detection and diagnosis methods in the absence ofprocess model 21

Limiting states for the healthy or faulty operation of a Switched Reluctance Motor
(SRM) drive is given in the table below:

Performance effects of staged faults on a 4-phase switched reluctance motor

Fault Condition Relative Output Vibration Increase? Overcurrent?

Normal operation: l.0


Phase disconnected and open circuited: 3/4 No No
Phase disconnected and short circuited: 3/4 No No
Phase with shorted pole: 7/8 Yes No
Phase with midpoint shorted to ground: >7/8 Yes Severe
Phase-to-phase short: l.0 Yes Yes

One obvious fault detection device is a simple overcurrent detector operating from the
eurrent sensor signal, setting a comparator having a threshold above the normal
operating range of the phase eurrents. This detector is easy to implement but is not fast
acting, since a fault indication is not set until the eurrent is already very high. Since the
detector operates from the current sensor, it will also not detect all kinds of faults. The
detectors must be able to operate quickly enough to interrupt a fault in progress before
damage to the inverter power switches occurs.
Differential detectors are devices utilizing comparators to provide a logic signal
indication of a winding fault and generally they do not exhibit the previously mentioned
drawbacks. Current differential, flux differential and rate-of-rise detectors are the most
utilized limit checking devices. Stephens, (1991), describes a SRM drive with an
implemented limit checking fault detection and management system which should be an
ideal selection for reliability-premium drive systems in aerospace, industry and
automotive.
Spee and Wallace, (1990), propose failure diagnostics and remedial operating strategies
for brushless dc-drives, based on correlations between the predictions of a simulation
program and test measurements. Comparison of normal operation with the performance
that occurs at the onset of faults has been shown to be capable of predicting post-fault
performance to a good level of confidence. The reader is referred to the latter paper for
details.

1.2.1.2 Steady-state and drift testing in a grinding-classijication circuit

Description of the circuit. Fig. 1.1 gives an outline of a process of grinding-


classification currently used in the field of mineral industry for the preparation of the
22 Real time fault monitoring of industrial processes

minerals. The role of tbis circuit is to reduce a mineral to a size small enough to free the
useful mineral from its gangue (Ceccbin, 1985).

,O,~~ ____________ ~

Figure 1.1 Grinding-classification circuit.

The studied circuit is composed of :


• A baIl-mill (4), fed with ore (stream 1 ) and with water (stream 2).
• A recovery sump for the grinded products (8); tbis sump receives an addition of
water (stream 6).
• An hydrocyclone (10) fed with pulp by stream (9). Tbis appliance assures the
separation between big particles, wbich have to be recycled (stream 11) and the fine
particles (stream 12).
The following sensors are instalied:
• stream I: sensor of ore flowrate (WI).
• stream 2: sensor ofwater flowrate (Q2).
• stream 6: sensor ofwater flowrate (Q6).
• stream 8: sensor oflevel (h8) and density (dS).
• stream 9: sensor ofpulp flowrate (Q9).
• stream 12: sensor ofpulp flowrate (QI2) and density (dI2).
The instalied actuators are:
• stream I: conveyer with adjustable speed.
• stream 2 and 6: pneumatic valves.
• stream 9: pump with adjustable speed.
The use of PID analog regulation modules has enabled the setting-up of the following
local regulations:
• regulation of ore flowrate (Wl).
Fault detection and diagnosis methods in the absence ofprocess model 23

• regulations ofwater flowrate (Q2 and Q6).


• regulation of pulp level (h8).
Realized decomposition. The circuit is split up into 5 subsystems:
• subsystem 1 ineludes the whole of mill-sump-hydrocyclone and it uses the totality of
the measured variables apart from measurement h8.
• subsystems 2, 3 and 4 eorrespond to the loeal regulations of Wl, Q2 and Q6.
These subsystems link the measurement to the set-point.
• subsystem 5 links level h8 to the inputs and outputs ofthe sump.
Implementation ofthe fault detection strategy.
Test ofsteady state~ Fig. 1.2 shows the evolution ofQ6, the evolution ofthe eorrespond-
ing test variable, U, and the result ofthe test of steady state (see paragraph A3). This test
is realized on a window of 50 measurements, that is to say 2.5 min. The discrimination
threshold ofthe test variable, U, is taken equal to 6.
Drift test. Fig 1.3 shows the results obtained by the realization of a drift test on the vari-
able 9 (see seetion A4). On fig. 1.3 it ean be seen the evolution of the non filtered vari-
able Q9, the mean of this variable realized on 50 measurements, and the test variable u,
ealeulated from a window of 50 measurements and from a window of 300 measure-
ments. The threshold of statie-dynamie discrimination is taken equal to 3 for the first
window and equal to 2 for the seeond. For the narrow window, it is notieed that the
variable U is inferior to its diserimination threshold: the variable Q9 is therefore consid-
ered in statie state for that window. For the window of the large width, the variable U
erosses its threshold very quiekly, so a slow drift is present.
Standard deviation test. This test is realized on all variables deelared to be in a steady
state. The value of the standard deviation of the measurements is compared to thresholds
determined by learning. If the standard deviation exeeeds the thresholds, there is a modi-
fieation of the noise due to bad eontaets or to a modifieation of the proeess noise (see
section A2).
Several "visible" failures were ereated on variable Q9 in order to prove the usefulness of
a standard deviation test. Fig. 1.4 shows the evolution ofvariable Q9. The funetioning of
the installation is normal exeept at the following points:
• From 5.5 to 10.5 mn: ehanging ofthe proportional eoefficient P in the regulation of
level.
• From 15.75 to 21.75 mn: suppression oflevel regulation, the pump works at a eon-
stant speed.
• From 26 to 31.5 mn: addition of an integral action I on the level regulation
• from 35 mn to the end: false intermittent eontacts on the eurrent output ofthe sensor
Q9.
• The value ofthe standard deviation ealeulated on a window of50 measurements. The
arbitrarily negative values of the standard deviation eorrespond to the points in dy-
namie state.
24 Real time fault monitoring of industrial processes

• The value ofthe variable "result ofthe tests". Tbis variable has 3 states:
It is zero when the variable Q9 is in static state, the process operation being normal.
It is negative when the variable Q9 is in dynamic state
It is positive when the variable Q9 is in static state and when the standard deviation
test has detected an anomaly.

lIh \Jh

~
620
600

00wc

I:' (\" """"",


10 t(min) measurement Q9
mcasurcmcnt Q6
6

o . V--y"lV* J-"1 ,
'" .~

f1=
test variable u, window = 50 pts.

~-
test variable u

result of test
rQ'_W"~
test variable U, window = 300 pts.

Figure 1.2 Test ofsteady state applied on Q6 Figure 1.3 Drift test applied on Q9

1.2.1.3 Conclusions

The a1gorithms described in tbis section may be used to provide a11 basic functions of the
fault monitoring strategy.
The efficiency of the localization and the precision of the diagnosis depend on the struc-
ture of the installation and on the knowledge of its operation quality and quantity of in-
formation.
Fault detection and diagnosis methods in the absence of process model 25

To improve a partial diagnosis, a "knowledge based system" realizing tbis synthesis is


needed and may propose some actions to the user: quick verification of a part of the in-
stallation, use of extra signals at certain points of the circuit, measurement of variables
normally not measured.

stati

result of tests

Figure 1.4 Standard deviation test applied on Q9

An extension of the diagnosis strategy concems the study of decision- making, that can
be considered from different aspects:
• Proceed to a thorough study of the behaviour of suspect elements.
• Correct defective measurements (in case ofa sensor failure).
• Compensate control signals (in case of actuators' failures).
• Reconfigure the control structure (in the case of loss of essential organs) and proceed
the substitutions and reparations of organs.
• Put installation in "emergency state" (in case ofa non-satisfying reconfiguration).
• Shut down production.
The knowledge- base aspects of the fault diagnosis problem will be investigated in detail
in Chapter 4.
26 Real time fault monitoring of industriaI processes

1.2.2 Proccss control charts

Control charts are used in monitoring the statistical state of a process whose measure-
ments are available sequentially in time. Some statistic w (sampIe mean or sampIe range
etc.) is computed from successive sampIes of size n and plotted on a graph containing
lower and upper limits corresponding to the critical region of the hypothesis on wunder
test. If the statistic w is distributed normally with mean mw and variance sw' where mw
and Sw are calculated a-priori, then typical limits are mw + 3sw for the Upper Control
Limit (UCL) and mw - 3sw for the Lower Control Limit (LCL). Such charts are usually
referred to as Shewhart charts (Fig. 1.5), originated in 1931.
1J

UCL ---------------------------------

LCL ------------------------------------

Figure 1. 5 Shewbart control chart

Control charts were used by Himmelblau, (1978), to monitor dynarnical systems and
detect malfunctions in the plant equipment quite recently. An extensive bibliography on
control charts is given by Vance, (1983).
Univariate Shewhart control chan techniques.
Univariate control charts can be used for the first stage of fault monitoring (as are the
cases of Section 1.2.1, paragraph A), as folIows: given successive sampIes of residuals
ri,j,ri+l,j+\·",ri+m,j+m, where rj,k =[rU) rU + 1) ... r(k)r, an appropri-
ate statistic is calculated and plotted on a corresponding control chart with the precom-
puted UCL aod LCL. Adecision that a fault has occurred is made when the statistic falls
outside its normal operation level for a specified subsequent number of times. This
procedure will decrease the probability oftype I errors (see section 1.2.1., paragraph A).
A logical tIowchart for a computer-operated control chart is shown in Fig. 1.6. The
calculations of the central line, UCL and LCL for the appropriate test statistic of
Fault detection and diagnosis methods in the absence ofprocess model 27

univariate Shewhart charts are summarized in Himmelblau (1978), where the reader is
referred for further details. The Shewhart chart essentially treats each sampie separately.
However, practical experience has shown that by taking into account the information
from all collected sampies, it is possible to determine in a better way whether the process
is in control or out of contro!.

_ t _.-pl• • 1._ n
",,4
upper an4 10wer 11.m1.ta
w , w

Figure J. 6 Flowchart for computer operated control chart.

Many efforts to use Shewhart charts in industry have been of limited success. This is not
because of a technical shortcoming of the method, but typically because of one or more
ofthe following reasons:
1. The formulas used to calculate the limits are incorrect.
2. The sampling and subgrouping plans used to supply and group data for the charts are
poorly chosen.
3. The improvement work demanded by the charts is so radical in the context of the
organization's culture, that the organization is unable to properly respond.
To one not familiar with industrial environments, the three preceding items may seem
uninteresting or even trivial, but they are the critical criteria for successful application.
Item 1 seems on the surface to be beneath one's attention. This problem is so widespread
however, that when consulting, one must regularly question clients on how they did their
28 Real time fault monitoring of industrial processes

calculations. The root cause in this case is a lack of understanding that: "the essential
statistieal power of control charts comes from using variation within the subgroups (or in
the case of individuals charts, between neighboring measurements) to calculate the width
of the control limits". Those who have missed this point have recommended, for
example, that limits on a control chart for individuals be based on the sampIe standard
deviation of all of the data thrown together, instead of the average moving range.
Similarly, some statistieal software allows the user to calculate control limits using the
sampIe standard deviation of the subgroup averages or of all of the individuals. Charts
with limits calculated in this fashion are nearly doomed to failure because special causes
occurring between subgroups will inflate the limits, making it harder to detect such
causes, which is the original purpose of the chart.
Item 2, the problem of poor sampling plans, is probably the least understood of the is-
sues. The ability of a control chart to signal trouble depends primarilyon the sampling
plan used to supply the chart with data. Indeed, being identified as special or common is
not due to a property of a cause itself but rather to the way in which the control chart
works as a window or filter through which the process is observed. By carefully selecting
the sampling and subgrouping, one can control what sources of variation will show up as
special causes. This makes control charts very versatile tools, since one can select several
filters with which to view the same process.
Proper response to the charts, item 3, is the most critical issue in successful application.
This requires prompt action by those closest to the process. In a manufacturing environ-
ment, this is typically the operator.
Every one should agree that Automatie Process Control (APC) is the most etfective
means of maintaining a setpoint with minimal variation. If one deals only with statistical
theory and ignores practical issues, however, one can make APe seem like a panacea.
Since APC systems are continually making physical adjustments to a process, there can
be increased wear, especially in a typieal industrial environment. This, together with the
maintenance requirements of the control equipment itself, can substantially increase
maintenance costs.
Experts estimate that at any point in time, from 25% to 35% of the world's advanced
automatie control systems are on manual. Lack of operator confidence is one critical rea-
son for this. Operators, like most people, have an inherent distrust of a "black box" that
makes decisions on a basis that they do not understand. This is not an inherent fault of
APC but rather of the way in which it may have been misapplied.
Box and Kramer, (1992), discuss rationales for process monitoring using some of the
control chart techniques of Statistieal Process Control (SPC) for feedback adjustment.
Minimum cost feedback schemes are discussed for some practically interesting models.
The critical question is how to integrate APC and SPC for total system improvement.
Hoerl and Palm, (1992), provided a more basic discussion of this topic. APC should be
applied to the critical variables that have a direct control knob to minimize variation,
Fault detection and diagnosis methods in the absence ofprocess model 29

manage setpoint changes, and ensure safety. The time and money required should be in-
vested. SPC should be applied to tbis system to monitor its long-term effectiveness, de-
tect special causes, and monitor the on-line measurement system. Experimental design
could be used to tune the control system. SPC or algorithmic statistical process control
(Vander Wiel et al., 1992) can be used as a substitute when the cost of APC cannot be
justified.
Cumulative sum (CUSUM) control charts are very effective in detecting special causes.
The CUSUM chart is usually maintained by taking sampies at fixed time intervals and
plotting a cumulative sum of differences between the sampie means and the target value
ordered in time, on the chart. The process mean is considered to be on target as long as
the CUSUM statistic computed from the sampies does not fall into the signal region of
the chart. A value of the CUSUM statistic in the signal region should be taken as an
indication that the process mean has changed and that the possible causes of the change
should be investigated.
The number of observations taken before an out-of-control signal is triggered, is called
the run length. The performance of a CUSUM chart is commonly measured by the
average run length (ARL) corresponding to various possible choices of the mask
parameters.
Cumulative sum (CUSUM) charts are often used instead of standard Shewhart charts
when detection of small changes in a process parameter is important. For comparable
average run lengths (ARLs) when the process is on-target, CUSUM charts can be de-
signed to give shorter ARLs than Shewhart charts for detecting certain small changes in
process parameters. The superiority of the CUSUM chart over the Shewhart chart also
holds when the Shewhart chart is augmented with runs rules. Thus, it is only natural to
investigate whether the shorter ARLs for the univariate case can be extended to the
multivariate case (Hawkins, (1992), Pignatiello, (1990), Blazek, (1987».
The exponentially weighted moving average control chart. A control chart technique
that may be of value to both rnanufacturing and continuous process quality control
engineers is the exponentially weighted moving average (EWMA) control chart (Hunter,
(1986), Lucas and Saccucci, (1990». The EWMA has its origins in the early work of
econometricians, and although its use in quality control has been recognized, it remains a
largely neglected tool. The EWMA chart is easy to plot, easy to interpret, and its control
limits are easy to obtain. Further, the EWMA leads naturally to an empirical dynamic
control equation.
The exponentially weighted moving average (EWMA) is a statistic with the characteristic
that it gives less and less weight to data as they get older and older. A plotted point on
an EWMA chart can be given a long memory, thus providing a chart similar to the ordi-
nary CUSUM chart, or it can be given a short memory and provide achart analogous to
a Shewhart chart.
30 Real time fault monitoring of industrial processes

The EWMA is very easily plotted and may be graphed simultaneously with the data ap-
pearing on a Shewhart chart. The EWMA is best plotted one time position ahead of the
most recent observation since it may be viewed as the forecast for the next observation.
The immediate purpose is only to plot the statistic. The EWMA is equal to the present
predicted value plus Ä times the present observed error of prediction. Thus,
EWMA =Yt+l =Yt + Äet
=Yt + Ä(Yt - Yt)
=ÄYt + (1- Ä)Yt
where, Yt+l is the predicted value at time t+l (the new EWMA), Yt is the observed
value at time t, Yt is the predicted value at time t (the old EWMA), et = Yt - Yt is the
observed error at time t and and Ä is a constant (O<Ä<I) that determines the depth of
memory of the EWMA. As shown in Hunter, (1986), the EWMA can be written as,
t
Yt+l = LWiYi
i=O
where the wi are weights defined by,
wi = l(1-Ä)t-i

with sum L~=o W j =1. The constant Ä determines the "memory" of the EWMA statistic.
That is, Ä determines the rate of decay of the weights and hence the amount of
information secured form the historical data. Note, that as Ä -+- 1, wl -+- 1 and Yt+l
practically equals the most recent observation Yt.
When the process is under control and Ä=I, points plotted on the classical Shewhart
chart and those on an EWMA chart are therefore almost equal in their ability to detect
signals of departures from assumptions. As Ä~O, the most recent observation has small
weight, and previous observations near equal (though lower) weights. Thus, as Ä~, the
EWMA takes on the appearance ofthe CUSUM. The EWMA control chart for values of
O<Ä<1 stands between the Shewhart and CUSUM control charts in its use ofthe histori-
cal data.
The choice of Ä can be left to the judgment of the quality control analyst or can be
estimated using an iterative least squares procedure. Tbe analyst, by considering the data
as new data arriving sequentially, could for different values of Ä., compute the
corresponding sequential set of predicted values j based on the EWMA. The value of Ä.
which provides the smallest error sum of squares is preferred, based upon this very
limited evidence.
As shown in Hunter, (1986), the variance ofthe EWMA is,
Var (EWMA) = [ÄI (2 - Ä)] (12

and thus,
Fault detection and diagnosis methods in the absence of process model 31

O"EWMA = [ )"1 (2 - ),,) ]0.5 0"


An estimate of 0"2 can be obtained from the minimum error sum of squares L e~
obtained while estimating).. That is,
T
8 2=Le;/(T-l)
t=l
When the Shewhart and EWMA charts are constructed and plotted simultaneously
O"EWMA can be estimated from the information used in establishing the 30" Shewhart
controllimits. That is,
o-EWMA =[21 (2 - 2)]°.5 o-Shewhart
The 30" controllimits for the EWMA are,
r±3O-EWMA
The Shewhart and the EWMA control limits may be conveniently placed on the same
chart.
In practice the classical Shewhart chart (only the last plotted point falling outside the 30"
limits providing a signal) is aided by the use of runs. For example, two out of three suc-
cessive points falling within the region q± 2uand q± 3u, or six points in sequence
above the chart's center line, are frequently taken to be signals that the process is no
longer under control.
Employing runs provides an informal use ofthe recent history and, in the bands ofan ex-
perienced analyst, can make the Shewhart chart take on the aspects of the EWMA chart.
However, the EWMA chart provides a regular and formal use of the historical data.
Runs and all other data configurations are encompassed in the EWMA forecast. Further,
once the EWMA has been computed it contains all the information provided by the his-
torical record. Tbere is no need to save or, as in tbe case of tbe modified CUSUM, to
give occasional zero weights to recorded observations.
The recognition that an EWMA control scheme can be represented as a Markov chain
allows its properties to be evaluated more easily and completely tban has previously been
done. Lucas and Saccucci, (1990), evaluate the properties of an EWMA control scheme
used to monitor tbe mean of a normally distributed process that may experience shifts
away from the target value. A design procedure for EWMA control schemes is given.
Parameter values not commonly used in the literature are shown to be useful for detect-
ing small shifts in a process. In addition, several enhancements to EWMA control
schemes ate considered. These include a fast initial response feature that makes the
EWMA control scheme more sensitive to start-up problems, a combined Shewhart
EWMA that provides protection against both large and small shifts in a process, and a
robust EWMA that provides protection against occasional outliers in the data that might
otherwise cause an out-of-control signal. An extensive comparison reveals that EWMA
32 Real time fault monitoring of industrial processes

control schemes have average TUn length properties similar to those for cumulative sum
control schemes.
The EWMA can be used as a dynamic process control tool. When the process mean Tl is
on the target (i.e., Tl=!') all three charting procedures, the Shewhart, CUSUM and
EWMA, are roughly equivalent in their ability to monitor departures from target.
However, the EWMA provides a forecast of where the process will be in the next in-
stance oftime. It thus provides a mechanism for dynamic process control.
To control a process it is convenient to forecast where the process will be in the next in-
stance of time. Then, if the forecast shows a future deviation from target that is too
large, some electro-mechanical control system or process operator can take remedial ac-
tion to compel the forecast to equal the target. In modem manufacturing, and particularly
where an observation is recorded on every piece manufactured or assembled, a forecast
based on the unfolding historical record can be used to initiate a feedback controlloop to
adjust the process.
Of course, if an operator is part of a feedback control loop he/she must know what
corrective action to perform, and Care must be taken to avoid inflating the variability of
the process by making changes too often. But control engineers long aga leamed how to
elose the feedback loop linking forecast and adjustment to target. The same information
feedback loop exists in many situations which only the operator can control. The EWMA
chart not only provides the operator with a forecast, but also with control limits to in-
form when the forecast is statistically significantly distant from the target. Thus, when an
EWMA signal is obtained, appropriate corrective action based on the size of the forecast
can often be devised.
The EWMA can be modified to enhance its ability to forecast. In situations where the
process mean steadily trends away from target the EWMA can be improved by adding a
second term to the EWMA prediction equation. That is,
modified EWMA: Yt+! =Yt + A!ct + ~ L ct
where Al and ~ are constants that weight the error at time t and the sum of the errors
accumulated to time t. The coefficients Al and Al can be estimated from historical data
by an iterative least squares procedure similar to that mentioned earlier for the estimation
of the constant A.
A third term can be added to the EWMA prediction equation to give the empirical con-
trol equation,
Yt+! = Yt + A1et + ,.1,2 :Let + ,.1,3Vet
where the symbol V means the first difference of the errors et ; that is Vet =ct - et-I'
Now observe that the forecast Yt+l equals the present predicted value (zero ifthe proc-
ess has been adjusted to the target) plus three quantities: one proportional to et> the sec-
ond a function ofthe sum ofthe et , and the third a function ofthe first difference ofthe
Fault detection and diagnosis methods in the absence ofprocess model 33

et. These terms are sometimes called the "proportional", "integral", and "differential"
terms in the process control engineer's basic proportional, integral, differential (PID)
control equation. The parameters Ä}> ~ and ÄJ weight the historical data to give the best
forecast.
The EWMA may thus be viewed as more than an alternative to either the Shewhart or
the CUSUM control charts. The EWMA may also be viewed as a dynamic control
mechanism to help keep a process mean on target whenever discrete data on manufac-
tured items or on the process operation are sequentially available.
Multivariate control charts.
The qUality of the output of a production process is often measured by the joint level of
several correlated characteristics. For example, a chemical process may be a function of
temperature and pressure, both of which need to be monitored carefully; a particular
grade oflumber might depend on correlated characteristics such as stiffness and bending
strength. In a geochemical process in coal mining each observation consists of 14
correlated characteristics. In these types of situations, separate univariate control charts
for each characteristic are often utilized to detect changes in the inherent variability of
the process. When these characteristics are mutually correlated, however, the univariate
charts are not as sensitive as multivariate methods that capitalize on the correlation.
One common method of constructing multivariate control charts is based on Hotelling's
T2 statistic. Currently, when a process is in the start-up stage and only individual obser-
vations are available, approximate F and chi-square distributions are used to construct
the necessary multivariate control limits. These approximations are conservative in this
situation. Tracy et al. (1992), present an exact method based on the beta distribution, for
constructing multivariate control limits at the start-up stage. An example from the
chemical industry illustrates that this procedure is an improvement over the approximate
techniques.
Note that in the following, monitoring of the mean of a multivariate normal process is
required. The term "on-target" is used to indicate that the process is in-control with
respect to its mean. Likewise, the term "off-target" is used to indicate that the mean of
the multivariate normal process has shifted.
For successive sampies, multivariate control chart techniques used for controlling the
mean of the multivariate normal process can be interpreted as repeated tests of signifi-
cance of the form,
Ho: P = Po
H 1: p;:I; Po
where p represents a multivariate normal process mean, whose true value is unknown
and Po is the target value for the parameter vector. For simplicity, it will be assumed that
Po=O, since the general case can be handled easily by translation.
34 Real time fault monitoring of industrial processes

Let X t =[Xlt Xpt]T denote the p-vector of quality characteristic measurements


made on a part of a multivariate normal process where Xj,t is the observation on variate j
at time t. It is assumed that the successive X t are independent and identically distributed
multivariate normal random vectors with known and constant covariance matrix E. That
is, theXt are i.i.d. Np{P, E). Without loss ofgenerality, letXt denote the (sampie) mean
vector at time t and let E denote its covariance matrix.
The Multivariate Shewhlll1 XZ Chlll1.
To test the previous hypothesis, the null hypothesis should be rejected at time t if
t- > z7,;a where,
t- =(Xt - Po)T E-1(Xt - Po)

and Z;;a is the upper lOOa percentage point of the z2 distribution with p degrees of
freedom. The noncentrality parameter associated with z2 is,
Ä. 2(p) =(p - Po)T E- 1(p - Po)
Note that M,p), the square root ofthe noncentrality parameter, is often used to represent
a measure of the distance of p from Po. This measure of distance is also called the
Mahalanobis distance or the statistical distance. Note that the straight line or Euclidian
distance assumes an identity covariance matrix instead. Henceforth, the word "distance"
will be used to mean the square root of the noncentrality parameter defined above.
A z2 control chart operates by plotting z2 on achart with an appropriate UCL. If a point
plots above the upper controllimit, the process mean is deemed to be out of control and
the assignable causes ofthe variation are sought. The average run length (ARL) ofthis
control scheme can be calculated as 1/P where P denotes the probability that z2 exceeds
the UCL. The on-target value of P is determined from the probability that z2 exceeds the
UCL under the central Z; distribution while the off-target value of P is the probability
that z2 exceeds the UCL under the non-central Z; distribution.
The multiple univariate CUSUM scheme.
A p-dimensional multivariate normal process can be monitored by using p two-sided uni-
variate CUSUM charts. The ph two-sided univariate CUSUM is operated by forming
the cumulative sums,
Sj,t =max(O, Sj,t-l +Xjt - kj )

Tj,t =min(O, Tj,t-l +Xjt +kj }


where Sj,O ~ °
0, Tj,o > 0, k j > and Xjt is the sampie mean at time t for variatej.
The ph two-sided chart signals that the corresponding process mean has shifted when
either Sj,t > k j or Tj,t < -hj for some CUSUM control chart parameters '5' and hj" The
Fault detection and diagnosis methods in the absence ofprocess model 35

multiple univariate CUSUM scheme signals an off-target condition when any of the p
two-sided schemes produces an off-target signal. Therefore, the on-target average ron
length of the multiple univariate scheme is less than the average ron length of any one of
the univariate CUSUM charts.
Two new multivariate CUSUM charting procedures are proposed by Pignatiello, (1990).
These procedures make direct use of the covariance matrix and are based on quadratic
forms of the mean vector. The ARL performance for these charts has shown a superior
performance with the ARL of the classical CUSUM charts as compared by Pignatiello,
(1990). Overall, these charts appear to be a good control charting procedure for
detecting a variety of shifts in the mean of a multivariate normal process.
To introduce the first multivariate CUSUM scheme, the multivariate sum
t
Ct = :L(Xi - Po)
i=t-nt+ l
is considered, where nt can be interpreted as the number of subgroups since the most
recent renewal (i.e. zero value) of the CUSUM, and is formally defined below. Since
(1/ nt )Ct may be written as ,

the vector Ct / nt represents the difference between the accumulated sampie average and
the target value for the mean. Consequently, at time t, the multivariate process mean can
be estimated to be (Cr! nt)+Po. The norm of Ct
11 Ct 11 = ~"--C-:-E--I-C-t
is seen as a measure of the distance of our estimate of the mean of the process trom the
target mean for the process. A multivariate control chart can then be constrocted by
defining MC 1 as,

and,
nt - l + 1 if MCl t _l > 0
nt ={
1 otherwise
where the choice ofthe reference value k>O is discussed below. The MCl chart operates
by plotting MCl t on a control chart with an upper controllimit ofUCL 1. IfMCl t ex-
ceeds UCL I then the process is deemed to be off-target. Rather than basing a multivari-
ate CUSUM statistic on the square of the distance of the accumulated sampie average
from Po, one could consider the square of the distance of each sampie mean from Po and
36 Real time fault monitoring of industrial processes

then accumulate those squared distances. Hence, as an alternative to MC I, one could


consider the square of the distance D;
of the tth sampIe mean from the target value of
Powhere,

D; =(Xt - PO)T I-I(Xt - Po)


has a central X2 distribution with p degrees of freedom when the process is on-target and
a non-central X2 distribution when the process is off-target. A one-sided univariate
CUSUM can now be formed as,

MC2 t =max{O, MC2 t _1 + D; - k}


with MC2o=0. The choice ofthe reference value k is discussed below. To use tbis multi-
variate CUSUM, one would declare the process to be off-target if MC2 t exceeds an up-
per controllimit h2.
Choice of reference value k. In the (upper) one-sided, univariate CUSUM procedure
the reference value k is often taken to be the average of the expected values of the
process mean under (Ho): P=Jlo and (HI ): P=PI where 110 represents the on-target state
and PI a specified, unacceptable off-target state. The choice of a value for k follows from
the derivation ofthe CUSUM and from Wald's sequential probability ratio test. Although
the (non-) central X2-distribution of the observations for the (off-) on-target state of
MC2 is not symmetric, tbis same approach can be used. That is, the k used in MC2 is,
p + O.5Ä2 (PI)
Since the form ofMCI is different from the other CUSUM charts, the choice of k cannot
be derived analogously. Instead, one can chose k to be half of the distance of PI from 110
(where again, PI is a specified off-target state). That is, one can take k=0.5 '-(PI)' Note
that for both multivariate CUSUM charts, the value for k depends only on the magnitude
of the distance ofPI from 110. It may be possible to improve the performance of the MC 1
chart at selected off-target conditions with alternate choices for k.
The procedures presented above for treating quality control data assumes the existence
of normality and independence of the data. When either independence and/or normality
are not present, as it is usually the case, application of the presented methodology
introduces large errors in the analysis of the data and renders conclusions based on them
dubious.
Once serial correlation is confirmed, identification techniques can be used to suggest a
specific kind of model that might be worth considering. The identification techniques
make use of the autocorrelation and partial autocorrelation functions. When the identifi-
cation is complete, the likelihood function can provide maximum likelihood estimates of
the parameters ofthe identified model (Ljund, 1987).
Suppose that a correlation test revealed the presence of data dependence and the iden-
tification techniques suggested as the best model, an autoregressive process AR(P) of
order p. Vassilopoulos, (1978), has modified and extended the existing standard control
Fault detection and diagnosis methods in the absence ofprocess model 37

chart methodology by utilizing the time series analysis approach and by introducing de-
pendence via a second order autoregressive process (AR(2) model). Curves of the
modified auxiliary quality control factors are presented, showing the substantial effect of
dependence on the classical quality control factors.
Yashchin, (1993), discusses a situation in which one is interested in evaluating the run-
length characteristics of a cumulative sum control scheme when the underlying data
show the presence of serial correlation. In practical applications, situations of this type
are common in problems associated with monitoring such characteristics of data as
forecasting errors, measures of model adequacy, and variance components. The
discussed problem is also relevant in situations in which data transformations are used to
reduce the magnitude of serial correlation. The basic idea of analysis involves replacing
the sequence of serially correlated observations by a sequence of independent and
identically distributed observations for which the run-Iength characteristics of interest are
roughly the same. Applications of this method to several classes of processes arising in
the area of statistical process control are discussed in detail, and it is shown that it leads
to approximations that can be considered acceptable in many practical situations. The
reader is referred to Vasilopoulos, (1978), and Yashchin, (1993), for details.
Statistical process control (SPC) by displaying multivariate data.
Control charts are more valuable in practice especially when used as simple graphical
aids to let the process operator, who is untrained in statistical techniques, get amental
picture of the process history and interpret whether or not the quality of the process op-
eration is at a satisfactory level.
Displaying multivariate data is probably the most popular topic of current statistical
graphics research (Blazek, 1987). Particularly in the area of process control, where mi-
cro- and mini-computers collect, analyze, and store thousands of observations on each
phase of a process each day, effective graphical display of these data is a necessity. In
one plant the data collected by process control computers can amount to millions of bits
ofinformation. Even narrowing down the variables to Juran's "significant few" and using
computers to help process the data, graphics is the only means available to present this
plethora of data to the operators and supervisors for rapid interpretation of results.
This problem has been most keenly feit with the increasing use of statistical process con-
trol (SPC) methods. Although weil versed in basic techniques and SPC charting, opera-
tors and supervisors feel overwhelmed when they realize that their complex, interde-
pendent processes produce ten to twenty significant variables to be monitored simultane-
ously instead of just two or three. Although summary measures, like Hotelling's T2 pre-
sented in the previous section, can quantify the overall status of the system, these people
also need information on the individual contributors as weil. This is especially true in
facilities where modernization has produced new measurements for which component
interdependencies are unknown.
38 Real time fault monitoring of industrial processes

A graphic which will display the individual and collective relationship sequentially, called
polyplot, will be presented in the following. The polyplot is a g1yph in the shape of a
polygon with rays emanating from the vertices. A g1yph can represent a single observa-
tion, or, as it is often the case in SPC applications, a composite of several observations.
Each side is of equallength, and the number of sides of the polygon corresponds to the
number of variables being studied. Each vertex and corresponding ray is associated with
a specific variable. The rays of the polygon are oriented a10ng invisible line segments
passing from the center of the polygon through the appropriate vertex. The length of a
ray varies with the value of each observation. The vertex corresponds to the mean of
each variable Xj, and the distance from the mean is transformed into standard error units.
Thus one can plot,

which indicates the number of standard error units the ith data average is from its proc-
ess mean (Xj is the ith average of n data points; s is the estimate of the process standard
deviation and n is the sampie size). Values less than the mean are mapped by rays start-
ing at the vertex and going toward the center. Rays starting at the vertex and going away
from the center correspond to observed values greater than the mean. The distance be-
tween the vertex and the center is four standard error units. To ensure that the rays of
one g1yph do not overwrite a nearby g1yph nor extend past the center, rays greater than
four standard error units are truncated at that limit.
The use of different line styles or colors for the rays encodes the level of significance of
the deviation from the estimated mean. For example, the ray is dotted if the deviation is
witbin two standard error units, dashed between two and three standard error units, and
solid beyond three standard error units. If colors can be used, it is suggested to use blue
for rays less than two standard error units, green between two and three, and red to
signify rays beyond three standard error units.
Figs. 1. 7, 1. 8 and 1. 9 demonstrate the appearance of the g1yph for three, five and sev-
enteen variables, respectively. The glyph displays the relative number of variables dra-
matically as the polygon changes from a triangle to a pentagon to a virtual circle with
seventeen variables. Yet the figure still a1lows the user to identify individual variables
easily. The use of different line styles or colors is also effective in tbis setting. The pat-
tern of blue, green and red rays in addition to the varying ray-Iength lends itself to effec-
tive pattern recognition among related variables. For black and wbite presentation, it is
suggested that the use ofvarying line styles is a good substittute to color.
Because the initial application of the polyplot was as an SPC tool, the characteristics of
the glyph were designed to relate multivariate quality control information. The sequenc-
ing of the glyphs displays the production narrative over time, wbile the number of verti-
ces informs the reader of the number of variables being tracked. The length of each vari-
able's arm away from the polygon is significant information. That is, when a variable is
"in control" near its mean value, the arm length is very short.
Fault detection and diagnosis methods in the absence ofprocess model 39

~, I'
"-
1-
') ,
I /
/'

r / -/

/
Figure 1. 7 Three variable polyplot Figure 1.8 Five variable polyplot

.,

Figure 1.9 Seventeen variable polyplot

The dotted line style of the ray, or a1ternately the blue color, is unobtrusive. However,
when the variable is "out of control", the arm lengthens and the line becomes solid or
changes color, thus calling attention to itself visually. The glyphs are arranged in a time
sequence in a left to right and top to bottom order, the natural reading sequence in the
western world. Also, the use of differing line styles or colors emphasizes the points that
SPC personnel are most interested in studying. The shift from dotted to dashed (or blue
to green) occurs when the variable is approaching an "out-of-control" condition; the
change to asolid (or red) line occurs when that condition is reached. The experience has
40 Real time fault monitoring of industrial processes

been that tbe line pattern or color in combination witb tbe different arm lengtbs identifY
many relationships wbicb can be otberwise overlooked. Tbe use of patterned lines or
color therefore efficiently relates the scale without using "non-data" ink.

1.2.2.1 An application e.xampletor Statistical Process Control (SPC)

Polyplots are very useful in an SPC environment, especially when multivariate control
charting is appropriate. Univariate control charts can be misleading even when the corre-
lation between variables is moderate. Bivariate control charts limit the study to pairs of
variables and lack time sequence information (Himmelblau, 1978). In these SPC applica-
tions, the number of individual observations used to calculate the standard error and av-
erage for each glyph are under user control. The rays of each glyph correspond to uni-
variate X-bar chart entries. Each polyplot, then, represents the condition of tbe process
for a given unit of time. By reading consecutive glyphs in normal order, left to right and
top to bottom, the sequential bistory of a process is narrated. Finally, the bar to the left
of each glyph represents the relative value of Hotelling's T2 for the data displayed by
each glyph. Cbarting Hotelling's T2 provides an assessment of multivariate control in
time sequence, but usually one would like to combine the multivariate control
information with information about tbe univariate behavior. By including T2 information
in a polyplot, one can compactly convey both tbe multivariate and univariate control
information (Tracy et al., 1992). The following production quality control example taken
from Blazek, (1987), illustrates tbis idea.
In Fig. 1.10 tbe values of T2 associated witb vector subgroup averages are represented
by the height of a vertical bar plotted to the left of eacb glypb. The critical value for T2 is
the hash mark above the T2 bar. When tbe T2 value becomes significant and crosses the
hash mark, the bar changes line style from dots to solid. Therefore, when T2 exceeds a
controllimit, one not only knows that the process is out of control, but also which com-
bination of univariate signals led to the event. Note that tbis implementation efficiently
displays multivariate and univariate control information simultaneously. Figure 1.10
displays six measurements from 50 coils produced at a particular plant. The
measurements are identified as variable 1 through variable 6. Each glyph represents the
composite statistic on two consecutive coils.
The most interesting variable is Variable 1. After producing four good coils (the first two
glyphs), Variable 1 went out of control when the control system lost its calibration. As a
result the next 13 coils were out of control in a positive direction. During tbis time
Variables 2 and 4 were also atfected and went out of control at several points. The con-
trol system was tuned between coils 17 and 18 (and again after coil20 as it was overcor-
rected the first time), and good coils were produced until coil 36 (glyph 18). Then,
Variable 1 went out of control in a negative direction uotil the system was repaired
again. During tbis time Variables 2, 4 and 5 also showed out-of-control values. Normal
processing then resumed through tbe end of these records.
Fault detection and diagnosis methods in the absence ofprocess model 41

r
I <~j I-~1\~ LEGEND
\
-i )- -i

~ J
> l"{ .:. -< >.• \
- l' - CRITICfL

{
')-

~
/
-<\
>
·
....
(.
,.
\.
/
-(
)
1-: ,~ !-< '.~
t~
\. ~
1 l' -
2
0
1

I
-< 3 6
/ I.)
r. r "\
\~

....· .:.
, < >. >
"- -( ~ ..ä.. \. ~ .... \. -(

-! 5
r ,.
.. .., ':: -\
~ ,\r ~

1 VARIABLE -I
J .
j--< >- <
\. J · <
( )
.:.
\
2 VARIABLE -2

,) I-~/>. I ~,
... .J ..ä.. .~ ~

3 VARIABLE -3
4 VARIABLE -4
r. "'
.. · / .. 5 VARIABLE -5
. .:. J
(. )
6 VARIABLE -6
IJ
:
..... .:. -< ~ \
I {':
,
,. -<
I
-<
I
r "l\ r, -<
.) >-- ( .) >-
... - ...
\ \

\ 1
-( ~
..i.. ~ ~

. <\ "l\
>-
... -i

Figure 1.10 Six variable polyplot with Hotelling's T2 of production data,


2 observations per glyph

With tbis technique, both the ray and Hotelling "thermometer" visually indicated when
the system went out of control and why. The polyplot also demonstrates the process
relationsbip between Variables 2, 4 and 5 and Variable 1. The nature of the process is
such, that only certain types of Variable 1 problems also manifest themselves in the other
variables. Therefore, a conventional analysis of the data would probably not correlate the
42 Real time fault monitoring of industrial processes

two. With this graphical hint and some process knowledge, conjunctures about the actual
relationship can be developed.

1.2.2.2 Conclusions

A fault detection scheme using control charts involves:


(i) sampIe collection,
(ü) definition of an appropriate fault signature, called statistic in the present context,
(iii) adecision rule that discriminates between normal operation and faulty operation.
From the above discussion and Fig. l.6 the warning limits and action limits are the
decision rules used for fault detection. As it is mentioned earlier, the fault signature could
be the sampIe mean, range, standard deviation etc. Thus the design of the fault detection
scheme involves:
a) SpecifYing the sampIe size and sampling frequency.
b) Calculating the appropriate statistic.
c) Specifying the warning and action limits.
Polyplots are glyphs which display multivariate and univariate data simultaneously over
time. Bach glyph, a regular polygon with as many vertices as the number of variables of
interest, represents one time intervaI. By locating the mean at the vertex and using rays
from the vertex to indicate magnitude and direction, the polyplot becomes a univariate
control chart for every variable.
An advantage ofthe polyplot is that it can be implemented in a variety offorms. For in-
stance, in a SPC application, consider a quality measurement y modeled as a constant
plus a linear combination ofk components xl' X2,"" xk' These may represent a set ofkey
process variables which have the greatest impact on y, and the main effects of these vari-
ables dominate the variation in y. A polyplot of the xs with the height of the glyph pro-
portional to the value of y could provide valuable real-time process troubleshooting in-
formation.
The real power of SPC should be brought to bear to involve all employees in studying
their own processes to continuously improve them. This must involve not only manufac-
turing but marketing, safes, research and development, safety and environment, account-
ing, procurement, and so forth. As a general principle, one must consider automating
routine, bighly repetitive tasks, freeing the operators to use their intelligence for bigher
level problem-solving and prevention. The real future of computer technology in manu-
facturing and process monitoring is not in automating to replace workers but rather in
"informating", that is, in providing workers with unprecedented levels of knowledge
about the process, enabling them to make critical, collaborative decisions on the operat-
ing floor.
Fault detection and diagnosis mehods in the absence ofprocess model 43

1.3 Fault diagnosis based on signal analysis instrumentation

Machinery condition monitoring is gaining increasing general acceptance throughout in-


dustry as a means of achieving reduced maintenance costs, increased machine reliability
and increased safety standards. The correct and efficient application of condition moni-
toring to rotating machinery should enable plant and maintenance engineers to predict
the major events affecting machine mechanical condition (i.e. bearing failures) and plan
corrective action in advance. This will reduce or avoid the amount of routine mainte-
nance work undertaken at fixed time intervals.
An effective condition monitoring programme will usually offer significant economic ad-
vantages over routine or breakdown maintenance by providing advanced warning of
maintenance requirements. These can then be incorporated into a scheduled work pro-
gramme to reduce down-time and lost production.
The major concem of this section is to present methods and instruments to analyse vi-
bration or acoustical signals generated by a machine to reveal its operating conditions.
The vibrations or the noise produced by mechanism forces can be used to reveal faults in
the mechanism itself or some change in the vibration path. The fault revealed may be an
actual malfunction or perhaps ooly a change in an operating parameter of the machine. In
the latter case, the operating condition may be changed by a control system, but if a fail-
ure is detected, the machine probably needs to be shut down. Changes in the vibratory
path may signal the need for the replacement or repair of structural elements. The use of
vibration or noise to detect signal changes in mechanisms or structures is termed by
Lyon, (1987) and Baines, (1987), machinery diagnostics or machine health monitoring.

1.3.1 Machine health monitoring methods

The success of a condition monitoring (CM) system is as dependent on its planning and
design as on the sensors and signal analysis techniques used. Before the sensor fit and
techniques can be practically decided, various features should be considered by the op-
erator to define the requirements:
• Which items of machinery should be monitored ?
• What sort of faults should the system detect ?
• What level of diagnostic/prognostic information is required ?
GeneraIly the answers indicate that the monitoring strategy should aim at reducing the
number and severity of failure incidents between overhauls (which have high consequen-
tial costs in terms of damage and loss of availability) and increasing the prognostic ca-
pability so that maintenance can be planned effectively. The high plant availability and
low maintenance cost requirements demanded in the current economic c1imate necessi-
tate efficient, cost-effective monitoring systems. Their performance can be assessed by
the criteria:
44 Real time fault monitoring of industrial processes

• Diagnostic and prognostic capability.


• Maximum information from minimum sensor fit.
• Low false-alarm rate.
• Low missed-alarm rate.
In general the equipment to be monitored must be considered as a number of separate
machinery components and it is essential to identify the items which experience most
faults and therefore cause most loss of revenue: these often vary between different instal-
lations. In many cases it is the auxiliary equipment which cause the largest reduction in
machine availability and therefore they need to be incorporated into the condition
monitoring system design.
Identification of the items to be included in the CM system and the level of monitoring to
be applied can be assessed by careful consideration of the plant operation and its fault
history; this process is worthy and will not be dealt with in great detail here.
Many current CM systems are derived from control and alarm instrumentation designed
for a supervisory rather than a monitoring capability. This sort of system uses a limited
number of sensors in key positions, generally on the main pieces of equipment, to moni-
tor simple parameters for alarm level crossings.
The conventional monitoring techniques which will be presented in the following and in
particular traditional vibration analysis methods are essentially energy methods and
therefore only detect faults that generate significant amounts of changes in energy. Many
faults only generate sufficient energy to trigger the alarm in the later stages of develop-
ment, so that they are not detected until significant and costly damage has occurred.
Efficient extraction offault signatures from sensory data is a major concern in fault diag-
nosis. A general self-tuning method of fault signature extraction that enhances fault de-
tection, minimizes false alarms, improves diagnosability, and reduces fault signature vari-
ability will be also briefly presented.
Vibration analysis.
Machine operation involves the generation of forces and motions that produce vibra-
tions. These generating events are called sources. When vibration is measured from a
transducer mounted on the casing of a machine, what is actually measured is the original
force signal from the source of the signal, modified by the characteristics of the trans-
mission path from the source to the measurement point. Expressed in terms of frequency
this modification is a multiplication by the mobility of the transmission path.
A developing fault in a machine will show up as increasing vibration at a frequency as-
sociated with the fault. However, the fault might be weIl developed before it affects the
overall vibration level. A frequency analysis of the vibration, on the other hand will give a
much earlier warning of the fault, since it is selective, and will allow the increasing vibra-
tion at the frequency associated with the fault to be identified (see fig. 1.11).
Fault detection and diagnosis mehods in the absence ofprocess model 45

vibration level
overall level
J

Frequency

Figure 1.11 Frequency analyzed results give earlier waming.

The vibration from a rotating maehine varies as parts wear out and are replaeed.
However, this variation is over sueh a long period that the signal ean usually be regarded
as being stationary. Truly stationary signals do not really exist in nature. Non-stationary
signals ean exhibit a short-term or a long-term variation. For instanee, vibration from a
reciproeating maehine is stationary when regarded over several eycles, but over a single
eycle, whieh eonsists of several transients, it is non-stationary. Vibration from a maehine
which is running up or down in speed, however, is non-stationary on a long term basis.
Stationary deterministie signals show the well-known line speetra. When the speetral
Iines show a harmonie relationship, the signal is deseribed as being periodie. An example
of aperiodie signal is vibration from a rotating shaft. Where no harmonie relationship
exists, the signal is deseribed as being quasi-periodic. An example of a quasi-periodie
signal is vibration from a turbojet engine, where the vibration signal from the two or
more shafts rotating at different frequencies produee different harmonie series bearing no
relationship to eaeh other (Lyon 1987, RandalI).
The weil known Fourier Transform (Lyon 1987) gives the mathematieal eonneetion be-
tween time and frequeney, and viee versa, and given a time signal allows ealculation of
the speetrum. The Fast Fourier Transform (FFT), see Appendix I.B, is merely an effi-
eient means of ealeulating the diserete form ofthe Fourier Transform (DFT).
A deterministie signal ean be analyzed by stepping or sweeping a filter aeross the fre-
queney span of interest and measuring the power transmitted in eaeh frequeney band. A
random signal is a eontinuous signal whose properties ean only be deseribed using statis-
tieal parameters. Examples of random signals are eavitation and turbulenee. Random sig-
nals produee eontinuous speetra. Sinee random signals have eontinuous speetra, the
amount of power transmitted by the analyzing filter will depend on the filter bandwidth.
In a11 frequeney analysis, there is a bandwidth-time limitation. When using a filter, it
shows up as the response time ofthe filter. Afilter having a bandwidth of B(Hz) will take
approximately 1fB seeonds to respond to a signal applied to its input. If the analyzing
filter is B (Hz) wide, one has to wait at least IfB seeonds for a measurement. After
46 Real time fault monitoring of industrial processes

filtration, the filter output must be detected. One can detect the peak level passed by the
filter, the average level, the mean square level, or the root mean square level. Mean
square or root mean square detection is used, since it relates to the energy or power
content of the signal independent of the phase relationships. Peak detection is relevant
when maximum excursions are important. Mean square and root mean square detection
require that the output of the analyzing filter be squared and averaged. The period over
which the square ofthe filter output is averaged is called the averaging time, TA'
With random signals, averaging is used to reduce the standard deviation, 0; of the meas-
ured estimate. For a mean square measurement, then:
I
cr=---
~BTA
where Bis the analyzing filter bandwidth and TA is the averaging time. For a root mean
square measurement:
I
cr--==
- 2~BTA
The above assurnes that BTA~ 1O. When FFT analyzers are used, BT will usually be equal
to nd, the number of averages. However, this will depend on the overIap conditions set.
OverIap is where overlapping time records are analyzed and averaged. 0% overIap means
that only results from statistically independent records are averaged.
As a consequence of the Central Limit Theorem, it can be assumed that any narrow band
filtered random signal follows a gaussian distribution. Hence, from the properties of the
Gaussian Distribution, there is a 68.3% chance of being within ±o, a 95.5% chance of
being within ±20, and a 99.7% chance ofbeing within ±30 ofthe true mean value ofthe
signal.
Many FFT analyzers can also average signals in the time domain. Time domain averaging
can be used with repetitive signals, for instance, repeated transients or vibration from
rotating machines, to suppress extraneous noise. However, there must be a trigger signal
synchronous with the signal being averaged. The amount of noise suppression (for ran-
dom noise) which can be achieved with time domain averaging is equal to 1 / Jn;,
where nd is the number of time domain records averaged. Time domain averaging is also
called signal enhancement, or synchronous averaging.
In vibration measurements external measurements of internal effects must be made.
However, the transmission path characteristics from the source of vibration to the meas-
uring point will vary from machine to machine, even if the machines are of the same de-
sign and construction. This is due to differences in castings, welds, tightness of bolts,
etc .. Even in a single machine, the transmission path characteristics will vary with fre-
quency.
Fault detection and diagnosis methods in the absence of process model 47

SmaU, insignificant components can be amplified by resonances, and large, significant


components damped by anti-resonances. Hence, it is essential to measure the spectra
over a large dynamic range, because the largest spectral components are not necessarily
the most significant. What can be measured on a machine is the change in vibration, and
tbis will be transferable from machine to machine. If the system is linear, a relative
change in the vibration at the source will give the same relative change in the vibration at
the measuring point. Hence it is the relative change which is important. When relative
changes are important, it is often convenient to express them in dB:

Change in dB = log Al
A2
where Al is the present level and A2 the previous level. The same relative change will
give the same change in dB, independent of the absolute levels measured. The absolute
levels themselves, however, will depend on the transmission path characteristics. This
can be extended to making all measurements in dB refer to a common reference.
Changes in vibration levels can then be conveniently plotted simply by subtracting the
previous vibration level in dB form the present level. The two most commonly used
methods of presenting data in the frequency domain are constant bandwidth on a linear
frequency sCale, and constant relative bandwidth on a logarithmic frequency scale. The
two methods have their own different applications. The former gives equal resolution
along the frequency axis, making it easier to identify such things as families of harmonies
and sidebands. Its limitation is that it can only be used across a frequency range of about
1 and 1/2 decades. The latter can be used across a broader frequency range (3 or 4
decades is typical). Its drawback is that the resolution gets progressively worse at higher
frequencies.
In fault detection, it is necessary to use a broad frequency range in order that all machine
faults can be detected. Typically this requires a range of under half the slowest shaft
speed to more than three times the highest toothmeshing frequency. Also, the possibility
of easy speed compensation is desirable, since macbine speeds will vary from measure-
ment to measurement. Both these requirements are fulfilled by constant relative band-
width analysis on a logarithmic frequency axis. Once again, the spectrum should be plot-
ted with logarithmic amplitude.
The most basic level of vibration measurement is to measure the overall vibration level
on a broadband basis in a range of, for example, 10-1000 Hz or 10-10000 Hz. Such
measurements are also relevant with displacement measurements from proximity probes,
where the frequency band of interest is usually from about 30% of the running speed up
to about the 4th harmonie. An increasing vibration level is an indicator of deteriorating
machine condition. Trend analysis involves plotting the vibration level as a function of
time, and using this to predict when the machine must be shut down for repair. Another
way of using the measurements is to compare them with published vibration criteria.
48 Real time fault monitoring of industrial processes

One example of a published vibration criterion chart is the General Machinery Criterion
Chart, which is for displacement or velocity measurements at the bearing cap. In this
chart there is always a factor of two involved in movement from one class to the next,
that is a constant interval of6 dB, and logarithmic axes are employed (Mitchell, 1981).
Another example is VDI 2056, from Germany, shown in fig. 1.12. It is for measurements
at the bearings of the machine of interest in a frequency range of 10Hz to 1000 Hz.

153 45
tI)
....... 149 28 Not pcrmissiblc
~ 145 18 Not pcnni..ible
Not pcrmisliblc

IQ
10

...-!
141 tI) 11.2 20 dB (x 10)
.......
4-1 137 Si 7.1 Jot tole""'l.
Q)
Si
.....
~ 133 4.5 Just tolerable
>t
.j..J
~ 129 2.8 Allowablc
'''; Just tolcrable:
't1 u
:> 125 0
....-t
U Aflowable

:>! 121 Cl! 1.12 Allowabl.


Good
.j..J
.r-! :> Good
U 117 U)
0.71
0 ~
....-t ~ Good ZArge macbiDes witIJ
113 0.45
Cl! rigid lUId lJ..vy (0""-
:> 109
Medium m"chmes
15 - 75kWorup to
dalioDs dose
lJJImnd JTeqUlmcy
00 0.28 300 kW OJI special
SmtdJ madJiJJes. up aceeds m.chiDe
1015kW fOUll,u1ioDs
~
speed
lOS 0.18
Gronp x: GrDnpM GronpG

1;1
Max.15 kW
15-75 kW
(300kW) >75kW

Figure 1.12 Vibration Criterion Chart (from VDI 2056).

This chart differentiates between vibration classes according to machine size. Note again
the logarithmic velocity axis and the constant width of the allowable and just tolerable
Fault detection and diagnosis methods in the absence of process model 49

classes, (a change in vibration level by a factor of2.5 or 8 dB), independent ofmachine


size. This again emphasizes that it is the change in vibration level which is important, and
when plotting changes, it is most logical to use a logarithmic axis.
Although overall vibration measurement proves a good starting point for fault detection,
far more information can be obtained when a frequency analysis is employed, see fig.
1.13. Firstly, a frequency analysis will usually give far earlier indication of the develop-
ment of the fault, and secondly, the frequency information can be used to diagnose the
fault allowing spare parts to be bought in, etc ..
1) Early fault detec:tion = early waming

A A
~---------123 _4
\V~~;:::::::
~
--23
-1

Overall level f Spectrum f

2) Trend analysis = determination of date of break down


·•
A A
Limit Limit
'.'
date of
shot down

Overalllevcl Speetrum
measured measured

Figure 1.13 Benefits offrequency analysis for fault detection.

The objective of frequency analysis is to break down the vibration signal into its compo-
nents at various frequencies. It is used in machine health monitoring because a machine
running in good condition has a stable vibration spectrum. As parts wear and faults de-
velop, however, the vibration spectrum changes. Since each component in the vibration
spectrum can be related to a specific source inside the machine (e.g. unbalanced masses,
toothmeshing frequency, blade pass frequency resonances), this then allows diagnosis of
the fault.
The basis of fault diagnosis is that different faults in a machine will manifest themselves
at different frequencies in the vibration spectrum, as it can be seen in fig. 1.14. The fre-
quency domain information can then be related to periodic events in gears, bearings, etc ..
Note that fault diagnosis depends on having a knowledge of the machine in question, that
is the shaft frequencies, toothmeshing frequencies, number of teeth on gears, bearing ge-
ometries, etc.
50 Real time fault monitoring of industrial processes

,-- - --- --- - -- ------------- - ----1------1


,

Figure 1.14 Typical machine "signature".

Two of the most common faults associated with rotating shafts are unbalance and rnis-
a1ignrnent. Unbalance produces a component at the rotational frequency of the shaft,
mainly in the radial direction. Arnisaligned coupling, however, will produce a component
at the rotational frequency, plus usually its lower harmonics, both in the axial and radial
directions. Misaligned bearings produce a sirnilar symptom, except that the higher har-
monics also tend to be excited. Abent shaft is just another form of rnisalignment, and
will produce vibration at the rotation frequency and usually its lower harmonics. Finally.
a cracked shaft produces an increase in the vibration at the rotational frequency and the
second harmonic (eue 1990, Mitchell1981 ).
Fig. 1.15 taken from RandalI, shows an example of the effect of rnisalignment in a
gearbox. Both the low speed (50 Hz) and high speed (85 Hz) shafts are originally mis-
aligned. After repair, the 50 , 85 and 170 Hz components are considerably reduced. The
100 Hz component, however, remains more or less at the same level, which rnight appear
strange until it is realised that it is not only the second harmonic of the shaft speed, but
also the second harmonic ofthe mains frequency, (2-pole synchronous motor). This is a
common electromagnetic source of vibration. Note that the higher noise level in the up-
per spectrum is because it was originally recorded as acceleration and integrated to ve-
locity on playback.
Magnetically induced vibration is an important source of vibration in electrical machines.
One source is the rotating magnetic field, which causes ahernating forces in the stator.
Since there are symmetrical conditions for a north or south pole, this gives rise to vibra-
tion at twice the mains frequency, or the "pole passing frequency" . Note that in electrical
machines, the force is proportional to the current squared, that is the vibration is highly
load dependent. In induction motors, the rotational frequency will usually be slightly less
than the synchronous frequency. For instance, fig. 1.16 shows the vibration spectra for
an induction motor. The lower ofthe two is a detailed analysis obtained by non-destruc-
tive zoom, and shows that the high 100 Hz component is electromagnetic in origin rather
than from misalignment.
Fault detection and diagnosis mehods in the absence ofprocess model 51

50Hz 85Hz
Before repair

170Hz

100 200 300 400 500


frequency (Hz)

85Hz After repair


50Hz

100Hz
170Hz

100 200 300 400 500


frequency (Hz)

Figure 1.15 Effect of misalignment in gearbox


Vibration transducers
Measurement and analysis of vibration requires first that a vibration transducer be used
to convert the mechanical vibration signal into an electrical form. Various types of vibra-
tion transducers exist. Proximity probes are used to sense displacement, velocity probes
to sense velocity, and accelerometers to sense acceleration.
Displacement, velocity and acceleration are interrelated parameters. Displacement can be
differentiated to produce velocity, and velocity can be differentiated to produce accel-
eration. Likewise, it is possible to integrate from acceleration to velocity and from veloc-
ity to displacement. In choosing wbich parameter to measure, it is usual to choose
wbichever parameter gives the flattest spectrum, so as to maximize the use of the dy-
namic range of the measuring instrumentation. As a rule of thumb, tbis will usually be
52 Real time fault monitoring of industrial processes

velocity. Where the velocity spectrum is flat, the displacement spectrum will show a -6
dß/octave slope, and the acceleration spectrum a +6dß/octave slope. Abrief presenta-
tion of the most commonly used vibration measuring transducers is given in the follow-
ing.

Zoom range

o 400 800

Zoomed Spectrum

100.0 Hz:

99.6 Hz:

frequency (Hz)

Figure 1.16 Electric motor vibration signature

Mechanicallevers measure displacement, see fig. 17. They are inexpensive and self gen-
erating but limited to low frequency only, sensitive to orientation and prone to wear.
Eddy current (or proximity probe) measures displacement, see fig. 1.18. There are not
moving parts and contacts, resulting in no wear, but variations in the magnetic properties
of the shaft give erroneous signal components.
When a force is applied to a piezoelectric material in the direction of its polarisation, an
electric charge is developed between its surfaces, giving rise to a potential difference on
the output terminals. The charge (and voltage) is proportional to the force applied. The
Fault detection and diagnosis methods in the absence ofprocess model 53

same phenomena will occur if the force is applied to the material in the shear mode. Both
modes are used in practical accelerometer design.

Figure 1.1 7 Mechanicallevers. Figure 1.18 Proximity probe.

Accelerometers (compression type or shear type) measure acceleration, see fig. 1.19.
Usually they have not moving parts so there is no wear and they have very large dynamic
range and wide frequency range making them more suitable for applications.
Noise analysis.
Up to now it has been studied how vibration is transmitted through a machine to its outer
surfaces. In the following is considered how that vibration is converted into sound.
Sound radiation is inherently a complicated process (Lyon, 1987). It turns out, however,
that some fairly simple geometrical and dynamical parameters control sound radiation.
These parameters a1low to make reasonably good estimates of sound radiation. More
specifically, Lyon, (1987), has shown that the sound radiated power of a vibrating
machine structure is proportional to the space-time mean square vibration velocity.

Figure 1.19 Accelerometer.


54 Real time fault monitoring of industrial processes

An example of asound source is the noise produced by an air jet when it impinges on a
rigid obstacle such as a fan blade. When the turbulent flow produces forces on an obsta-
cle, then, by Newton's law of reaction, the obstacle puts forces back on the fluid in the
form of fluctuating lift and drag, resulting in sound radiation. Large-scale motions asso-
ciated with structural vibration are usually much more efficient in radiating sound.
Impacting forces also produce a broad spectrum of vibration in the machine, and this
represents another source of sound radiation. Generally, the sound energy produced by
vibration will be greater, particularly for large machines that are resonant. For example,
although there is direct sound radiation due to the deceleration of the impacting elements
in a punch press, the major amount of sound usually comes from the impact-induced vi-
bration and its subsequent radiation.
The ability ofmulti-channel FFT analyzers (see Appendix l.B) and other analyzers using
digital filters to quickly and accurately compute the cross spectrum between microphone
signals has been the basis for the very rapid growth in using acoustical intensity meas-
urements to determine the sound power radiated by machines. The usual measurement
procedure is to surround the machine with a fixed array of microphone probes or a trav-
erse setup that sweeps over an area surrounding the machine.
Identification and ranking of the noise sources is essential for both new and existing in-
stallations. Only the sources which are contributing to the excessive noise levels need to
be treated.
Frequently a trial-and-error approach is used. Dominant sources are identified from far-
field sound pressure measurements by comparing far-field noise spectra with near-field
spectra of probable sources. It is very difficult however to distinguish between spectra of
sources when many sources exist in the near-field. It is also difficult to know how much
to silence a source and to know whether all the important sources are identified. Often a
major source is treated, lowering the near-field noise but reducing the far-field levels only
marginally because other sources start to dominate. It is also important that suppliers
provide suitable noise data on their products.
The use of sound intensity techniques will provide better sound power information by
deterrnining sound power levels of individual sources without subjecting bulky equip-
ment on the confines of anechoic or reverberant chambers. These sound power levels can
be calculated from intensity measurements taken in situ in the presence of many sources.
Using sound powers and correcting for directivity, distance and excess environmental
attenuation, a mathematical model can be generated to determine the effect of the major
sources on far-field sound pressure levels. This model allows the ranking of the sources
in order of importance and provides a means to predict the impact of a noise abatement
programme. The results will be used to predict sources at other similar plants and to
provide information to suppliers to enable them to improve machine package design
(Laws, (1987), eue (1990».
Sound power can be reliably calculated from sound pressure levels in a controlled envi-
ronment, or in the free-field where sources do not interfere with one another. If ambient
Fault detection and diagnosis methods in the absence ofprocess model 55

noise levels are bigh and the sound field is reactive, however, only sound intensity meas-
urements will enable calculation of accurate sound power levels.
Sound intensity is the sound energy flux, a vector quantity describing the magnitude and
direction of the net flow of acoustic energy. Therefore the dimensions commonly used
for sound intensity are W/m2, By taking ten-times the logarithm ofthe ratio of sound in-
tensity ofa reference value (10- 12 Watt1m2), the sound intensity level can be expressed in
decibels.
The integral of the sound intensity over a surface is the sound power passing through the
surface. The sound intensity and sound power levels can be expressed in terms of octave
bands, tbird-octave bands, or overall noise level over any frequency range.
There are different instrumentation packages on the market that can measure sound in-
tensity levels. The instrumentation usually consists of a pair of microphones in conjunc-
tion with either a dual channel Fast Fourier Transform (FFT) signal analyser or areal
time sound intensity analyser (see Randal). For continuous level noise sources such as
gas-turbines, both types of analyser will give similar results. For the study of an unsteady
source such as a jack hammer, a real-time instrument should be used to capture the peak
levels.
Sound intensity techniques have a number of inherent limitations, such as bias errors re-
sulting from the finite pressure difference approximation for particle velocity, phase mis-
match errors due to phase differences in the microphones and analyser channels, and re-
activity errors resulting from phase mismatch of both the equipment and the measure-
ments surface. The bias errors limit accuracy in the bigher frequencies, wbile the phase
mismatch errors limit the lower frequency capabilities. Reactivity errors could result at
any frequency, depending on the location and sound power levels of extraneous sources
and the distance between the microphones. The microphone spacing should be selected
correctly for the frequency range of interest, in order to minimize the amount of error.
These errors are fully discussed in the literature (Randali and MitchelI, 1981).
In addition to sound power levels, an important factor in determining the effect of a
source on the far-field is the directivity of the source. There are two components of
directivity which can be described as directivity factors; the directivity wbich a source
would exhibit if it was operating in an anechoic chamber or in the air without any reflec-
tive surfaces (QA), and the directivity effect upon a source due to reflective surfaces,
wbich can be termed spatial directivity (Qs). Such sources as exhaust ducts, air-intake
ducts and vents radiate sound non- uniformly even if there are no reflective surfaces. The
spatial directivity factor accounts for reflections from such items as the ground and walls.
The spatial directivity factors for spherical, hemispherical and quarter-spherical propaga-
tion are orte, two and four respectively. The total directivity factor Q8 is defined as the
product of QA and Qs and tbis total directivity factor is translated into a directivity index
(DI8 ), expressed in decibels, by the following equation:
DI8 = 10 log Q8 = 10 log QA = 10 log Qs
56 Real time fault monitoring of industrial processes

Sound intensity measurements for any particular source will use one of three types of
control surface; conformal (conforming to the shape of the object), hemisphere (a hemi-
spherical "cover" placed over the source), box (a box-shaped "cover" placed over the
source), or a combination of these. The control surfaces are determined using coordi-
nates relative to the object of interest. The physical size of the majority of the sources
examined in the case study of fig. 1.20 dictated the use of a box technique. Tbis tech-
nique is best explained by describing sound intensity measurements over one of the
sources investigated - an inertial air filter extraction fan. A box shape was constructed
over the fan as shown in fig. 1.20 and the area of each ofthe five open sides was deter-
mined. The sixth side of the box was covered by the steel plate of the filter house. Since
the fan and duct did not radiate sound uniformly through each side of the box, the sound
intensity needed to be measured for each of the five sides. Before taking readings of the
average sound intensity for the box, the choice was made between the use of a grid or a
sweeping technique.
The grid technique involves constructing a real or imaginary grid of equal-area shapes
over a surface and taking sound intensity measurements at the centres of each of these
shapes. The grid size should be small enough so that the intensity does not vary greatly
throughout the shape. An identical grid is set up witbin the sound intensity computer
program (Ikeucbi, 1988) and measurements are then taken systematically.

_---r-. . . . . ...........
~
I
.,...
'--~ .... -
-- --- _---
----
_.... - --,--- - ~
I

I ~~------~~:
I I
I I
I
I
I I I
..,..'- - ... --- - . . . - .... -. ~ - _.1.- - - -
L ::.:-_-- I "
, -rI

----------..J,.,,;'
Figure 1.20 Extraction fan control surface.

The operator ensures that the probe is in the correct location in the centre of the shape
and perpendicular to the box surface. Once all measurement points are stored, the
computer program will display the sound intensity results in tabular or grapbical form.
The sound power for each of the grid areas, as weIl as the total sound power of the
complete surface, is calculated by the computer.
Fault detection and diagnosis mehods in the absence of process model 57

The sweeping technique is a space-averaging process taken across a complete side. The
probe is kept at right angles to the surface and swept uniformly across the surface while
the analyser averages the sound intensity.
The sweeping technique was used for each of the five sides of the imaginary box over the
extraction fan, using 250 averages over each of the sides or sectors. As described before,
the frequency range over which measurements are taken results in various values of bias,
phase mismatch and reactivity errors. These errors can be minimized by selection of the
proper combination of frequency range and space between the probe microphones (see
the case studies below).
Fault signature extraction.
A new general approach to the statistical development of diagnostic models is the use of
nonparametric pattern classification techniques, so as not to require knowledge of the
probabilistic structure of the system. Recently, ehin and Danai, (1991), introduced a
nonparametric pattern classification method with a fast learning algorithm based on
diagnostic error feedback that enables it to estimate its diagnosic model based on a small
number of measurement - fault data.
This method utilizes a multi-valued injluence matrix (MVIM) as its diagnostic model
and relies on a simple diagnostic strategy ideally suited to on-line diagnosis. The MVIM
method can also assess the diagnosability of the system and variability of fault signatures
which can be used as the basis for sensor selection and optimization.

Processed
Measurements
.------,p
A
X

Illustration of the various stages of the fault signature extraction improved diagnosis
(see fig. above). In the jlagging uni!, the processed measurements are first flagged by
thresholds and then filtered by a single-Iayer network. A sampie batch of measurement-
fault vectors is used to tune the flagging unit through iterative learning using a
nonparametric pattern classification method. Once all the measurement vectors in the
sampie batch are flagged, the MVIM is estimated to provide the indices for fault
signature variability and system diagnosability. These indices along with the number of
false alarms and undetected faults are then fed back to the unit's adaptation algorithm to
tune the unit's parameters in its next adaptation iteration. The parameters of the flagging
unit are tuned iteratively until its performance indices are extremized.
The effectiveness of this scheme is demonstrated by simulation in ehin and Danai,
(1991), where the reader is referred for the detailed mathematical analysis and
58 Real time fault monitoring of industrial processes

implementation features of the method. The method is also suitable for automatie tool
breakage deteetion in maehining.
System Analysis.
The distinetion between signal analysis and system analysis is often made depending on
what ean be measured.
In praetical analysis situations, there is no measurable input but a measurable output or
both a measurable input and output. In the first, it is only possible to make a signal
analysis, while in the seeond, beeause of the presenee of information ab out both the input
and the output, it is possible to make an analysis ofboth the signals and the system.
In signal analysis the input to the system is usually not measured. This ean be due to any
of three reasons. The first is that the input might be inaeeessibie. A good example of this
is maehine health monitoring, where external measurements must be used to monitor in-
ternal effeets. The seeond reason is that it might be impossible to define an individual in-
put, as, for instanee, in many environment noise measurements. The third reason is that
the output might be the only item of interest, as, for instanee, in noise dose or whole
body vibration measurements.
In system analysis, measurements are made of both the input and the output to the sys-
tem. It is best to measure the input and output simultaneously (while at the same time
taking aeeount of any system delays), so as to maintain the phase relationships, although
some limited system analysis (measurement of the magnitude of the frequeney response
funetion) is possible using sequential measurements. With system analysis one ean obtain
the system properties whieh ean then be used to predict how the system will behave un-
der various excitations.
System analysis is mostly used as a design tool. However, it also has applieations when
systems are instalied. It ean, for instanee, be used to monitor struetures such as oil pro-
duetion platforms, maehine foundations, ete., for faults. It ean also be used for determi-
nation of signal sources and signal paths when, for instanee, it is neeessary to isolate part
of a system from vibration. The classical method of system analysis is to use swept sine
testing. The system is excited with a sine wave, and feedback from the output is used to
hold the input amplitude eonstant as the sine wave is swept up or down in frequeney.
Henee, the amplitude ofthe sine wave at the output ofthe system gives the magnitude of
the frequeney response and the phase differenee between the input and output. The
advantages of swept sine testing are high signal-to-noise ratio and the possibility of
studying non-linearities. The disadvantage is that it is slow. However, the speed limita-
tion has been largely removed by Time Delay Spectrometry, (TDS), where very fast sine
sweeps are used to give results almost in real-time.
Dual-ehannel digital filtering is rarely used for system analysis although it is potentially a
very powerful tool, sinee true real-time measurements ean be made. It is a very powerful
means of aeoustie and vibration intensity measurements.
Fault detection and diagnosis mehods in the absence ofprocess model 59

Dual-channel FFT analysis forms a very powerfill and widely used means of system
analysis (Randall and Ikeuchi, 1988). Both the input and output of the system are
measured simultaneously (taking account of system delays). The basic measured data are
the autospectra at the input and output and the cross spectrum between the input and
output, from which many other functions can be calculated. The phase information is
maintained, and the effects of noise can be reduced.
Some advantages of dual-channel FFT analysis are flexibility and the fact that is easy to
use. Also, because the input signal to the system need not be controlled, naturally occur-
ring excitations can be used. Finally, since it is a digital form of analysis, the results can
be easily entered into a computer, for example, to carry out a modal analysis.
Dual-channel FFT analyzers are easy to use, but there are several pitfalls that it is neces-
sary to be aware of Three of them include leakage, the assumption of a linear system,
and compensation for system delays.
Leakage is an effect which occurs because FFT analyzers (both single and dual-channel)
operate on a time-limited signal. The rectangular weighting introduced produces a
(sinx)/x filter characteristic, and power "Ieaks" from the main lobe to the sidelobes,
meaning that measured peaks can be too low and measured valleys too high. Leakage
can be combated by using higher resolution (zoom), introducing an artificial time
window, or where the excitation can be controlled, choosing the right excitation (see
RandalI).
Linearisation can also be considered an advantage. However, it is important to remember
that the dual-channel FFT analyzers impose linearity, even if the system being measured
is non-linear.
All physical systems exhibit a propagation delay. When a propagation delay becomes
significant, as can frequently happen in mechanical and acoustical (and electrical) sys-
tems, it becomes necessary to compensate for it when making a 2-channel FFT analysis,
otherwise bias errors will be introduced into the results.
For instance suppose a system has a propagation delay of T seconds, and the analyzer
processes data blocks T seconds long. If the analyzer processes simultaneous data blocks
at the input and output, the measured frequency response fI will be lower than the true
response H by a factor (1 - r /1). Likewise, the measured coherence y2 (see below for an
exact definition) will be low by a factor (1 - rlJ)2. System analysis measurements are
usually based on the Fourier Transforms, see fig. 1.21, ofthe input and output time sig-
nals a(t) and b(t). The input and output spectra produced (SA and SB) are two sided, that
is they exist for both positive and negative frequencies. However, since the time func-
tions are real and SA and SB will both be conjugate even (that is symmetrical amplitude
about 1=0, but opposite phase), it is usual to combine the positive and negative
frequency halves to form the single sided spectra GA and GB, which are zero for negative
frequency.
60 Real time fault monitoring of industrial processes

The basic functions usually used for system analysis are the input and output autospectra,
(formerly called the input and output power spectra), and the cross (power) spectrum,
i.e.
• input autospectrum
(1= 0)
(1)0)

• output autospectrum
(1= 0)
(1)0)

• cross spectrum
(1=0)
(1)0)

a(t) - <t)
[ ; ] --.
H(f)
b(t)

SA =F[a(t)]
SB = F[b(t)]
2SA (I> 0)
GA ={ (I =0)
SA
(1)0)
(I = 0)
Figure 1.21 System analysis measurements.

The input and output autospectra are the squared and averaged input and output spectra.
Note that they contain no phase information. The cross spectrum is the product of the
coherent amplitudes at the input and output and the phase difference between the input
and output. The cross spectrum is the most important function in system analysis, since it
contains the phase information, and since uncorrelated noise at the input and output will
be averaged out in the cross spectrum.
Given the three basic functions, many input/output relationships can be calculated by
taking various combinations ofthe three and by using Fourier Transforms. The most im-
Fault detection and diagnosis mehods in the absence ofprocess model 61

portant is the system frequency response Hif) and the system impulse response h( T). The
impulse response is the weil known time response of a system to a delta function and it
can be calculated by taking the inverse Fourier Transform of the system frequency re-
sponse (see Appendix l.B). Cross correlation shows whether the input and output
signals are correlated and at what time delays. It can be calculated by taking the inverse
Fourier Transform ofthe cross spectrum.
Three different methods can be used to measure a frequency response function. The first
IH
method is based on a 12 , the ratio of the output to the input autospectrum as it would
be measured using a single-channel analyzer.
Two other methods can be used in a dual-channel analyzer. These are the traditional
method, H I , which is the ratio of the cross spectrum to the input autospectrum, and a
newer method, H2 , which is the ratio of the output autospectrum to the inverse cross
spectrum, i.e.,
H(f) = B(f)
A(f)

IH (ft = GBB(f)
a GAA(f)

H (f) = GAB(f)
1 GAA(f)

H 2(f) = GBB(f)
GBA(f)
The three methods will behave differently according to whether there is noise at the in-
put, noise at the output, or noise at the input and the output. Noise at the output pro-
duces an error in IHi and H 2. On the other hand, H 1 will be unaffected, since it is a
function of the input autospectrum CM, which is noise free, and the cross spectrum,
CAB, where the noise can be averaged out. Hence H I will give the correct result. An ex-
ample where there will be noise at the output is where there are other, unknown inputs to
the system. The effects of these other inputs will show up as noise at the output, as
shows the figure below:

a(t} -----,.~ h(-r} 1------...1.......- - v(t)


H(f)

D(t) --4~
b(t)
62 Real time fault monitoring of industrial processes

Noise at output: In the case ofnoise at the output one obtains:

IHi =G
GBB =IHnl+GNN IGVV ]
AA

_GAB _
HI----H
GAA
H 2 = GBB = H[1 +GNN I Gvv ]
GBA

Noise at the input produces an error in IHl and H I . This time, H 2 will give the correct
result. An example where there will be noise at the input is where a specimen is being
excited with random noise on a shaker. At a resonance of the specimen, the shaker is
effectively trying to drive a mechanical short circuit, which drives the input signal down
towards the noise floor of the measuring instrumentation. Rence the input signal-to-noise
will be low. The output signal-to- noise will be high, however, because ofthe resonance
of the specimen. This situation is shown in the figure below:

h("t)
u(t) - - -.....- -...... ~----...b(t)
H(i)

m(t)--"':

Noise at input: In the case ofnoise at the input one obtains:

IHi =GGBB =IHI2 I+G 1


IGuu
AA MM

H} =GAB =H - -1- - -
GAA I+GMM IGuu

H2 =GBB =H
GBA
Use of H 2 for measurement at resonance peaks when using broad band random noise
excitation was first proposed by MitchelI, (1981).
The situation when noise is present both at the input and the output is shown in the next
figure:
Fault detection and diagnosis mehods in the absence ofprocess model 63

u ( t ) - - - -...- ...... h('t) t - - -....- - - - v ( t )


H(f)

m(t) - - , . . a(t) n(t) l---"b(t)

Noise at Input and Output: In the case of noise at the input and output one obtains:
Ej=GMM/GUU Eo=GNN/GVV

IHl = GBB =IHI2 1+ Eo


GAA 1+ Ej
GAB 1
HI=--=H--
GAA 1+ Ej
G
H 2 = BB =H[I+ Eo]
GBA

r2 =H I / H 2
IHl =IHII·IH21
IHII ~ IHI ~ H 2 1

Note that the true value of the frequency response function will always lie between BI
and B 2, and that while BI tends to give a low estimate, B 2 tends to give a high estimate.
The user can choose H I or H 2 (after rneasurernent). H I is lower bound while H 2 is upper
bound. B 2 reduces bias errors for resonance peaks with randorn excitation. The coher-
ence function relates how rnuch of the rneasured output signal is linearly related to the
rneasured input signal, i.e.

2
rAB-
_ IGABGI2 ' O~rAB~1
2
G
AA' BB

A coherence of 1 indicates a perfect linear relationship, and 0, no relationship. The co-


herence is always bounded between 0 and 1. It can also be shown that the coherence
IH I
function is equal to I 2 divided by IHi,
indicating that IHi
will always give a high
estimate of the fre::}uency response function unless there is a coherence of 1 (Randall and
MitchelI, 1981). Likewise, the coherence function is equal to BI divided by B 2, again
indicating that unless there is a coherence of I, B 2 will always be greater then BI'
64 Real time fault monitoring of industrial processes

Low coherence can be due to, amongst other things, noise, non-Iinearities, or leakage.
Note that where low coherence is due to noise, it is still often possible to make good
measurements, since the effects of the noise can be averaged out in the cross spectrum.
Low coherence due to leakage can be combated by increasing the resolution. Here, it is
also important to remember that although the coherence will be the same for H} and H2,
H2 will converge on a resonance peak faster than H). There is nothing which can be done
to combat low coherence due to non-Jinearities.
Figure 1.22 shows the differences obtained for a measurement of H) and H 2 on a cantile-
ver bar mounted on a shaker and excited with random noise. The resonance peaks in H)
are about 10dB lower than in H 2 .

main Y~44.8dB

....,...,
... . . -...-.."'"
, ....... ' r

main Y ~34.2dB

Figure 1.22 Differences between H) and H2 measurements

Another important tool in signal analysis is the power cepstrum (see for details Appendix
l.B). The cepstrum is a sort of "spectrum of a spectrum". The distinctive feature of the
cepstrum is the logarithmic conversion of the spectrum. The power cepstrum can be
applied to the detection of periodic structure in the spectrum (harmonics, side bands,
echoes, reflections) and for the separation of source and transmission path effects. The
power cepstrum is a sensitive measure of growth of harmonic/sideband family (can be
used for separation of different families) and it is insensitive to measurement point, phase
combination, amplitude and frequency modulation and loading. An illustration of the use
of the cepstrum for both detection and diagnosis of a gear box fault is given in fig. 1.25
of the next section.

1.3.2 Vibration and noise analysis application examples.

A. Gearbox lailure diagnosis.


Even gears with perfect involute profiles exhibit vibration due to tooth deflection under
load, and in particular the sudden changes in tbis as the load is shared between different
numbers of teeth. The tooth deflection effect is at the toothmeshing frequency (and its
harmonics) and is very load dependent, see fig. I.23.a. (eue, (1990), RandalI).
Fault detection and diagnosis mehods in the absence ofprocess model 65

Figure 1.23a Effect oftooth deflection. Figure 1.23b Effect ofwear

Components also occur at the toothmeshing harmonics due to mean deviation from the
ideal profile. These may be a resuIt of initial macbining errors, but will eventually be
dominated by the effects ofuniform wear. Wear tends to be greater on either side ofthe
pitch circle, as iIIustrated, because ofthe greater sliding velocity there (with pure rolling
at the pitch circle), see fig. 1.23.b. The effects of such geometrical errors are much less
load sensitive.
Fig. 1.24 iIIustrates typical increases in toothmeshing harmonics due to uniform wear.
The effect of wear is often first seen in the second harmonic, but usually spreads to the
bigher harmonics as the profile deteriorates. It is advisable to monitor at least 3 harmon-
ies, as the signal at the first harmonie must first exeeed the effects of tooth defleetion to
be noticeable. Measurements must be made at constant load, for comparisons to be
meaningful.
Fig. 1.25 is an illustration ofthe use ofthe cepstrum for both detection and diagnosis of
a gearbox fault. Sideband family ean be c1early seen in spectrum, but in cepstrum it can
be detected by monitoring only one component, at 95.9 ms (detection). Measured period
(95.9 ms) and corresponding frequency (10.4 Hz) is measured so accurately as to elimi-
nate second harmonic of output shaft speed (5.4 Hz) as a possible source. Source was
traced to rotational speed of second gear, even though tbis was unloaded because first
gear was engaged (diagnosis).
66 Real time fault monitoring of industrial processes

Q) _ Toothmeshing frequency
®0 - Higher har.. onics

_ _ _ Initial spectrulft values


_ _ _ _ Typical increases due to wear

Log.
Velocity

Frequency
I
Figure 1.24 Gear toothmeshing harrnonics.

B. Faults in rolling element hearings


Discrete faults in the elements of a ball or roller bearing give rise to aseries of impacts at
a frequency determined by the location of the fault, outer race, inner race, etc .. (Randali,
Li and Wu (1989), Pengelly and Ast (1988).
The initial impulses are so short, in particular when the faults are still microscopic, that
their frequency content extends up to perhaps 300 kHz. The shocks excite structural and
other resonances, including the resonance of piezo-electric transducers used to detect
them, and produce aseries of bursts, as illustrated, with a frequency content dominated
by these resonances. This bearing signal is masked by other background vibrations from
the machine, and the basic problem is to find a frequency range where the bearing signal
is dominant over the background vibration.
Note that the repetition frequency is better indicated by analyzing the envelope of the
bursts, rather than the raw signal.
Fig. 1.26 shows how a discrete fault causes aseries ofbursts with arepetition frequency
given by the bearing geometry and rotational speed. The frequency content of the bursts
is high (dominated by the resonances excited) and the component at the repetition fre-
quency is smalI. If the envelope of the bursts is formed, however, its frequency spectrum
is dominated by the repetition rate (and its harmonics).
It is possible to calculate the repetition frequency of the bursts using simple classical me-
chanics, see fig. 1.27. However, note that the relationships assurne pure rolling motion,
while in reality there is some rolling and some sliding motion. Hence the equations
should be regarded as approximate. Also amplitude modulations can produce side-bands.
Fault detection and diagnosis mehods in the absence ofprocess model 67

Speclra (Ist gear engaged) Cepstra (Ist gear engaged)

a)Gearbox 1 (Bad condition)


28,lms
r 95,9ms
t (10,4 Hz)
(35,6 Hz)


60~

o 100 200 300 400 500 0 0,1 0,2 0,3


frequency (Hz) period (s)

b) Gearbox 2 (Good condition)


Vdb

100
90

200 300 0,1 0,2 0,3


Frequency (Hz) period (s)

Figure 1.25 The use of the cepstrum for fault detection and diagnosis of a gearbox.

On-line bearing fault monitoring implies automatic data processing without human inter-
vention. Vibrations, picked up by sensors, are transmitted to a monitoring system where
they are processed for information extraction. The on-line system comprises a data ac-
quisition stage, where analog bearing signals are converted into digital form and a data
processing stage, where modular software algorithms are employed to perform the de-
signed algorithm under the guidance of a supervisor program.
The data acquisition stage comprises (I) an accelerometer, (2) acharge amplifier, (3) a
band pass filter, and (4) an analog to digital converter. The data processing stage consists
ofthree functional units: supervisor, defect detectionldiagnosis unit, and data base.
The block diagram which illustrates the organization of the complete system is shown in
fig. 27.a. The supervisor is responsible for : (I) the proper logic sequence of system op-
eration, (2) the data flow control between the defect detectionldiagnosis units and the
global data base, and (3) global data base management.
AgIobaI data base is constructed to hold aIl the information to be relayed among the data
acquisition stage, the functional units of data processing stage, and the system's human-
machine interface. It comprises 3 data files:
(I) general purpose data file, (2) raw bearing signal data file, and (3) pattern vector data
file.
One important reason for having a global data base is that the external data files will pre-
serve important data just prior to any unforeseen shutdown of the monitored system or
bearing monitoring system itself
68 Real time fault monitoring of industrial processes

Dlscrete laults In Inner and Inner race laults rotate


outer races glve rlse to a In and out 01 the loaded Uneven vibration levels, olten wtth shocks
_Ies 01 bursta at a rate zone glvlng amplitude
correspondlng to the contacts modulation Contact Angle ß
wlth the roUlng elernenta

n = number 01 balls or
rollers
I, = relative rev./s between
Inner and outer races

Impact Rates ((Hz) (Bssumlng pure rolllng motion)

For Outer Race Oelecl: I (Hz) = :1n I, (1 -


BO
PD cos ß)

For Inner Race Oelect: I (Hz) = 2"n I, (1 + PD


BO
cos ß)
Envelope signal Envelope signal

The enveIope signal contalns Inlormatlon


For a Ball Dalecl: I(Hz) = ~g I, [l-(~g COSßf]
on Impact rate and amplitude modulation

Figure 1.26 Faults in rolling element bear- Figure 1.27 Faults in ball and roller
ings. bearings

The interface is responsible for the human-monitoring system communication. The nec-
essary input information consists of (a) bearing geometry, (b) bearing rotational speed,
and (c) sampling rate. The output quantities through the interface consists of alarm and
diagnosis.
Among others, short-time energy function, short-time average zero crossing rate, and
median smoothing are employed by the proposed scheme. The definition of the short-
time energy function is,
<Xl

En = Lx 2 (m)w(n - m)
m=-<Xl

where x(n) is the sampled signal and w(n) = 1 if 0 ~ n ~ N-l, and zero otherwise (N is the
width ofwindow).
In the context of discrete-time signals, a zero-crossing is said to occur if successive
sampies have different algebraic signs. The rate at which zero crossings occur is a simple
measure ofthe frequency content ofa signal. Its definition is,
<Xl

Zn = Llsgn[x(m)] - sgn[x(m -l)]lw(n - m)


m=-ct:)
Fault detection and diagnosis mehods in the absence of process model 69

where sgn[x(n)] = 1 ifx(n) ~ 0 and = -I ifx(n) < 0, and w(n) = (l/2)N ifO ~ n ~ N-I; = 0
otherwise. However, the second equation makes the computation of Zn appear more
complex than it really iso All that is required is to check sampies in pairs to determine
where the zero-crossings occur with the average being computed over N consecutive
sampies.

GLOBAL DATA BASE

general purpose data fde feature vector data fde

I
SUPERVISOR

alarml diagnosi& .....


I
t
Defect
bearing geometry aud rpm~
interface
complaint . I .. detection/diagnosi&
unit
ihreshold vector ........
I
hearing vibrations
-,
t A/D control DATA PROCESSING STAGE
signals DATA ACQIJISITION STAGE

~ I h
AID baudpass charge
fdter
-
amplifier
accelerometer
L .P

Figure 1.27.a. Block diagram representation ofthe on-line bearing monitoring system

The definition of median smootbing is


Mn =M[x(m)w(n - m)]I:=_CXl
where w(n) = 1 if 0 ~ n ~ N-I, and not valid otherwise. Tbis property makes median
smootbing useful for prior processing of the bearing signal because short-time energy
function and short-time average zero crossing rate are sensitive to the discontinuities in
the signal. The signal discontinuities can be removed by subtracting the output of the
median smoother from the signal.
The tactic for feature extraction is based upon the estimation of the rate and the strength
of impulse generation from bearing signals. The estimation is performed by bighlighting
the occurrence of defect related vibration bursts in the signal and computing their rate of
occurrence and strength. If tbis estimated impulse generating rate happens to coincide
with any of the characteristic defect frequencies and the strength of vibration bursts is
significant, the existence of a localized defect is concluded.
70 Real time fault monitoring of industrial processes

A bearing signal dominated by bearing defect sensitive resonances reveals the occurrence
of damaged related impulses in the fluctuations of its amplitude. These amplitude fluc-
tuations are made more prominent by computing their short-time energy functions.
On the other hand, the fluctuations of those defect-excited resonances that have a much
higher frequency than the vibration generated by other machine elements will introduce
variations in frequency content of bearing signals. Variations of this type may be easily
revealed by the computation of average zero-crossing rate ofbearing signals.
After all defect related vibration bursts have been made more prominent in both the
short-time energy function and short-time average zero-crossing rate, their rate of occur-
rence is estimated by computing the autocorrelation functions.
It is possible to incorporate a pattern recognition based monitoring scheme which em-
ploys short-time signal processing techniques to extract useful features from bearing vi-
bration signals. These features can be used by a pattern elassitier to detect and diagnose
hearing defects (Li and Wu, 1989).
Short-time signal processing techniques use windowed segments of the bearing signals to
facilitate the estimation ofthe rate and the strength ofimpulsive vibrations, which may be
the result of a localized defect. If the estirnated impulse generating rate is elose to any
one of the eharaeteristie defeet frequencies and the strength of the impulse train is sig-
nifieant, the designed pattern elassifier will elassify the bearing into the damaged eate-
gory. Due to the uniqueness of eaeh eharaeteristic defeet frequeney, the diagnosis
regarding the loeation ofthe defect is also provided through the proposed seheme.
System operation logic. The aetual sampling rate of the system may be varied and it is
set by a programmable eloek whieh is an integral part of an ND eonverter. The
supervisor first reads in the necessary inputs from the human operator. Then the data
aequisition stage activates the ND eonverter aeeording to this information. Onee the
converter is initiated, it supplies diserete data elements to the system at the seleeted rate
until a specified number of sampIes has been generated. The data are formulated into
reeords and written onto the peripheral disk memory (data file no. 2).
The data will be proeessed by the defeet deteetionldiagnosis unit. The loop of data
measuring and proeessing will be eontinuously earried out until any kind of loealized de-
feet is deteeted and diagnosed. In the ease of sueh an event, the pattern veetor, estimated
impulse generating rate, and elassifier output will be displayed on the CRT to alert the
operator.
Details for the on-line implementation of the deseribed teehnique ean be found in Li and
Wu (1989).
C Reciprocating machine and gos turbine fault detection
Vibration and noise signals from reciproeating maehinery eome, typieally, from events
which occur at different phases of the machine eYele, see fig. 1.28. This amounts to a
signal whieh, though repeated every cyele, varies during one eyele. Continuous averaging
Fault detection and diagnosis mehods in the absence ofprocess model 71

lumps all these signals together and track is lost of this variation which from everyday
experience is important in judging the condition of the machine. An FFT Analyzer makes
it possible to pick out a short sampie length. of the signal which is associated with one
particular event and analyze it separately. This is done by triggering the Analyzer from a
tacho pulse every cycle and using the variable time delay of the Analyzer to choose that
phase ofthe signal to be analyzed (Randali).
The signal and the tacho pulse are recorded on an instrumentation tape recorder. They
are then played back into the FFT Analyzer which is set to trigger on the tacho pulse af·
ter a specified delay. Aseries of spectra for various trigger settings, representative of
various phases of the machine cycle, are recorded on the digital cassette recorder.
Comparison and data display is handled by a programmable calculator and, optionally, a
digital plotter (see RandalI).
The basic steps used in the analysis for collecting spectra are presented in fig. 1.29,
where:
(a) Represents a typical impulsive signal from a four cylinder diesel engine.
(b) Represents the once per cycle tacho signal.
(c) Represents the positions of the Hanning time window after 2 specified trigger
delays, Trig 1 and Trig 2, set up in the FFT Analyzer.
(d) Represents aseries of spectra which are used to obtain a representative average
spectrum for the phase of the engine cycle corresponding to Trig 1 .
(e) Represents aseries of spectra obtained by repeating the process for a new trigger
delay setting, Trig 2.
These average spectra can then be stored on the digital cassette recorder. The signals in·
volved vary somewhat from cycle to cycle, and a considerable amount of averaging may
be necessary to obtain a reliable result.
A logic diagnosis for the whole engine can be incorporated in the computer memory such
as a binary interrogation ofthe type shown in fig. 1.30, which produces additional prog·
nostic conclusions (see also chapter 4).
By comparing actual values with base values determined for the system theoretically and
values measured when the plant was commissioned, the relevant trends can be reported.
Additionally, values are theoretically determined for the plant when operating with
defects and programmed into the computer; thus when measured conditions fall within
those for which the defect applies, the "status" is reported and the defect notified.
Automated fault identification for gas turbines based on spectral features of measure·
ments of various dynamic quantities, such as internal pressure, casing acceleration,
acoustic data is presented and applied by Loukis et al. , (1992). The examined faults were
rotor fouling (fault of all the blades), individual rotor blade fouting (fault of 2 blades of
stage 1 rotor), individual rotor blade twisted, stator blade restaggering. The difference
pattern (used as fault index) derived from the measured signal of an instrument is defined
by the expression :
72 Real time fault monitoring of industrial processes

Tacbo pulse

k T r ; g g c r dcla Y
l
Analyscd
cvcnl

Figure 1. 28 Reciprocating machine fault detection.

l' Ono <y<1e 'I


l~' I t ~ ~ ~.~ ~,{o ~ roh r~ 1j~~ Vi~ltion'~MI
41

~ I ("" !'--------JnL-_L
;....; o.r.y ahllf Tfie 1
T,• •_ ,

~ Dol'y,ft ... T'ig2 ~ 11

_tc_,.;..
! ....Lttf~ , I~----Li*!y'L-_ _ AN.!!'.::!,."
. 'rl----U'*~.~_..J.!];J~

(d,M~~b~~~ . ;fS" 69
(·,~~~be~~

Figure 1.29 Basic steps used in the analysis for collecting spectra.
Fault detection and diagnosis mehods in the absence ofprocess model 73

p(f) =20[loglO(Sp(f» -IOglO(sph(f»]


where p(f) is the fault index, which is thus a set of values in function of frequency, sp(f)
is the power spectrum of the signal of the measuring instrument from the monitored
(possibly faulty) engine, sph(f) is the same spectrum from a healthy engine. This index
showed high sensitivity to all examined faults and produced distinguishable forms for
each fault.
MNsure
• vibration
• performance
• oil temperalure

No

Check
indlcated
components

Check
oil
system

Figure 1.30 Simplified logic tree and complementary interrogatory diagnosis.

Inspection of the fault indices calculated for the different measuring instruments has
shown that the presence of one ofthe examined faults results in the appearance of differ-
entiations mainly at multiples of the shaft rotational frequency. It was therefore decided
to filter out values of the indices at frequencies other than the shaft harmonics, since the
most useful diagnostic information is contained at these harmonics.
This is done by a filter defined by the following equation:
h(f) {I if f is rotational harmonic
= 0 if f is not rotational harmonic
74 Real time fault monitoring of industrial processes

The pattern resulting from filtering the dift'erence pattern with the above filter will be ref-
ered to as reduced patternPr(f).
Two discriminant functions were selected, one of them expressing the quantitative simi-
larity (influenced by both the shape and amplitude of compared patterns) the other ex-
pressing the shape similarity (influenced only by the shape ofthe compared patterns).
The first discriminant function is the usual Euclidean distance between reduced patterns,
when they are viewed as points in an N-dimensional space.
The second discriminant function is the normalized crosscorrelation coefficient. In order
the two discriminants to be used for fault classification reference patterns for each fault
must be possessed.
If such reference patterns are possessed, then the two reduced pattern discriminants can
be produced for any measured signal from the monitored gas turbine and depending on
their values the fault corresponding to that signal (if any) can be decided.
The flow chart of an automated gas turbine fault diagnosis scheme based on spectral
pattern analysis is as shown in fig. 1.31:

lteduced differeace pattern HealIhy power spectra


calculation data base

Fault reference patterns


Calculation of Diacriminants Data base

Figure 1.31. Flow chart of the automated spectral pattern fault diagnosis method for gas
turbines.

The technique presented briefly previously has been developed on the basis of the data
available from experiments in an industrial gas turbine with specific implanted faults
Fault detection and diagnosis mehods in the absence ofprocess model 75

(Loukis et al., 1992). From this point ofview part ofthe findings can be considered of
general validity while others will be particular to the specific engine.
D. Induction machine broken bars detection.
The desire to improve the reliability of industrial drive systems has led to concerted re-
search and development activities in several countries to evaluate the causes and conse-
quences of varlous fault conditions. In particular, ongoing research work is being fo-
cused on rotor bar faults and on the development of diagnostic techniques.
Broken bars were shown to produce high localized airgap fields and to degrade mechani-
cal performance. The field perturbation associated with broken bars, which are deliber-
ately disconnected from the endrings by machining, produces, low-frequency compo-
nents and harmonics in the seareh coil-indueed voltages and gives rise to an oseillatory
torque that produces noise and mechanieal vibration.
Different techniques for the detection of broken bars were tested and evaluated by
Elkasabgy et al., (1992).
A. Search Coil Induced Voltage Detection Techniques. This technique involves an
inspection ofthe time and frequency domain ofvoltages indueed in internal (stator tooth
tip and yoke) seareh eoils or an inspection of the time and frequency domain of voltages
indueed in an extemal seareh eoil plaeed against the frame ofthe motor.
Consider first the voltage indueed in an internal stator tooth tip coil for a maehine with
no rotor faults. The dominant and fundamental frequency will be at the motor excitation
frequeney of 60 Hz. Higher frequency eomponents will appear due to the periodieity of
rotor bars. The nonsinusoidal stator emf distribution, i.e., spaee harmonies, may also in-
duee time-harmonie voltages.
Consider now the voltage in the same seareh eoil for a machine with one or more adja-
cent broken bars. An anomalously high airgap loeal field rotates at rotor speed. This field
pulsates at slip frequeney and ean be considered to be the resultant of two fields, eoun-
terrotating at sx(synehronous speed), whieh are rapidly attenuated away from the fault
loeation. The field assoeiated with the broken bars will, therefore, modulate the eoil-in-
dueed voltage at a eharacteristie frequeneY!jault, given by

ffault =(~)(1-S)±SfHZ
where fis the exeitation frequeney, s is the slip, and p is the number of poles of the in-
duetion motor.
Similar frequeney components are anticipated in the yoke and external seareh eoil volt-
ages. These voltages an be eaptured by a fast data aequisition system and printed out to
iIIustrate their time dependenee. Their frequency spectra ean also be analyzed.
Comparlng the tooth-tip seareh and the yoke-seareh coils indueed voltage frequency
76 Real time fault monitoring of industrial processes

spectrum for a fault-free rotor with that for the broken-bar rotor, broken bars can be
adequately detected.
Perhaps surprisingly, the external search coil is just as effective as the internal coils in
detecting the broken bars. It therefore appears unnecessary to incorporate internal coils
to take advantage of this diagnostic technique; an external coil placed against the casing
of the machine being entirely adequate.
The use of an external coil placed against the frame of the machine is considered particu-
larly useful in an industrial environment because the motor need not be modified in any
way (by the installation of stator search coils) or be taken out of service temporarily. All
that is needed is a coil of 10-20 turns with a length equivalent to the active axiallength of
the machine and ofwidth equal to perhaps halfa pole, a power-frequency oscilloscope or
low-frequency spectrum analyzer, and the experience of an observant operator.
B. Stator-current detection technique. Each individual rotor bar can be considered
to form a short-pitched single-turn single-phase winding. The airgap field produced by a
slip-frequency current flowing in a rotor bar will have a fundamental component rotating
at slip speed in the forward direction with respect to the rotor, and one of equal
amplitude that rotates at the same speed in the backward direction. With a symmetrical
rotor, the backward components will sum to zero. For a broken-bar rotor, however, the
resultant is nonzero. The field, which rotates at slip frequency back-ward with respect to
the rotor, will induce EMFs in the stator side that modulate the mains-frequency
component at twice slip frequency.
Under sinusoidal voltage excitation, this effect produces twice slip frequency (2.67 Hz at
1760 r/min) side bands in the spectrum ofthe phase current, which indieate the existence
of the fault.
Thus, the examination of the machine current spectrum provides an important method
for detecting rotor-bar faults.
e. Torque-harmonics detection technique. In a balanced three-phase induction machine
with no rotor faults, the forward-rotating field interacts with the slip frequency induced
rotor currents to produce a steady output torque. For a machine with a rotor fault, a
backward rotating field is developed as discussed. This backward rotating field interacts
with the rotor currents, induced by the forward rotating field, to produce a torque
variation at twice-slip frequency, which is superimposed on the steady output torque.
Rotor faults therefore lead to low-frequency torque harmonics, which result in increased
noise and vibration. The torque oscillations can be measured by means of a shaft torque
transducer using a data acquisition system while the motor is running on-line at various
load conditions. The frequency of the torque oscillation increases as the machine is
loaded. The dominant frequency of oscillation corresponds to twice the slip frequency of
the operating condition.
Fault detection and diagnosis mehods in the absence of process model 77

1.3.3 Conclusions

Advances in instrumentation and computer information processing technologies offer


machinery users an added dimension to their protection systems-diagnostic information
for machinery behaviour analysis and predictive maintenance programs. Two decades
ago only rudimentary vibration data could be obtained using the electronic technology of
the time. Today, computerised vibration instrumentation is a powerful tool for obtaining
more specific vibration data, for diagnostic purposes, including steady state data, taken
on-line and transient data, taken at start-up and shut-down. The data, properly used,
provides the machinery diagnostician with greater insight into the mechanical behaviour
ofrotating machinery.
As more and more vibration information becomes available, the proper methods for in-
terpreting it are becoming increasingly important for accurately analysing machine behav-
iour and diagnosing malfunctions. Proper methodology ensures a thorough und erstand-
ing of the machine's mechanical behaviour and the potential malfunction mechanisms, as
weIl as their symptoms. This knowledge enables one to diagnose and correct machine
malfunctions accurately.
Without the ability to reduce and interpret vibration data properly, diagnosing a machine
malfunction can be difficult, time consuming and costly. A systematic approach to ma-
chine analysis makes it possible to use vibration data for pinpointing the malfunction, de-
spite a myriad of symptoms. Six basic principles create the foundation for accurate ma-
chine malfunction diagnosis methodology:
1. Know the basic mechanical characteristics of the machine.
2. Know the types of malfunction mechanism that the machine is likely to incur and
their corresponding symptoms.
3. Monitor the key parameters that will indicate a change in the condition of the
machine.
4. Reduce diagnostic data to interpretable formats.
5. Understand the historical and related events that may have caused the change in
machine condition.
6. Take prompt action, based on the diagnosis.
Any machinery analysis requires that the diagnostician is first familiar with the physical
parameters ofthe machine (size, weight, clearances). However, understanding the basic
mechanical factors goes beyond knowing these physical characteristics; it involves un-
derstanding the machine's fundamental rotor dynamic behaviour during steady-state and
transient conditions. To achieve this, vibration data needs to be acquired during accep-
tance testing, commissioning, imbalance response testing (synchronous perturbation),
and non-synchronous perturbation testing to document the rotor response during steady-
state (on-line) and transient (start-up and shut-down) conditions. This process provides
baseline data for insight in:
• The machine's natural frequencies.
78 Real time fault monitoring of industrial processes

• The amount of damping in the system.


• The machine's susceptibility to instability.
Analytical models of the machine's dynamic behaviour - particularly models based on a
modal approach - are of tremendous value in future diagnostic procedures. It is desirable
for critical machines or those that have had a history of vibration problems to incorporate
these analytical models into the machine's data base. Critical machines are categorised as
those which are indispensable for plant operation; failure of a critical machine results in
immediate loss of plant output.
The analytical and testing baseline information is necessary for comparing the machine's
previous and present condition when troubleshooting suspects malfunctions. It can also
be used after a problem has been corrected to verify the validity of the correction.

References

Anderson T. W. (1958). An Introduction to Multivariable Statistical Analysis. John


Willey, New Y ork.
Anderson T. W. (1971). The Statistical analysis oftime series. John Willey, New York.
Ali M. M. (1989). Test for autocorrelation and randornness in multiple time series.
Journal of the American Statistical Association, 84, 406.
Baines N. (1987). Modern vibration analysis in condition monitoring. Noise and
Vibration control world wide, p. 148.
Basseville M. and A. Benveniste (1980). Detection of Abrupt Changes in Signals and
Dynamical Systems. Springer-Verlag, Berlin.
Bennett C. A. and N.L. Franklin (1954). Statistical analysis in chemistry and the
chemical industry. John Wiley.
Blazek L., Novic D. and D. Scott (1987). Displaying Multivariate data using polyplots.
Journal of Quality technology, 19, 2, p. 69.
Blumen I. (1958). A new bivariate sign test. Journal of the American Statistical
Association, 53, p. 448.
Box G. and T. Kramer (1992). Statistical Process Monitoring and Feedback adjustment
- A discussion. Technometries, 34, 3, p. 251.
Cecchin T. (1985). Conception d' une usine pilote mineralurgique automatisee.
Elaboration et application d' une strategie de diagnostic d' etat de fanctionnement de pro-
cedes industriels en temps ree!. These de doctorat de l' Institut National Polytechnique
de Lorraine.
Fault detection and diagnosis mehods in the absence ofprocess model 79

Chin H. and K Danai (1991). A method offault signature extraction for improved diag-
nosis. ASME Transactions oj Dynamic Systems, Measurement and Control, 113, p.
635.
Chitturi V. (1976). Distribution ofmultivariate white noise autocorrelations. Journaloj
American Statistical Association, 71, 353, p.223.
Commissariat a l' Energie Atomique (1978). Statistique appliquee a l' exploitation des
mesures. Editions Masson, Paris.
Cue R.W. and D.E. Muir (1990). Engine performance monitoring and trouble shooting
techniques for the CF-18 aircraft. Proceedings, Gas Turbine and Aeroengine Congress,
Brussels, June 11-14, 1990.
Dixon W. 1. Power functions of the sign test and power efficiency against normal al-
ternatives. Annals ojMathematical Statistics, 24, p. 467.
Elkasabgy N.M., Eastham AR and G.E. Dawson (1992). Detection of broken bars in
the cage rotor on an induction machine. IEEE Transactions on Industry Applications,
28, 1, p. 165.
Hawkins D.M. (1992). A fast accurate approximation for average run lengths of
CUSUM control charts. Journal ojQuality Technology, 24, 1, p. 37.
Himmelblau D. M., (1978). Fault detection and diagnosis in chemical and petrochemical
processes. Elsevier, Amsterdam.
Hoerl R.W. and AC. Palm AC. (1992). Discussion: Integrating SPC and APC",
Technometrics, 34, 3, p. 268.
Hunter 1.S. (1986). The Exponentially Weighted Moving Average. Journal oj Quality
Technology, 18, 4, p. 203.
Ikeuchi T., Shirai M., Nakamachi K, Tanabe S., Ishino K and T. Fujishima (1988).
Computer-assisted noise and vibration analysis system - CANVAS. Noise and Vibration
Control Worldwide, February, p. 58.
Johns W. D. and R.H. Porter (1988). Ranking ofcompressor station noise sources using
sound intensity techniques. Noise and Vibration Control Worldwide, February, p. 70.
Kendall M., Stuart A and 1.K Ord (1982). The advanced theory of statistics. Vois. 2,
3. Charles Criffin Ltd., London.
Laws W. C. and A Muszynska (1987)Periodic and continuous vibration monitoring for
preventive/predictive maintenance of rotating machinery. ASME Journal oj Engineering
jor Gas Turbines and Power, 109, April, p. 159.
Li C.J. amd S.M. Wu (1989). On line detection of localized detects in bearings by
pattern recognition analysis. ASME Journal oj Engineering in Industry, 111, November,
p.331.
80 Real time fault monitoring of industrial processes

Liggett W., Jr. (1977). A test for serial correlation in multivariate data. The annals of
Statistics, 5, 2, p. 408.
Loukis E., Mathioudakis K. and K. Papailiou (1992). A precedure for automated gas
turbine blade fault identification based on spectral pattern analysis. ASME Journal oj
Engineeringjor Gas Turbines and Power, 114, April, p. 201.
Ljund L. and T. Söderstrom (1987). Theory and practice of recursive identification.
The MIT Press, London.
Lucas J.M. and M.S. Saccucci (1990). Exponentially Weighted Moving Average
Control Schemes: Properties and Enhancements. Teehnometries, 32, 1, p.l.
Lyon R (1987). Machinery noise and diagnostics. Butterworths ed., London.
MacNeill I. B. (1974). Tests for periodic components in multiple time series.
Biometrica, 61, 1, p. 57.
Mehra RK. and J. Pesthon (1971). An innovations approach to fault detection and
diagnosis in dynamic systems. Automatiea, 7, p. 637.
Mitchell J. S. (1981). Machinery analysis and monitoring. Penn Weil ed., London.
Pnegelly B.W. and G.E. Ast (1988). A computer-based multipoint vibration system for
process plant rotating equipment. IEEE Transaetions on Industry Applieations, 24, 6, p.
1062.
Pignatiello J. and C. Runger (1990). Comparisons of multivariate CUSUM charts.
Journal ojQuality Teehnology, 22, 3. p. 173.
Pouliezos A. (1980). An iterative method for calculating sampie serial correlation co-
efficients. IEEE Transactions on Automatie Control, AC-25, 4, p. 834.
Proceedings, 15th Symposium ''Aireraft Integrated Monitoring Systems", Aachen,
Germany, September 12-14, 1989.
Randali R D. Efficient Machine Monitoring. Bruel and Kjaer publication, code no. 18-
212.
Randles R (1989). A distribution-free multivariate sign test based on interdirections.
Journal ojthe Ameriean Statistieal Association, 84, 408.
Robert P., Cleroux Rand N. Ranger (1985). Some results on vector correlation.
Computational Statisties and Data Analysis, 3, p. 25.
Spee R and A.K. Wallace (1990). Remedial Strategies for brushless DC drive failures.
IEEE Transaetions on Industry Applieations, 26, 2, p. 259.
Stephens C.M. (1991). Fault detection and management system for fault-tolerant
switched raluctance motor drives. IEEE Transaetions on Industry Applieations, 27, 6,
p. 1098.
Fault detection and diagnosis mehods in the absence ofprocess model 81

Thompson R.W. (1991). Importance sampling applied to a robust signal detector.


International Journal ofModeling and Simulation, 11,4, p. 114.
Tracy N.D et al. (1992). Multivariate control charts for individual observations. Journal
of Quality Technology, 24, 2, p. 88.
Vance L. C. (1983). A Bibliography of Statistical Quality Control Chart Techniques,
1970-1980. JournalofQuality Technology, 15, p. 59.
Vander Wiel S.A et al. (1992). A1gorithmic Statistical Process Control: Concepts and an
application. Technometrics, 34, 3, p. 286.
Vasilopoulos A. and A. Stamboulis A. (1978). Modification ofcontrol chart limits in the
presence ofdata correlation. Journal ofQuality Technology, 10, 1, p. 20.
Walter T. J., Marchione M. M. and H.C. Shugars (1988). Diagnosing vibration
problems in vertically mounted pumps. ASME Journal of Vibration, Acoustics, Stress
and Reliability in Design, 110, p. 173.
Yashchin E. (1993). Performance of CUSUM control schemes for serially correlate4
observations. Technometrics,35, 1, p. 37.
82 Real time fault monitoring of industrial processes

Appendix 1. A

l~ l~-,r-~--.---.-~~~---r--.---r--,
a

0.5

-2. -1. o 1. 2.

Figllre 1.A.l Operating characteristic curves for the sampie mean test, Pf = 0.01

l~ fr--,r--.---.--~--~--r-~r-~--~--~
a -

0.5 -

-2. -1. o 1. 2.

Figure 1.A.2 Operating characteristic curves forthe sampie mean test, Pf = 0.05
Fault detection and diagnosis mehods in thc absence ofprocess model 83

;'

\
,
99 o· ~ ) ~I
9S -I
/
..:~ ---
90 ;"

\ \ I /'
70
50 v --
30 "'-
., r-.
./
·f /
.... W-
1
.5
.2
.05
.1 .2 .4 1 2 4 10 20 40 100

Figure J.A.3 Power curves for the two-tailed i-test at the 5% level of significance.
84 Real time fault monitoring of industrial processes

Table J.A.J

Values of k such thatPr(y~ k -1) < a / 2 wherey has the binomial distribution withp=O.5.

Samp1e size Probabi1ity of fa1se detection


-
n P f =Oo10 P f =Oo05 P f =Oo02 P f =Oo01

5 1 - - -
6 1 1 - -
7
8
1
1
1
1
1
1
-1
9 2 2 1 1
10 2 2 1 1
11 3 2 2 1
12 3 3 2 2
13 4 3 2 2
14 4 3 3 2
15 4 4 3 3
16 5 4 4 3
17 5 5 4 3
18 6 5 4 4
19 6 5 5 4
20 6 6 5 4
21 7 6 5 5
22 7 6 6 5
23 8 7 6 5
24 8 7 6 6
25 8 8 7 6

30 11 10 9 8
35 13 12 11 10
40 15 14 13 12
45 17 16 15 14
50 19 18 17 16
Fault detection and diagnosis mehods in the absence ofprocess model 85

Table 1.A.2 Table 1.A.3

VaIues of I-Pd for the sign test VaIues of I-Pd for the sign test

n r I-Pd n r 1-P d

8 0 .00781 (5) (0) .0625


9 0 .00391 6 0 .03125
10 0 .00195 7 0 .01562
11 0 .00098 8 0 .00781
12 1 .00635 9 1 .03906
13 1 .00342 10 1 .02148
14 1 .00183 (10) (2) .10938
15 2 .00739 11 1 .01172
16 2 .00418 12 2 .03857
17 2 .00235 13 2 .02246
18 3 .00754 14 2 .01294
19 3 .00443 15 3 .03516
20 3 .00258 16 3 .02127
17 4 .04904
25 5 .00408 18 4 .03088
30 7 .00522 19 4 .01921
35 9 .00599 (20) (4) .01182
40 11 .00643 20 5 .04139
45 13 .00661 (20) (6) .11532
50 15 .00660
25 7 .04329
60 19 .00622 30 9 .04277
70 23 .00558 35 11 .04096
80 28 .00968 40 13 .03848
90 32 .00743 45 15 .03570
100 36 .00664 50 17 .03284

60 20 .02734
70 26 .04139
80 30 .03299
90 35 .04460
100 39 .0352
86 Real time fault monitoring of industrial processes

Table 1.A.4

Critical values of the rank corelation coefficient

II 0.01
~
0.1 0.05 0.02
I
.1

5 0.9 - - -
6 0.829 0.886 0.943 -
7 0.714 0.786 0.893 -
8 0.643 0.738 0.833 0.881
9 0.6 0.683 0.783 0.833
10 0.564 0.648 0.745 0.794
11 0.523 0.623 0.736 0.818
12 0.497 0.591 0.703 0.78
13 0.475 0.566 0.673 0.745
14 0.457 0.545 0.646 0.716
15 0.441 0.525 0.623 0.689
16 0.425 0.507 0.601 0.666
17 0.412 0.49 0.582 0.645
18 0.399 0.476 0.564 0.625
19 0.388 0.462 0.549 0.608
20 0.377 0.45 0.534 0.591
21 0.368 0.438 0.521 0.576
22 0.359 0.428 0.508 0.562
23 0.351 0.418 0.496 0.549
24 0.343 0.409 0.485 0.537
25 0.336 0.4 0.475 0.526
26 0.329 0.392 0.465 0.515
27 0.323 0.385 0.456 0.505
28 0.317 0.377 0.448 0.496
29 0.311 0.37 0.44 0.487
30 0.305 0.364 0.432 0.478
Fault detection and diagnosis mehods in the absence ofprocess model 87

Appendix 1. B

The Discrete Fourier Transform


The discrete Fourier transjorm (DFT) is a basic operation used in many different signal
processing applications. It is used to transform an ordered sequence of data sampies,
usually from the time domain into the frequency domain, so that spectral information
about the sequence can become known explicitly. The OFT is described briefly here.
The OFT is a complex function of frequency, that is, an ordered sequence of complex
numbers, each number consisting of a real part and an imaginary part. Usually the data
sequence being transformed is real, but it need not be, and in either case the OFT is, in
general, complex.
Suppose areal data sequence consisting of N real sampies of a signal x(t), given by
Sampies ofx(t) = [xkl = [Xo, Xl, X2, ... , xN_tl
In this notation, k is usually a time index and ranges from 0 to N-l. When [xkl is areal
data sequence, the computed OFT of [xkl consists of (NI2) +1 complex sampies (it is as-
sumed here that N is given), given by,
[Xml = DFT [xkl [Xo, Xl' X2• ... , XN12 1
=

The relationship of [Xml to [xkl is discussed later; here it is simply noted that [Xml is the
OFT of [xkl and that the index m designates the frequency of each component Xm. Also,
the OFT is complex, so each Xm can be represented in polar form as

Xm =IXm lej8m
In this notation, IXml is the amplitude of Xm, and a plot of IXml versus the frequency index
m is called the amplitude spectrum of [xkl Similarly, a plot of (}m versus m is called the
phase spectrum of[xkl.
The relationship implemented by the forward transform between [xkl and [Xml can be
expressed as,
N-l N
X m -- "" - 0, 1, ... , -
k-.xke- j(21CIJ1kIN).,m-
k=O 2
Again it is assumed here for convenience that N, the number of data sampies, is even. In
this formula for Xm, the exponential function, exp(-j(2mnklN),is a complex sinusoid and
is periodic. If one thinks of exp(-j(2mnklN) as a function of k, the time index, then its
period is seen to be Nlm; that is, when k goes through a range of Nlm, exp(-j(2mnklN)
goes through one cycle. One can see this even more clearly by separating the real and
imaginary parts of the above relation:
88 Real time fault monitoring of industrial processes

Xm =N-l (21tm)
LXkcOS - - k -
N-l (21tm)
jLxksin - - k ;
N
m=O,I, ... , -
k=O N k=O N 2
Thus, each part (real or imaginary) of each DFT component Xm is a correlation (summed
product) ofthe data sequence [xk] with a cosine or sine sequence having aperiod of Nlm
data sampies.
The periodicity of the DFT just discussed is a very important property. One can see that
adding N to the index of Xm does not change the value of Xm, that is,
N-I
X m+N =L Xk c - j [21t(m+N)kIN]
k=O
N-I
=LXkc-j(27tIIJkIN)c-j27tk =X m
k=O
Furthermore, tbis rule holds whether [xk] is real or complex; Xo through XN_I is a com-
plete set ofDFT components.
Secondly, when [xk] is real, one can also show that Xm and XN_m are complex conjugates.
One can take:
N-I
XN- m =L Xk c - j [21t(N-m)kIN]
k=O
N-I
=L XkC +j(27tIIJkIN)c - j27tk = X~
k=O
The star (*) denotes the complex conjugate, resulting from the change of sign in the ex-
ponential function. Note that if [xk] is complex, the above relation does not apply.
Thus, when [xk] is a sequence ofreal data, all values ofXm outside ofthe setXo• XI • ...•
XN12 are redundant. When [xk] is a sequence of eomplex data, all values outside ofthe set
XO' X/> ... , XN_1 are redundant.
The inverse transform.
The reverse (or inverse) DFT is used to obtain a data sequence [xk] from its complex
spectrum [Xml The formula for the inverse DFT, is
1 N-I
xk
=_ ~X .. j(21tmkIN).
~ m'" , k = 0, 1, ... , N -I
N m=O
Thus the inverse DFT is the same as the forward DFT except for the sign of the expo-
nential and the sealing factor IIN.
It ean be seen that N values of Xm, not just (NI2)+ I values, are required for the inverse
transformation. Thus, given Xo through XN12 , one could generate X(NI2)+1 through XN_I ,
Fault detection and diagnosis mehods in the absence ofprocess model 89

using the formula for XN_m before applying the inverse formula. A1ternately, the inverse
formula can be modified as folIows, assuming [xkl is real and using the above formula for
XN-m:

=-1 [ X o + NI2-1
~ X e j (27r.mJc/N) + X
L. m NI2
e fttk + N -j l ]
~ X·
L. N-m
e (2runk/N)
N m=l m=N/2+1

=N1 [ X o + (_l)k X N12 + N/2-1 NI2-1


L X m e j (2runk/N) + LX;e- j (2Mk/N)
]

m=l D=l

= ~ {XO+(-1)' X N12 +2 Rea{N~IX",ej(2Kmi"IN ]}


2 2nmk 2nmk)
=- L
N /2 (
R:Ucos---Imsin--
Nm=o N N
where [Rml and [Iml are, respectively, the real and imaginary parts of [Xml, and [R'ml is
the same as [Rml but with Ro and RN12 reduced to half-strength.
Thef~tFourieruan~ornL

Thejast Fourier transjorm (FF1) is not a new kind oftransform different from the DFT.
Instead, it is simply an a1gorithm for computing the DFT, and its output is precisely the
same set of complex values expressed in Xm. The FFT a1gorithm eliminates most of the
repeated complex products in the DFT, however, so its execution time is much shorter.
Specifically, the ratio of computing times is approximately
FFT computing time _ 1 I N
--og2
DFT computing time 2N
Using the FFT, one can also do the computation "in place", so that [Xml replaces [xkl,
with only a Iimited amount of auxiliary storage needed for work space.
On the other hand, the FFT a1gorithm is more complicated than the DFT and becomes
lengthy when N, the number of data sampies, is not apower oftwo. Thus, in many appli-
cations it is simpler and preferable to use a simple DFT a1gorithm instead of an FFT.
One final but important point about the inverse transform is that most computer routines,
in order to preserve synunetry, omit the factor lIN. Thus, the sequence [xkl is scaled to
N times its correct amplitude by most inverse DFT and FFT a1gorithms.
Power spectrum and cepsuum.
The power spectrum of an ergodic signal is derived from the Fourier transform as,
90 Real time fault monitoring of industrial processes

YT(ro) =!IJ T/2 y(t)e-jwldtI2


T -T/2
Ify(t) is real, then there is also the constraint that Y(-co) = Y*(co), wbich requires that Yr
be an even function of m and Yj be an odd function of co:
Y(co) = YACO) [even in co] + jYj(co) [odd in co]
Therefore,

is also an even function of co, and the phase,

is an odd function of co.


The above estimation ofthe power spectrum ofy(t) is not a satisfactory estimate of Y(co),
no matter how large Tis, because the variance of YJ<.co) does not tend to zero. However,
it can be used to estimate a smoothed version of Y( co) since,

1(0,(02 YT( ro)dro -;


T~oo
1(02
(0, Y( ro)dro
With tbis background, the relationsbips shown in fig. I.B.I are examined. In the upper
left corner is shown a linear system with an impulse response h(t) that is driven by a
source x(t) and produces a response or output y(t). The output y(t) is computed by con-
volving the input time record with the impulse response of the system to produce an out-
put waveform. The result of this process is that the input and the system responses are
inextricably woven and folded together so that it is not possible to gain direct informa-
tion regarding the input or the path from an observation ofy(t).
In the frequency domain, the transform ofthe output is the product ofthe Fourier trans-
form of the input X( co) and the system function H( co). The magnitude of the output is the
product of the magnitudes of the input and system functions, and the phase of the output
'Py is the sum of the phase of the input f{Jx and the phase of the system function 'Ph'
The magnitude relationship can be used to generate power spectra according to IYl 2 =
IXl2 IHl2. Because of the product relationship of the power spectra, the correlation
output is again a convolution of the input correlation function and the convolution of the
impulse response. This relationsbip is also sketched in fig. I.B.I. Thus, correlation does
not solve the problem of mixing between the input and the system response.
This separation problem can sometimes be alleviated, however, by going another step
and forming the log of the transform so that, for example,
log Y = 10glYl +jf/Jy
Fault detection and diagnosis mehods in the absence of process model 91

Clearly, the logs ofthe magnitudes ofthe input and transfer function add to produce the
log of the output magnitude. The phase of the output is a linear sum of the phases of the
input and of the transfer function. One now has a situation in which the transfer function
and the input add their properties to produce an output. This frequency domain process
may not result in an effective way to separate the source and transfer function; but if one
takes the inverse transform into the time domain, the cepstrum is generated which may
provide a way in some cases to separate the individual effects of course and propagation
path on the output cepstrum, at least to a degree that is useful for diagnostic purposes.
Since the log magnitude is an even function of frequency and the phase is an odd func-
tion of frequency, their inverse transforms are real functions, so the complex cepstrum
Cy(/) is areal function oftime. The Fourier transform ofthe log magnitude is called the
power or real cepstrum, and in situations where the phase is unknown or ignored it may
be a useful way to separate source and path effects. The inverse transform of the phase is
the phase cepstrum, and the sum of the magnitude and phase cepstra is the complete
complex cepstrum of the signal.
It is noted that if one is able to determine the input cepstrum Cil), then by a Fourier
transformation he could construct the log Fourier transform of x and by exponentiation
recreate the Fourier transform itself Having obtained the Fourier transform of x, both in
magnitude and phase, one could use the inverse transform again and get back to x(t).
Thus, there is unique and recoverable relationship between the complex cepstrum and the
variable from which it is derived. However, it is not possible to recover the initial wave-
form from the power or real cepstrum because the inverse transform of the real cepstrum
only allows to compute the magnitude ofthe Fourier transform or the power spectrum.
Inverse time transformation of the power spectrum reproduces the correlation function,
not the initial waveform, because the phase of the signal has been lost.
92 Real time fault monitoring of industrial proccsses

-'\r-
Time ~ Frequency
ll(t)~y(t) .. FT
• X(w)~Y(w)

~t
I:I_:~:W I~ ~:
B
Variable FT Transform
y(t) = x(t) * h(/) Y(co) = X(co) H(co)
1J1 = IX! [H]
cPy = cPx + cP"

B
Correlation FT Power Spectrum
Rir) = R,,(r) * Rir) 1J12 = 1X!2 . IHF

+,T* ~ -Rx
~.
-~.
- 1\0 IV Ry
_A _ T >
"7>

B
Complex Cepstrum FT Log Transform
Cir) = Cir) + C,,(r) log Y = loglJ1 + jtPy
= logIXI + log 1111 + j( tPx + tPh )
Power (Real) Cepstrum Log Magnitude
Cy(r) = Ci(r) + Ch(r) loglJ1 = loglX! + loglHl

B
Phase Cepstrum FT Phase
Crw( r) = Crpx( r) + Cq;,( r) 'Py = 'Px + 'Pli

Figure J. B. J Derivation ofthe power cepstrum.


The cepstrum has an even part (power cepstrum) and an odd part (phase cepstrum), each of
which is additive in the time frequency domains for source and path components.
CHAPTER2

ANALYTICAL REDUNDANCY METHODS

2.1 Introduction

Quantitative model-based failure detection and isolation (PDI) methods rely on the com-
parison of a system's available measurements, with a-priori information represented by
the system's mathematical model. The term quantitative is used here as contrary to the
term qualitative, denoting cruder sustem descriptions. There are two main trends of tbis
approach, namely analytical redundancy or residual-generation methods and parameter
estimation. Tbis distinction is not universally adopted and some researchers consider
these two approaches as belonging to the same category. However, for reasons of
clearer presentation the distinction is adopted here, and parameter estimation methods
will be presented in Chapter 3.
The term analytical redundancy arises from the use of analytical relationsbips describing
the dynamical interconnection between various system components. In contrast, physical
or hardware redundancy relies on replication of hardware components (sensors, actua-
tors, computers), thus increasing the reliability ofthe overall system.
Chow and Willsky, (1984), may be considered the inventors oftbis terminology, but it is
certain that sirnilar ideas were tried before them. Since then, tbis approach has been ex-
tended and evolved into what is currently termed robust FDI. Though tbis term will be
elaborated later on, it generally means FDI schemes that are robust with respect to
modeling errors and unknown (unmeasured) disturbances. Two main streams of re-
search along tbis path have been followed in the two sides of the Atlantic. In USA, par-
ity space methods have been used in many applications, wbile in Europe, observer-based
techniques were developed. However, it has been recently seen, that both approaches
are formally equivalent and just use different mathematical tools to acbieve the same goal
in robustness (Gertier, 1991).
94 Real time fault monitoring of industrial processes

The common denominator of all the approaches that will be presented in this chapter is
that decision on whether a specific fault has occurred or not is made according to the
values of characteristics quantities called residuals. These are generated from the ob-
served input-output history of the system, and the way by which they are generated
signifies each different method.
While residuals are zero in ideal situations, in practice, this is seidom the case. Their de-
viation from zero is the combined result of noise and faults. If the noise is negligible, re-
siduals can be analyzed directIy. With any significant noise present, statistical analysis
(statistical testing) is necessary. In either case, a logical pattern is generated, showing
which residuals can be considered normal and which ones indicate a fault. Such a pattern
is called the signature of the failure.
As is common in other fields of science, there exist a variety of possible cIassifications of
methods. With respect to the different sectors of a system where faults can occur, one
may distinguish between instrument fault detection (IFD), actuator fault detection
(AFD) and component fault detection (CFD). While early attempts were not concen-
trated on any one of these categories separately, increased complexity of robustness re-
quirements has forced researchers to attack each of these in isolation to the others. It is
also true to say that most work has been directed towards IFD, since sensor information
is very crucial to the safe operation of any system. With respect to noise modeling, ana-
Iytical redundancy methods may be cIassified into stochastic and deterministic. In the
former, explicit modeling of the noise is present while in the latter noise is taken into ac-
count without any distribution assumptions. Early attempts tend to fall into the first
category, while current robust techniques employ the second method.
The main technological areas where the methods of this Chapter have been applied are:
aerospace engineering, automotive engineering, machining applications, nuclear engi-
neering, chemicallpetrochemical engineering, power plant and power transmission appli-
cations.
The structure of this Chapter is as folIows: the main concepts of analytical redundancy
methods is presented first. This incIudes modeling and performance criteria. Next,
methods of stochastic modeling are briefly considered, mainly for introductory purposes.
This is followed by current techniques, aiming at robust FDI, based on deterrninistic
models. Finally, specific examples from industrial applications are presented, which iIIus-
trate the varlous methods which have been considered.

2.2 Plant andlai/ure models

Most model-based failure detection and isolation methods rely on linear discrete-time
state-space models. Since most diagnostic computations are performed on sampled
data, this represents a reasonable form. This implies that for non-linear plants, any non-
linearity is linearlzed around some operating point. Note, however that some method-
Analytical redundancy methods 95

ologies can be extended to explicitly non-linear models, especially if they can be de-
composed into static nonlinearities and linear dynamics (Gertier et al., 1991). Also,
continuous-time plants are represented by their discretized model. It must be empha-
sized however that the type of model used serves the proposed solution method, this be-
ing the cause of the many different representations one sees in the fault detection litera-
ture. Thus, frequency or z-domain representations have been lately considered, which
exploit the additional information carried by the spectrum ofthe process.
Plant parameters may be varying with time. "Normal" variations are usually small and
slow compared to the dynamics of the plant. Such variations will be neglected here for
the sake of simplicity. Abrupt and/or significant changes, on the other hand, may and
should be considered as multiplicative process faults. In addition, additive faults, e.g.
biases on the different parts of the system are taken into account.
The state-space model relates the state vector x(k) to the input vector u(k) and output
vector y(k) using known system matrices A, B, and C. The well-known state equations
describing the nominal (fault-free) system are:
x(k + 1) = Ax(k) + Bu(k) (2.1)
y(k) = Cx(k) (2.2)
The dimensions of the state, input and output vectors are n, rand m respectively. An
equivalent input-output model may be presented in shift-operator form, with matrices
G(z) and H(z) consisting of elements that are polynomials in the shift operator z and H
being a diagonal matrix:
H(z)y(k) =G(z)u(k) (2.3)
The matrices ofthe input-output model are related to those ofthe state model by,

G(z) = C[Adjoint(Iz - A)]B


(2.4)
H(z) =[Determinant (Iz - A)]I
Note that (2.3) usually can be simplified by eliminating, row by row, the common factors
in the H(z) and G(z) matrices.
Introducing/p(k) for additive process faults and wpCk) for (additive) process noise, and P
and Q for their coefficient matrices, the state equation (2.1) becomes,
x(k + l) = Ax(k) + Bu(k) + Pfp(k) + QWp(k) (2.5)

while the inptit-ouput equation (2.3) becomes,


H(z)y(k) = G(z)u(k) + L(z)fp(k) + M(z)w pCk) (2.6)
96 Real time fault monitoring of industrial processes

Here the matrices Land Mare obtained in accordance with (2.4), with B replaced by P
and Q, respectively. Note that the presence ofthe new terms in (2.5) may influence H(z)
and G(z) since L(z) and M(z) interfere with the simplification ofthe equations.
Introduce now fu(k) and!y(k) for the additive measurement fault (bias) on the input u(k)
and output y(k), and wu(k) and wy(k) for the respective measurement noise. With these,
the measured input fi( k) and output j( k),
fiCk) = u(k) + f u(k) + W u(k)
(2.7)
j(k) = y(k) + f y (k) + W y (k)

For controlled inputs, there is no sensory measurement; instead, u(k) is the control signal
and fiCk) its implementation by the actuators, withfu(k) representing any actuator mal-
function and wik) the actuator noise. Altematively, inputs may be c1assified into three
groups: measured inputs, um' controlled inputs U c and disturbance inputs ud'
Finally, introduce M(k), AB(k), and AC(k) for the discrepancies between the model
matrices Ä, Band C and the true system matrices A, B, and C:
Ä=A+AA(k)
B=B+AB(k) (2.8)
C=C+AC(k)
Such discrepancies may account for multiplicative plant faults. To obtain a complete de-
scription of the system with all the possible faults and noises taken into account, the true
variables u(k) and y(k) expressed from (2.7) and the true matrices A, B, C from (2.8) are
to be substituted into (2.5) and (2.2). The model becomes,
x(k + 1) = (A + AA(k»x(k) + (B + AB(k»(u(k) + fu(k) + w u(k») + Pfp(k) + Qwp(k)
y(k) = (C + AC(k»x(k) + fy(k) + Wy(k) (2.9)

or,
x(k + 1) = Ax(k) + Bu(k) + AA(k)x(k) + AB(k)u(k)
+ B~ (k) + Bwu(k) + AB(k)fu(k) + AB(k)wu(k) + Pfp(k) + Qwp(k)

y(k) = Cx(k) + AC(k)x(k) + fy(k) +Wy(k) (2.10)

Now ifthe various terms are lumped into similar groups one obtains the system,
x(k + 1) = Ax(k) + Bu(k) + ~ (k)t; (k) + D1d1(k)
(2.11 )
y(k) = Cx(k) + F; (k)f2 (k) + D 2d2 (k)
where the meaning of the various terms is clear from comparison of (2.11) and (2.1 0).
The main point to note is that uncertainty (faults and disturbances) is generally split into
two categories:
Analytical redundancy methods 97

• structured uncertainty, acting upon the system as additive faults and disturbances and
represented by unknown time-functions multiplied by known distribution matrices,
and
• unstructured uncertainty, wbich describes multiplicative (parametric) faults and
modeling errors and is represented by unknown matrices multiplying known
(observed) variables.
Further specialization of (2.11) is possible in specific situations, where it is clarified what
is considered a fault to be detected and what is considerd a disturbance to be ignored.

2.3 Design requirements

Model-based fault diagnosis can be defined as the detection, isolation and characteriza-
tion of faults in system components from the comparison of available measurements.
These three desired functions are stated in order of increased difficulty. Detection is
performed by all traditional methods, while isolation and characterization (size and pos-
sible time of occurrence of fault) are acbieved using more sopbisticated algorithms. A
fault detection method should usually possess the following characteristics:
• Low detection delay time (td): tbis is usually minimised for a fixed false alarm rate.
• High rate of correct detections (PtJJ.
• Low rate of/alse alarms (PI)'
• Isolability: is the ability to distinguish (isolate) faults and depends on the statistical
tests employed and the structure of the system matrices.
• Sensitivity: characterizes the size of faults that can be isolated under certain condi-
tions. It depends on the size of the respective matrices and noise properties and is
closely related to the detection delay time. Some researchers use tbis term in a dif-
ferent context wbich is related to the robustness requirement but we prefer tbis
definition since it defines a distinct aspect of an FDI algorithm.
• Robustness: is the ability to isolate faults in the presence of modeling errors and/or
unknown disturbances. Tbis is a most serious requirement, since such errors are
practically inevitable and therefore greatly affects the previous properties. Since
modeling errors appear as multiplicative faults, false alarms are triggered if the ro-
bustness issue is not taken into account in the design process. It has by now become
clear that the most essential requirement for a model-based FDI algorithm is robust-
ness to disturbances as weil as to model-system mismatches. Tbis is not at all a
straightforward problem and as will be seen in the following sections, approximate
solutions are found in practice.
Performance measures for the above requirements vary, and will be cited with the spe-
cific methods.
98 Real time fault monitoring of industrial processes

2.4 Methods ofsolution

The procedure of evaluation of the redundancy given by the mathematical model of the
system, described by any of the models of Section 2.2 can be roughly divided into the
following two steps:
1. Residual generation.
2. Residual analysis: decision and isolation ofthe faults (time, location, sometimes also
type, size, and source).
The analytical redundancy approach requires that the residual generator performs some
kind ofvalidation ofthe nominal relationships ofthe system, using the actual input 11, and
measured output y. The redundancy relations to be evaluated can simply be interpreted
as input-output relations of the dynamics. If a fault occurs, the reduncancy relations are
no longer satisfied and a residual, ~, occurs. Tbe residual is then used to form appro-
priate decision functions. They are evaluated in the fault decision logic in order to moni-
tor both the time of occurrence and location of the fault.
A more detailed structural diagram of the overall FDI procedure is depicted in fig. 2.1.
Note that for the residual' generation three kinds of models are required: nominal, actual
(observed) and that of the faulty system. In order to achieve a high performance of fault
detection with low false alarm rate, the nominal model should be tracked and updated by
the observation model.

Decision raa king


• Decision functio
generation
* Fault decision
logic

Input Outputs
U y

Figure 2.1 General architecture ofFDI based on analytical redundancy


Analytical redundancy methods 99

Basically, there are three different ways of generating fault-accentuated signals using
analytical redundancy: parity checks, ob server schemes and detection filters, all of them
using state estimation techniques. The resulting signals are used to form decision func-
tions as, for example, norms of likelihood functions. The basis for the decision on the
occurrence of a fault is the fault signature, i.e. a signal that is obtained from some kind
of faulty system model defining the effects associated with a fault.
Deterministic methods tor FDI use deterministic state variable methods to generate re-
sidual quantities. The detection, isolation and further diagnosis of faults is achieved using
these residual quantities. Careful design of the residual can facilitate the use of tighter
bounds in the form of threshold levels for detection and isolation.
If the dynarnical system with a number of possible faults can be described by the input-
output relation,
y(s) = GuCs)u(s) + Gfs)f(s)
where y(s), u(s), fis) are the Laplace transformed output, input and fault vectors re-
spectively then each component of the residual vector r(t) generated by means of a de-
terministic model, should satisfy the condition:
r(t) = 0 ifand only if f(t) = 0 (2.12)
where fit) is considered to act upon the dynarnics of the process in an additive manner.
From a practical point of view it is reasonable not to make further assumptions about the
fault vector fit) except that it is an unknown time function.
The general structure for all deterministic residual generators, based upon the concept
above is shown in fig. 2.2. This structure is expressed mathematically in the frequency
domain as:

res) =[Hu(s) H y(s)] [U(S)]


y(s)
=Hu(s)u(s) + H y(s)y(s) (2.13)

u(s) Y(s)

r(s)
+ IResiduals
~._-----------_._--- . _------;
Figure 2.2 General structure of a residual generator
100 Real time fault monitoring of industrial processes

The transfer matrices HII(s) and Hy(s) are realizable using stable linear systems. In order
to make the residual r(s) become zero for the fault-free case (i.e. to achieve require-
ments in Eq. (2.12», His) and His) must satisfy the null condition:
Hu(s) + Hy(s)Gu(s) =0
Eq. (2.13) is a generalized representation of all residual generators. The design of the
residual generator results simply in the choice ofthe transfer function matrices HII(s) and
His) which must satisfy the null condition. The various ways of generating residuals
correspond to different parameterizations of His) and His). One can obtain different
residual generators using different forms for HII(s) and His) and using the design
freedom, the desired performance ofthe residual can be achieved.
A fault can be detected by comparing the residual with adecision or threshold fimction
DF<r) according to the test,
Dp(r) ~ T(t) for f(t) =0
{
Dp(r) > T(t) for f(t) 0 *
1fthis test is positive (Le. the decision function is exceeded by the residual), a likely fault
is hypothesized. There may also be a likelihood that the decision function will be
exceeded even if there is no fault. This would lead to a false alarm in detection as a
consequence of modeling errors (in the determination of the residual or in the
determination of the decision function) or unknown (Le. unexpected) disturbances
affecting the residual. The simplest method of deciding whether or not there is a fault is
to use a fixed threshold applied to the residual signal r(t).
There is a rieh variety of methods available for quantitative model-based residual gen-
eration, including:
Observer approaches. The underlying idea is the estimation of system outputs from the
measurements (or a sub set of measurements) by using either full order or reduced order
state ob servers. A suitable weighting of the output estimation error is then defined as a
residual, aecording to the general structure given in Eq. (2.13) and fig. 2.2. Methods for
the selection ofthe ob server gains include:
(a) The Unknown Input Observer (UIO) method (Watanabe and Himmelblau, (1982),
Massoumnia, (1986), Wünnenberg and Frank, (1987)).
(b) Eigenstructure assignment to give disturbanee decoupling (patton and Kangethe,
(1989), Patton and ehen, (1991)).
The state ob server approach has become a popular approach due to the flexibility of de-
sign, the relative ease in achieving robustness in fault detection and fault isolation, the
algorithmic and software simplicity, and speed of response in detecting and isolating
faults. However, it does not provide fault size information.
The parity relations approach is based either on a teehnique of direct redundancy, mak-
ing use of the static algebraie relations between sensor and actuator signals, or alterna-
Analytical redundancy methods 101

tively upon temporal redundancy, when dynamic relations between inputs and outputs
are used (differential or difference equations). The term "parity" was first used in com-
puter systems to enable "parity checks" to be performed for error checking. In the FDI
field, it has similar meaning in the context of providing an indicator for the presence of a
fault (or error) in system components. The key idea is to check the parity (consistency)
of the mathematical equations of the system (analytical redundancy relations) by using
the actual measurements. A fault is declared to have occurred once preassigned error
bounds are surpassed. Tbis method does not provide fault size information either.
In the early developments, parity space methods were applied to parallel redundancy
schemes (potter and Suman, (1977), Desai and Ray, (1981), Chow and Willsky (1984».
For such system configurations, the number ofmeasurements is greater than the number
of variables to be sensed and the residuals can be obtained directly from the redundant
measurements. Inconsistency in the measurement data is then a metric that can be used
initially for detecting faults and, subsequently for fault diagnosis. Mironovskii (1980),
has independently derived similar relations in the Soviet Union.
Stochastic modeling methods for fault diagnosis are based on statistical testing of the
innovations (i.e. the residuals) ofKalman filters or other filters and can be used for both
fault detection and isolation, by means of hypothesis testing.
Whilst using a similar structure to the observer, approaches based on the Kalman filter
comprise a residual generation mechanism derived by means of a stochastic model of the
dynamical system. In normal operation the Kalman filter residual (or innovation) vector
(the difference between the measurements and their Kalman filter estimates), is a zero-
mean wbite noise process with known covariance matrix. Mehra and Peschon, (1971)
proposed the use of different statistical tests on the innovation to detect a fault of the
system. The idea wbich is common to all these approaches is to test, amongst all possible
hypotheses, that the system has a fault or is fault-free. As each fault type has its own
signature, a set of hypotheses can be used and checked for the likelihood that a particu-
lar fault has occurred.
Main methods include:
(i) Chi-squared testing (Mehra and Peschon (1971), Willsky et al., (1974a, 1975);
Watanabe et al., (1979, 1981». Tbis is just an alarm procedure, i.e. it does not
provide fault location or size information.
(ii) Sequential Probability Ratio Testing (SPR1): The purpose of tbis test is to check
the zero-mean property of the innovations. The decision is based on the value of
the likelihood ratio of the p.d.f. of the innovations under the null and altternative
hypotheses. Usually the decision space is divided into three regions: fault, no fault
and repeat. In tbis sense the test is sequential. Cbien and Adams, (1976), Deckert
et al, (1977), Yosbimura et al, (1979), Bonivento and Tonielli, (1984) and Uosaki,
(1985) are the main contributors to tbis approach. Tbis is also an alarm procedure.
102 Real time fault monitoring of industrial processes

(iii) Generalized Likelihood Ratio (GLR) testing. The philosophy ofthis approach is as
follows: a Kalman-Bucy filter is implemented on the assumption of no abrupt syst-
em changes while a secondary system monitors the measurement residuals of the
filter to determine if a change has occurred and adjusts the filter accordingly.
Decision of fault occurrence is based on the value of the generalised likelihood ratio
ofthe no-fault and fault hypotheses (Willsky and Jones, (1976), Ono et al, (1984),
Kumamaru (1984), Pouliezos and Stavrakakis (1987) and Tanaka and Müller
(1990». This method performs all required tasks, i.e. fault detection, isolation and
estimation, thus it is possible to perform automatie system reorganization in the
case of soft failures. These qualities are however offset by its low robustness.
(iv) Multiple Model Adaptive Filters (MMAFs). In this early development a bank of
linear filters based on different hypotheses concerning the underlying system behaviour is
constructed. The innovations of the various filters are monitored and the conditional
prob ability that each system model is correct is computed. The system with the highest
probability is declared to be the correct one (Lainiotis (1971), Athans and Willner
(1973), Willsky et al. (l974b». The same comments as for the GLR method apply here

2.5 Stochastic modeling methods

In this section it is considered the problem of detecting changes in linear, possibly time
varying, stochastic dynarnical systems described by,
x(k + 1) = A(k)x(k) + B(k)u(k) + w(k) (2.14)
y(k) =C(k)x(k) + D(k)u(k) + v(k) (2.15)

where wand v are zero-mean, independent, white Gaussian sequences with covariances
defined by,

E{w(k)w T (j)} =Qökj


(2.16)
E{v(k)v T (j)} =R8kj

where 8kj is the Kronecker delta. Eqs. (2.14)-(2.16) describe the "normal operation" or
"no failure" model ofthe system ofinterest. Ifno failures occur, the optimal state estima-
tor is given by the discrete Kalman filter equations,

x(k + llk) = A(k)x(klk) + B(k)u(k) (2.17)

x(klk) =x(klk -1) + K(k)r(k) (2.18)

r(k) =y(k) - C(k)x(klk -1- D(k)u(k) (2.19)

where r(k) is the zero-mean, Gaussian innovation process, and the gain K(k) is calcu-
lated from the equations,
Analytical redundancy methods 103

P(k + llk) = A(k)P(klk)AT (k) + Q (2.20)

V(k) = C(k)P(klk -1)C T (k) + R (2.21)

K(k) =p(klk -1)C T (k)V-1(k) (2.22)

p(klk) = p(klk -1) - K(k)C(k)P(klk -1) (2.23)

Here p(ilj) is the estimation error covariance of the estimate x(ili), and V(k) is the
covariance of r(k). Eqs. (2.17)-(2.23) are referrd to as the "normal mode filter" in the
sequel.
In addition to the above estimator, one may also have a c10sed loop controllaw, such as
the linear law
u(k) = G(k)x(klk) (2.24)

Since the statistics of the innovations sequence is completely known under normal con-
ditions, a number of tests can be devised to check if the observed statistics are the ex-
pected. The relevant properties are: zero mean, independence and known covariance.
These tests can be used as fault alarms, in situations where tbis is desirable, or as first
level fault detectors in more sopbisticated algorithms. Both approaches will be examined
in the following.

2.5.1 Simple tests

Mehra and Peschon (1971) were among the first to propose several statistical tests for
detection of changes in the system (2.14)-(2.15). Since then various other tests have
been proposed and tested in various real applications. It is obvious that the relevant the-
ory is very extended and cannot be described in detail here. The interested reader is re-
ferred to the excellent book ofBasseville and Nikiforov (1993), in wbich the statistical
change detection theory is presented in detail.
Some of the results presented in tbis section are specialised versions of general algo-
rithms presented in Chapter 1. However, for reasons of clarity, they will be reintroduced
here for the KaIman filter's innovation sequence.
For hypothesis testing purposes, it is more convenient to consider the standardized inno-
vation sequence defined by,
1
,,(k) = (C(k)P(klk -l)C T(k) + R(k)t 2r(k) (2.25)
1
where (.) 2 denotes the square root of the inverse of a matrix. Then,
104 Real time fault monitoring of industrial processes

(2.26)
It is also usual in fault detection applications to use moving windows of data. Tbis re-
sults in lower detection delays, since the estimators do not have infinite memory. For
tbis purpose, define,

71T(k,n w )= [T
71 (k-n w +1)::T
71 (k-n w +2) ::T
71 (k-n w +3) :
: ...:: T ]
71 (k)

(2.27)
to denote a collection of nw residuals ending at time k.

2.5.1.1 Tests ofmean

These tests check whether the observed innovation sequence is zero mean or not. The
mean of the innovation sequence is estimated as,
1 N
=-
A

iiN L7I(i) (2.28)


N j=l

where the subscript is used to signify the dependence on the sampie size N, and iiN de-
notes the true mean. Under the null (no-failure) hypothesis, qN has a Gaussian distribu-
tion with zero mean and covariance,

(2.29)

Therefore at the 5 per cent significance level, the null hypothesis is rejected whenever,

~ I 1.96 I
I 71N > .JN (2.30)

The above test suffers from the fact that the covariance of rt...k) is assumed known. A
better test is the ]2-test wbich use the ]2-statistic,
2 A
T = NiiNCN 7IN
A-l~T
(2.31)
A

where C N is the sampie covariance calculated by (2.35) (folIows). Tbis test is uniformly
most powerful among all the tests for zero mean wbich are invariant with respect to
scaling (or covariance), see Anderson (1958).
If a sliding window is used, recursive calculation of the window mean is also possible,
using,
'(k) = ,,(k - nw ) - ,,(k) (2.32)
Analytical redundancy methods 105

11nW (k) = 11nW (k -1) - -'(k)


A A l

(2.33)
D w

where k denotes the current time and lfw the window length..
The above tests assume residual independence, a condition that may be violated if certain
faults (non-additive) have occurred. In tbis situation, non-parametric tests ofTer greater
robustness. Such a test is the multivariable component sign test (Bickel, 1965). Tbis
test uses a sign statistic for each component ofthe vectors and combines them in a quad-
ratic form. Define,

where,
k
Sj,nw (k) = L sgn l1j(t) (2.34)
t=k-nw +l
and the sgn function is defined as,
z>O
sgn(z) ={ ~ z=O
-1 z<O
Nowform,
S:w (k) =SJw (k)W;l(k)Sn
w w
(k) where
k
Lsgnl1j(t);gnl1t(t) for 1 ~ i ~ m and 1 ~ f. ~ m

The test rejects Ho (detects change) for values of S;... (k) greater than X~.

2.5.1.2 Tests ofcovariance

The covariance ofthe innovation sequence is estimated by,


1 N T
= N -1 ~ (11(i) -
A A A

CN fiN )(,,(i) - fiN ) (2.35)

Division by (N-l) instead of N produces unbiased estimates for smaal sampies. Under the
A

null hypothesis of no change, CN has a Wishart distribution (Anderson, 1958). The


trace of CN has a X 2 distribution with (N-I)m degrees of freedom. Thus CN can be
tested for its null hypothesis covariance equal to an identity matrix.
106 Real time fault monitoring of industrial processes

If moving windows are used, relevant recursive expressions for scalar signals are
(pouliezos, 1980a),

1 (2(k)-11 2(k-nw )+11 2(k)]


Cn (k)=cn (k-l)+_I_(2(k)ffn (k_l) __
W W nw-l W nw

where (k) and ffn W


(k) are defined by (2.32), (2.33).

• The sequential probability ratio test


A more sopbisticated approach for detecting changes in the covariance of the innovation
is to use Wald's Sequential Probability Ratio Test, SPRT (Wald, 1947).
In tbis context the two hypotheses to be decided upon are (for scalar outputs):
Normal mode (Ho): E{ 11n } =0, Var{ 11n } = 1
Failure mode (H1): E{ 11n } =0, Var{ 11n } =.L12 > l.
In order to cany out the test for these two hypotheses by conventional SPRT formula-
tion, the logarithm ofthe joint likelihood ratio function (LLR) IS computed,

A = log P(11J' 112' ... , 11n /H J) (2.36)


n P(11J' 112' ... , l1n/Ho)
and compare with two threshold values A * and B* (A *>O>B*) derived from the specified
error probabilities for false alarm (a) and miss alarm (ß), i.e.,

A*=log--
l-ß
a

B*=logL (2.37)
l-a
If the LLR exceeds the boundary A * or falls below the boundary B*, the observation is
terminated with acceptance of the hypothesis H 1 (failure mode) or the hypothesis Ho
(normal mode), respectively. Otherwise, the observation is continued and decision is
deferred. It is noted that LLR can be computed recursively by,

_ p(11n IH J) A 2 -1 2
An - An- J + log / An-l + - - 2 - 11n -logA (2.38)
P(11n Ho) 2A
and the test can be performed recursively aS new observations come in.
A failure detection system based on the conventional Wald's SPRT formulation given
above, on the average, minimizes the time to reach adecision for specified error prob-
abilities if the system is either in the failure mode or in the normal mode from beginning
of the test. However, the failure process considered here is characterized by the system
being initially operated in the normal mode and then transition occurs to the failure mode
Analytical redundancy methods 107

at a random instant during observations. The LLR (2.36) will show, on the average, a
negative drift when the system is in the normal mode and a positive drift when it is in the
failure mode, and thus the detection system suffers an extra time delay in compensating
for a negative quantity accumulated under the normal mode before transition to the fail-
ure mode (fig. 2.3). To improve on the performance ofthe conventional SPRT, Uosaki
and Kawagoe, (1988) proposed a modified version called backward SPRT.
Taking into account the change time 0, the two hypotheses can be restated as:
Ho (normal mode): E{ 'ln-i+I} = 0, Var{ 'ln-i+d = 1
H I (normal mode): E{ 'ln-i+l} = 0, Var{ 'ln-i+l} = ..12 > 1, ;=1, 2, ... , n-9t 1 (2.39)
Define a backward LLR (logarithm of likelihood ratio function computed in reverse from
the current observation to the past ones) by,

1B 1 P('ln,'ln-I' ... ,'ln-kIHI)


k
I\.
11,
= og P ('ln,'ln-I' )
... ,'ln-kIHo

= ..1
2 -1
2
N
L (2 2..12
Di - - 2 - log..1
J
;k =1,2, ...• D
2..1 i=n-k+1 ..1 -1
When the backward LLR is applied to test the hypotheses (2.39) as in SPRT (called
backward SPR1), no extra time delay will be introduced, since in the backward SPRT,
the variance of the normalized innovation is not unity from the beginning for hypothesis
H I corresponding to the failure mode. Furthermore a negative quantity accumulated un-
der the initial normal mode in the conventional SPRT scarcely appears (fig. 2.3).

Backward LLR can be expressed with the conventional LLR as,


2- L (2'li
..1 1
AnB•k =--2- 11 2..1 2
- - 2 - log..1
J
2..1 i=11-k+l L1 - 1

..1 2-1 11 (2
=--'" 2..1 2 J-..1-2--1' n"- k(2'1. ---log..1
2..1 2 J
2..1 t 2..1 t
'1. ---log..1
2 ..1 1
1 2 - ..1 -1
2 1 2

=A.,,-A.,,-k; k= 1,2, ... , n (2.40)


Since, we have interest onIy in detecting the failure mode, the decision rule is defined as
folIows:

If the backward LLR statistic A!k>K for some k=1, 2, ... , n, where K is a
suitable constant, observation is terminated with acceptance of the hypothe-
sis that system is in the failure mode. Otherwise, observation is continued as
system is not likely in the failure mode.
108 Real time fault monitoring of industrial processes

Var{n}

~~--------------~
failure (H,)

~n----------------+"a~-------------k
I
I
I
I
I normal (Ho)

O..::.....---''''""""''----o...,.-o-----,~---'---L--'----'----'---'----'---'--k
n n-4 n-6 'a 4 3 2 1
,I
Figure 2.3 Backward SPRT failure detection system and trajectory ofbackward LLR
The alternative expression of the backward LLR given by (2.40), leads to the decision
role for acceptance of the hypothesis that system is in the failure mode as,
A.n - A.n- k = A.:'k > K for some k = 1, 2, ... , n
or,
A.n - min A.k > K (2.41)
I<k<n

It is easy to show that this decision role is equivalent to the following decision role:
If,
.Li. 2 -1 2
Sn=max [ 0, 2L1 2 Tln -logL1+Sn_1] >K (2.42)

with So=O, then decide that system is in the failure mode.


Mean detection time of backward SPRT. To determine suitable reference values .Li.
and decision boundary K, the mean detection time (MD1) ofthe backward SPRT should
be considered. Here MDT is defined as the mean number of observations until Sn ex-
ceeds the decision boundary K when the innovations variance changes to ~ . To obtain
the relation ofMDT to.Li. and K, first divide (-00, (0) into N subintervals, (-00, 0], (0, WJ,
... , «N-3)W,KJ, and (K, (0), with W=K/(N-2), and define states of a stationary finite
Analytical redundancy methods 109

Markov chain as EI=(-oo, 0], E =«i-2)W, (i-I)W]; i=2, ... , N-I, and El'F(K, 00). It
j

should be noted that the states EI and EN are reflecting and absorbing, respectively. Let
P be the transition probability matrix of this Markov chain with components Pij=Pr(Sn_I
EEj and SnEE}; ij=I, ... , N) which are given by,
PlI = Pr{z - h :S O} = F(Dh)
Pli =Pr{(i - 2)W:S z - h < (i -1)W}

=F(D(U -1)W - h) - F(U - 2)W + h)); i = 2, ... ,N -1


PiN =Pr{z - h > K} =1- (D(k + h))

p,,: Pr{Z- h"(-i +%)w}: F( D((-i +%}W + b)) i:2, . .,N -I)
Pij = pr{(j -i -~)W:S z -h « j -i +~)W}

:+((j-i+~)w +b ))-F(~(j-i -~)w +b)} i,j:2, . . ,N-1


Pov :Pr{Z-b>(N-i-~)w}:I-+((N-i -~)w +b)} i=1, . . ,N-1
PNi = 0; i = 1,2, ... ,N-l
PNN =1 (2.43)
where,

A 2 -1 2A2
z =--2-' h = logA, D = ( 2 J 2 (2.44)
2A \ A -1 Li,;

and F(x) is the cumulative distribution function of the X2 distribution with one degree of
freedom, i.e.,

F(x) +J -:-exP(-~)dV =2(/)(~)-1


= v2n x
-oovv 2
(tP(x) is the c.d.f of the standard normal distribution). Then, the transition probability
matrix P can be expressed by,
110 Real time fault monitoring of industrial processes

PI.l P12 PI,N-l PI,N


P21 P22 P2,N-l P2,N
p=

PN-1,1 PN-1,2 PN-l,N-l PN-I,N

0 0 o 1

=[: ~] (2.45)

and the mean absorption time vector starting from state Ei is given by,
JJ =(I - R)-I,

where'=[1 1 ... Ir
Since the system starts from the normal mode, the first component of JJ gives the MDT
by an N-discrete state Markov chain approximation. The MDT can be obtained by sim-
ple extrapolation, as the number of states of discrete state approximation N, goes to in-
finity. It is assumed that the MDT for a large number of states N, is expressed by,
A
)J(N) =)J(oo) + N

and )J(oo) is determined by a least squares method from )J(N) for several values of N. To
determine the reference value .Li and decision boundary K, it is possible for example, to
determine the reference value as the greatest tolerable innovation variance change, and
then find the decision boundary K referring the value of the MDT of normal mode
(.tt; = 1), which corresponds to the inverse ofthe probability offalse a1arm.

2.5.1.3 Tests ofwhiteness

The most important property of the innovation sequence is whiteness or independence at


different time instants. Usual tests of mean and covariance assume that the innovation
sequence is white. Therefore, it is important to test the innovation sequence for white-
ness first, using tests which are invariant with respect to the mean and covariance of the
distribution.
Most of the tests of independence are based on the autocorrelation matrix C( -r) of a sta-
tionary process for lag -r= 1,2, ... defined by,
C(-r) =E{(11(i)-ii)(11(i --r) _ij)T} (2.46)

C( -r) is usually estimated as,


Analytical redundancy methods 111

(2.47)

Now, CN(-r) is an asymptotically unbiased and consistent estimate ofCN<-r) (Jenkin and
Watts, 1968). Under the null hypothesis CN(-r), -r=1, 2, ... are asymptotically independ-
ent and normal with zero mean and covariance of I/N. Thus they can be regarded as
sampies trom the same normal distribution and must lie in the band ±1.96/..JN more than
95 per cent ofthe times for the null hypothesis.
Another statistic that can be used for testing independence between the components of
the innovation vector is the sampie correlation coefficient defined as,
N
L(71a (i) - if)(71 P(i) - ffP)
(2.48)

where the superscripts a and P indicate the components of the vector 71. Anderson,
(1958) shows that under the null hypothesis the distribution of pr;! is,

where r [.] denotes the gamma function. This statistic is invariant with respect to the
mean and the covariance of rl...k). It is particularly useful in the present case since the true
mean and covariance of rl...k) are unknown. This statistic can also be used for testing
whiteness by defining p'f! for different lags.

2.5.1.4 Two-stage methods

The previously presented simple tests provide fault alarms and are useful when more so-
phisticated algorithms cannot be used, because of speed or other limitations. An alterna-
tive procedure in such situations would be to implement a two-stage method (pouliezos
and Stavrakakis, 1991). In this approach partial fault isolation is achieved by performing
a combination of simple tests, and then use a fault table to limit the possible faults. Such
a fault table is the following:
112 Real time fault monitoring of industrial processes

Zero mean Whiteness Stationarity


Plant bias No Yes Yes
Change in A No No
Change in Q Yes No No (Yes*)
Measurement bias No Yes Yes
Change in C No No
Change inR Yes No No(Yes*)
• After fault effects have settled.
If for example, correlated residuals, but of zero mean, are detected, then an increase in
the covariance matrix ofthe noise sequences has taken place. Following tbis, estimation
of the additional noise is possible by appropriate algorithms. In the case of additional
sensor noise, Pouliezos et al. (1993), estimate the additional noise covariance matrix S
for the system described by the time-invariant equivalent of (2.14)-2.23), using the fol-
lowing expressions:

S = Coo - V - CAE(CA)T
E = (1- KC)AE[(1- KC)A]T + K(Coo - V)K T - KCAl{KCA)T
where V and K denote the state steady values of the residual covariance and filter gain
A

respectively, as given by (2.21), (2.22). C00 denotes an estimate of the residual covari-
ance obtained from a large number of sampies.
Tbis is an algebraic equation in the elements of E , which can be solved directly for its
distinct n(n+ 1)/2 elements. Let,
(1= [ u11 u12 ••• u1n u21 ... u2n ... um]
be the vector of the unknown elements of E. Then using brote force algebra,
(I =(I - T)-lf

t11ll t11l2 tIT


t12
ll t 12 nn
t12
12
T=

ll t 12
tnn t;:
nn
f=[f ll f 12 f 1n f 21 ... f 2n ... fnnl
Analytical redundancy methods 113

n n
t~!
1)
= A)'yA
"
i x -Ai,L...J"
x "M). kAky-A).y"
,LJ Mi"kAk x
k=1 k=l

L = K(Coo - V)K T
M=KC
(suffices denote respective matrix elements). In particular, ifthe system is scalar, its so-
lution is,

s=(8 -v)( 1-
00
(cak) 2
[(1- kc)a]2- 1
J- 1

2.5.2 The Multiple Model (MM) method

The MM method was originally developed for problems of system identification and
adaptive control (Athans and Willner (1973), Lainiotis (1971)).
The basic MM method deals with the following problem: the inputs u(k); Ir-O, 1, 2, .. ,
and outputs y(k); Ir-l, 2, ... of a system which is assumed to obey one of a given finite
set of linear, possibly time-varying, stochastic models, indexed by ;=1, ... , N, are ob-
served:
xj(k) = A j (k)x 1 (k) + Bj(k)u(k) + wj(k) + gj(k)
y(k) =Cj(k)xj(k) + vj(k) + bj(k)
where w;(k) and v;(k) are independent, zero-mean Gaussian white noise processes, with,

E{Wj(k)w; (j)} = Qj(k)8jk

E{vJk)v; (j)} = Rj (k)8 jk

As usual, the initial state xj(O) is assumed to be Gaussian, independent ofwj and Vj, with
mean xj(OIO) and covariance Pj(OIO). The matrices Aj(k), Bj(k), Cj(k), Qj(k), and Rj(k)
are assumed to be known. Also, bj(k) and gj(k) are given deterministc functions of time
(corresponding to biases, linearizations about different operating points, etc.). In addi-
tion, the state vectors x;(k) may be of different dimensions for different values of i
(corresponding to assuming that the different hypothesized models represent different
orders for the dynamics of the real system). Note that this is a discrete-time formulation
of the MM method. Continuous-time versions can be found in the literature (Greene,
1978), and they differ from their discrete-time counterparts only in a technical and not in
a conceptual or structural manner.
114 Real time fault monitoring of industrial processes

Assuming that one of these N models is correet, we now have a standard multiple hy-
pothesis testing problem. That is, let Bi denote the hypothesis that the real system corre-
sponds to the ith model, and let PiCO) denote the a-priori probability that Bi is true.
Similarly, let Piek) denote the probability that Bi is true based on measurements through
the kth measurement, i.e. given I k = {u(O), ... , u(k-I), y(1), ... , y(k)}. Then Bayes' rule
yields the foUowing reeursive formula for the Piek):
p(Y(k + 1)IBj ,1k,u(k») pj(k)
pj(k + 1) = N (2.49)
LP(Y(k + 1)1Bj ,1k,u(k»)pj(k)
j=l
Thus, the quantities that must be produced at each time are the conditional probability
densities p(Y(k + 1)IHj ,lk,u(k»); i = I, ... , N. However, conditioned on Bi> this prob-
ability density is preeisely the one step prediction density produced by a Kaiman filter
based on the ith model.
That is, let xj(k + llk) be the one-step predicted estimate of xi(k+l) based on I k and
u(k), assuming that Bi is true. Also let xj(k + llk + 1) denote the filtered estimate of
x,{k+l) based on Ik+1={/k> u(k), y(k+l)} and the ith model. Then these quantities are
computed sequentially from the following equations:
xj(k + llk) = Aj(k)xj(klk) + Bj(k)u(k) + gj(k) (2.50)

xj(k + 1)lk + 1) = xj(k + llk) + Kj(k + I)Tj(k + 1) (2.51)

where T,{k+l) is the usual innovations process,


Tj(k + 1) =y(k + 1) - C j (k)xj (k + 1)lk)
and K,{k+ I) is calculated off-line from the following set of equations:
~(k + Ilk) = Aj(k)~(klk)A; (k) + Qj(k) (2.52)

~(k + I) =Cj(k)~(k + llk)C; (k) + Rj(k) (2.53)

Kj(k + 1) =~(k + llk)C; (k)~-l(k + 1) (2.54)

~(k + Ilk + 1) =~(k + Ilk) - Kj(k + l)Cj(k)~(k + llk) (2.55)

Here Pi(k+llk) denotes the estimation error covariance in the estimate xj(k + llk)
(assuming Bi to be true), and Pi(k+llk+J) is the covariance of the error
x j (k+I)-xj(k+llk +l), again based on Bi. Also under hypothesis Bi> Ti(k+l) is
zero mean with covariance V;(k+I), and it is normally distributed (since we have as-
sumed that annoises are Gaussian). Furthermore conditioned on Bi' Ik> and u(k), y(k+ 1)
Analytical redundancy methods 115

is Gaussian, has mean Cj(k),ij(k + 1)lk) and covariance V;(k+l). Thus, it is deduced
that,

P(y(k + l~Hj,Ik,u(k») = 1 1/2


(21t)m/2[detVj(k + 1)]
exp{-.!rJ (k + l)Vj-l(k + l)rj(k + I)}
2

(2.56)
(recaU m is the dimension ofy).
Equations (2.49)-(2.51) and (2.56) define the MM algorithm. The input to the procedure
are the y(k) and u(k), and the outputs are the p;(k). The implementation of the algorithm
can be viewed as consisting of a bank of N KaIman filters, one based on each of the N
possible models. The outputs of these Kalman filters are the innovations sequences
1;(k+l), wbich effectively measure how weIl each ofthe filters can track and predict the
behavior of the observed data. Specifically, if the ith model is correct, then the one-step
prediction error 1 ;(k) should be a wbite sequence, resulting only from the intrinsic uncer-
tainty in the ith model. However if the ith model is not correct, then 1;(k) will not be
wbite and will include errors due to the fact that the prediction is based on an erroneous
model. Thus the probability calculations (2.49), (2.56) basically provide a quantitative
way in wbich to assess wbich model is most likely to be correct by comparing the per-
formance of predictors based on these models.
Several questions arise in understanding how the MM algorithm should be used. Clearly
a very important question concems the use of MM in problems in wbich the real system
is nonlinear and/or the noises are non-Gaussian. The answer to tbis problem is applica-
tion-dependent. The Gaussian assumption is basically used in one place, i.e. in the
evaluation of p(Y(k + 1)IHj,Ik,u(k») in (2.56). It may turn out that using tbis formula,
even when r;(k+l) is non-Gaussian, causes essentially no performance degradation.
As far as the nonlinearity of the real system is concemced, an obvious approach is to lin-
earize the system about a number of operating points for each possible model and use
these linearized models to design extended KaIman filters wbich would be used in place
of Kalman filters in the MM algorithm. Again the utility of tbis approach depends very
much on the particular application. Essentially the issue is whether the tracking error
from the extended Kalman filter corresponding to the linearized model closest to the
true, nonlinear system, is markedly smaller than the errors from filters based on less cor-
reet models. Tbis is basically a signal-to-noise ratio problem, similar to that seen in the
idealized MM algorithm in wbich everytbing is linear. In that case the noise is measured
by the V,{k+ 1). The larger these are, the harder it will be to distinguish the models. In the
nonlinear case, the inaccuracies of the extended Kalman filters effectively increase the
V;(k+ 1) thus reducing their tracking capabilities and making it more difficu1t to distin-
guish among them. Therefore, the performance ofMM in tbis case will depend upon how
far apart the different models are, as compared to how weIl each of the trackers tracks.
116 Real time fault monitoring of industrial processes

The further apart the models are, the more signal we have; the poorer the tracking per-
formance is, the more difficult it is to distinguish among the hypotheses.
Even if the true system is linear, there is clearly the question of the utility of MM given
the inevitability of discrepancies between the actual system and any of the N hypothe-
sized models. Again tbis is a question of signal-to-noise ratio, but in the linear case a
number of results and approaches have been developed for dealing with tbis problem.
For example, Baram (1976), has developed a precise mathematical procedure for calcu-
lating the distance between different linear modes, and he has shown that the MM proce-
dure will converge to the model closest to the real model (i.e. p;(k)~ 1 for the model
nearest the true system). Tbis can be viewed as a technique for testing the robustness of
MM or as a tool that enables us to decide what models to choose. That is, if the real
system is in some set of models that may be infinite or may in fact represent a continuum
of models (corresponding to the precise values of certain parameters), then Baram's re-
sults can be used to decide upon a finite set of these models that span the original set and
that are far enough apart so that MM can distinguish among them. Willsky (1986) fur-
ther elaborates on implementation issues of the MM algorithm.

2.5.3 The Generalized Likelihood Ratio (GLR) method

The starting point for the GLR method is a model describing normal operation of the ob-
served signals or of the system wbich generated them. Abrupt changes are then modeled
as additive or multiplicative disturbances to tbis model that begin at unknown times.
Additive disturbances typify biases, wbile multiplicative disturbances model parameter
changes. As just discussed for MM, the case of a single such change will be considered,
the assumption being that abrupt changes are sufficiently separated to allow for individ-
ual detection and compensation. The solution to the problem just described and applica-
tions ofthe method can be found amongst others in Willsky and Jones (1976), Pouliezos
and Stavrakakis (1987), Tanaka and Müller (1990).

2.5.3.1 Additive changes

The system under consideration is modeled as,


x(k + 1) =A(k)x(k) + B(k)u(k) + w(k) + fj(k,O)v (2. 57a)
y(k) =C(k)x(k) +v(k) + gj(k,O)v (2.57b)
where the normal model consists of these equations without the /; and gj terms. These
terms represent the presence ofthe ith type of abrupt change, i = 1, ... , N Here 0 is the
unknown time at wbich the failure occurs (so/;(k,8) =g.{k,8) = 0, for k<8), and/; and gj
are the specified dynamic profiles ofthe ith change type. For example, if/; = 0 and gj = a
Analytical redundancy methods 117

= a vector whose components are all zero except for the jth one which equals I for k~B,
then this corresponds to the onset ofa bias in thejth component ofy. Finally, the scalar v
denotes the magnitude of the failure (e.g. the size of a bias) which we can model as
known (as in MM and as in what is called simplijied GLR (SGLR» or unknown.
Assurne that a Kaiman filter based on normal operation is designed, i.e. by neglecting/;
and gj. This filter is given by,
x(k + 11k) = A(k)x(klk) + B(k)u(k) (2.58)

x(k + 11k + 1) = x(k + 11k) + K(k + l)r(k + 1) (2.59)

r(k + 1) = y(k + 1) - C(k)x(k + 11k) (2.60)

where K, P, and V are calculated as in (2.52)-(2.55). Suppose now that a type i change
ofsize voccurs at time B. Then, because ofthe linearity of(2.57)-(2.60),
= xN(k) +aj(k,B)v
x(k) (2.61)

x(klk) = xN(klk) + Pj(k,B)v (2.62)

x(k + 11k) = xN(k + 11k) +pj(k + 1,B)v (2.63)

r(k) = rN(k) + pj(k,B)v (2.64)


where x N' XN and rN are the responses if no abrupt change occurs, and the other
terms are the response due solely to the abrupt change. Straightforward calculations
yield recursive equations for these quantities:
aj(k + 1,B) = A(k)aj(k,B) + fj(k,B)
(2.65)
aj(B,B) =0
pj(k + 1,0) = [1 - K(k + I)C(k + 1)]p, (k + 1,0) + K(k + 1)[C(k + l)aj (k + 1,0) + gj(k + 1,0)]
Pj(B -1,B) = 0 (2.66)
appropriate to determine its identity i and estimate its time of occurrence Band size v, if
the latter is modeled as being unknown. The solution to this problem involves matched
filtering operations. First, define the precomputable quantities,
k
a(k,B,i) = L pJ U,B)V- U)PjU,B)
1 (2.67)
j=8

This has the interpretation as the amount of information present in y( B), ... , y(k) about a
type i change occurring at time B.
The on-line GLR calculations consist ofthe calculation of,
118 Real time fault monitoring of industrial processes

k
d(k,e,i) = LPJ (J,e)V-1(J)y(J) (2.68)
j=()

which are essentially correlations ofthe observed residuals with the abrupt change signa-
tures p/j,e) for different hypothesized types i, and times e. If v is known (the SGLR
case), then the likelihood of a type i change having occurred at time e given data y( 1), ... ,
y(k) is,

f s(k,e,i) = 2vd(k,e,i) - v 2a(k,e,i) (2.69)


If v is unknown, the GLR for this change is,

f(k,e,i) = d 2 (k,e,i) (2.70)


a(k,e,i)
and the MLE of vassuming a change of type i at time eis,

v(k,e,i) = d(k,e,i) (2.71)


a(k,e,i)
Thus the GLR algorithm consists of the single KaIman filter (2.58)-(2.60), the matched
filter operations of (2.68), and the likelihood calculation of (2.69). The outputs of the
method are these likelihoods and the estimates of eq. (2.71) if v is modeled as unknown.
The basic idea behind GLR is that different types of abrupt changes produce different
kinds of effects on the filter innovations - i.e. different signatures - and GLR calculates
the likelihood of each possible event by correlating the innovations with the correspond-
ing signature.
As with the MM method, a number of issues can be raised ab out GLR. Some of these,
such as the effect of nonlinearities and robustness to model errors, are very similar to the
MM case. Essentially it still can be viewed as a signal-to-noise ratio problem: in the non-
linear case the additive decomposition of (2.64) is not precisely valid, but it may be ap-
proximately correct. Also, different failure modes can be distinguished even in the pres-
ence of modeling errors if their signatures are different enough. Again these issues de-
pend very much on the particular application. The reader is referred to Basseville and
Benveniste (1986), for discussions of several applications of GLR to which these issues
had to be addressed.
In the case of the GLR algorithm it is simpler to discuss performance measures, as one
can use standard detection-theoretic ideas. Specifically, a direct measure of the de-
tectability of a particular type of change is the information a(k,e,i) defined in (2.67). This
quantity can be viewed as the correlation of Pi 0, ()) with itself at zero lag. Similarly, the
relative distinguishability of a type i change at two times e l and ~ can be determined as
the correlation of the corresponding signatures,
Analytical redundancy methods 119

k
L pi T (JA)V- 1(J)Pi(J,e2 ) (2.72)
j=max(O,,02)

and the relative distinguishability of type i and m changes at times e1 and ~ similarly:
k
L pi T (J,e1)V-1(J)p m(J, e2 )
These quantities provide extremely useful information. For example, in some appliea-
e
tions, the estimation of the time at which the change occurs is critical, and the above
equations provide information about how weil one can resolve the onset time. In failure
detection applications these quantities direct1y provide information about how system
redundancy is used to detect and distinguish failures and can be used in deciding whether
additional redundancy (e.g. more sensors) are needed. Also, they direct1y give the statis-
ties of the likelihood measure (2.69). For the SGLR case of (2.69), f s is Gaussian, and
its mean under no failure is -v 2a(k, e,
i), while if a type m failure occurs at time rp, its
mean is,

E{f 2(k,e,i)l(m,rp)} = v 2[2a(k,e,rp,i,m) - a(k,e,i)]

In the case offull GLR, under no failure f(k,e,i) is a X 2 random variable with 1 degree
of freedom, while if a failure (m,rp) of size v occurs, f(k,e, i) is a non-central X 2 with
mean,

E{f(k e i)l(m )} = 1+ v 2a2(k,e,rp,i,m)


",rp a(k,e,i)

These quantities can be very useful in evaluating the performance of GLR detection al-
gorithms and for determining decision rules based on the GLR outputs. If one were to
follow the precise GLR philosophy (V an Trees, 1968), the decision rule one would use is
to choose at each time k the largest ofthe f(k,e,i) over all possible change types i and
e.
onset times This largest value would then be compared to a threshold for change de-
tection, and if the threshold is exceeded the corresponding maximizing values of and i e
are taken as the estimates of change type and time. For greater confidence, persistance
tests (Le. f must exceed the threshold over some time period) are often used to cut down
on false alarms due to spurious and unmodeled events. Basseville (1981), further dis-
cusses the threshold selection procedure.
A final issue to be mentioned is the pruning of the tree of possibilities. As in the MM
case, in principle a growing number of calculations have to be performed, as d(k, e,i)
must be calculated for every possible fault case and all possible change times up to the
e
present, i.e. = 1, ... , k. As discussed previously, an appropriate procedure is look only
over a sliding window of possible times:
k-Ml~e~k-M2
120 Real time fault monitoring of industrial processes

whereMl andM2 are chosen based on the a's (eq. 2.69), i.e. on detectability and distin-
guishability considerations. Basically after M2 times steps from the onset of change,
enough information is collected, so that a detection may be made with a reasonable
amount ofaccuracy. Further, after MI time steps, a sufficient amount ofinformation will
be collected so that detection performance is as good as it can be (i.e. there is no point in
waiting any longer). Clearly, MI, M2 must be large to allow for maximum information
collection, but also small enoough for fast response and for computational simplicity.
Tbis is a typical tradeoff that arises in all change detection problems.

2.5.3.2 Non-additive changes

GLR has been successfully applied to a wide variety of applications, such as geophysical
signal analysis (Basseville and Benveniste, 1983), detecting arrhythmias in electrocardio-
grams (Gustaffson et al., 1978a and 1978b), freeway incident detection (Willsky et al.,
1980), and manoeuver detection (Dowdle et al., 1983). Recall that the model used in
(2.57) for such changes is an additive model. Thus it appears on the surface that the
types of abrupt changes that can be detected by GLR are a special sub set of those that
can be detected by MM, since in tbis method parametric changes are allowed (in A, B, C,
Q, R) as weil as additive ones. However, a GLR system based on the detection of addi-
tive effects can often also detect parameter failures. For example, a gain change in a sen-
sor does look like a sensor bias, albeit one that is modulated by the value of the variable
being sensed. That is, any detectable change will exhibit a systematic deviation between
what is observed and what is predicted to be observed. Obviously, the ability of GLR to
detect a parametric change when it is looking for additive ones is again a question of ro-
bustness. Ifthe effect ofthe parametric change is "elose enough" to that ofthe additive
one, the system will work. For example, Deckert et al. (1977), describe an additive-fail-
ure-based design that does extremely weil in detecting gain changes in sensors. Note of
course that in tbis mode GLR, is essentially only indicating an alarm, but in detection
problems where the primary interest is in simply identifYing wbich of several types of
changes has occurred, tbis is acceptable.
Direct application of the GLR methodology to parametric changes has also been re-
ported by some researchers. In tbis context, the model is described in any of the follow-
ing modes:
X(k + 1) =[A(k) + AAuk+10]x(k) + B(k)u(k) + w(k)
(a) { ,
y(k) =C(k)x(k) + v(k)

X(k+ 1) = A(k)x(k) + B(k)u(k) + w(k)


(b) {
y(k) =[C(k) + ACcrk+l,o]x(k) + v(k)
Analytical redundancy methods 121

(c) {
X(k + I) = A(k)x(k) + B(k)u(k) + w(k) + 'x (k)l1 k +19
,
y(k) = C(k)x(k) + v(k)
X(k + I) = A(k)x(k) + B(k)u(k) + w(k)
(d) {
=C(k)x(k) + v(k) +,y (k)l1 k+1,9
y(k)

where 'ik), 'y(k) are additional noises of covariances Sx, 5.;, independent of the plant
and sensor noise sequences.
Model (a) is used in cases where a transition matrix change is to be detected, model (b)
is used when a measurement matrix change is to be detected, while models (c) and (d)
are used whenever additional noise in the state or measurements must be monitored. For
these models, appropriate equations for GLR-based fault detection are (pouliezos and
Stavrakakis, 1987):
For model (a):
k
y(k)= YN(k) + LGa (k,i,AA)AAxN (i -I)
i=9

where the matrices Ga( ) are calculated from,

G.(k, 9,AA) =C(k{ '(!9) C;<P(9+ q,9+ q)(AA)q - A(k -l)F,(k -1,9,AA) J
Fa(k,0, AA) = K(k)Gik,O,AA) + A(k -1)Fa(k -I,O,AA); k ~ 0
Gik,0, AA) = Fa(k,O,AA) = 0; k<O;

C; is the binomial coeeficient,4i(i,j) is the state transition matrix and the subscript N
denotes normal operation quantities defined as in (2.61)-(2.64). As seen these equa-
tions have a similar form to Eqs. (2.64), (2.65)-(2.68) but the nice linear structure is lost,
since there is modulation of the residuals by the unknown AA.
Similarly for model (b):
k
y(k)=YN(k)+ LGb(k,i)AHxN(i)
i=9

where the matrices Gb( ) are calculated from,


Gb(k,O) = -C(k)A(k-I)Fb(k-I,O); k> 0
Fb(k,O) = K(k)Gb(k,O)+A(k-l)Fb(k-I,O); k ~ 0
Gb(O,O) =I;
k<O
l22 Real time fault monitoring of industrial processes

wbile for model (c),


k
r(k)= rN(k) + LGe(k,iKx(i)
i=8

where,
Gc(k,O) = -C(k)[<P(k,O) - A(k -l)Fe(k -1,0)]
Fe (k,O) =K(k)G e(k,O) + A(k -l)Fe(k -1,0); k ~ 0
Ge(k,O) = Fe (k,O) =0; k<O
Finally for model (d),
k
r(k) = rN (k) + L Gd (k,iK y (i)
i=8

where,
Gd(k,O) = -C(k)A(k -l)Fd(k -1,0); k >0
Fd(k,O)=K(k)Gd(k,O)+ A(k -l)Fd(k -1,0); k ~ 0
Gd(k,O) = I;
Gd(k,O) = Fd(k,O) = 0; k<O
These equations can be used to derive GLR-based detection a1gorithms. However, their
implementation is not at a11 straightforward, because of their complexity. Liu (1977),
considered the case of additional plant noise and proposed some approximate solutions
to tbis problem. Pouliezos et al. (1993), considered the problem of additional sensor
noise and solved it for the time-invariant case. Their approach utilised the steady-state
effect additional sensor noise has on the filter innovations. Tanaka and Müller (1990),
have used a pattern recognition approach to detect parametric changes in the state and
measurements matrices, using a GLR statistic. Their method is robust and relies on the
recognition of the pattern of the curve of the maximum GLR calculated by the conven-
tional step-hypothesised GLR.

2.6 Deterministic methods

2.6.1 Observer-based approaches

The basic idea in the ob server based approach is to reconstruct the states (and the out-
puts) of the system with the aid of observers. Observers are dynamic systems that are
aimed at reconstructing the state x of a state-space model on the basis of the measured
Analytical redundancy methods 123

inputs u and outputs y. The state estimation error is then used as the residual for the
deteetion of the faults.
The particular type of ob server used depends on the particular applieation's needs, i.e.
what kind of failures need to be detected (component, sensor, actuators), what criteria
must be set (robustness, isolability ete.) and on the system strueture (observability, eon-
trollability). To introduee the eoneept, eonsider the linear, time-invariant, eontrollable
and observable system deseribed by equations (2.1), (2.2) in its eontinuous time version:
i(t) = Ax(t) + Bu(t)
(2.73)
y(t) = Cx(t)
It ean easily be seen that tbis mathematieal model with the assumptions made, eontains
all the existing relations among the measured variables Yj (i.e., the "redundaney rela-
tions") in the form of differential equations. To reeonstruet the states or measured vari-
ables from other measured variables and inputs one ean use the linear jull-order estima-
tor,
i(t) = AX(t) + Bu(t) + L(y(t) - Cx(t» (2. 74a)
y(t) = Cx(t) (2. 74b)
where x(t) and y(t) denote the estimated state and output veetor and L the gain ma-
trix, by the choice ofwhich desired dynamies ofthe estimator ean be aebieved.
Let's eonsider a simple, eontinuous-time version of equations (2.11):
i(t) = Ax(t) + Bu(t) + f(t) + w(t) (2.75)
y(t) = Cx(t) + AC(t) + v(t) (2.76)
where fit) denotes faults in the proeess, AC(t) sensor faults, and w(t), v(t) proeess and
sensor noise respeetively.
Substracting (2.74a) from (2.75) and defining the state estimation error,
s(t) = x(t) - x(t)
the equations for the ouput estimation error,
c(t) = y(t) - y(t)
beeome,
i(t) =(A - LC)s(t) + w(t) + f(t) (2.77)
c(t) = Cs(t) + AC + v(t) (2.78)
It ean be seen that if all the disturbanees fit), AC, w(t), v(t) are zero and the matrix A-
LC has leftside eigenvalues, the estimation error e(t) goes to zero after any initial eondi-
tion traeking error has died out. If, however, at least one of these disturbanees oeeurs,
124 Real time fauIt monitoring of industrial processes

e(t) is driven by it. Thus its effect on e(t) can be evaluated to discover the disturbance.
Therefore e(t) can be used as a residual for FDI.
Since specific designs depend on design requirements let us now consider three dinstinct
cases:
• instrument fault detection (IFD)
• component fault detection (CFD)
• actuator fault detection (AFD)

A. Instrumentfault detection
The goal of IFD is to detect and locate faults of the sensors of the process with sufficient
robustness to system parameter changes and noise. There are two extremes of applica-
tion, depending on the purpose of failure detection and the type of failures. One extreme
is the detecting of hard-jailures as, for example, in safety relevant systems. Typically,
the failures are then large and the admissible detection time is small (e.g. seconds or
fractions of seconds). The other extreme is the detecting of soft jai/ures as in the case of
signal validation, such as repeated (periodic) checking of instrumentation in a nuclear
reactor. In this case the failures to be discovered are typically small (a few percent) but a
large detection time is available (days or weeks). Hence, in the first case, one can get
along with the deterministic approach using ob servers since noise (and parameter uncer-
tainties) do not playa large role. In the second case noise cannot be neglected and hence
Kaiman filters and stochastic decision procedures including correlation techniques are
indicated. In a concrete situation one has to decide for either (or, in poorly conditioned
situations, for non) ofthe two approaches.
Over the years a number of different estimator schemes for deterministic IFD have been
proposed in the literature. They differ in the number of estimators and the number of
measured variables.
In the Dedicated Observer Scheme (DOS), (Frank, 1987), it is assumed that the full
state or a sub set of it can be observed from each measured variable of the process (fig.
2.4). Each sensor (instrument) output, Yi, is used to drive a dedicated ob server of full or
reduced order to observe as many of the remaining measured variables as possible. If the
process is completely observable, an m-fold redundancy of the measurement vectors is
achieved. In the non-fault case,
Yik = Yij = Yi; i = 1,2, ... ,n; k,j = 1,2, ... ,m (2.79)

If a fault occurs in the q-th instrument then,

Yiq "* Yik = Yi (2.80)

holds for i=l, 2, ... , n and k=1, 2, ... , m, and kt:.q, i.e., the ob server that is driven by the
faulty instrument provides a wrong estimate of all measured variables whereas the esti-
Analytical redundancy methods 125

mates of all the other ob servers match the corresponding Yi except that of the erroneous
instrument. The output estimation error,
(2.81)
represents now the residual vector which, in principle, allows a unique FDI even in the
case of several faulty instruments at the same time.

~(t) 1 SET OF ,"--..:.Y-,,-l_ _~


~ PROCESS ~ INSTRU- ~Y2 r--- LOGIC 1 ~
.....M_EN_T_S---,H+~Yq-' ~t----t

OBSERVER
1 •

Yq
\
OBSERVER ~ Yq
q ~ LOGICq ~
: J2q
Yqq

Figure 2.4 Dedicated observer scheme (after Frank, 1987)


The simplest version of an ob server scheme for IFD is the so-caIled Simplijied Observer
Scheme (SOS) which includes only a single (full or reduced order) observer driven by
only one ofthe measured variables, see fig. 2.5 (Clark, 1978). As the driving variable one
selects the most appropriate sensor output that provides full state (and output) estima-
tion. Though only single redundancy is used this scheme allows a unique isolation of a
single faulty sensor. If one of the sensors, q, that does not drive the ob server fails, the
reconstructed output vector is correct and Yj - Yj = 0 holds for all i except i=q.

If the sensor s that drives the observer fails, all estimates are erroneous so that
Yj - Yj * 0 for all i except i=s. An advantage ofthis scheme is that it applies to proc-
esses with very limited observability and that it is easy to implement. However, the price
to pay is reduced redundancy and a loss of detection reliability.
At the other end, if DOS is generalized in that each observer is driven with more than
one measured variable, say with overlapping subsets of y, the Generalized Observer
126 Real time fault monitoring of industrial processes

Scheme (GaS) is arrived at, see fig. 2.6 (Frank, 1987). It is evident that the decision
logic is ofthe same type as in the case ofDOS or SOS. However, this scheme provides
more degrees of freedom for the ob server design and can be used for the increase of ro-
bustness to parameter variations.

YI
1 SET OF
~
- r ' PROCESS f=:;> lNSTRU- ! Y7
!y,
MENTS Yq

I vI
LOGIC
OBSERVER OR
KALMAN FILTER :-(2
.

Figure 2.5 Simplified observer scheme (after Frank, 1987)

\!
PROCESS
1 SET OF
INSTRU-
MENTS
SUBSET Y..I
Y..

.. SUBSE
-
•=
OBSERVER I
Y..q
im
LOGIC
ALARM
1
.
of ···
OBSERVER
q i(q)

Figure 2. 6 Generalized observer scheme (after Frank, 1987)

B. Component fault detection (CFD)


The goal of CFD is to detect and loeate faults of eomponents in dynamie processes with
suffieient robustness to sensor faults, system parameter ehanges and noise. While the IFD
sehemes deseribed in the previous section are also suitable for the alarm task of CFD
they are generally not able to solve the problem of loeation of faulty eomponents. In this
Seetion, ob server sehemes that allow to deteet and loeate eomponent faults are dis-
eussed.
The most natural approach to the eomponent fault detection and isolation problem is to
decompose the system and apply a hierarchical seheme of local observers. One of the
advantages of a loeal observer seheme is that even if the order of the overall system is
Analytical redundancy methods 127

rather high, the order of each subsystem and consequently the order of the corresponding
ob servers may be rather low. Furthermore, only local observability is required. The
problem with this approach lies, however, in the interaction between the subsystems. If
theyare low, a malfunction in any ofthe components affects only the estimate ofthe cor-
responding local observer. It is thus possible to identify the faulty component uniquely.
However, if the interactions are large and not measurable, a fault in one component
propagates to observers of other components. The observer scheme would then fail to
identify the faulty component. What it could achieve is to identify a fault in the comp/ex
0/ components that are largely interconnected. This is the basis for the Hierarchica/
Observer Scheme (HOS) introduced by Janssen and Frank (1984).
In the HOS, shown in fig. 2.7, the overall system is divided into two levels of compo-
nents: an upper level including all components with measurable couplings and a lower
level of components with unknown couplings. For each of the resulting configurations in
both levels a· scheme of local observers is designed, which includes the reconstruction of
the measured outputs of the components. The ob server scheme used for the upper level
is called ASCOS (Available-State Coup/ed Observer Scheme) and the one for the lower
level ESCOS (Estimated-8tate Coupled Observer Scheme). The difference between
ASCOS and ESCOS is the way of performing the couplings in the observer part of fig.
2.7.
PROCESS

i=l.
!!J.
I ~l
aMlNENT 1
!! crNONOO. •
I ~

==- COUPlING
==--
!!p '=f CO/'IPOHEHT p
I
I
!p

r
11
OBSERVER 1 ~
~ ..
FAULT
OBSERVER
. DETECTION
P
pI
COUPLING y AHD
OBSERVER p
h: ISOLATION

Figure 2.7 Local observer scherne ofan HOS (after Frank, 1987)
For abrief development of the main ideas of ASCOS and ESCOS, consider a system
represented by equations (2.1), (2.2) in component form:
N
iAt) =Ajjxj(t) + LAijxj + Bjuj (2.82)
j=l
i~j
128 Real time fault monitoring ofindustrial processes

where xi> Yi' "i are the states, outputs and inputs, respectively, of the ith component and
A V' Bi' Ci are matrices of appropriate dimensions. Note that Aif characterizes the cou-
plmg gains with other components.
Assuming that the coupling terms L Aijxj in (2.82) are measurable, the state equations
ofthe corresponding ith ob server are given by,
N
i(t) =(A jj - LjCj)xj(t) + L(Aijxj(t)) + Bjuj(t) + LjYj(t) (2.84)
j=l
iM
Yj(t) = CjXj(t) (2.85)
where xi and "i are measurable and N is the total of components of the upper level. In
the nominal case the state estimation error &j(t) =xj(t) - xj(t) obeys,
(2.86)
where AI; and vi denote parameter variations and component faults of the ith compo-
nent. The corresponding output estimation error Cj (t) =Y j (t) - Yj (t) of the ith com-
ponent then becomes,
(2.87)
where Bj and Jlj denote sensor faults and measurement noise, respectively. In the ideal
case, all coupling terms are measurable, all state and output estimation errors are per-
fectly decoupled from each other, and AIj, Vj' Bj , Jlj only affect the error equations of
the ith component. This implies that a system fault in the kth component only affects eJ!.t)
leaving the remaining errors e,{t); #k, unchanged. This allows a unique failure isolation.
To derive ESCOS, observe that in the second level the coupling terms LAijxj(t) are
not measurable. The idea is to replace them in the observe scheme by their estimates and
estimation errors which both are available from other ob servers. Then the state equation
of the corresponding observer of the ith component becomes,
N
ij(t) = (Ajj - LjCJxj(t) + L[Aijx/t) + LjCj(t)] + Bjuj(t) + LjYj(t)
j=l
j~j

This leads to the following equation for the state estimation error:
N
Sj(t) =(A1j -LjCJ&j(t) + L(Aij -LjCJ&j(t) (2.88)
j=l
j~j
Analytical redundancy methods 129

From (2.88), the output estimation error e;(t) can again be deterrnined in the same way
as above.

2.6.2 Parity space approach

The parity space approach originated by Deckert et al. (1977), yields a systematic ex-
ploitation of the analytical redundancy provided by the mathematical model of the sys-
tem.
The basic idea used in this work is to identify the analytical redundancy relations of the
system that were known weil and those that contained substantial uncertainties. An FDI
system (i.e., its residual generation process) is then designed based primarilyon the well-
known relationships (and only secondarily on the less well-known relations) of the sys-
tem behavior. Chow, (1980), and Chow and Willsky, (1984), extracted and extended the
practical idea underlying this application and developed a general approach to the design
ofFDI algoritthms. Lou et al. (1986), and Massoumnia et al. (1988), further developed
these ideas.
In addition to its use in specifying residual generation procedure, this approach is also
useful as it can provide a quantitative measure of the attainable level of robustness in the
early states of a design. This allows the designer to assess overall performance.
The basis for residual generation in analytical redundancy essentially takes two forms:
1. Direct redundancy: the relationship amongst instantaneous outputs of sensors; and,
2. Temporal redundancy: the relationship amongst the histories of sensor outputs and
actuator inputs. Based on these relationspjps, outputs of (dissimilar) sensors (at dif-
ferent times) can be compared. The residuals resulting from these comparisons are
then measures of the discrepancy between the behaviour of observed sensor outputs
and the behaviour that should result under normal conditions.
In order to develop a clear picture of redundancy, consider the system described by
equations (2.1), (2.2) in the following form:
r
x(k + 1) = Ax(k) + "Lbju /k) (2.89)
j=l

y/k) = cjx(k); j=l, ... , m (2.90)

where Cj is an n-row vector. Direct redundancy exists among sensors whose outputs are
algebraically related, i.e., the sensor outputs are related in such a way that the variable
one sensor measures can be determined by the instantaneous outputs of the other sen-
sors. For the system (2.90), this corresponds to the situation where a number ofthe cj's
are linearly dependent. In this case, the value of one of the observation scan be written as
a linear combination ofthe other outputs. For example, one rnight have,
130 Real time fault monitoring of industrial processes

m
Yt(k) =L 8 iYi(K) (2.91)
i=2

where the a;'s are constants. This indicates that under nonnal conditions the ideal output
of sensor 1 can be calculated from those of the remaining sensors. In the absence of a
failure in the sensors, the residual YI (k) - L:28iYi(k) should be zero. A deviation from
this behavior provides the indication that one of the sensors has failed. Define,
cj
cjA

;k =0, I, ... , ; j =I, ... , m

c-A k
}

The weU-known Cayley-Hamilton theorem implies that there is an nJ; 191j 91, such that,
k+1 k<n}.
rankC·(k)= { (2.92)
J nj k~nj

The null space of the matrix S{nr I) is known as the unobservable subspace of the jth
sensor. The rows of S{nr I) span a subspace of Rn that is the orthogonal complement of
the unobservable subspace. Such a subspace will be referred to as the observable sub-
space of the jth sensor, and it has dimension nj"

Let m be a row vector of dimension N = L:2 (ni +1) such that m = [mI ... cd"], where
mj;}=I, ... , m is a (nJ+I)-dimensional row vector. Consider a nonzero m satisfying,
Ct(nt )

x(k)=O (2.93)

Cm(nm )

Assuming that the system (2.89)-(2.90) is observable, there are only N-n linearly inde-
pendent m's satisfying (2.93). Let Q be an (N-n)xn matrix with a set of such independ-
ent m's as its rows. (Q is not unique). Assuming all the inputs are zero for the moment,
yields,
Analytical redundancy methods 131

p(k) = {J (2.94)

where,

; j=l, ... , m

The (N-n)-vector p(k) is called the parity vector. In the absence of noise and failures,
p(k)=O. In the noisy, no-fail case, p(k) is a zero-mean random vector. Under noise and
failures, p(k) will become biased. Moreover, different failures will produce different
(biases in the) p(k)'s. Thus, the partity vector may be used as the signature-carrying re-
sidual for FDI.
The matrix (Jmay be generated by making direct use of(2.93). Let,
C1(n1)

T=

Cm(n m )

From (2.93), it is seen that the rows of {J span the orthogonal complement ofthe range
space of T. This suggests that {J can be generated by subtracting the orthogonal projec-
tion onto T from the identity operator. That is, (J can be chosen to consist of the (N-n)
independent rows of 1-T(TfTr 1Tf.
When the actuator inputs are not zero, (2.94) must be modified to take into account this
effect. In this case,
132 Real time fauIt monitoring of industrial processes

p(k) ={J (2.95)

where,
o o o
o

o
B=[b} ... br l
u(k) = [u}(k) ... uAk)f

[T
U(k,no) = u (k) T
... u (k + no) ]T
and Binj) is an (nJ+l)xno" matrix.
The quantity p(k) is the generalized parity vector, which is nonzero (or of nonzero mean
if noise is present) only if a failure is present. The (N-n) dimensional space of all such
vectors is called the generalized parity space. Under the no-fail situation (P(k)=O),
(2.95) characterizes all the analytical redundancies for the system (2.89)-(2.90) because
it specifies all the possible relationships amongst the actuator inputs and sensor outputs.
Any linear combination ofthe rows of(2.95),
m
LCiJi[Y/k,n) - B/n)U(k,no)] = 0 (2.96)
i=1

is called a parity equation or a parity relation; any linear combination of the RHS of
(2.95) is called a parity junction.
To provide insight into the nature ofparity relations consider two simple examples.
Example 2.1: A Single sensor. It is always possible to find a nonzero CiJj such that
Analytical redundancy methods 133

(2.97)

or,

Y /k) =- (m~ ) -I[nJ


Lm~ _tYj(k -
nj ]
t) - La~ ._tu(k - t) (2.98)
J t=1 J t=1 J

where,

[at ... a~J 0 ... o]=mjBj(n)

a(; t =0, ... , nj -I is ar-dimensional row vector and m(; t = 0, ... , nj -I is the
(/+ 1)st component of m j. Equation (2.98) represents a reduced-order ARMA model for
the jth sensor alone. That is to say, the output of sensor j can be predicted from its past
outputs and past actuator inputs according to (2.98). Based on the ARMA model, sev-
eral methods of residual generation are possible.
Example 2.2: Temporal redundancy between two sensors. A temporal redundancy
exists between sensor i and sensor j if there are,

m i =[moi
i
mnj-I 0]
mj - [m 0j
j
mnJ -I 0]
satisfYing the redundancy relation,

[mi m j ]{[ Yi(k1nl )] _ [B j (k,nj)]U(k, n o)} =0 (2.99)


y;<k,n}) Bj(k,nj)

Clearly, (2.99) holds if and only if,

[m~ ... m~I-I]Cl(nl-I)=[mt ... m~J_l]Cj(nj-l)


and, this implies that a redundancy relation exists between two sensors if their observable
subspaces overlap. Furthermore, when the overlap subspace is of dimension ii, there are
ii linearly independent vectors ofthe form [mi m}] that will satisfy (2.99).

Assuming m~ # 0, (2.99) can be rewritten in an ARMA representation for sensorjas in


(2.98),

Y/k)= -(..:r[t,"~,Y/k-t)+ ~ID:.-,Y;(k-t)- ~("~' +"~)U(k-t)]


134 Real time fault monitoring of industrial processes

Example 2.3. To illustrate the residual generation procedure consider a simple second-
order system with the following parameters:

A=[a~l ::]. b=[~]


Cl =[1 0]
(2.100)
C2 =[0 I]
In this case nl=2, n2=1, and N-n=3. Therefore, there are only three linearly independent
parity equations which may be written as,
YI(k) -(all +a22 )YI(k -I) +alla 22Yl(k - 2) -a12u(k - 2) =0
YI(k) -allYI(k -I) -a12Y2(k -I) =0
Y2(k)-a22Y2(k-I)-u(k-l)=O (2.101)
Note that these represent temporal redundancies. Just as with direct redundancy rela-
tions, the parity function itself can be taken as a residual. For our specific example, this
would be,
(2.102)
Such a residual is a moving average process, i.e., it is a function of a sliding window of
the most recent sensor output and (possibly) actuator input values. It is useful to note the
effect of noise and failures on the residual. Specifically, if the sensor outputs are cor-
rupted by white noise, the parity function values will be correlated over the length of the
window. In tbis example, TI(k) is correlated with TI(k-l) and Tl(k+l) but not with any of
its values removed by more than one time step.
The effect of a failure on a parity function depends, of course, on the nature of the fail-
ure. To illustrate what typically occurs, consider the case in which one sensor develops a
bias. Since the parity function is a moving average process it also develops a bias, taking
at most p steps to reach the steady-state value. In this example, ifY2(k) develops a bias
of size ßat time 8, Tl (k) will have a bias of size - al2P from time 9t1 onwards.
How can the residuals generated using parity functions can be used for failure detection?
Firstly, sensor FDI using direct redundancy will be examined.
Consider a set ofm sensors with output vector y(k)=[yl(k) ... Ym(k)]T and a parity vec-
tor,
P(k)=Dy(k) (2.103)
where Q is a matrix with m columns and a number of rows (the specification of which
will be discussed later). Q is not unique, and for any choice of D such that (2.103) is a
parity vector, p(k) will be zero in the absence of a failure (and no noise). However, the
Analytical redundancy methods 135

nature of failure signatures contained in the parity vector depends heavily on the choice
of D. Clearly D should be chosen so that failure signatures are easily recognizable.
A possible approach wbich uses information about how failures affect the residuals has
been examined by Potter and Suman (1977), and Daley et al. (1979). Tbis method ex-
ploits the following phenomenon. A faulty sensor, say the jth one, contains an error sig-
nall-(k) in its output
Y j(k) = C jx(k) + v(k) (2.104)
The effect oftbis failure on the parity vector defined by (2.102) is,
p(k) :: D j l-(k)

where D j is the jth column of D. That, is, no matter what l-(k) is, the effect of a sensor
j failure on the residual always lies in the direction D j' Thus, a sensor j failure can be
identified by recognizing a residual bias in the D j direction. D j is referred to as the
lai/ure direction in parity space (FDPS) corresponding to sensor j.
It is now clear that D should be chosen to have distinct columns, so that a sensor failure
can be inferred from the presence of a residual bias in its corresponding FDPS. In prin-
ciple, an D with as few as two rows but m distinct columns is sufficient for detecting and
identifying a failure among the m sensors. In practice, however, increasing the row di-
mension of D can help to separate the various FDPS's and increase the distinguishability
of the different failures under noise conditions.
Now, consider the extention of tbis detection method to temporal redundancy relations.
In tbis case, it is generally not possible to find an D to confine the effect of each compo-
nent failure to a fixed direction in parity space. To see tbis, consider the parity relations
(2.101). The parity vector can be written as,

+[: -:"Ju(k
Yl(k)

~k)=[i -:"]
-(au +a22 ) 0 Yl(k -1)
1
a 11a 22

-al 2 0 0 YI(k-2) -1)


u(k-2)
0 0 1 -a22 Y2(k) 1 0
Y2(k - 2)
When sensor 2 fails [with output model (2.104)], the residual vector develops a bias of
theform,

(2.105)
136 Real time fault monitoring of industrial processes

Unless \(k) is a constant, the effect (signature) of a sensor 2 failure is only confined to a
tivo-dimensional subspace of the parity space. In fact, generally when temporal redun-
dancy is used in the parity function method for residual generation, failure signatures are
generally constrained to multidimensional subspace in the parity space. These subspaces
may in general overlap with one another, or some may be contained in others. If no such
subspace is contained in another, identification of the failure is still possible by determin-
ing which subspace the residual bias lies in.

2.7 Robust detection methods

The theory developed in the preceding sections works weIl as long as the adopted system
models represent adequately the monitored physical system and no noise or unexpected
disturbances are present. These requirements are rather stringent however, and are met
in few real systems. As a consequence, following initial efforts in devising PDI algo-
rithms for idealised situations, current research focuses on robust algorithms. There are
a number of different, but somewhat overlapping approaches to this problem, and in the
following sections they will be briefly described.

2.7.1 Robust observer-based methods

Eigenstructure assignment methods.


Eigenstructure assignment aims at decoupling the observer estimation from the struc-
tured type of uncertainty in the following way: assume that alt uncertainties of a system
can be summarised as unknown inputs (disturbances) with known distribution matrix
acting on the system model on which the observer is designed. Patton and Willcox
(1986) and Patton et al. (1987) first demonstrated the eigenstructure assignment ap-
proach to robust observer-based failure detection. They have shown in continuing work
an approach to solving this problem using the assignment of suitable eigenvectors and
eigenvalues (eigenstructure assignment) as a way of providing robustness through dis-
turbance decoupling. Furthermore, Patton et al. (1992) have studied ways by which the
disturbance distribution matrix is obtained in an optimal way. The main results of their
work folIows.
To approach the problem from the most general and practical point ofview, consider the
discrete-time representation ofEquation (2.11) with structured uncertainty only:
x(k + 1) =Ax(k) + Bu(k) + D1d1(k)
y(k) =Cx(k) + Du(k) + D 2d 2 (k)
Now if the state structured uncertainty is decomposed into the unknown input distur-
bance and state faults, the system becomes,
Analytical redundancy methods 137

x(k + 1) =Ax(k) + Bu(k) + Ed(k) + QEa(k) (2.106a)


y(k) = Cx(k) + Du(k) +fs(k) (2.106b)
Recall that x(k) is the nx 1 state vector, A the open-Ioop system dynamics matrix, u(k)
the rx 1 known input vector with the corresponding input distribution matrix B. The term
Ed(k) characterizes a qx1 unknown input (disturbance) vector d(k) with known distribu-
tion matrix E acting directly onto the system dynamics, used to represent the structured
uncertainties, i.e. although the values of the uncertainty are unknown, its distribution
matrix (structure) is known. Cis the measurement matrix of the system and y(k) the
mx 1 output vector that is assumed to be available for further treatment. The sensor faults
that corrupt the measurements, are described by the vector fs(k). The term Q/aCk) repre-
sents the faults acting on the system dynamics, such as actuator or component faults.
The purpose of the robust residual generator is to use an ob server which generates esti-
mates of the system states and measurements, and provides residual signals which are
independent ofuncertainties (fig. 2.8).

Faults Disturbances Faults

1\
Y-
Estimated
Measurements

Figure 2.8 General structure ofan observer-based residual generation approach

The ob server dynamies are described by the following:


x(k + 1) =(A - LC)x(k) + (B - LD)u(k) + Ly(k) (2.107)
y(k) =CX(k) + Du(k) (2.108)
(compare with (2.74». The state estimation error dynamics are then given by:
eCk + 1) = Ace(k) + Ed(k) + QEa(k) - L~(k) (2.109)
where Ac=A-LC. A p-dimensional residual vector is generated from the difference be-
tween the actual and estimated measurements having the form:
E(k) =W(y(k) - j(k») =Wey(k) (2.110)
138 Real time fauIt monitoring of industrial processes

where W is a pxm weighting matrix. Substituting (2.107), (2.1 08) and (2.109) into
(2.11 0) yields,
eCk) =WCe(k) + W~(k) =He(k) + W~(k) (2.111)
where, H=WC (2.112)
From equations (2.109) and (2.111), the complete response of the residual vector in shift
operator form is,
E(Z) = [w - WC(zI - Ac)-l L ]fs(k) + WC(zI - Ac)-IQ~(z) + WC(zI - Ac)-l Ed(z)
(2.113)
One can see that the residual is not zero, even if no faults occur in the system. Indeed, it
can be difficult to distinguish the effects of faults from the effects of disturbances acting
on the system. The effects of disturbances obscure the performance of fault detection and
act as a source of false alarms. Therefore, in order to minimize the false alarm rate, one
should design the residual generator such that the residual itself becomes decoupled with
respect to disturbances. This is essentially the principle of a robust residual generator.
In order that the residual e(k) be independent of uncertainties, it is necessary to null the
entries in the transfer function matrix between residuals and disturbances, Le.

Grd(z) = WC[zI - (A -LC)tE = 0 (2.114)


This is a special case of the output-zeroing problem which is weil known in multivariable
control theory (Karkanias and Kouvaritakis (1979». Once E is known, the remaining
problem is to choose the matrices L and W to satisfy this equation. This can be achieved
with the aid ofthe invariant subspace theory (Antsaklis, 1980). Expanding the inverted
matrix one obtains:

H[zI -~tE = H{al(z)In +a2(z)~+ ... +an(z)~-I}E (2.115)

H
H~
= [al (z)Ip a2(z)Ip ... an(z)Ip ] E

H~-I
Analytical redundancy methods 139

(2.116)

Hence, (2.115) can be satisfied in the foUowing two cases:


1. Ifthe (H, FC>-invariant subspace (observable subspace) lies in the left zero space of
E,or,
2. Ifthe (Fe, E)-invariant subspace (controllable subspace) lies in the right zero space
ofH.
These two goals can be achieved by the assignment of either teft or right eigenvectors of
the observer as proposed by Patton and Chen (1991). To briefly expose these ideas, ex-
press GrrJ...z) in dyadic form as:

Grd(z)=~+ ... +~ (2.117)


z-P1 z- Pn
where R j = Hvjl; E and vi and I; are, respectively, the right and left eignevectors as-
sociated with an eigenvalue Pi of Ac. It is weil known that, a given left eigenvector I;
(corresponding to eigenvalue Pi) of Ac is a1ways orthogonal to the right eigenvectors vi
corresponding to the remaining (n-l) eigenvalues f3; of Ac where /3,*f3;. Now, for
(2.117) to be satisfied, a11 eigenvectors must be appropriately scaled so that VLT =LTV =
In> where:
V=[VI v2 vn ]
L =[lI 12 In]
Thus, disturbance decoupling is possible if and only if R, = HvJ; E = 0 for a11 ;=1 to n.
This implies that,
R I + ... +Rn=Hvll!E+ ... +Hvjl;E+
= HVLTE =HE= WCE= 0 (2.118)
Hence, WCE=HE=O is a necessary condition for achieving disturbance decoupling de-
sign. Furthermore, it may be proved (patton and Chen, 1991) that,
Theorem 2.1 If WCE=O, and alt rows of H=WCE are left eigenvectors of Ac corre-
sponding to any eigenvaIues, equation (2.118) is satisfied.
The a1gorithm then becomes:
140 Real time fault monitoring of industrial processes

1. Compute the weighting matrix W to satisfy equation (2.118): The necessary and suf-
ficient condition for this is rank(CE)<m.
2. Assign the left eigenvectors ofthe ob server as the rows of H (corresponding to suit-
able eigenvalues).
Step (2) of the algorithm can be done by a transformation of the dual control problem.
Obtaining the right eigenvectors of the dual control problem is equivalent to computing
the left eigenvectors of the ob server. The assignment of right eigenvectors for a control-
ler is weil developed (Moore, 1976). The assignability condition is that for each Pi' the
corresponding left eigenvector 1;
of Ac must belong to the row subspace spanned by the
rows ofC(ß;I-A)-I.
If the left eigenvector assignability condition is not satisfied, a similar approach can be
followed, that is to assign the right eigenvectors of the ob server as columns of matrix E.
In this case the corresponding conditions are:
Theorem 2.2 If WCE=O and all columns of E are right eigenvectors of Ac correspond-
ing to anyeigenvalues, equation (2.118) is satisfied.
The assignment of right eigenvectors of the ob server (Jeft eigenvector of dual controller)
is a relatively new problem. Patton and Chen (1991) derived the following conditions:
Theorem 2.3 For a vector ri to ba a right eigenvector of A-KC corresponding to the
eigenvalue Pi either:
• ri is a right eigenvector of A corresponding to Pi and Crj=O or
• ri is not a right eigenvector of A corresponding to Pi and Crj ;f. 0.

If a number of right eigenvectors must be assigned, the gain matrix L must satisfy a set
of equations like,
L C ri = (A - P;I) r i
If all columns ei of E must be assigned as the right eigenvectors of Ac=A-LC corre-
sponding to eigenvalues Pi the following equation must be satisfied:
LCCj = (A - PJ)cj ; i = 1, ... , q (2.119)
l.e. LCE=Ap (2.120)
where,

Therefore the right eigenvector assignment problem is to solve (2.120) and at the same
time ensure that the observer is stable.
Theorem 2.4 The necessary and sufficient condition for solution of equation (2.120) to
exist is:
Analytical redundancy methods 141

rank (Aß) =rank (CE)


Subject to tbis conditon, the general form ofthe solution to (2.120) is:
L = Aß (CE) * + KI[Im-CE(CE)*] (2.121)
where K 1 is a nxm arbitrary design matrix and (CE)* is the pseudo-inverse of CE. When
rank(CE)=q, (CE)* is given by:
(CE)* = [(CE)TCE]-I(CE)T
The dynamic matrix ofthe observer is thus:
A-KC = A-AICE)*C-KI[Im-CE(CE)*]C = A I-KI Cl
where Al = A - Ap(CE)*C; Cl = [Im - CE(CE)*]C
The necessary and sufficient condition for the ob server dynamic matrix A-LC to be sta-
ble is that {Cl' Al} is the dual 0/ a stabilizable pair. When tbis condition is satisfied,
the assignment problem of the right eigenvectors is to choose the matrix K I such that the
observer is stable. Tbis problem can be handled by using the traditional pole assignment
methods. As ßl> ... , ßq have been assigned as the eigenvalues of A-LC =AI-KICl> only
the maximum (n-q) eigenvalues of AI-LCI can be moved by changing the design matrix
KI ·

Unknown Input Observer (UIO) method


An alternative procedure for desigining robust fault detection procedures is the Unknown
Input Observer (UIO). In the sequel, following Wünnenberg and Frank (1987), abrief
description ofhow to obtain a robust observer is given.
Assurne a simple, discrete-time, system model,
x(k + 1) =Aox(k) + Bou(k) + Ed(k) (2.122)
y(k) =Cx(k) (2.123)
where the unknown input distribution matrix E is of the form
E=[AA i AB i G] (2.124)
and, as previously, M represents the difference between the nominal system matrix Ao
and the actual system matrix A and similarly for AB; G represents a matrix that distrib-
utes external disturbances.
The problem is how to design an observer such that the estimation error is decoupled
from any unknown input signal d(k). Starting from the system description (2.122) and
(2.123), a regular transformation ofthe state vector is performed:
142 Real time fauIt monitoring of industrial processes

(2.125)

where,
(2.126)

This means that the state vector x(k) is separated into the measurable part y(k) and the
unmeasurable x· (k) which has to be estimated by the ob server. From Eqs. (2.122),
(2.123) and (2.125) it is obtained,
Mx· (k + 1) - AoMx· (k) = Bou(k) + Ed(k) - C Ry(k + 1) + AoC Ry(k) (2.127)
Multiplying (2.127) from the left with the regular matrix,

[;]
where,

[;:}= [tJ (2.128)

yields,
NMx· (k + 1) - NAoMx· (k) =NBou(k) - NCRy(k + 1) + NAoCRy(k) (2.129)

On the left hand side of(2.129) there is an expression with the unknown x·(k). All ele-
ments on the right hand side are known or measurable. By substituting,

u· (k) =NBou(k) - NCRy(k + 1) + NAoC Ry(k) (2.130)


(2.129) reduces to,

(2.131)
which is a system of difference equations that has to be solved. Using the shift operator
z, (2.131) can be rewritten as,
(zNM - NAoM)x· (k) =u* (z) (2.132)

H(z)x*(k) =u·(z) (2.133)


Now by proper choice of N and M, the so caIIed matrix pencil H(z) can be transformed
into the block diagonal Kronecker canonical form (Gantmacher, 1974):

H(z) = diag{0Jlo,Eo;L&, (z), ... ,L&s (z);zIp1 - J:;zIps - 1";;zJo - I,;L!, (z), ... ,L!p (z)}
(2.134)
Analytical redundancy methods 143

The Ei; ;=0, ... , S, are the column indices and the P.i; ;=0, ... , p, are the row indices. The
expression O'L. s corresponds to zero rows or columns. The matrix L s . is of dimension
ru' 0 I

Eix(E,+I) and has the form,


z -1 0 0
o z -1 0
(2.135)
L 6i =

o 0 z -I
and the corresponding matrix for the row indices is,
zOO
-1 Z o
L TPi = 0 o (2.136)

o 0 -1

The matrix J: is a ßrdimensional matrix that has only unstable eigenvalues. J; repre-
sents a ps-dimensional Jordan matrix with stable eigenvalues only. J o is a Jordan matrix
with all eigenvalues identical to zero. Consider now that part x~(k) ofthe state vector
that corresponds to the J o block; this is determined by the difference equations.
-x~2(k + 1) + x~l(k) =u~l(k)
-x~3(k + 1) + x~2(k) =u~2(k)
(2.137)
-x~,(k + 1) + x~'_l(k) = u~_l(k)
x~(k) = u~,(k)
which are directly derived with the aid of Eqs. (2.133) and (2.134). It is easily seen, that
all components X~i are completely determined by the known signals U~j; X~j is then cal-
culated with a maximum delay of ,time shifts where 'is the dimension of the Jo-matrix.
Next, define,
s
E=Eo+LEj+S (2.138)
i=l

(2.139)

and partition the appropriately chosen matrix M from Eq. (2.125) into the matrices
144 Real time fault monitoring of industrial processes

(2.140)

where Ms containts the first E columns, M: the following ßJ. columns, M; the next Ps,
M oo the next , Md M p the last )J columns of M.
Therefore, a linear combination of the state variables,
z(k) =Tx(k) (2.141)
can be reconstructed,
i) Without delay but with free choice in the eigenvalues of the estimations error dy-
namic matrix if,
(2.142)

ii) Without delay and without free choice of all eigenvalues of the estimation error dy-
namic matrix if,
(2.143)

iü) With delay of a finite number of sampies and with free choise of the dynamics of the
estimation error dynamics matrix if,
(2.144)

The above constitute conditions for the existence, structure and eigenvalues of the re-
sulting ob servers as weil as the basis for all possible matrices R. Numerically stable al-
gorithms for the computation of an upper triangular form that contains all information of
the Kronecker canonical form are available in Konik and EngelI, (1986).
Next, assume that the ob server used is expresse4 by,
z(k + 1) = Rz(k) + Sy(k) + Ju(k)
with the residual,
r(k) = ~z(k) + ~y(k)
This observer must fulfill the following robustness requirements:
i) lim r
k--+oo
=0 for all u and d and for all initial conditions Xo and Zo.
ii) A matrix T must exist, such that Txo=Zo implies TXk=Zk> for all k.
These conditions lead to the well-known observer equations:
Analytical redundancy methods 145

TAo-RT=SC
TE=O
J=TB (2.145)

[1,. L,m=O
Now, if (2.122), (2.123) are enriched to include failure modes, the system is given by,
x(k + 1) = Aox(k) + Bou(k) + Ed(k) + KE(k)
(2.146)
y(k) = Cx(k) + Fd(k) + GE(k)
For these model, the set of equations that must be fulfilled in order to acbieve distur-
bance and fauIt decoupling are Eqs. (2.145) and,
SF=O
SG:tO
TK:tO (2.147)
~F=O

~G=O

These equations can be solved with the Kronecker canonical form, outlined previously.
Conditions for existence of solutions can be found in Patton et al. (1 989b).

Modeling 0/ uncertainty and approximate decoupling.


Aprerequisite for solving Eqs. (2.114), (2.145) and (2.147) and is the a-priori knowl-
edge of the distribution matrix E. Furthermore, certain rank conditions, for example the
condition rank(E)<n, must hold for exact decoupling. In the sequel, a procedure for
modeling tbis distribution matrix will be presented, wbich is due to Patton et a/. (1992).
Approximate decoupling methods, in the cases where the physical conditions for the ex-
istence of a robust ob server are not met, will be also discussed.
The disturbance term Ed(k) in Eq. (2.106a) or (2.146) can be used to describe: the inter-
connection term in large scale systems, linearized errors, extra non-linear or time-vary-
ing terms and model reduction errors. For example, consider the (continuous time) dy-
namic system:
x(t) = (A + AA)x(t)+(B + AB)u(t) + Acxh(t) + Gp(t) + QE(x,u,t) (2.148)
where pekl is a noise or external disturbance vector, AA, AB represent the parameter
variations and plant uncertainty, wbilst Ac, xh represent the dynamic errors when using a
reduced order to approximate full-order systems or represent the interaction term in a
146 Real time fault monitoring of industrial processes

large scale system. In this case, the disturbance distribution matrix E can be directly
computed as:

E = AA[
.
~
.
AB ~ A co ~ Go ~ Q
. ' ] -1l x q
t:: (2.149)

Now, consider the situation where the system matrices are functions of a parameter
vector aER&:
x(t) = A(a)x(t) + B(a)u(t) (2.150)
If the parameter can be perturbed around a nominal condition Il=tzo, (2.150) can be ex-
panded as:

x(t) = A(ao)x(t) + B(ao)u(t) + ±{dA 8a j x+ dB 8a j u} (2.151)


j=1 da j da j
In this case, the distribution matrix and unknown input vector is:

dA ~ dB ~ : dA : dB]
E =[ dal : da l : •.. : da g : da g
(2.152)

(2.153)

Now, if rank(E)=n, (2.118) has no solutions and exact decoupling is impossible.


Therefore, some kind of approximate decoupling must be applied. The procedure will be
to compute a matrix E* that is as close as possible to E but has rank(E*)91-1, i.e. to
find the solution to the following optimization problem:

minliE - E* 11: subject to rank(E*)91-1 (2.154)

Here 11 . II~ denotes the Frobenius norm, defined as the root of the sum of squares of
the entries of the associated matrix. This optimization problem can be solved by the
Singular Value Decomposition (SVD) of E:

(2.155)

where S and T are othogonal matrices, 0i $;02$; '" $;on are the singular values of E. As
shown in Lou et al. (1986), the matrix E* that minimizes (2.154) is given by:

E* =S[diag{O, ... ,O,l114 , ... ,l1n }, O]T (2.156)

where nl is the rank of E* which is less than n.


The operating point of the system varies according to different plant conditons, and dif-
ferent operating points correspond to different unknown input matrices, Ei (i=l, 2, ... ,
M). It is attractive to be able to design a single FDI scheme for a whole range (or a set)
Analytical redundancy methods 147

of operating points. The success of the single FDI design depends on its robustness
properties. In order to rnake the disturbance decoupling hold for all operating points, one
mustmake:
HE; =0, for i = 1,2, ... ,M (2.157)
or,
(2.158)

Ifrank(p)9l-1, Eq. (2.158) has solutions and exact decoupling at alt operating points is
acbievable. Otherwise, approximate decoupling must be used. Tbis is equivalent to the
solution ofEq. (2.144) and can be solved by defining an optimization problem:

minllP - P *II~ subject to: rank(P*) ~ n -1 (2.159)

Now, consider the case when the full-order system model is not available. A possible
approach would be to obtain the nominal model {A o, B o, Co, D o} via identification, with
the estimation error {AA, AB, AC, AD}. Normally, AA and AB are unknown but
bounded:
AI~AA~A2 (2.160)

BI ~AB~B2 (2.161)

where A 1> A 2, BI and B 2 are known and AA~2 denotes that each element of AA is not
targer than the corresponding element of A 2 . Tbis typifies an unstructured but bounded
uncertainty. Consider AA and AB in a finite set of possibilities, say {AA;, AB;}; i = 1, 2
, ... , M witbin the interval A I ~A ~ A 2 and BI ~ AB ~ B 2. Tbis might involve choosing
representative points, reflecting desired weighting on the likelihood or importance of
particular sets of parameters. In tbis situation, a set of unknown input distribution matri-
ces is obtained:
(2.162)

In order to make the disturbance decoupling valid for a wide range of model parameter
variations, an optimal matrix E* should be made to be near all Ei; i = 1, 2 , ... , M as
closely as possible. The optimization problem is thus defined as:

min IIE*-[EI ... EM]"~ (2.163)


{s.t. rank( E*)::;n-I}

In most cases, not enough knowledge about the state space model of the system is avail-
able. What is usually at hand, is the linearized low order model matrices (A, B, C, D).
In order to account for modeling errors, the system is assumed to be in the form:
i(t) = Ax(t) + Bu(t) + d l (t) (2.164)

y(t) =Cx(t) + Du(t) (2.165)


148 Real time fault monitoring of industrial processes

where dI(t) represents modeling errors. If dI(t) can be obtained, it may be decomposed
into Ed(t) with E a structured matrix so as to apply the disturbance decoupling concept.
Firstly, assurne that dl (t) is slowly time-varying, so that the system model can be re-
written in augmented form as:

(2.166)

(2.167)

Using the true system input and output data, an ob server based on eqs. (2.l66) and
(2.167) can be used to estimate dI(t). Then it is possible to obtain so me information
ab out the distribution matrix E. Further details of this method can be found in Patton and
ehen (1992). Patton et al. (1992) further explored tbis idea and proposed the deconvo-
lution method to estimate the vector dl (t).
As far as the UIO method is concerned, appropriate optimal approximate solutions have
been proposed by Wünnenberg and Frank (1987). In this approach, the "best" residual is
found by solving the following minimisation problem:

11 w TVOH 2 11
P=min (2.168)
w IIw T VoH 311
where,
F 0 0
CE F 0 0
H2 = CAE CE F 0

CAs-tE CA s- 2 E CA s- 3E F

G 0 0
CK G 0 0
H 3 = CAK CK G 0

CAs-1K CA s- 2 K CA s- 3K G
and Vois used to ensure that,
Analytical redundancy methods 149

C
CA

is satisfied. Differentiating (2.168) leads to the relation,

W
T(VOH 2H 2TVoT- PVOH 3H 3TVoT) =0
This is a generalised eigenvalue-eigenvector problem. The minimal eigenvalue is the
optimal performance index, while the corresponding eigenvector is the optimal residual
generator v. With this the optimal residual sequence is calculated by,

y(k-s)
y(k - s+ 1)
1 u(k- s)
u(k - s+ 1)
r(k)=v T -Hj

y(k) J u(k)
where,
o o o
CB o o o
H1 = CAB CB o o

These conditions have been obtained in the time domain. Ding and Frank (1989), have
produced corresponding results in the frequency domain.

2.7.2 Parity relations for robust residual generation

In this section an approach for obtaining robust parity relations in the face of noise and
parameter uncertainty will be given. The exposition follows Chow and Willsky (1984).
The starting point is a model that has the same form as (2.89) but inc1udes noise distur-
bance and parameter uncertainty:
q
x(k + 1) =A(r)x(k) + Lb/r)Uj(k) + w(k) (2.169)
j=l

(2.170)
150 Real time fault monitoring of industrial processes

r
where is the vector of uncertain parameters taking values in a specified sub set r of
RM. This form allows the modeling of elements in the system matrices as uncertain
quantities that may be functions of a common quantity. The vectors w and v are inde-
pendent, zero-mean, white Gaussian noise vectors with constant covariance matrices Q(
~O) and R(>O), respectively.

A parity function is essentially a weighted combination of a (time) window of sensor


outputs and actuator inputs. The structure of a parity function defines which input and
output elements are included in this window, and the coefficients are the (nonzero)
weights corresponding to these elements. A scalar parity functionp(k) can be written as,
p(k) = aY(k) + ßU(k) (2.171)
where Y(k) and U(k) denote the vectors containing the output and input elements in the
parity function, respectively. Together, Y(k) and U(k) specify the parity structure, and
the row vectors a and p contain the parity coefficients. Consider, for exarnple, the first
parity function of(2.101). Its corresponding Y(k), U(k), a and p are:

Y(k) = [Yl(k - 2), Yl(k -1), Yl(k)]T


U(k) =u(k - 2)
a = [alla22, -(all +a22), 1]
p = -a]2
Under model (2.170), Y(k) has the form,
Y(k) =C(r)x(k - p) + (I)(r)w(k) + B(r)U(k) + v(k) (2.172)
where p is the order of the parity function, and

w(k)=[wT(k-p) ... w T(k-1)r

The component of v(k) and U(k), and the rows of C(r), tP(r), the B are determined
from (2.170) and the structure of Y(k). If, specifically, the ith component of Y(k) is
yik-(J), then the ith component ofv(k) is,
vj(k)=vs(k-(J).

v
The vectors wand are independent zero-mean Gaussian random sequenses with con-
stant covariances Q and R respectively. The matrix Qis block diagonal with Q on the
diagonal; Rj,j = Rs,Al,t, and the ith element of Y(k) is ys(k-(J), while the jth element is
YI.k-r). The ith row ofC(r), i.e., C(;, r) is,
C(i,r) = csAP-u
The ith row, tP(i, r) of tP(r)(which has pN columns) is,
Analytical redundancy methods 151

.....(.1,1) -- [AP-a-I
'!P c. , c. AP-a-2 , ... , c., 0, ... , 0]
Note thatx(k-p) is a random vector that is uncorrelated with wand V, and
E{(x(k - p)} = xo(k - p)
cov{x(k - p)} = E(r)

where E(r) is the (steady-state) covariance ofx(k-p) and it is dependent on r through


A(r) and B..r)·
The matrix B and the vector U(k) are determined as folIows: first, collect into a matrix
B all the rows in ~(~ r) corresponding to C(i, r) (Wald, 1947). Then, collect all the
nonzero columns of B into B and the corresponding components of u in the window
into U(k).
It is clear from tbis exposition that when parameter uncertainties are included, it is not
possible in general to find any parity functions in tbis narrow sense. In particular, with
reference to the function p(k) defined by (2.171) and (2.172) tbis condition would re-
quire that aC(r)=O for all rEr. Consequently, the notion of a useful parity relation
must be modified. Intuitively, any given parity structure will be useful for failure detec-
tion if a set of parity coefficients can be found that will make the resulting function P(k)
in (2.171) close to zero for all values of rEr when no failure has occurred. When con-
sidering the use of such a function for the detection of a particular failure one would also
want to guarantee thatp(k) deviates significantly from zero for all rEr when no failure
has occurred. Such a parity structure-coefficient combination approximates the true
parity function.
The problem then becomes one of finding coefficients a and Pfor the parity function,
p(k) = a[ C(r)x(k - p) + 4J(r)w(k) + B(r)U(k) + v(k)] - PU(k)
Note the dependence ofP(k) on a, fJ, r, x(k-p), and U(k). As P(k) is a randorn variable,
a convenient measure ofthe magnitude (squared) ofp(k) is its variance, E{p2(k)}, where
the expectation is taken with respect to the joint probability density of
x(k - p), w(k) and v(k) with the mean xo(k-p) and the value of U(k) assumed
known. Tbis can be thought of as specifying a particular operating condition for the
system. Note also that the statistics ofx(k-p) depend on r. Define,

e(a,p) =maxE{p2(k)} (2.173)


reI'
The quantity e( a, /J) represents the worst case effect of noise and model uncertainty on
the parity function P(k) and is called the parity error for p(k) with the coefficients a and
p. A concervative choice of the parity coefficients is obtained by solving,
min(a,p)
a,p
152 Real time fault monitoring of industrial processes

Since tbis has a trivial solution (a=0, P=0), tbis optimization problem has to be modified
in' order to give a meaningful solution. Recall that a parity equation primarily relates the
sensor outputs, i.e., a parity equation always includes output terms but not necessarily
input terms. Therefore, a must be nonzero. Without loss of generality, a can be re-
stricted to have unit magnitude. The actuator input terms in a parity relation may be re-
garded as serving to make the parity function zero so that Pis nominally free. In fact, P
has only a single degree offreedom. Any Pcan be written as p = AlfT(k) + ZT, where z is
a (column) vector orthogonal to U(k). The component ZT in Pwill not produce any ef-
fect onp(k). This implies for each U(k), only Pofthe form P=AlfT(k) has to be consid-
ered, leading to the following problem:

min maxE{p2(k)} (2.174)


a,l Tel"
s.t.acl" =1

where,

and S is the symmetric positive definite matrix,

S=[SI1 S12]
S21 S22

SI1 =C(r)[xo(k - p)x~ (k - p) + I'(r)]CT(r)


+ ~(r)~T (r) + R + B(r)U(k)UT(k)BT(r)
+ C(r)xo(k - p)UT(k)B(r) + B(r)U(k)x~ (k - p)CT (r)
= sil =-sinB(r)U(k) + C(r)xo(k -
r
S12 p)]

S22 =[u T(k)U(k)


Let a* and A* denote the values of a and A that solve (2.174), with R*=A*U(k). Then
e* is the parity error corresponding to the parity function p*(k)=a*Y(k)+p*U(k). The
quantity e* measures the usefulness of p*(k) as a parity function around the operating
point specified by xo(k-p) and U(k).
Although the objective function of (2.174) is quadratic in a and A, (2.174) is generally
very difficult to solve, because S may depend on r arbitrarily.
With the coefficients and the associated parity errors deterrnined for the candidate parity
structures, the parity functions for residual generation using the parity function method
can be chosen. As the squared magnitude of the coefficients [a, PJ scales the parity er-
ror, the parity errors of different parity functions can be compared if they are normalized.
Analytical redundancy methods 153

The normalized parity e"or e *, the normalized parity eoefficients, and the normalized
parity junetion p *(k) are defined as folIows:
c* =c* /(J
ii* =a *(J
P* =P*(J
p* (k) =ii*Y(k)- P *U(k)
where,
(J2 =[a*, p*][a*, p*]T =1+ P * P *T
The parity functions with the smallest normalized parity errors are preferred as they are
closer to being true parity functions under noise and model uncertainty, i.e., they are
least sensitive to these adverse effects.
An additive consideration required for choosing parity functions for residual generation
is that the chosen parity functions should provide the largest failure signatures in the re-
siduals relative to the inherent parity errors resulting from noise and parameter uncer-
tainty. A useful index for comparing parity functions for this purpose is the signature-
to-parity e"or ratio 7r, which is the ratio between the magnitudes of the failure signature
and the parity error. Using g to denote the effect of a failure on the parity function, 7r
can be defined as,

For the detection and identification of a particular failure, the parity function that pro-
duces the largest 7r should be used for residual generation.

2.8 Applications

2.8.1 Fault detection in ajet engine system

The correct operation of a gas turbine is very critical for an aircraft and, if faults occur
the consequences can be extremely serious. There is therefore a great need for simple
and yet highly reliable methods for detecting and isolating faults in the jet engine. Patton
et al. (1992), presented an example for the detection ofjet engine sensor faults using the
procedure described in Section 2.7.1. The jet engine model used, illustrated in fig. 2.9,
has the measurement variables: Nb NH, T7, P6, T29 (N denotes compressor shaft speeds,
the P variables denote pressures, whilst T represents temperature). The control inputs
are: the main engine fuel flow rate and the exhaust nozzle area.
154 Real time fault monitoring of industrial processes

Figure 2.9 Jet engine

A thermodynamic simulation model of a jet engine is utilised as a test rig to assess the
robustness ofthe FDI scheme. This model has 17 state variables; these include pressures,
air and gas mass flow rates, shaft speeds, absolute temperatures and static pressure. The
linearized 17th order model is used here to simulate the jet engine system. The nominal
operating point is set at 70% ofthe demanded high spool speed (NH ). For practical rea-
sons and convenience of design, a 5th order model is used to approximate the 17th order
model. The model reduction and other errors are represented by the disturbance term
Ed(t) ofEq. (2.106a). The 5th order model matrices are:
-78 294 -22 21 -29
7 -28 2 -2 3
A= -1325 5326 -526 221 -477
1081 -4445 377 -463 403
2152 -8639 781 -575 782
-0.0072 0.0030
0.0035 0.0003
B= l.2185 -0.0329 , C = I sxs , D =0SX2
1.3225 0.0201
-0.0823 0.0244
As shown in Section 2.7.1, a necessary step for the robust residual generation design
procedure is to find a matrix H to satisfy Eq. (2.118) (i.e. HE=O). The matrix E models
structured uncertainty arising from the application of the 5th order ob server to the 17th
order plant, and is given by:
E=[E1 :E2 :E3 :E4 ] (xl0 3 )
where numerical values for the E;'s are defined in Patton et al. (1992). From these val-
ues, rank: (E)=5=n, and hence Eq. (2.118) has no solution. The singular values of E are
Analytical redundancy methods 155

{1.5,60, 198, 11268}, and the matrices Sand T are omitted for brevity. The optimallow
rank: approximation of the distribution matrix Eis,
E* = S [ diag ( 90, 5, 60, 198,11268) 0Sx14 ]T.
Based on tbis matrix, an observer-based robust residual generator can be designed. The
observer design is simplified by choosing all eigenvalues at -100. In this case, the gain
matrix K=-(lOOISxs+A) as Cis an identity matrix.
In fig. 2.10, the output estimation error norm is shown. Tbis is very large, and cannot be
used to detect the fault reliably. Tbis represents the non-robust design situation. fig.
2.11 shows the fault-free residual. Compared with the output estimation error, the resid-
ual is very small, i.e., disturbance decoupling is acbieved. Tbis robust design can be used
to detect incipient faults. In order to evaluate the power ofthe robust FOI design, a small
fault is added to the exhaust gas temperature (T7); tbis simulates the effect of an incipi-
ent jault, the effect of wbich is too small to be noticed in the measurements. fig. 2.12
shows the faulty output of the temperature measurement (T7) and the corresponding re-
sidual. The fault is very small compared with the output, and consequently, is not detect-
able in the measurement. It can be seen that the residual has a very significant increase
when a fault has occurred in the system. A threshold can easily be placed on the residual
signal to declare the occurrence of faults. A fault signal is now added to the pressure
measurement signal for P6. The result is shown in fig. 2.13 wbich also demonstrates the
efficiency ofthe robust residual in the role ofrobust FOI.

output estimation error residual

rnJ( 0.8
400 .
0.6
0.4
200
0.2 \
0 0
0 10 0 10 20 30
20 time (s) 30 time (s)

Figure 2.10 Norm ofthe output estimation Figure 2.11 Absolute value ofthe fault-free
error residual
156 Real time fault monitoring of industrial processes

faulty output of sensor 3 faulty pressure measurement


200 80

r \
150 60

100 40

SO 20

0 0
0 10 20 time (s) 30 0 10 20 time (s) 30
residual residual
0.08 0.06

0.06

0.04

0.02

0 0.01
0 10 20 time (5) 30 0 10 20 time (s) 30

Figure 2. J2 Faulty output and rsidual in the Figure 2.13 Faulty output ofthe pressure
case of a fault in T7. measurement P(, and corresponding residual.

2.8.2 Applications in Transportation Engineering

Among other things, the term Jactory oJ the future contains driverless transportation
systems within the framework of computer-aided logisties. The transportation vehicles
are mostly inductively guided along a defined path. The prineiple of electronie traek-
guidanee is applied in airports, container terminals at railway stations or harbours, within
the serviee tunnel transportation system of the euro-tunnel and within modern public
short-haul trafIic systems. An example of this class of track-bounded transportation sys-
tems with automatie track-guidance, is the standard city bus 0-305 ofMercedes-Beoz.
The supervision of measuring instruments in such vehicles is of utmost importance, sinee
they are self-driven. Thus, automatie sensor failure detection techniques have a potential
field ofapplication in this area. In a recent paper, van Schrick (1993) presents such an
example. The main points ofthis work follow.
Analytical redundancy methods 157

The bus follows a nominal track marked by an electro-magnetic field of a cable that is
narrowly running under the road surface. The altemating current flowing through the
cable generates an electromagnetic field, that induces a voltage into the measuring in-
strument located concentrically in front of the bus. The voltage induced is a measure for
the deviation oftrack, d(I), used as the only controller input. The digital controller calcu-
lates a steering signal, that with the aid of an active steering system, directly acts onto the
front wheels to minimize the distance between the bus and the nominal track.
Additionally, a second measuring instrument was introduced to enhance the riding com-
fort of the bus by disturbance rejection control, (van Schrick, 1991). This instrument
gives information on the directly measured steering angle, /J...I), of the front wheels. Both
measuring instruments, the one for the deviation of track and the one for the steering
angle have to be supervised. This is due to the very high requirements of safety on such
transportation systems.
A linearised model of fifth order in sensor coordinates for the lateral motion of the city
bus is given as folIows:
i(t) = A(p)x(t) + bu(t) + E(p)d(t)
y(t) = c T x(t)
where the states Xl (I) to Xs(t) are the displacement d(l) between sensor and nominal
track, its velocity d(t), the yaw angle rate & (I), the side slip angle a (I) and the steering
angle /J...I). The control input U(I) is the steering angle rate ß(t) and the controlled vari-
abley(I)=xl(l) is the measured deviation oftrack d(I). Additionally, the disturbance vec-
tor d(t) consisting ofbending K(I), side wind momentM(I) and side wind force F(t) acts
on the system. For the investigations described in the following, only the bending K is
regarded. If necessary, the effects of the disturbances M(I) and F(I) can be treated in the
same maner. The parameter vector p =[m v] contains the relative mass m correspond-
ing to the ftiction coefficient Jl ranging from 9950 kg to 27000 kg and the velocity v
ranging from 0.6ms- 1 to 14 ms- 1. The input vector b and the output vector cT are
b =es and c T =e7 The system matrix A(p) and the disturbance input vector g K (p)
ofthe disturbance input matrix E(p) are:
0 I 0 0 0 0
0 0 a 23 a 24 a 25 g21
A(p)= 0 0 a 33 a 34 a35 , gK(P) = 0
0 0 a43 a 44 a 45 0
0 0 0 0 0 0
where the elements aij and g21 depend on the parameter vector p (cf. Darenberg (1987),
for their evaluation).
The 5th order dynamic controller used for track control can be written as,
158 Real time fault monitoring of industrial processes

x(t) = ARx(t) + bRy(t)

UR (t) =cix(t) + dRy(t)


where AR' bR, ci and dR consist of constant elements calculated by simulation, optimi-
sation and riding tests along a test-track. This design result guarantees that the steering
angle Iß1::;;45 0 and its derivative IßI::;;23°/s with an accuracy of track of 2cm and a
maximum deviation in curves of 15cm.
In the work of van Schrick (1991), an additional term to the above control law was
added, resulting in,
u(t) =uR(t) + Uv (t) =uR(t) - k! (p)v(t)
where k! (p) is the disturbance rejection gain vector. Investigations have shown that
the gain vector can approximately be chosen k! =constant. The vector v(t) reflects the
reconstructed unknown input K(t) and its derivative K(t) supplied by a robust distur-
bance observer. For disturbance rejection control, the conditions for the control and
compensation are fulfilled, but the condition for reconstructing the unknown input is
fulfilled if a second measurement is introduced. This is the reason for using an additional
sensor for measuring the steering angle ß.
The development of a sensor supervision for the track-guided city bus with disturbance
rejection control led to an extended system structure. fig. 2.14 illustrates the overall
structure comprising the bus, the dynamic controller, the disturbance observer and the
two IFD-parts.
The design task for a sensor supervision of the track-guided city bus is to determine a
scalar decision functionj(t) that is robust against variations ofthe parameter vector p and
the unknown disturbance K(t) but sensitive to instrument faults.
For this task, it is not necessary to seperately investigate the residuals etJ(t) and erlJ) de-
fined as output estimation eITors of the measurements d(t) and /X.t). Therefore, the deci-
sion function,

is the absolute value of a weighted linear combination of the residuals, where,

w T=[Wd wpl
is a weighting vector and,

the residual vector. In spite of parameter variations Ap and the unknown input K(t), the
fact that the decision function should be minimal in the fault-free case and maximal in the
Analytical redundancy methods 159

case of sensor faults simplifies the observer design. It is not required to minimise the
estimation errors but to minimise the decision function. For tbis reason, the optimization
ot the weighting vector wT is included into the design procedure.

f j>
d
-Uv.,... u Control Systems
~.r~ City Bus 0305
uR ~

Track Controller
. . . r.- 5th Order Dynamic ~ Control
Controller

Disturbance Compensator ~ CODlpeDllation


r--< 7th Order Observer

k'" I
v "-
u
Generation
Residual Generator ~
""'- 5th Order Observer

Ir
Residual Evaluator
Threshold Logic Detection

aJ a.l
Figure 2.14 Overall structure ofthe IFO-seheme

The designed estimator is a fifth order observer,


i(t) =Aox(t) + bu(t) + L(y(t) - CX(t»
that is based on the measurements d(t) and /J...t). The matrix Ao denotes that for the ro-
bust observed design a nominal bus model has to be taken into account, i.e. a design op-
erating point Pe has to be chosen. The eonstant observer gain L and weighting wT have
then to be determined for PEP shown in fig.2.15, where Pi deseribe the admissible cor-
ner operating points ofthe track-guided city bus.
Unfortunately, a perfect decoupling ofthe decision function}{t) from the unkown inputs
(parameter variations included) is not possible because some rank conditions are not
fulfilled. Consequently, the proposed design methods for unkonwn input ob servers as
weil as the parity space approach or the eigen-structure approach cannot be applied and
an approximative solution has to be found. The adopted approach uses the multi-objec-
tive parameter optimization porcedure proposed by Kasper et al. (1990).
160 Real time fault monitoring of industrial processes

For a design operating point,


Pe = [13750kg 4.5m/ s]
the design results in adecision function that is robust against variations ofP and the ef-
fect ofthe unknown bending K(t). The optimized values ofwT are Wd = -0.1329 and wp =
0.7661 wbich show that the ~residual is weighted bigher than the d-residual.
For one channel, the design results for the residual generation and the resulting residual
evaluation with a fixed threshold are by means of simulation. The proposed IFD-proce-
dure is operated as a c1osed-loop system that is under the influence of the extended con-
trollaw. In fig.2.16 (top), the behaviour of.f(t) during the fauIt-free case without distur-
bance rejection control for the design operating point Pe as weil as for PI = [9950kg
0.6ms-I ] and P2 = [27000kg 14ms-I ] is shown. The behaviour of.f(t) with disturbance
rejection control is shown in fig. 2.15 (bottom). The operating point P2 results in a deci-
sion function with the largest magnitude and the strongest vibration for a11 operating
points p. Depending on the operating point, .f(t) reflects the dynamic bahaviour of the
controlled bus. Obviously, the distrubance rejection control leads to vibrations with
bigher frequencies in general and bigher magnitudes at the beginning. Moreover, for op-
erating points with bigh mass and velocity far away from the design operating point Pe
the decision function is very sensitive to tbis departure.
To iIIustrate the different reactions of.f(t) on faults in the d-sensor, fig.2.16 shows the
course ofthe decision function without (top) and with (bottom) the influence ofthe dis-
turbance compensation. For the design operating point Pe and at the instant of fault t.r
=5s, a 5% decrease of the d-sensor magnitude appears. fig. 2.16 (top) shows that the
controller nearly conceals the fault (curve labeled withf",R) wbile a controller not influ-
enced, gives an unmistakable course of function (curve labeled withfoR). The bottom
part reflects the same situation but now with disturbance compensation that results in a
bigher frequency of.f(t) and a complete concealing ofthe fault.
For a11 parameter and riding situations as weil us different types offaults, further investi-
gations have shown that tbis porcedure gives a very sensitive decision function to a1low
for the detection ofincipient faults in the sensors ofthe track-guided city bus.
As a final comment, note that a fixed threshold was used. The threshold value depends
on the input signal, operating point, riding situation and effects on unknown inputs as
weil as the quality of sensors. A worst-case estimation of the threshold resulted in
T=5xl0-4.
Ana1ytical redundancy methods 161

1
t
.0001
\A'itllOUI ORt

t
1 .0002
wilhoUI

10•
.00005 .0001

0
5 0
10 5 10

I
t
.006

.004
w;lhDRC
t
1.0002 10•
with DRC
P,

.002 .0001

0 0
5 10 5 10

I[S ) _

Figure 2.15 j(t), no-fault case. Figure 2.16 j(t), 5% d-sensor fault.

2.8.3 Applications in aerospace engineering

Hydromechanical implementations of turbine engine control systems have matured into


bighly reliable units, resulting in increased engine complexity in order to meet ever in-
creasing engine performance requirements. Consequently, the engine control has become
increasingly complex. Because of tbis complexity trend and the revolution in digital elec-
tronics, the control has evolved from a hydromechanical to a full authority digital elec-
tronic control (F ADEC) implementation. These FADEC type controls must demonstrate
the same or improved levels of reliability as their hydromechanical predecessors.
DeLaat and Merrill (1990) describe such an implementation, termed Advanced
Detection, Isolation and Accommodation (ADIA) concept, whose objective is to im-
prove the overall reliability of digital electronic control systems for turbine engines.
The ADIA algorithm detects, isolates, and accommodates sensor failures in an FI00 tur-
bofan engine control system. The algorithm incorporates advanced filtering and detection
logic and is general enough to be applied to different engines or other types of control
systems. The algorithm detects two classes of sensor failures, hard and soft. Hard failures
are defined as out-of-range or large bias errors that occur instantaneously in the sensed
values. Soft failures are defined as small bias errors of drift errors that increase relatively
slowly with time. The ADIA algorithm (fig.2.17) consists offour elements: (1) hard sen-
162 Real time fault monitoring of industrial processes

sor failure detection and isolation logic; (2) soft sensor failure detection and isolation
logic; (3) an accommodation filter; and (4) the interface switch matrix.

---------------------------------j
Fl00 engine system ,

I : AC!uators h--fr -:::::::-::=:::-I----i..(.


Fl00 enginer-c;
I
Sensors
EmI
J.--!;~: ---,
.----..,zm'
Sensors I ,
I
Um'
Sensors I
L ________________________________ l
,
Ir--------l
MVC --------------AÖiÄ-a1~~riÜ,;---"j1
I algorithm
Transilion
control
r -- Soft detectionl
Isolation lagic
I ,
A I +:
Proportional W~~Z~_!-I ___-<IrA;:cco=m:m=oda=tio::n~h_.....-I~I==~--I
control
Interface
"r-----J filter r
,
Integral
H-!-
swit~ .T '
I :,
c (residuals)
control matrix ,

T
L________ ,:
Engine
protection
I
I 1 Hard detectionl
isolation lagic I
I _____ ~ :
I Zm ,
- - - - Signal P'lth I
~ - Reconllguratlon L _______________________ J
Information

Figure 2.17 Tbe ADIA block diagram

In the normal or unfailed mode of operation, the accommodation filter uses the full set of
engine measurements to generate a set ofoptimal estimates, jet), ofthe measurements.
These estimates are used by the controllaw. When a sensor failure occurs, the detection
logic determines that a failure has occurred. The isolation logic then determines which
sensor is faulty. This structural information is passed to the accommodation filter. The
accommodation filter then removes the faulty measurement from further consideration.
The accommodation filter, however, continues to generate the full set of optimal esti-
mates of the control. Thus the control mode does not have to be restructured for any
sensor failure. The ADIA algorithm inputs as shown in fig. 2.17, are the sensed engine
output variables Ym(t), the sensed engine environmental variables em(t), and the sensed
engine input variables um(t). The outputs of the algorithrn, the estimates jet) of the
measured engine outputs Ym(t), are used as input to the proportional part of the control.
During normal mode operation, engine measurements are used in the integral control to
ensure accurate steady-state operation. When a sensor failure is accommodated, the
Analytical redundancy methods 163

measurement in the integral control is replaced with the corresponding accommodation


filter estimate by reconfiguring the interface switch matrix. Estimates are always used in
the proportional control to avoid propagating a transient when switching from measure-
ment to estimate.
The accommodation filter incorporates an engine model along with a Kaiman gain up-
date to generate estimates ofthe engine statesx and the engine outputsy as folIows:

i(t) = A(x(t) - xb) + G(um(t) - ub) + Kr(t)


y(t) = C(x(t) - x b ) + D(um(t) - ub) + Yb
r(t)=Ym(t)- y
where the subscript b represents the base point (steady-state engine operating point) and
xis the 4xl model state vector, Um the 4xl sensed control vector andYm the 5xl sensed
output vector. The matrix K is the Kalman gain matrix and r is the residual vector. The
A, G, C, and D matrices are appropriately dimensioned system matrices.
The basepoints are scheduled as nonlinear functions of the engine operating condition
indicated byem . The individual system matrix elements along with those of K, are cor-
rected by the engine inlet conditions em and scheduled as nonlinear functions of Yb
(Beattie et al., 1983). Reconfiguration of the accommodation filter after the detection
and isolation of a sensor failure is accomplished by forcing the appropriate residual ele-
ment to zero. For example, if a compressor speed sensor failure (N2) has been isolated,
the effect of reconfiguration is to force r2=O. This is equivalent to setting sensed N2
equal to the estimate of N2 generated by the filter. The residuals generated by the ac-
commodation filter are used in the hard failure detection logic.
The hard sensor failure detection and isolation logic is straightforward. To accomplish
hard failure detection and isolation the absolute value of each component of the residual
vector is compared to its own threshold. If the residual absolute value is greater than the
threshold, then a failure is detected and isolated for the sensor corresponding to the re-
sidual element. Threshold sizes are initially determined from the standard deviation of the
noise on the sensors. These standard deviation magnitudes are then increased to account
for modeling errors in the accommodation filter. The hard detection threshold values are
twice the magnitude of these adjusted standard deviations.
The soft failure detection and isolation logic consists of a multiple-hypothesis-based
testing (fig.2.18). Each hypothesis is implemented using a Kalman filter. A total of six
hypotheses filters are used, one for normal mode operation (Ho) and five for the failure
modes (one for each engine output sensor HI to Hs). The structure for each hypothesis
filter is identical to the accommodation filter. However, each hypothesis filter uses a dif-
ferent set of measurements. For example the first hypothesis filter (HI) uses all of the
sensed engine outputs except the first, NI. Thus, each hypothesis filter generates a
unique residual vector rj. From tbis residual each hypothesis filter generates a statistic or
164 Real time fault monitoring of industrial processes

likelihood based upon a weighted sum of squared residuals (WSSR). Assuming Gaussian
sensor noise, each sampie of r i has a certain Iikelihood of probability Li
Lj= pj(rj) =kexp(-WSSR j »
WSSR j = r JIrj
I =diag(u7)
where k is a constant and the 0i are the adjusted standard deviations. These standard de-
viation values scale the residuals to dimensionless quantities that can be summed to form
a WSSR. The WSSR statistic is smoothed to remove gross noise effects by a first order
lag with a time constant of 0.1 s. The log of the ratio of each hypothesis likelihood to the
normal mode likelihood is calculated. If the maximum log-likelihood ratio exceeds the
soft failure detection and isolation threshold, then a failure is detected and isolated and
accommodation occurs. If a sensor failure has occurred in NI for example, all of the hy-
pothesis filters except BI will be corrupted by the fauIty information. Thus each of the
corresponding likelihoods will be small except for LRI. Thus, LRI will be the maximum
and it will be compared to the threshold to detect the failure.

LRI .....----,
}-1'--'I'-1'--'I'-<~

+
}--'1'--'I'~~ Maximum No lallure
isolalOO
LRI

Fallure
1501alOO

Figure 2.18 Soft failure detection and isolation logic

Initially, the soft failure detectionlisolation threshold was determined by standard statisti-
cal analysis of the residuals to set the confidence level of false alarms and missed detec-
tions. The threshold was then modified to account for modeling error. It was soon ap-
parent from initial evaluation studies that transient modeling error was dominant in de-
termining the fixed threshold level. It was also clear that threshold was too large for de-
Analytical redundancy methods 165

sirable steady state operation. Thus, an adaptive threshold was incorporated, to make the
algorithm more robust to transient modeling error while maintaining steady-state per-
formance. The adaptive threshold d j was heuristically determined and consisted of two
parts. One part, diss is the steady-state detectionlisolation threshold which accounts für
steady-state, or low frequency modeling error. The second part, dEXP, accounts for the
transient, or high frequency modeling error. The adaptive threshold is triggered by an
internal control system variable mtran which is indicative oftransient operation:
d j =djss(dEXP +1)
.(dEXP)+dEXP =mtran
The values of d jss, ., and M tran were found by experimentation to minimize false alarms
during transients. When the engine experiences a transient, M tran is set to 4.5; otherwise
is O. The threshold time constant is • =2s. The adaptive threshold expansion logic en-
abled diss to be reduced to 40% of its original value, which results in an 80% reduction in
the detectionlisolation threshold dJ diss . The adaptive threshold logic is iIlustrated in fig.
2.19 for a power lever angle (PLA) pulse transient.

L..-..-'--- Power lever angle (PLA)


83
20
LI~==*=====I::==---l
,l , , ,
4.5

o
0 I
n,,-'---TranSlenl indicalor
(MTRAN)
I ! ,

2
~Threshold
I
I

-1 ~----~ ____ ~ ____ ~ ____ ~ __ ~

o 5 10 15 20 25
TIme. sec

Figure 2.19 Adaptive threshold logic.


166 Real time fault monitoring of industrial processes

For failure accommodation, two separate steps were taken. First, all seven of the filters
(the accommodation filter and the six hypothesis filters) were reconfigured (the appro-
priate residual in each filter is forced to zero) to account for the detected failure mode.
Second, if a soft failure was detected, the states and estimates of all seven filters were
updated to the values of the hypothesis filter which corresponds to the failed sensor.
Stability of each of the filters after reconfiguration was verified during algorithm evalu-
ation.
The algorithm was implemented in real-time to demonstrate its capabilities with a full
scale engine using hardware and software typical of that to be used in next generation
turbofan engine controls. The Multivariable Control (MVC)-ADIA implementation has
several distinct hardware and software features. Three CPUs were used, operating in
parallel. The schedules used to generate the engine model basepoints and table look-up
routines within the ADIA algorithm were coded in assembly language.
The real-time microcomputer implementation of the combined MVC-ADIA algorithm
performed extremely weil. Sensor failure detection and accommodation were demon-
strated at eleven different operating points which included subsonic and supersonic
conditions and medium and high power operation of the engine.

2.8.4 Applications in automotive engineering

Interest in on-line diagnosis of internal combustion engines has recently increased due to
new environmental regulations in the United States and in some European countries (e.g.
the EFTA partners). These regulations will, for example, require the following compo-
nents/faults to be diagnosed:
• individual fuel injectors
• 02 sensor(s)
• mass air flow sensor
• manifold pressure sensor
• throttle position sensor
• NP controller
• inlet air temperature sensor
• misfire detection
• vacuum leak in intake manifold
• loss of power in individual cylinder
In order for a diagnostic strategy to be useful for such applications, it should permit real
time implementation in order to allow for continuous monitoring of system performance
on-board the vehicle. Within this context, many of the currently employed service diag-
nostic strategies, for example those based on expert system methods, are not suitable.
Analytical redundancy methods 167

Rizzoni et a/. (1993) discuss simulation and experimental results of a study aimed at di-
agnosing faults associated with an automotive engine exhaust emissions control system,
using the parity vector approach.
The engine model used was based on the works ofDobner (1982), Coats and Fruechte
(1982), Moskwa and Hedrick (1987), Cho and Hedrick (1989) and Grizzle et a/. (1990).
The structure ofthe model is shown in fig. 2.20. The essential elements ofthe model are:
intake manifold, rotating dynamies, volumetrie efficiency, oxygen (lambda) sensor model,
combustion and exhaust dynamies and fuelling controller and metering system.

Figure 2.20 Engine model.

Details ofthe various subsystems can be found in Rizzoni et a/. (1993). The whole sys-
tem depicted in fig. 2.20, after suitable identification procedure, is put in the form:
x(k + 1) = Ax(k) + Bu(k) + Ed(k)
y(k) =Cx(k) + Du(k) + Ay(k)
where d(k) is a vector of plant or input faults, or unmeasured disturbance inputs, and
Ay(k) is a sensor fauIt vector. In tbis ease, residuals can simply be defined as folIows:

r(k) == y(k) - G(z)u(k) =H(z)d(k) - G(z)Au(k) + Ay(k)


Tbis equation is referred to as the primary parity equation. The residual vector ean then
be designed to have some desired properties by multiplying the primary parity equation
bya suitable polynomial matrix W(z).
168 Real time fault monitoring of industrial processes

The model of fig. 2.20 was used to generate a set of residual generators, shown in fig.
2.21. Apart from the intake manifold model, wbich is modeled by a nonlinear static rela-
tion, each of the residual generators was obtained by employing system identification
methods to identify the relevant dynamics. Thus, the range of operation of the residual
generation strategy for these four subsystems is limited to the neighborhood of an op-
erating point.

intake manifold
nonlinear model

fueling
modell

throttle-to-fuel
dynamics

Figure 2.21 Residual gerieration strategy.

Another limitation of tbis design is that, because of lack of suitable experimental ar-
rangements, the fueling dynamics of the engine have been simulated off-line; the engine
simulator reflects each of the subsections except for the dependence of the fueling time,
"fi on the estimated air mass. Thus, the fueling model may be thought of as a perturba-
tion model.
Note that perturbations in the load torque, ATL , appear as an input (or output) in two of
the residual generators. Tbis quantity is shown in parentheses to indicate that tbis distur-
bance input is not measured; part ofthe residual generation strategy consists of designing
a decoupling matrix W(z) that can remove the effect of the load torque from the residual
vector r(k). Details oftbis strategy can be found in Gertler (1991).
Analytical redundancy methods 169

The residual generation strategy was tested using a mix of experiment and simulation. As
an example one experiment is discussed, which included a throttle transient correspond-
ing to a change in trottle position of7.5° causing a change in engine speed from 3,000 to
2,700 rev/min, as shown in fig. 2.22. The sensor outputs were sampled in the crank angle
domain, at a rate corresponding to one sampIe every 1800 of crankshaft rotation. Various
faults were injected, in correspondence of the 1800th sampIe. The faults consisted of
10% calibration errors in each of the sensors (throttle position, speed, manifold pres-
sure), and in one ofthe fuel injectors. Figs 2.23 and 2.24 depict the residual vector cre-
ated by the residual generators of fig. 2.22 for the no-fault case, and for a typical fault
condition of 10% change in throttle calibration. As it was onserved, the transient caused
some difficuIties in discriminating among faults. However, once the effects of the tran-
sient had subsided, the residuaIs behaved according to the structure iIIustrated in Table
2.1.

.(J03

1000 2000 3000 4000


500 1000 1500 2000 2500 3000 3500

enmksbaft8DjJIe

Figure 2.22 Experimental conditions for Figure 2.23 No fault residuals; throttle
residual generation validation. (top).
Figure 2.24 10% throttle sensor fault
(bottom).
170 Real time fault monitoring of industrial processes

Table 2.1 Residual structure


no fault throttle sensor speed sensor pressure sensor fuel injec-
tor
0 1 1 I 0
0 0 I 1 1
0 0 1 0 1
0 0 0 1 I
0 1 1 1 0

2.8.5 AppUcations in robotics

In flexible automation, unpredictable behavior can result from abnormal deviations.


Therefore, a high demand in supervision, fault detection, fault isolation and fault recov-
ery exists. The approach of installing as much sensors as possible to monitor automation
and access all possible failure modes has the drawback of increasing costs and making
automation even more complex. For this reason, software redundancy methods are ap-
plicable in these situations. Such a method is proposed by Schneider and Frank (1992)
who implemented an observer-based method for supervision and fault detection of ro-
botized productioin units, where no additional sensors are required.. Only internal robot
states (motor eurrent, joint displacement and joint velocity) are used which are already
available for controlling the robot. Faults are considered to be external torques. The
method was implemented for a MANUTEC r3 industrial robot where a major improve-
ment for residual evaluation was realized by employing an adaptive threshold.
Adynamie, nonlinear, model of a robot can be derived by the Euler-Lagrange equations
and presented in a closed form:
l(q(t»)ij(t) =X d (q(t),cj(t») + x g (q(t») + f(u(t»)
Here, the vector q(t) represents generalized position coordinates, J in an inertia mass
matrix, Xd is a matrix of Coriolis and centrifugal terms and ~ is a vector of gravitional
terms. Driving torques are described by f( u(t»).
With a special choice ofthe system setup the robot equation can be written:

y(t) = [1o 0]1


x(t), with x(t) =[q(t)]
cj(t)
Analytical redundancy methods 171

This is ofthe form ofEqs. (2.146) withAo=O, K=O, F=O, G=O. In order to detect a fault
fJ....t) in the i-th axis of an n-degree offreedom robot, applying conditions (2.147) yields
the transformation matrix T:

T = [0 nxl : I:xd
1* =[OnxI lOnxI]
i -th coluIDD

and

G=[OnxI j -aJ;xI]
LI =[OnxI : -aJ;xd
L2 =[OnxI: aJ;xI]
where aj is a positive number determining the ob server poles.
A matrix multiplication, f; (t) =J ( q (t ) ) f d ( t) decouples the torque on each general-
ized axis, representing unmodeled friction torques and any other additional torque dis-
turbances.
Extensive investigations on this robot showed that friction cannot be modeled in a simple
form, e.g. Coulomb and viscous friction. Instead, a displacement dependent friction char-
acteristic had to be introduced (see e.g. Schneider and Frank, 1992, Armstrong, 1988).
Typical friction characteristics ofthe r3 robot are displayed in fig. 2.25 where friction is :
obtained by the observer-based approach while constant velocity movements are per-
formed. The residual, representing a friction torque, is not plotted as a time signal but as
a correlation with the corresponding positional displacement, such that different meas-
urements can easily be compared.
In experiments, external torques to the robot were calculated with the observer-based
method and also measured with a force/torque sensor. An external torque was applied to
the robot's tool center point wbile the robot performed a normal operation. As seen from
fig. 2.26, the calculated residual matched the measured one within a ±3Nm range. Note
that a dead-band of about 30Nm was needed when a fixed threshold was used.
Moreover, only a small torque was applied in tbis experiment, degrading the results since
the measurements were corrupted by noise.
172 Real time fault monitoring of industrial processes

60
40
20 6
Torquc
o [Nm1 4
-20
-40
-60
2
. .
~.,j... '''': J;:.. - ... '~. ,.~ ...:'
-80 ~---:-!:::-----:7:--~--=--=--::-:-:----:--:-:-- "
-200 -150 -100 -50 0 50 100 150 -2'L-----:-----c-----:--____-:---::--
02 46189
Position (deg) TIme [s1

Figure 2.25 Friction characteristics of the Figure 2.26 External torque estimation.
MANUTEC r3 robot. Solid line: measured; dotted line: calculated.

References

Anderson T.W. (1958). An introduction to multivariate statistical analysis. John Wiley,


N.Y.
Antsaklis PJ. (1980). Maximal order reduction and supremal A, Binvariant and contr-
ollability subspaces. IEEE Transactions on Automatie Control, AC-25, 44-49.
Armstrong B. (1988). Friction: Experimental determination, modeling and compensation.
Proceedings, IEEE International Conferenee on Roboties and Automation, Philadelphia.
Athans M. and D. Willner (1973). A practical scheme for adaptive aircraft control sys-
tems. Symposium on Parameter Estimation Teehniques and Applieations in Airerajt
Flight Testing, NASA Flight Research Center, Edwards AFB, April 24-25.
Basseville M. (1981). Edge detection using sequential methods for change in level, Part
I: A sequential edge detection algorithm. IEEE Transactions on Aeousties, Speech and
Signal Proeessing, ASSP-29, 1,32-50.
Basseville M. and A. Benveniste (1983). Design and comparative study of some se-
quential jump detection algorihms for digital signals. IEEE Transactions on Aeousties,
Speech and Signal Proeessing, ASSP-31, 32-50.
Basseville M. and A. Benveniste, Eds. (1986). Detection of abrupt changes in signals
and dynamical systems. Springer-Verlag.
Basseville M. and I. Nikiforov (1993). Detection of abrupt changes. Theory and appli-
cation. Prentice Hall, Englewood Cliffs, NJ.
Baram Y. (1976). Information, consistent estimation and dynarnic system identification.
Phd. Thesis, MI. T.
Analytical redundancy methods 173

Beard RY. (1971). Failure accomodation in linear systems through reorganisation.


Rep. MVT-7J-l, Man VebicleLab., M.I.T.
Beattie E.C. et al. (1981). Sensor failure detection systems for the FI00 turbofan en-
gine. NASA CR-165515.
Bickel, P.J. (1965). On some asymptotic competitors to Hotelling's]2· The annals 0/
Mathematieal Statisties, 36, 16-173.
Bonivento C. and ATonielli (1984). A detection-estimation multifilter approach with
nuclear application. Proceedings, 9th IFAC World Congress, Budapest, Hungary,
1771-1776.
Cho D.and JK. Hedrick(1989). Automotive Power Train Modeling for Control. ASME
Journal 0/Dynamie Systems, Measurement, and Control, 111, 568-576.
Chow E.Y. (1980). A failure detection system design methodology. Phd. Thesis, M.lT.
Chow E.Y. and AS. Willsky (1984). Analytical redundancy and the design of robust
failure detection systems. IEEE Transactions on Automatie Control, AC-29, 603-614.
Cbien T.T. and M.B. Adams (1976). A sequential failure detection technique and its
application. IEEE Transactions on Automatie Control, AC-21, 750-757.
Clark RN. (1978). A simplified instrument failure detection scheme. IEEE
Transactions on Aerospaee and Eleetronie Systems, AES-14, 4.
Coats F.E. and RD. Fruechte (1982). Dynamic Engine Models for Control
Development. Part 11: Application to Idle Speed Control. General Motors Research
Laboratories Publieation GMR-3789, Ianuary.
Daley K.C., Gai E. and I.V. Harrison (1979). Generalised likelihood test for FDI in re-
dundant sensor configurations. Journal 0/ Guidanee and Control, 2, 9-17.
Darenberg W. (1987). Automatische Spurfiihrung von Kraftfahrseugen - Ein Problem
der robusten Regelung. Automobil Industrie, 2, 155-159.
Deckert JC., Desai M.N., Deyst, J.1. and AS. Willsky (1977). F8 DFBW sensor failure
identification using analytical redundancy, IEEE Transactions on Automatie Control,
AC-22, 5, 795-803.
DeLaat JC. and W.C. Merrill (1990). A real time microcomputer implementation of
sensor failure detection for turbofan engines. IEEE CS Magazine, 10,4,29-36.
Desai M. and A Ray (1981). A fault detection and isolation methodology.
Proceedings, 20th Con/erenee on Deeision and Control, 1363-1369.
Ding X. and P.M. Frank (1989). Komponentenfehler-detektion mittels auf
Empfindlichkeitsanalyse basierender robuster detektionsfilter. Automatisierungstechnik,
0ldenburg, Munich.
174 Real time fault monitoring of industrial processes

Dobner D.J. (1982), Dynamic Engine Models for Control Development. Part I:
Nonlinear and Linear Model Formulation. General Motors Research Laboratories
Publication GMR-3783, January.
Dowdle J.R., Willsky AS. and S.W. Gully (1983). Nonlinear generalised likelihood ra-
tio algorithms for manoeuver detection and estimation. Proceedings, 1982 American
Control Conjerence, Arlington, Virginia, June 1983.
Frank P.M. (1987). Fault diagnosis in dynamic systems via state estimation-A survey.
In S. Tzajestas, M Singh, G. Schmidt (Eds.), System jault diagnostics, reliability and
related knowledge-based approaches, D. Reidel, Dordrecht, 35-89.
Frank P.M. (1990). Fault diagnosis in dynamic systems using analytical and knowledge-
based redundancy-A survey and some new results. Automatica, 26, 3, 459-474.
Gantmacher R. (1974). The theory of matrices, Volume 11. Chelsea Publishing
Company, N.Y.
Gertler, IJ. (1991). Analytical redundancy methods in failure detection and isolation in
complex plants. Proceedings, IFAClIMACS Symposium SAFEPROCESS '91, Baden-
Baden, September 10-13, pp.9-21.
Gertler, II, Costin, I, Kowalczuk, Z., Fang, x., Hira, R. and Q. Luo (1991). Model-
based on-board fault detection and diagnosis for automotive engines. Proceedings,
IFAClIMACS Symposium SAFEPROCESS '91, Baden-Baden, 503-508.
Green C.S. (1978). An analysis ofthe multiple model adaptive control algorithm. Phd.
Thesis, Report no. ESL-TH-843, Electronic Systems Laboratory, M.I.T.
Grizzle lW., Cook JA and K.L. Dobbins (1990). Individual Cylinder Air to Fuel Ratio
Control with a Single EGO Sensor. Proceedings, 1990 American Control Conjerence,
San Diego, CA, May 1990,2881-2886.
Gustafsson D.E., Willsky AS., Wang I-Y., Lancaster M.C. and IH. Triebwasser
(1978a). ECGNCG Rhythm diagnosis using statistical signal analysis, I: Identification
of persistent rhythms. IEEE Transactions on Biomedical Engineering, BME-25, 4,
344-353.
Gustafsson D.E., Willsky AS., Wang I-Y., Lancaster M.C. and IH. Triebwasser
(1978b). ECGNCG Rhythm diagnosis using statistical signal analysis, I: Identification
oftransient rhythms. IEEE Transactions on Biomedical Engineering, BME-25, 4, 353-
361.
Janssen K. and P.M. Frank (1984). Component failure detection via state estimation.
Proceedings, 9th IFAC World Congress, Budapest, Hungary.
Jenkin G.M. and D.G. Watts (1968). Spectral analysis and its applications. Holden Day,
San Francisco.
Analytical redundancy methods 175

Jones H.L. (1973). Failure detection in linear systems. Phd. dissertation, Dept.
Aeronautics and Astronautics, M.I.T.
Karkanias N. and B. Kouvaritakis (1979). The output zeroing problem and its relation-
ship to the invariant zero structure. International Journal ofControl, 30, 395-415.
Kasper R., Lückel J., Jaker KP. and J. Schroer (1990). CACE Tool for Multi-Input, .
Multi-Output Systems using a New Vector Optimisation Method. International Journal
ofControl, 51, 963-993.
Konik D. and S. EngeIl (1986). Sequential design of decentralised controllers using
decoupling techniques. Part 2:Numerical aspects and application. Proceedings, 4th
IFACIIFORS Symposium on Large Scale Systems: Theory and Applications, Zurich,
Switzerland.
Kumamaru K (1984). Statistical failure diagnosis for dynamical systems. Systems and
Control, 28, 77-86.
Lainiotis D.G. (1971). Joint detection, estimation and system identification, Information
and Control, 19, 75-92.
Lou X.C., Willsky AS. and G.C. Verghese (1986). Optimal robust redundancy relations
for failure detection in uncertain systems. Automatica, 22, 333-344.
Massoumnia, M.-A (1986). A geometrie approach to failure detection and identifica-
tion in linear systems. Phd. Thesis, Dept. of Aeronautics and Astronautics, MIT.
Massoumnia M.-A and W.E. Vanderwelde (1988). Generating parity relations for de-
tecting and identifying control system component failures. AIAA Journal of Guidance,
Control and Dynamics, 11, 60-65.
Mehra, R.K and I. Peschon (1971). An innovations approach to fault detection and di-
agnosis in dynamical systems. Automatica, 7, 637-640.
Mironovskii LA (1980). Functional diagnosis of dynamic system. Automation and re-
mote control, 41, 1122-1143.
Moore B.C. (1976). On flexibility offered by state feedback in multivariable systems
beyond closed loop eigenvalue assignment. IEEE Transactions on Automatie Control,
AC-21, 689-692.
Moskwa J.J. and J.K Hedrick (1987). Automotive Engine Modeling for Real Time
Control Application. Proceedings ofthe 1987 American Control Conference, Vol. 1, pp.
341-346.
Ono T., Kumamaru T. and K Kumamaru (1984). Fault diagnosis of sensors using a
gradient method. Transaetions, Society of Instrument and Control Engineers, 20, 22-
27.
176 Real time fault monitoring of industrial processes

Patton RJ. and I. Chen (1991). Robust fauIt detection using eigenstructure assignment:
A tutorial consideration and some new results. Proceedings, 30th IEEE Conference on
Decision and Control, Brighton, UK., December 11-13, 2242-2247.
Patton RI. and I. Chen (1992). Robust fault detection of jet engine sensor systems by
using eigenstructure assignment. Proceedings, AIIA Guidance, Navigation and Control
Conference, New Orleans, August.
Patton RI., Chen I. and H.Y. Zhang (1992). Modelling methods for improving robust-
ness in fault diagnosis of jet engine system. Proceedings, IEEE 31st Conference on
Decision and Control, Tucson, Arizona, December 1992, 2330-2335.
Patton RJ., Frank P.M. and RN. Clark, Eds. (1989). Fault diagnosis in dynamic sys-
tems, Theory and Applications. Prentice-Hall, Englewood Cliffs, NI.
Patton RJ. and S.M. Kangethe (1989). Robust fault diagnosis using eigenstructure as-
signment of ob servers. In Patton RI., Frank P.M. and RN. Clark (Eds.), Fault
diagnosis in dynamic systems: Theory and applications, Prentice Hall.
Patton RI. and S.M. Willcox (1986). FauIt diagnosis in dynamic systems using a robust
output zeroing design method. Proceedings, First European Workshop on Faill/re
Diagnosis, Rhodes, Greece, August 31-September 3.
Patton RI., Willcox S. and SJ. Winter (1987). A parameter insensitive technique for
aircraft sensor fault analysis, AIAA Journal of Guidance, Control and Dynamies, 10,
359-367.
Patton R.I., Zhang H.Y. and I. Chen (1992b). Modeling ofuncertainties for robust fault
diagnosis. Proceedings, IEEE 31st Conference on Decision and Control, Tueson,
Arizona, December 16-18.
Potter I.E. and M.C. Suman (1977). Thresholdless redundancy management with arrays
of skewed instruments. Integrity in electronic jlight control systems,
AGARDOGRAPH-224, 15-11 to 15-25.
Pouliezos A. (1980a). Fault monitoring schemes for linear stochastic systems. Phd.
Thesis, Brunel University, England.
Pouliezos A. (1980b). An iterative method for calculating the sampie serial correlation
coefficient. IEEE Transactions on Automatie Control, AC-25, 834-836.
Pouliezos A. and G. Stavrakakis (1987). Linear estimation in the presence of sudden
system changes: An expert system. In P. Borne, S. Tzafestas (Eds.), Applied modeling
and simulation of technological systems, North-Holland, 41-48.
Pouliezos A. and G. Stavrakakis (1991). A two-stage real-time fault monitoring system.
Proceedings, European Robotics and Intelligent Systems Conference EURISCON '91,
Corfu, Greece, June 23-28.
Analytical redundancy methods 177

Pouliezos A., Stavrakakis G. and G. Tselentis (1993). Detection ofmultivariable system


noise degradation. Proceedings, IEEE Mediteranean Symposium on New Directions in
Control Theory and Applications, Hania, Greece, June 21-23.
Rizzoni G., Azzoni P.M. and G. Minelli (1993). On-board diagnosis ofemission control
malfunctions in electronically controlled spark igmition engines. Proceedings,
International Conference on Fault Diagnosis TOOWIAG '93, Toulouse, France, April
5-7,238-248.
van Schrick D. (1991). Robust Observers for Track Control and Instrument Fault
Detection of a City Bus. In Isermann R. (ed.), Preprints, IFACIIMACS Symposium on
Fault Detection, Supervision and Safety for Technical Processes-SAFEPROCESS '91,
Baden-Baden, Germany, September 10-13, Vol. I, 227-23l.
Schneider Hand P.M. Frank (1992). Observer-based fault detection ofrobots enhanced
by characteristic curves. Proceedings, IMACS/SICE International Symposium on
Robotics, Mechatronics and Manufacturing Systems, Kobe, Japan.
van Schrick D. (1993). Instrument fault detection scheme for inductive track-guided
transportation vehicles. Proceedings, International Conference on Fault Diagnosis
TOOLDIAG '93, Toulouse, France, April 5-7, 261-266.
Tanaka S.and P.C. Müller (1990). Fault detection in linear discrete dynamic systems by
a pattern recognition of a generalised likelihood ratio. Journal 0/ Dynamic Systems,
Measurement and Control, Transactions ofthe ASME, 112,276-282.
Uosaki K. (1985). Failure detection using backward SPRT. Proceedings, IFAC
Symposium on Identification and System Parameter Estimation, York, u.K., 1619-
1624.
Uosaki K. and M. Kawagoe (1988). Backward SPRT failure detection system for de-
tection of innovation variance change. Proceedings, IFAC Identification and System
Parameter Estimation, Beijing, PRC.
Van Trees HL. (1968). Detection, estimation and modulation theory. John Wiley and
Sons.
Wald A. (1947). Sequential analysis. Wiley, New York.
Watanabe, K. and D.M. Himmelblau (1982). Instrument fault detection in systems with
uncertainties. International Journal ofSystems Science, 13, 137-158.
Watanabe K., Yoshimura T. and T. Soeda (1979). A diagnosis design for a parametric
failure. Transactions, Society ofInstument and Control Engineers, 15, 901-906.
Watanabe K., Yoshimura T. and T. Soeda (1981). A discrete-time adaptive filter for
stochastic distributed parameter systems. Transactions, ASME Journal of Dynamic
Systems, Measurement and Contro/, 103, 266-278.
178 Real time fault monitoring of industrial processes

Willsky A S. (1986). Detection of abrupt changes in dynamic systems. In M. Basseville


and A Benveniste (Eds.), Deteetion oj abrupt ehanges in signals and dynamieal sys-
tems, Springer-Verlag, 27-49.
Willsky AS., Chow E.Y., Gershwin S.B., Greene C.S., Houpt P.K and AL. Kurkjian
(1980). Dynamic model-based techniques for the detection of incidents on freeways.
IEEE Transaetions on Automatie Control, AC-25.
Willsky AS., Deyst J.1. and B.S. Crawford (1974a). Adaptive filtering and self-test
methods for failure detection and compensation. Proceedings oj the Joint Automatie
Control Conjerenee, Austin, Texas, 637-645.
Willsky AS., Deyst J.J. and B.S. Crawford (1974b). Two self-test methods applied to
an inertial system problem. Journal oj Spaeeerajt and Rockets, 12, 434-437.
Willsky AS. and HL. Jones (1976). A generalised Iikelihood ratio approach to the de-
tection and estimation of jumps in linear systems. IEEE Transactions on Automatie
Control, AC-21, 108-112.
Wünnenberg 1. and P.M. Frank (1987). Sensor fault detection via robust ob servers. In
S. Tzafestas, M. Singh, G. Schmidt (Eds.), System jault diagnosties, reliability and re-
lated knowledge-based approaehes, D. Reidel, 147-160.
Yoshimura T., Watanabe K, Konishi K and TA Soeda (1979). Sequential failure de-
tection approach and the identification of failure parameters, International Journal oj
Systems Scienee, 10, 827-836.
CHAPTER 3

PARAMETER ESTIMATION METHODS FOR FAULT


MONITOR/NG

3.1 Introducti.on

Fault detection via parameter estimation relies on the principle that possible faults in the
monitored process can be associated with specific parameters and states of a mathemati-
cal model of a process given in general by an input-output relation,
y(t) = f(u, e, (J, x) (3.1)
where y(t) represents the vector output of the process, u(t) the vector input, x(t) the
partially measurable state variables, (J the nonmeasurable process parameters likely to
change and e(t) unmodeled or noise terms affecting the process. It is obvious therefore,
that it is necessary to have an accurate theoretical dynamic model of the process in order
to apply parameter estimation methods. This is usually derived from the basic balance
equations for mass, energy, and momentum, the physico-chemical state equations and
the phenomenologicallaws for any irreversible phenomena. The models will then appear
in the continuous or discrete time domain, in the form of ordinary or partial differential
or difference equations. Their parameters (J i are expressed in dependence on process
coefficients Pj. like storage or resistance quantities. whose changes indieate a process
fault. Hence, the parameters ()i of continuous time models have to be estimated. In this
case there is a minimum number of independently measurable quantities which permit the
estimation of various states and parameters. As an example consider a simple dynamic
process model with lumped parameters, linearized about an operating point, which may
be described by the differential equation,
y(t) + ... + anY(n)(t) = boU(t} + b 1u(1)(t) + ... + b"",(m)(t) (3.2)
The process model parameters,
180 Real time fault monitoring of industrial processes

(3.3)
are defined as relationships of several physical process coefficients, e.g. length, mass,
speed, drag coefficient, viscosity, resistances, capacities. Faults which become notice-
able in these physical process constants are therefore also expressed in the process model
parameters. If the physical process coefficients, indicative of process faults, are not di-
rectly measurable, an attempt can be made to detect their changes via the changes in the
process model parameters 9. The following procedure is therefore applicable in general:
(I) Establishment ofthe mathematical model ofthe normal process,
y(t) =f(u(t),9) (3.4)

mainly from theoretical considerations. At this stage allowable tolerances for


process coefficient values are also defined.
(2) Determination ofthe relationship between the model parameters (}j and the physical
process coefficients Pj'
9 = flp) (3.5)
(3) Estimation of the model parameters (}j from measurements of y(t), u(t), by a
suitable estimation procedure,
O(t) =g(y(I), ... ,y(t),u(I), ... ,u(t)) (3.5a)
(4) Calculation ofprocess coefficients, via the inverse relationship,
pet) = f- 1(O(t)) (3.6)

(5) Decision on whether a fault has occurred, based either on the changes L1pj cal-
culated in step 4 or on the changes L1(}j and tolerance limits from step 3. If
decisions are made based on the L1(}j the affectedp;'s can be easily determined from
3.5. This may be achieved with the aid ofa fault catalogue in which the relationship
between process faults and changes in the coefficients L1pj has been established.
Decisions can be made either by simply checking against the predetermined
threshold levels, or by using more sophisticated methods from the fields of
statistical decision theory. A fault decision should include the fault Iocation, fault
size and time of occurrence. System reorganization should follow a positive fault
decision. Such an action is essential, since usually controller design depends on the
correctness of parameters. Robust schemes also benefit from this procedure.

The basis of this class of methods is the combination of theoretical modeling and pa-
rameter estimation of continuous time models. A block diagram is given in Figure 3.1.
Since, however a requirement of this procedure is the existence of the inverse relation-
ship (3.6) it may be restricted to well-defined process.
Parameter estimation methods 181

Calculation Theoretical
of process < mode1lng
coefficients p.f" (8)
11
v P

v 6p, 68

v
FAULTS

Figure 3.1 Fault detection based on parameter estimation and theoretical modeling.

Having in mind these limiting requirements it must be emphasized that parameter-es-


timation based fault detection perforrns aII the tasks that are expected from a well-
designed fault monitoring system:
• fauIt detection
• fault isolation
• fault estimation and
• enables systems reorganization following a fault.
Iserman (1987), points to some other advantages as weIl:
• Process faults can be detected earlier and localised more precisely since process
parameters are in many cases cIoser to the process faults in terms of signal flow.
• Fault selectivity is improved since the monitored process parameter is changed
directly by a fault and is not coupled to other parameters.
• Faults in cIosed loops can also be detected since the procedure considers the
selection between inputs and outputs.
• Multiple fault detection is possible, since the procedure is checking for all values
simultaneously.
182 Real time fault monitoring of industrial processes

The implementation of the full procedure requires however, more eifort in modeling the
process, more sophisticated and fault-sensitive identification methods and fast
processing hardware suitable for on-line operation.
In the following sections different approaches for each of the three stages of the fault
monitoring process (modeling, estimation, decision) will be presented in detail.

3.2 PTocess modeling fOT fault detection

The starting point in the design of a fault monitoring system is the development of the
process model in the form of a set of equations in the continuous or discrete time do-
main. Usually these process models are nonlinear and should not be linearised since a
process model for failure diagnosis should be valid over a large range of operating
conditions. This process description has to be translated into equations which allow to
estimate 9. Since usually least-squares or related methods are used for the estimation
phase, these equations must be linear-in-the-parameters, Le. of the following general
form,
r
[(YI,· .. ,Yn) =LOJi(YI,· .. ,Yn)
i=1

where Yj are the measured quantities. The functions /; may be nonlinear; time-derivatives
or integrals are also allowed since the derivatives can be obtained from the original
signals by state variable filtering (SVF, Young, 1981) or standard difference techniques
(pouliezos and Stavrakakis, 1989). More valuable than the estimated parameters 9 are
the physical process coefficients p, since they are directly related to the monitored
process. Thus deviations of the process coefficients from their "normal" values can be
attributed to a significant change in the process itself and allow a precise fault diagnosis
(localisation of fault) if there is a unique correspondence between 9, p. A simple ex-
ample will illustrate the importance of choosing the correct process model (Iserman,
1987).
Example 3.1 First order electrical circuit.
R
-c:=:::J-T+---<lO
U1 1 Uc ' I 1...L c 1 U2

o T 0

First approach: Defining voltage vI as input, voltage v2 as output, and al=RC, bo=l, the
dynarnic process model is:
Parameter estimation methods 183

v2(t) =[-v 2 vtl


b
[lI 81 ]
o /81
This model has only one parameter a1 (effectively) which contains two process coeffi-
cients R, C. Hence, one of them must be assumed as non-likely to fail. However, if
bo*1 a fault ofthe condensator (leakage current) is indicated.
Second approach: Defining voltage v1 as input, current i as output, and a1=RC, b1=C,
the dynamic process model is:

Hence, 9 1 = lIRC, 92 = lIR and R = 1192, C = 92/9 1 and the two process coefficients
can be determined uniquely.
This simple low order example shows that it depends on the selected input and output
variables whether a change (fault) in one ofthe process coefficients can be detected. This
capability is closely related to identifiability conditions for the process coefficients. If not
all of the process coefficients can be determined uniquely then some of them must be
assumed as not likely to fail.
Nold and Iserman (1986) give the following algebraic identifiability condition relating 9
andp:
Theorem 3. J A system of equations,

9j _
-
~
"-..iCpi
p=l
[
K+
A n k"v].. _
m

v=l
Py ,1 -1, ... , r

provides unique solutions for the process coefficients P1, "" Pm in Jlm with the exclusion
of an open neighbourhood of 0 if the determinant,
Ull ul,n-m-l 1 k ll k 1m

where Uj =[ U1j, "" Unj]T are the base vectors ofthe equation system written in the form:
9= {cli}z
The nurnber ofbase vectors has to be n-m-J ~o, This result is illustrated by considering
184 Real time fault monitoring of industrial processes

a second order electrical network as shown in Figure 3.2.

Figure 3.2 Second order electrical


network

The variables which can be measured are only u and i. Here the circuit diagram is already
a sufficient process description so that in this case, no block diagram is needed. One can
directly write down the equation in matrix form:

0 0 0 0
-R,
-I 0 1 0 U c,
0 0
0 0 0 +1 U R2

[~]=
0 1
0 0 -1 C 2s 0 12
0 0
-CIS 0 0 0 1 U C2
0 0
0 1 -R2 0 0 I,
0 0

iI
The H-matrix is now rowwise tranformed in upper triangular form. After tranforming
the first five rows one obtains:

-R, 0 0 0 0
U c,
1 -R, 0 1 0 1 0
U R2

[~]=
0 1 0 0 0
12
0 0 0 0 C 2 s __ J
U C2
CIS -RP,s 0 0 0 0 1
I,
0 0 0 -~ 0 0

Now the last row is stepwise tranformed:


[-1 R,]y =[0 0 -R2 -1 O]x
[-1 R) +R2 ]y =[0 0 0 -1 R 2 ]x
Parameter estimation methods IS5

[-1 - C1S(R2 + _1_)


C 2s
R1+ R2 + _1_
C 2s
+ RPIS
(R2 + _l_)]y
C 2s
=0

The last equation represents the estimation equation which can now be written in the
form,

Therefore one can derive the equation system:

KRO~ClC-l
1 1 2
KRoRoCoCO ()l
1 2 1 2
1 0 0 0 0 0 0
KRO~ClCO {)2
1 1 2
0 0 0 0 0 0 0
KRl~COCO -{)3
1 1 2
0 0 0 1 0 0 =
KRl~ClC-l -{)4
1 1 2
0 0 0 0 0 0 0
KRO~COCO -{)5
1 1 2
0 0 0 0 0 0 0
KRl~ClCO -()6
1 1 2
iRO~COCl
1 1 2

The base vectors are obtained by evaluating the homogeneous solution


ZI + Z2 = 0
Z4 + Zs + Z() = 0
Z3 =Z7 = Zs = 0
The solutions for tbis set of equations can be expressed in the form,
Z= Al"1 + Az"2 + A3"3; Al> Az, ~ E R
A set ofbase vectors"i for this equation system is,
"I T = [1 -1 0 0 0 0 0 0]

"2 T = [0 0 0 0 0 -1 0 0]
"3 T = [0 0 0 0 1 -1 0 0]
The determinant thus becomes,
186 Real time fault monitoring of industrial processes

0 0 0 0 1 -1
-1 0 0 0 0 0 0
0 0 0 0 0
0 1 0 0 0 0
=1
0 0 0 -1
0 -1 -1 0 1 0 0
0 0 0 1 1 0
0 0 0 0 0 0 -1

The determinant is unequal to zero, the number of base vectors is 8-4-1=3, r=5<n=8
hence the system is structurally identifiable.
However, the theorem does not give the solutions for the process coefficients, it just says
whether there is a unique solution or not. In tbis example, the process parameters can be
easily determined from the estimated parameters. Using,
du(t) . di(t)
( } l - - + (}2U(t) = l(t) + (}3 - - + (}4(t)
dt dt
with,
(}l = Cl + C2
(}2 =R2 Cl C2

()3 = Rl(C I + C2) + R2C2


()4 =R l R2 Cl C2
one can irnrnediately see that,
Rl = (}4(}2

Cl =(}i I {(}3(}2 - (}A)


C2 = (}l - (}i I ((}ß2 - (}A)
R 2 =((}3 -(}AI(}2)/((}I-(}i I ((}3(}2 -(}4(}1))

3.3 Parameter estimation for fault detection

Parameter identification methods exist in a variety of approaches. However, not all of


them are suitable if they are to be used for on-line fault detection schemes. Proper fault
detection procedures should be oblivious to old data in order to be able to detect
parameter changes soon after they occur and should perforrn necessary calculations fast
Parameter estimation methods 187

enough for real-time operation. For the first requirement some method of overweighting
old data must be used, but not at the expense of accuracy. A standard way to do that is
to use sliding windows of data values. The windows may be either rectangular with equal
weighting or exponential with some form of decreasing weights with distance from
present. Appropriate techniques will be discussed in the next sections. The second
requirement is fulfilled by using recursive methods and perhaps employing some
modifications for reducing calculations. Memory requirement is not a problem with
today's shrinking components.
In the following sections, several estimation techniques that have been used for fault
detection will be presented. At this point the elose relationship between identification
algorithms for slowly or fast-varying parameters and for fault detection purposes must
be pointed out. However, a subtle difference exists which should not be overlooked: a
fault detection mechanism assumes that a parameter value is known until a change
occurs. This means that any control loops utilizing these parameters should use the
known and not the estimated value of the parameters.

3.3.1 Recursive least squares algorithms

Most schemes for fault detection employing L.S. parameter estimation methods, model
the observed system (3.1) as,
y(t) = G(q-I; 9) u(t) + H(q-I; 9) e(t) (3.7)
where y(t) is the p-dimensional output, u(t) the r-dimensional input, e(t) is a zero-mean
white noise sequence with covariance A( 9) and G(lT 1; 9), H( lT 1; 9) are filters of
appropriate dimensions. Here lT 1 denotes the backward shift operator lT 1{ u(t)} = u(t-l)
etc. Finally, 9 denotes the n-dimensional parameter vector.
Equation (3.7) describes a quite general linear model which by proper choice of the
matrices G, H, A and parameters Oi can be put into more familiar forms. An ARMAX
(autoregressive moving average with exogenous signals) model is obtained if,

G(q-l'9) = B(q-l) =
H(q-l'9) C(q-l) A(9) =A. 2
, A(q-l) ' 'A(q-l)'
where,
A(q-I) = 1+ alq -I +... + an q na
a
B(q-I) = ~q-I + ... +bnbqnb
C(q-I) =1+Clq-l + ... +cncqnc
The parameter vector is taken as,
188 Real time fault monitoring of industrial processes

9 = [81 ... 8ns q ... bnb Cl ... Cnc }. 2 r


The ARMAX model is then,
A(q-l)y(t) =B(q-l)u(t) + C(q-l)e(t) (3.8)
or in difference equation form,
y(t)+~y(t-l)+ ... +an.y(t-n.)=b1u(t-I)+ ... +bn.u(t-nb)+c(t)+c1c(t-I)+ ... +cn,c(t-nc )
Here y(k) and u(k) are scalar signals.
If further nc=O, an ARX (autoregressive with exogenous input) model is obtained given
by,
A(q-l)y(t) = B(q-l)u(t) + e(t) (3.9)
9 = [81 ... 8n

~ ... bnb ]T (3.9a)

This structure can in some sense be viewed as a linear regression. T0 see this, rewrite the
model (3.9) as:
y(t) = tpT(t)9 + e(t) (3.9b)
where,
tp(t) = [ - y(t -1) ... - y(t - Da) u(t -1) ... u(t - Db ) ]T (3.9c)
Here though, the regressors (the elements of tp(t)) are not deterministic functions.
Finally, note that a linear stochastic model in state space form,
X(/+ 1) = A( 9)x(/) + B( 9)u(/) + w(/) (3.10)
y(t) = C( 9)x(t) + v(t)
where w(~ and v(t) are multivariate white noise sequences with zero means and co-
variances,
E{w(t)wT(s)} = R 1(9)t5t ,s
E{w(t)vT(s)} = Rd 9 )t5t ,s
E{ v(t)vT(s)} = R2(9)t5t,s
can be transformed into the general form (3.7) ifthe following definitions are made:
G(q-l; 9) = C(9)[qI -A(9)]-IB(9)
H(q-l; 9) = 1+ C(9)[qI - A(9)]-IK(9)
A( 9) = C( 9)P(9 )CI( 9) + Ri 9)
where K( 9) is the Kaiman gain,
Parameter estimation methods 189

and P(9) is the symmetrie positive definite solution ofthe Rieatti equation,

P(O) = A(0)P(9)A T (0) + R1(0) - (A(O)P(O)C T (0) + R 12 (O))

x (C(O)P(O)C T (9) + R2 (0)(( C(O)P(O)A T (0) + Rl~(O))


Speetral faetorization theory ean be used for the derivation of the above expressions
(Söderström and Stoica, 1988).
Now let us eonsider the ARX model given by equations (3.9). This is the form most
real-world applieations use beeause ofthe existenee ofwell behaved estimation methods.
Then the one-shot least squares parameter estimate is given by:

(3.11 )
A

The argument t has been used to stress the dependenee of 0 on time. The expression
(3.11) ean be eomputed in a reeursive fashion. Introduee the notation:

P(I) =[~f'{S)foT(SJ
(3.12)
Sinee trivially,
(3.13)
it follows that,

8(1) = P(t)[(~"(S)y(S») + ,,(I)y(l)1


=P(t)( p-\t -1)8(t -I) + fJ(t)y(t))
=8(t -1) + P(t)fJ(t)(y(t) - fJ T(t)8(t -1))
Thus,
9(t) =9(t -1) + k(t)E(t)
A A

(3.14a)

k(t) =P(t)fJ(t) (3.14b)

6(t) =y(t) - fJT (t)8(t -1) (3.14e)


Here the term E(t) should be interpreted as a predietion error. It is the differenee
190 Real time fault monitoring of industrial processes

between the measured output y(t) and the one-step-ahead prediction

y(tlt -1;8(t -1») =ipT (t)8(t -I)


A

ofy(t) made at time t-I based on the model eorresponding to the estimate O(t -I). If
E.{t) is small, the estimate 8(t -I) is "good" and should not be modified very mueh. The
veetor k(t) in (3 .14b) should be interpreted as a weighting or gain faetor showing how
mueh the value of E.{t) will modify the different elements ofthe parameter veetor.
To eomplete the algorithm, (3.13) must be used to eompute P(t) wbieh is needed in
(3. 14b). However, the use of(3.13) needs a matrix inversion at eaeh time step. Tbis
would be a time-eonsuming proeedure. Using the matrix inversion lemma (Gantmaeher,
1977) however, (3.13) ean be rewritten in a more useful form. Then an updating
equation for P(t) is obtained, namely,

P(t) = P(t -I) = P(t -1)f{J(t)ipT (t)P(t) (3.15)


I +ipT (t)P(t -I)ip(t)
Note that in (3.15) there is now a sealar inversion instead of a matrix inversion. The
algorithm eonsisting of (3 .14)-(3 .15) ean be simplified further. From (3 . 14b),
k(t) = p(t-I)f{J(t) - p(t-l)f{J(t)ipT(t)p(t-I)f{J(t)/(1 + fIT (t)P(t -1)f'(t»)

P(t -I)ip(t)
= -----:-----'~-'---

1+ ipT (t)P(t -I)ip(t) (3.16)


This form for k(t) is more convenient to use for implementation than (3. 14b). The reason
is that the right-hand side of (3.16) must anyway be eomputed in the updating of p(t)
(see 3.15). Further improvements for numerieal implementations will be diseussed later in
tbis ehapter.
The derivation of the Recursive Least Squares (RLS) method is now eomplete. The RLS
algorithm eonsists of (3.16), (3.15), (3.14e) and (3.14a), whieh are now given in al-
gorithmie order as:

k(t) = P(t -1)ip(t)


I +ipT (t)P(t -I)ip(t) (3.16)
P(t) = P(t -1) - k(t)f'T (t)P(t -1) (3.15)
&(t) =y(t) - ipT (t)8(t -I) (3.14c)

8(t) =8(t -I) + k(t)&(t) (3.14a)

The algorithm also needs initial values 8(0) and P(O). These can be either provided
from knowledge of system charaeteristies or calculated from an initial sampie using
Parameter estimation methods 191

(3.11) and starting the recursion later.


As is weil known the resulting estimate minimises,
N
VN (8) = ~>2(t)
t=1
A

Also, under mild conditions the LS estimate is consistent, i.e. 8 tends to 8 as N tends to
infinity, if,
E{ f1{t)tp1"(t)} is non singular (3.17)
E{ f1{t)d..t)} =0 (3.18)
Condition (3.17) is usually satisfied. A common cause of singularity is the condition of
non-persistent excitation of order nb of the input. Remedies for this irregularity will be
discussed later. Condition (3.18) is usually not satisfied unless d..t) is wbite, an
assumption usually made for most systems. However it must be stressed that violation of
(3.18) will render the whole fault detection scheme based on LS parameter estimation
invalid. In such cases the designer should resort to methods circumventing tbis problem
as for example instrumental variable methods (see Söderström and Stoica, 1988). Even
small biases should not be tolerated since these may trigger false alarms in sensitive
detectors.
There are several approaches for modifying the recursive LS algorithm to make it
suitable as a real-time fault detection method:
• Use of a forgetting factor.
• Use of a Kalman filter as a parameter estimator.
• Use of sliding windows of data.

3.3.2 Forgetting /actors

The approach in tbis case is to change the loss function to be minimized. Let the
modified loss function be,
t
~(8) = LÄ.t - ss (s)2
s=1 (3.19)
The loss funtion used earlier had Ä.= 1 but now the forgetting factor .1. is a number
somewhat less· than 1 (for example 0.99 or 0.95). Tbis means that with increasing t, the
measurements obtained previously are discounted. The smaller the value of .1., the
quicker the information in previous data will be forgotten. One can rederive the RLS
method for the modified criterion (3.19). The calculations are straightforward. The
recursive LS method with a forgetting factor is:
192 Real time fault monitoring of industrial processes

k(t) = P(t -I)tp(t)


A+ tpT ~t)P(t -I)tp(t)
8(t) =8(t -1) + k(t){y(t) - tpT (t)8(t -1))
P(t) =!(P(t -1) - k(t)tpT (t)P(t
A
-I») (3.20)

Equations (3.20) are often referred to as the Recursive Weighted Least Squares (RWLS)
identification method or Decreasing Gain Least squares (DGLS) method. Experiences
with tbis simple rule for setting A show that a decrease in the value of the forgetting
factor leads to two effects:
(1) The parameter estimates converge to their true values quicker, thus decreasing the
fault a1arm delay time, Id.
(2) But at the expense of increased sensitivity to noise. If A is much less than 1 the
estimates may even oscillate around their true values.
There are various ways around tbis problem:
Time-varying forgetting factor: In tbis method the constant A in (3.20) is replaced by
Ä.(t). A typical choice is an exponential given by,
A(t) =1- A~ +(1- A(O»)
or recursively,
A(t) = AOA(t -I) + (1- A(O») (3.21)
Typical design values for Ao and Ä.(O) are 0.99 and 0.95 respectively.
Equations (3.20) with (3.21) in place of A minimise the quadratic cost function:
t
~(O) = LA(S)E?(S)
s=1

A more general dass ofvariable weights is described by the relation,


t
w(r) =,u(r)il A(i);?: 0
i=T (3.22)
where ,u( r) is a multiplicative factor reflecting the quality of measurement y( r) and Ä.( r) is
a time-varying forgetting factor given by (3.21). Minimising,
t
~(O) =LA(s)e 2 (s)
s=1 (3.23)
results in the following equations,
Parameter estimation methods 193

1
a(t) =_+,T (t)P(t -1),(t)
fJ(t)
k(t) =P(t -1),(t)a-1(t)
P(t) = _1_{p(t -1) - k(t)a(t)k T(t))
A(t)

8(t) =8(t -1) + k(t){y(t) _,T (t)8(t -1)) (3.24)

Constant Trace. In the case of abruptly changing systems, the tracking capability and
consequently the fast response to parameter changes can be maintained by using the
forgetting factor to keep the trace of P constant. This idea resuIts in the recursive
Constant Trace Least Squares (CTLS) algorithm (Shibata et al., 1988) implemented by
the following set of equations:
8(i) =8~~ -1) - P(i -1),(i -1)e(i)
e(i) =loT (i -1),(i) - y(i»)a(i)
P(i) = A-1(l){P(i -1) - P(i -1),(i),T (i)P(i -1)a(i»)

a(i) = {I +,T (i)P(i -1),(i)t1


A(i) =1- a(i)(P(i -1),(i»)2 / trP(O) (3.25)

Here 6(0) and P(O) must be defined. This method eIiminates the estimator wind-1Jp
problem which occurs when a constant forgetting factor is used and provides rapid
convergence after the onset of a parameter charlge.
Kaiman filters: Assuming that the parameters are constant, the underlying model
y(t) = ,T0(/) + e(t)
can be described as astate space equation,
x(t+ 1) = x(t) (3.26)
y(t) = ,T(t)x(t) + e(t) (3.27)
where the "state vector" x(t) is given by,
x(t) = [ 81 .•. 8 n

q ... bn ]T = 0
b
(3.28)

The optimal state estimate x(t + 1) can be computed as a function ofthe measurements
till time t using the Kalman filter. Note that usually the Kalman filter is presented for
state space equations whose matrices may be time varying but do not depend on the data.
The latter condition fails in the case of (3.27) since tp(t) depends on data up to (and
inclusive of) time (1-1). However, it can be shown that also in such cases the Kalman
filter provides the optimal (mean square) estimate of the system state vector (Aström,
194 Real time fault monitoring of industrial processes

1971).
Applying the Kalman filter to the state model (3.26) will give precisely the basic re-
cursive LS algorithm. One way of modifying the algorithm so that time-varying
parameters can be tracked better is to change the state equation (3.26) to
x(t+l) =x(t) + v(t); E{v(t)vT(s)} =R1t\s (3.29)
Tbis means that the parameter vector is modeled as a random walk or a drift. The
covariance matrix R 1 can be used to describe how fast the different components of 0 are
expected to vary. Applying the Kalman filter to the model (3.29), (3.27) gives the
following recursive algorithm:
k(t) = pet -I),(t)
1+,T (t)P(t -1),(t)

pet) ={pet -I) - k(t),T (t)P(t -I)} + R)

6(t) = 6(t -1) + k(t)(y(t) _,T (t)6(t -1)) (3.30)

Observe that for both algorithms (3.20) and (3.30) the basic method has been modified
so that p(t) will no longer tend to zero. In tbis way k(t) also is prevented from decreasing
to zero. The parameter estimates will therefore change continually.
In the algorithm (3.20) R) has a role similar to that of Ä, in (3.20). These design variables
should be chosen by a trade-offbetween fast detection (wbich requires Ä, "small" or R}
"Iarge") on the one hand and reliability on the other (wbich requires Ä, elose to 1 or R}
"small"). Tbis trade-offmay be resolved by fault simulation.
The Kalman filter interpretation of the RLS algorithm is also useful in another respect. It
provides suggestions for the choice of the initial values 8(0) and P(O). These values are
necessary to start the algorithm. Since p(t) (times Ä,2) is the covariance matrix of 8(t) it
is reasonable to take for 8(0) an apriori estimate of (J and to let P(O) reflect the
confidence in tbis initial estimate 8(0). If P(O) is small then k(t) will be small for all t and
fhe parameter estimates will therefore not change too much from 8(0). On the other
band, if P(O) is large, the parameter estimates will quickly jump away from 8(0).
Without any apriori information it is common practive to take,
6(0) =0; P(O) =a}
where ais a "Iarge" number.
Increased flexibility in the choice of design parameters can be acbieved if additionally to
(3.29) one assumes,
E{e(t)e(s)} =r2(t)öu
wbich results in the modified set ofupdating equations,
Parameter estimation methods 195

6(t) =6(t -1) + k(t)(y(t) _,T (t)6(t -1))


k(t) = P(t -1),(t)
r(t) +,T (t)P(t -l),(t)

P(t) = P(t -1) _ P(t -1),(t),T (t)P(t -1) + R1(t) (3.30a)


r2(t) +,T (t)P(t -1),(t)
In tbis context T2(t) describes the confidence ofthe incoming measurements.
Recursive sliding window estimators. Tbis approach utilizes rectangular sliding
windows of length nW> thus using only information contained in the last 11w sampies. In
tbis way similar behaviour to that of the forgetting factor approach is obtained. A
recursive version of a least-squares sliding window algorithm is described by Stavrakakis
and Pouliezos (1991):
Consider as usual the process described by,
y(k) =,T(k)8 + e(k); dim(8)=n (3.31)
and define,

=
,T(k)
(3.32)
and,
y =[y(I) ... y(k)]T
Furthermore, for a moving window oflength fIw, define,
,T(k-nw +l
,T(k-nw +1)]
=[ ..................... .
(/1(k,k - nw + 2)
(3.33)
Then as shown in Appendix 3.A,
6(k + 1) =6(k) - P(k + 1)[r(k + 1)6(k) - 6(k + I)]
p-I (k + I) =p-I(k) + r(k + 1)
where,
r(k + 1) =,(k+ I),T (k+ 1) -,(k- n", + I),T (k- n", + I)
196 Real time fault monitoring of industrial processes

6(k + I) =tp(k + I)y(k + I) - tp{k - nw + I)y(k - nw + I) (3.34)

It should be remembered that 8(k) is estimated using information from the last nw
sampies. Equations (3.34) form the sliding window least squares estimator (SWLSE).
Note that in tbis simple case a further reduction of p-l is not needed since only one
inversion is required. The reduction in speed is proportional to the length ofthe window
since the dimensions of P, rand 6 are independent of the window size.
The improvement in speed over the c1assical batch sliding window LSE is shown in the
operations count table for the scalar case in Table 3.1. No special methods for better
performance of individual operations (matrix inversion) are taken into account, since
these would apply equailY weil to both cases. It should be noted however that memory
requirements are not reduced, since at any one time a11 the window values must be
accessible. The sealar case considered may serve as a guideline for speed improvement
in the vector versions.

Table 3.1 Operations count for window size nw (scalar output case).

Recursive version Batch

Additions Multiplies Additions Multiplies

Estimate 3n2 4n2+2n 4)T4) (nw-1)n n,JP-


updating
Covariance n2 0 [4)T4)]-l4)Ty n2+(nw -2)n n2+nwn
updatimg
Total 4n2 4n2+2n n2(nw +1)+nnw n2+2nwn-3n

3.3.3 Implementation issues

3.3.3.1 Covariance instability

In a11 the preceding a1gorithms it is necessary to recursively calculate the estimate's


covariance matrix. Let us consider the updating formula (3.20),

P(t) =_I_(p(t -1) + P(t -l)tp(t)tp T (t)P(t


Ä.(t) Ä.(t) + tpT (t)P(t -I)tp(t)
-I)J
where A.(t) is generated by any ofthe previous methods.
Parameter estimation methods 197

Since p(t) is interpreted as a covariance matrix it should be positive definite. However


computational rounding errors may accumulate and make the computed p(t) non-
positive definite leading to numerical instability problems wbich manifest themselves as
divergence of the parameter estimates. To avoid tbis problem, factorization algorithrns
can be used to update factors of p(t) in square-root or U-D form.
Square root algorithms. A square root algorithm is based on the following de-
composition:
p(t) = Q(t)QT(t) (3.35)
where Q(t) is a non-singular matrix. Potter, (1963), proposes the following square root
algorithm for p(t):
.f(t) =QT(t-l)tp(t)
!X..t) = A.(t) + jf(t}f(t)
1
a(t) =---;=;=====:=-
P(t) + ~(P(t)A(t»)
k(t) =Q(t -1)f(t)

Q(t) =(Q(t -1) - a(t)k(t)f T (t))~A(t) (3.36)

The algorithm is initialised by,


Q(O)QT(O) = P(O)
The quantity k (t) is a normalized form of the gain vector since,
k(t) = k(t) / P(t)
As an added bonus k(t) is not necessary to be computed since,
8(t) =8(t -1) + k(t)(s(t)/P(t»)
i.e. the single division b{t)/!X..t) is computed first. If the KaIman filter implementation
(3.30) is used then,
P(t) =P(t) + R1 (t) (3.37)
where P(t) is given by (3.20) with A.(t)=1.
If a square root approach is taken, the algorithm (3.36) can be used for finding
=
P(t) Q(t)Q(t) from p(t-l)=Q(t-l)QT(t-l). It then remains to find Q(t) using
(3.37). One way to do tbis is as folIows: let R1(t) be factored as,
R1(t) = V(t)J'f(t)
where V(t) is a (nxs) matrix of full rank (recall that n=dim9). In most cases R1(t) is a
198 Real time fault monitoring of industrial processes

diagonal matrix with some diagonal elements equal to zero. In such cases it is easy to
find V(t). Then orthogonal transformations are applied to the rectangular matrix (Q(t)
V(t». The problem is to find an orthogonal matrix T(t) and a triangular matrix Q(t) such
that,
[ Q(t) I V(t) ] T(t) =[ Q(/) I 0 ] (3.38)
Then one has,
P(I) + R1(1) =Q(I)QT (I) + V(I)VT (I)

=[Q(I) I V(I)]T(I)TT(I)[Q;(I)]
C (I)

~[Q(t) I 0e:(t)] ~P(t),


as required by (3.37). The matrices T(t) and Q(t) in (3.38) can be found using a QR
jactorization or a Gram-Schmidt orthogonalization. Such factorizations are common in
numerical linear algebra for solving certain eigenvalue and least squares problems, and
have appeared in many applications. An efficient procedure for a Gram-Schmidt
orthogonalization is given in Appendix 3.B.
U-D factorisation algorithms: A U-D factorisation algorithm is based on the
decomposition,
p(t) =U(/)D(/)CJf(t) (3.39)
where U(t) is unit upper triangular and D(t) diagonal. FoUowing Bierman (1977),
consider equation (3.20) again. Then the following algorithm produces p(t) in U-D
form:
• At time t, compute k(t) and update U(t- J) and D(/- J) by performing steps 1-6.
1. Compute/=CJf(t-I)fJ(t), g=D(t-l}f, ßo=Äi...t).
2. Forj=l, ... , n go through steps 3-5 (subscripts denote matrixlvector elements).
3.Compute:
Pj =Pj-l + fjg j
D jj(l) =Pj_1Djj(1 -1) / PjÄ.(t)
Vj =gj
PJ·=-f·/p·
J J-1

4. For i =1, ... ,j-l, go through step 5. (lfj=l, skip step 5).
Parameter estimation methods 199

5. Compute,
U··(t)
1) =U··(t -I) +V·11"
1)
II .
j

6. Compute,

k(t) = Je(t) =k(t) / Pd


Ud

The scalar Pd obtained after the dth cycle ofsteps 3-5 is the innovations variance,
Pd= A.(t) + ,T(t)p(I-I)fJ(I).
The algorithm is initialised by U(O)D(O)ur(O) = P(O)
The U-D analogue of the Kalman filter updating given by (3.30) is discussed in
Thornton and Bierman (1977) and consists of the foIlowing equations:
At time 1-1, U(t-I) and D(I-I) are given, as weIl as the factorisation R1(t)=V(I)JIf(t)
with V(t) a fuIl rank (nxs) matrix.
1. Compute k(t), U(t) and D(t) by performing steps 1-6 of (3.40) (U(t) and D(t)
are the matrices called U(I) and D(I) in (3.40».
2. Define the (n+s)-column vector W}O) as the kth column of ur(t) stacked on top of
the kth column of JIf(t); k=1, ... , d.
3. Define the (n+s)x(n+s) diagonal matrix Das the block diagonal matrix formed from
D ( t) and the sxs identity matrix.
4. For} =n, n-l, ... ,2 go through steps 5-8.
5. Compute,
Ui](t) =[ w}d-j ) ]T D w}d-j )
6. For i =1,2, ... ,}-1 go through step 7.
7. Compute,
U(t)ij =[ w/d- j ) ]T D w}d-NDit)
W.(d-j+l)
I
=w.(d-j) -
I
U.IJ''I)w.(d-j)
]
8. Compute,
200 Real time fault monitoring of industrial processes

3.3.3.2 Covariance singularity

Identifiability conditions dictate that the monitored system be persistently excited in


order for the covariance matrix to be nonsingular. However, there are cases when tp(t)
becomes constant and in such cases a procedure of regularization must be employed.
Tbis happens if essentially the prediction is unaffected by changing certain linear
combinations of the model parameters. Tbis in turn implies that either the model
contains too many parameters or the input signal is not general enough. The latter cause
may be present at some time in every process, therefore remedies must be applied. Pot et
al. (1984), proposed the following modification to the covariance updating equation
(3.24):
1. Compute P'(t) defined as,
P'(t) =p(t-l) - k(t)ti..t)kT(t) (3.40)
2. Choose the forgetting factor such that,
- IftrP '(t)~lro then A.(t)=I, P(/)=P'(/)
trP"(t) 1
- IftrP '(1) < Iro then A(t) = , P(t) =-P"(t)
Uo A(t)
with,
P"'(t) =PIt(t) + 0( Diag P '(tl + trplt(t)l) (3.41)
Tbis algorithm possesses the additional property ofkeeping the trace of p(.) equal to Iro.
Note that the two last terms of the righ-hand side of (3.41) are introduced to prevent the
eigenvalues of p(t) ftom becoming too small. Favier et al. (1988) computed the P'(t)
matrix defined by (3.41) in U-D factorlzed form.
A similar problem might arise if a Kalman filter is used based on equations (3.30) or
(3.30a). One way around tbis problem is to model the parameter variations as:
9(/+1) = (l-a) 9(t) + a9it) + n(t), O~a<1 (3.42)
where 9it) represents an apriori information on 9(1). The Kalman filter bullt on the

+,
model (3.27) and (3.42) is given by,
a(t) =_1_ T (t)P(t -1),(t)
p(t)

k(t) =(1- a)P(t -1),(t)a- 1(t)


P(t) = (1- ai P(t -1) - k(t)a(t)k T (t) + ~ (t)
6(t) =(l-a)6(t -1)+a9p (t -1) + k(t)(y(t) _,T (t)6(t -I») (3.43)

When tp(t) tends to zero, 6(t) tends to 9it) and P(I) tends to (2a-a2)-IR1(t). Favier et
Parameter estimation methods 201

al. (1988), proposed a U-D factorised version of (3.43) consisting of the following
steps:
Let p(t)=P=UDlJf, ji =P(t -I) =UDffY .
1. Form the matrices,

Ua
[
=0
1 ",Tu
(l-a)U ~}D.{~
where Ua is (n+l)x(2n+1), Da is (2n+I)x(2n+1).
2. Apply the modified Gram-Schmidt orthogonalization procedure,
U a =UaT
where Uais a (n+ l)x(2n+ 1) unit lower triangular matrix, T is (2n+ l)x(2n+ 1) such
that TD.T T = D. and D. is (2n+ 1)x(2n+ 1) diagonal.
3. Write U a and Tas,

_=[1 0]
U
a u U0 '
T=
T
v n+!
where viis the i-th row vector of T. The gain k and the factors U and D are
given by,
k=u, U= Uo

IIV211~•

D=

with,

(3.44)

4. Use (3.43) to compute the parameters.


202 Real time fault monitoring of industrial processes

3.3.3.3 Speed - Fast algorithms

Computational speed is a fundamental requirement in every algorithm designed for on-


line fault deteetion. Speed improvement ean be aehieved in two ways:
• By employing special algorithms (software improvement).
• By employing special instrumentation (hardware improvement).
In what follows we foeus on the Kalman filter equations. Sinee LS algorithms are a
special form of a Kalman filter, the diseussion applies equally weIl to these eases.
Software improvement relies on fast algorithms whieh, in general, reduee the eom-
putationalload required to ealeulate the eovarianee matrix by an order of magnitude. By
looking at Table 3.2, adapted from Bierman (1977), one ean see that these algorithms
will eompute p(t) in O(dim6) operations. It should be noted however that the updating
of p(t) is only a part of the total algorithm, so the total number of operations needed per
time update is higher than that given in Table 3.2.

Table 3.2 Number of arithrnetic operations used for updating p(t) onee. The number of
parameters is n.

Method Additions Multiplications Divisions Square roots


Conventional KaIman 1.5n2+3.5n 1.5n2+4.5n I 0
equation (3.30a)
Stabilized KaIman 4.5n2 + 5.5n 4n2 + 7.5n 1 0
equation
Potter's square root 3n2+3n 3n2+4n 2 1
U-D factorization 1.5n2+1.5n 1.5n2+5.5n n 0

Fast algorithms rely on the exploitation ofthe so ealled shift structure. To illustrate this
notion eonsider (3.9) again,
y(t) + aIY<t-l) + ... + aiJ'{t-k) = b1u(t-l) + ... + bJll(t-k) + e(t)
Note that the orders ofthe polynomials are the same and equal to k. Now define,

-y(t) x(t -1)J


x(t) =[ u t ], fI(t) =[ ...
( ) x(t - k)
(3.45)
These definitions lead to the foHowing shift structure for fI: introduce the veetor,
Parameter estimation methods 203

-(t) = [X(t)] (3.46)


tp tp(t)
Then,

* t = [X(t)] = [tp(t +
tp () tp(t) x(t - k)
1)]
Here the dimension of x is 2 since scalar signals are considered and the dimension of tp is
n=2k.
Since the relevant calculations for the derivation of fast formulae are rather involved the
necessary recursive equations are only cited and the interested reader is referred to the
work ofKalouptsidis and co-workers and Ljung and co-workers.

Fast Calculation ofthe Gain Matrix


I. Initialize:
A(O) = 0, B(O)=O, Re(0)=81 (for some 8>0), k(I)=O.
2. Given A(t-I), B(t-I), Re(t-I), and k(t), update:
e(t) =x(t) + AT(t-I)fJ(t) (3.47)
A(t) = A(t-l) - k(t)eT(t) (3.48)
/Xt) = k T(t)fJ(t) (3.49)
eI(t) = (I - /Xt»e(t) (3.50)
Re(!) =A.Re(t-l) + eI(t)eT(t) (3.51)

Je-(t) =[ [Re (t)tCI (t) I 1


k(t)+A(t)[Re(t)r CI(t)
(3.52)
3. Partition k*(t) as,

k - (t) = [.~.:~:.]} 2k rows


p(t) } 2 rows
(3.53)
4. Compute,
r(t) =x(t-k) + BT(t-I)fJ(t+I)

B(t) = (B(t -I) - M(t)r (t))(l- p(t)r


T T (t)t
204 Real time fault monitoring of industrial processes

k(t+ 1) =M(t) - B(t)J.I(t) (3.54)


The algorithm computes the gain vector k(t) which is subsequently used to update the

r
parameter estimate:

o(t) =O(t -1) + k(t)(y(t) - OT (t -1)qI(t) (3.55)


Now the vector x(t) contains y(t) since the first row of x(t) equals -y(t) and therefore, by
comparison of (3.55) with (3.47), (3.48), it is found that the first column of A(t) will be
equal to 8(t). Hence, whenA(q-l)~1 the updating of8(t) is obtained as a by-product of
the algorithm given by equations (3.47)-(3.54) to determine the gain.
A count of the number of operations involved gives,
68n + 76/3 multiplications, 80n + 17 additions,
which is an order ofmagnitude less (in n) than previous algorithms (Table 3.2).
This fast algorithm has therefore a distinct advantage over conventional methods, when
the model order n is high: it is much faster and requires much less memory.
Speed improvement using special hardware is usually based on implementations on
parallel processors (transputers). These implementations require rewriting ofthe original
algorithms so that full advantage of parallelism is taken. There are three important
considerations involved in the implementation of an algorithm on a parallel architecture:
• Subdivision of the algorithm ioto parallel processes.
• Interprocess communication.
• Efficient execution of each process.
Maguire and Irwin, (1991), consider transputer implementations of three forms of the
Kalman filter algorithm: the conventional filter, a mixed covariancefmformation filter and
a square root filter. Two approaches were compared, one involving heuristic partitioning
of the conventional algorithm and another employing a new strategy for mapping the
systolic array descriptions onto parallel processors. Simulations on the T414 and T800
transputers showed a marked superiority ofthe systolic array approach. Rhodes, (1990),
adopting a control systems viewpoint, proposes a Kalman filter decomposition based on
a system representation which is a direct sum of observability subspaces. Kalouptsidis
and Theodoridis, (1987), developed a highly concurrent implementation for LS
algorithms requiring O(dime) computing time and O(dime) processors. The basis of the
formulation are Levinson-type recursions, while dot products are computed via a
scheme based on Schur's algorithm. Finally, Stavrakakis et al. (1990) have implemented
the recursive sliding window estimator on an INMOS transputer, and applied the whole
scheme to fault detection in robotic systems. This example is elaborated in the ap-
plications section.
A different strategy would be to code the Kalman filter algorithm directly onto hardware.
Parameter estimation methods 205

Yeh, (1991), implemented a conventional and a U-D filter on a DSP32C processor using
assembly and C languages.

3.3.3.4 Data weights selection

As explained in previous sections the selection of the quantities that affect the con-
tribution of past data to the calculation of the estimate, plays a crucial role in the fault
detection process. In fact, it may be argued that the tuning of these parameters is the
weak link in the detection chain since usually they can be assigned only by simulation. In
a parameter identification context, especially for slow-or rapidly-varying systems, the
role of the weighting parameters is weil understood. Consider for example the process
model,
y(t+ 1) =a(t) y(t) + e(t+ 1)
where,
a(t) =-0.9 for t< 100.
= -0.3 for f? 100.

This model is a typical fault situation, where the estimated parameter jumps in a step-Iike
fashion. Less rapid changes can usually be weil approximated by aseries of step changes.
In Figure 3.3 the outcome of the estimation procedure using (3.20) for three different
values of A is shown. A clear conclusion is therefore drawn: the larger the value of A the
better the quality of the estimate but at the expense of slow response. On the other hand
smaller values of A result in poorer estimates but which are reached faster. A useful rela-
tion of A and effective window length is given by,
I
To = - -
I-A
This means that data older than To time units have a weight less than e-}~36% ofthat of
the most recent data. Analogous conclusions can be reached for the a1gorithm of (3.30)
for parameter R}(t). However in this case R} lies on the opposite size ofthe scale, close
to O. Window length for sliding window a1gorithms can also be determined by similar
arguments since A is directly related to effective data length.
To develop appropriate weight sequences for fault monitoring schemes one has to
observe that fault monitoring differs from time-varying system identification in that the
quality of the parameter estimates has not the same significance at every time instance. In
fault monitoring, parameters are assumed known until a change occurs, therefore until
then the most important operation is change detection. Following a positive change
decision, the part of parameter identification becomes important, while change detection
may be even suspended. This is acceptable if subsequent faults can be assumed to happen
in longer intervals than these needed for parameter estimation.
206 Real time fault monitoring of industrial processes

This reasoning points to the use of two different weighting sequences: a pre- fault and a
post-fault one. The whole procedure is shown in Figure 3.4. Two problems remain:
how to decide on a fault occurrence and how to change the weighting sequence. The first
question will be answered in subsequent chapters. Let us concentrate on the second.

Figure 3.3 Effect of different forgetting factors on the quality of estimates.

When a fault is detected, the gain in the estimation algorithm should be increased. This
means that p(t) should be increased. This can be achieved in many ways, but there are
mainly two methods that have been previously used. The first one is to decrease the
forgetting factor Ä. The growth of p(t) is then nearly exponential. This approach can be
implemented by the variable forgetting factor method of F ortescue et al.. , (1981). The
necessary recursions are:

1. Prediction j(t)=fJT (t -1)8(t -1) + u(t -1)


2. Error s(t) =y(t) - y(t)

3. Gain
k(t) = P(t -1)fJ(t -1)
1+ fJT (t -1)P(t -1)fJ(t -1)
4. Estimate 8(t) =8(t -1) + k(t)s(t)
Parameter estimation methods 207

5. Forgetting Ä(t)=I-(I-flT(t-l)k(t»)e 2 (t)/uo

note: if Ä(t)<Änun then set Ä(t)=Ämin


6. Covariance pet) = _1_(1_ k(t)flT (t -1)P(t -I») (3.56)
Ä.(t)
This strategy is based on the requirement that the imformation content of the filter
expressed recursively as,

u(t) =Ä.(t)u(t -I) + (1- fiT (t -1)k(t»)e2 (t) (3.57)

is kept constant and equal to~. In other words, the amount of forgetting will at each
step correspond to the amount of new information in the latest measurement, thereby
ensuring that the estimation is always based on the same amount of information. Thus
from (3.57),
Ä(t) = 1- 1/N(t)
where,
N(t) =Uo / (1- fiT (t -1)k(t»e 2 (t)

is the equivalent asymtotic memory length if Ä.=Ä(t) were to be used throughout the
estimation. Since ~ is related to the sum of the squares of the errors, one possible
guideline on how to choose it is to express ~ as,
~=uNo
where u. is the expected measurement noise variance based on real knowledge of the
process. Then N will control the speed of adaptation as it corresponds to a nominal
asymptotic memory length. Simulations have shown that if ~ is chosen in this manner,
then for a stationary process,

and,
E{HT}~HT

where HT is a steady-state information matrix related to system dynamics and choice


of~. Further details ofthis are given by Ydstie (1981). The sensitivity of the system is
govemed by the choice of No. A small value of No will give a large covariance matrix
and a sensitive system; a larger value will give a less sensitive estimator and slower
adaptation.
An exact solution to the problem of keeping c(t)=~ would give an algorithm where the
forgetting factor is updated before the gain calculation in step 3, and would involve
solving a quadratic and hence more complex relationship for Ä(t). The practical
difference between this and the algorithm above is in most cases small, but when the
simpler algorithm is used, a test on Ä(t) must be introduced to prevent this from
208 Real time fault monitoring of industrial processes

becoming too small or even negative. A discussion of some of the convergence


characteristics of the algorithm is presented by Cordero and Mayne (1981).

AUOIIrMH MnM

MIIM nNsnl~ln

.1

".,
rsruuloN 01
"ODU UTlHUU

UUU DlC I$ION


unD ON UAI'

luoaUMH MUM
NIX!
'OM SlNSUIYln
srn

HEXT
STU
!
t
lStlHUION 01

S UUMHIIS

.~ UUS

YES

U"HAtlON 01
nocESS
UUHETUS 1

Figure 3. 4. A unified strategy for fault detection based on parameter estimation

A typical simulation is iIIustrated below. The system to be identified is described by the


equation,
Parameter estimation methods 209

y(t)-1.2y(t-l )+0.2y(t-2)+0.02y(t-3) = u(t-l )-0.62u(t-2)-0.03u(t-3)+e(t)


where e(t) is a sequence of pseudo-random numbers with zero mean and standard
deviation 0.005. To simulate one aspect of a nonlinear plant, after aperiod of operation
the setpoint is changed and the system model is modified simultaneously to:
y(t)-l.4y(t-l )+0.3y(t-2)+0.05y(t-3) =u(t-l )-O.5u(t-2)-0.08u(t-3)+e(t).
The results shown in Figure 3.5 indicate the behaviour of the forgetting factor and
estimates. Under steady-state operations A(t) is dose to one and the estimator behaves
very much like an unweighted filter. During the setpoint change the system description is
changed; tbis results in a poor fit between model and process, and a smaller forgetting
factor is produced. After a while, the parameters are "retuned" and the forgetting factor
returns to its former value dose to 1.

OA+----J
:J
0.001 ' , ' ,
OUTPUT

, ,
so 100 ISO 100 250 300

I~~~~~~~~-~~~
-- .,
....
PARAMETER ESTlMA~----- ",

1.20

0.80

0.40

O. ~----------------------
0.8

0.40 RESIDUAL ERIlOR

0·ooI~--~SO~-"'1~00:---:-::15:-0-2~OO::----:-2SO~-:3-:-:-00
TIME (second.)
·0.40

Figure 3.5. Simulation ofself-tuning estimator with variable forgetting factor ~=O.0125.

This feature is extremely usuful in a fault monitoring situation: if a fault occurs all
information prior to the fault is really useless. The small value of A(t) at tbis instant
actually automatically "restarts" the algorithm at the time of the fault. A drawback of
tbis approach is that "restarting" is introduced to faulty and non-faulty parameters
simultaneously.
210 Real time fault monitoring of industrial processes

The second method is to add a constant times the unit matrix to I'(t) in which case it is
increased instantaneously. Equation (3.20) will then be substituted by,

P(t) =![P(t -I) _ P(t -l)fI(t)fI(t)T P(t -I)] + P(t)]


Ä. Ä. + fI(t)T P(t -1)fI(t) (3.58)

where ~t) is a nonnegetive scalar variable which is zero except when a fault is detected.
When a fault is detected, a positive ~t) has the effect of increasing I'(t). The final
problem is to choose a suitable ~t). When no fault is detected, ~t) is zero. When a fault
is detected, it is reasonable to let ~t) depend on the actual value of I'(t) and on how
significant the alarm iso This may be done in many ways, and the following proposal is
just one possibility.
In the noise-free case, the progress of the estimation error, when 8(t) is constant, is
given by,
8e(I) = 8e(1-1) - I'(I)tp( I)t{t)
=(I - I'(t)tp(t)flT(/»8 e(t-I)
= U(t)fJ e(t-l)
All eigenvalues of U(t) are one, except the one corresponding to the eigenvector
1'(/)tp(t). This eigenvalue determines the step length in the algorithm. A small eigenvalue
causes large steps, while an eigenvalue close to one means that the step length in the
algorithm is small. Using (3.58) the eigenvalue can be written as:
1- rpT (t)P(t)rp(t) =1- fiT (t)P(t -1)fI(t) p(t)flT (t)fI(t)
Ä. + rpT (t)P(t -1)rp(t)

Ä.
= Ä. + rpT (t)P(t -I)rp(t) - p(t)flT (t)fI(t)

When /X..t)=O, the elgenvalue is thus,


Ä.
vo(t) =---:::-------
Ä. + fiT (t)P(t -1)fI(t)

The eigenvalue is obviously between zero and one as long as P>O. Suppose now, that an
eigenvalue equal to v(/) is desired when a fault is detected. Then ~t) has to be chosen as,
P(t) = T I (vo(t) -v(t»)
fI (t)fI(t)
The eigenvalue v(/) should lie in the interval,
O<l'( t):5 vo(t)
in order to keep 1'(/) positive definite. In practice, this choice of IX..t) must also be
combined with a test for nonsingularity of flT(t)tp(t).
Parameter estimation methods 211

It remains to determine a suitable v(t). One choice is to model v(t) as a piecewise linear
function ofthe significance ofthe fault a1arm (Figure 3.6).

Vo t - - - - - - - - , .

o
Figure 3.6. Choice of v(t)

3.3.5 Robustness issues

The preceding sections indicate that most, if not a11, traditional approaches to estimation
problems assume that the model is perfect and hence the only source of modeling error
that is considered is due to process or measurement noise. However, in practical
situations, it is frequently the case that the major source of modeling errors is the
approximate description of the system's response. Tbis gives rise to the need for robust
(in the presence of modeling errors) parameter estimation techniques. The importance of
tbis problem has a1ready been discussed in Chapter 11, with reference to residual-based
fault detection methods.
Efforts have also been made to produce robust parameter-estimation methods that could
be used in fault detection applications. Tbis is a very important development, since it is
weil known that if model mismatch exists, it is the source of false alarms, a condition that
will invalidate even the most sopbisticated a1gorithm. Weiss, (1988), computed an
uncertainty bound in the frequency domain wbich accounts for modeling errors. Tbis
bound is then used to construct a test variable for fault detection. Carlsson et al. (1988),
developed a related strategy in wbich the unmodeled response is embedded in a
stochastic process, so that bounds can be computed on the unmodeled errors.
In tbis section two approaches to robust parameter estimation-based fault detection are
briefly summarized: the "black box" approach suggested by Wahlberg (1990) and the
"embedding" approach ofCarlsson, extended by Kwon and Goodwin (1990). It must be
noted that both approaches are in fact off-Iine and have been mainly used for model
validation.

The "stochastic embedding" approach.


212 Real time fault monitoring of industrial processes

This approacb provides robust fault detection wben unmodeled dynamics, linearization
errors and noisy inputs are present. It fimctions by calculating upper bounds for
parameter estimates in tbe face of tbe previous anomalies.
Tbe model mismatcb is described by incorporating into tbe system description additive
unmodeled dynamics. System dynamics are described by tbe general linear model of
Equation (3.7). Two models are needed: tbe nominal model, witb transfer function,

G( -1'8) = B(Z-I;8) (3.59)


q, A(Z-I)

and tbe unmodeled dynamics model witb tranfer function GL1(q-l). Tbe system output
tben satisfies,
y(k) == [G(q-l) + GL1(q-l)]u(k) + v(k)
== B(q-l; 8)uP(k) + rt..k) (3.60)
wbere v(k) models measurement noise ofzero mean and spectral density t/J(OJ) and,
1
uF(k) = I u(k)
A(q- )
TI(k) =Gß. (q-I)u(k) + v(k)
Equation (3.60) can be represented in standard linear regression form as,
y(k) == V(k)8 + rt..k) (3.61)
wbere,
fJ(k) == [uP(k-l) uP(k-2) ... up(k-nB)] (3.62)
Tbe parameters are estimated using ordinary least squares as,

(3.63)
wbere,
.==[qJ(I) qJ(2) ... qJ(N)]T (3.64)
Y == [y(I) y(2) ... rt..N) ]T (3.65)

8e =8-8=
A [
.T.]-1
From (3.61) and (3.63), tbe following expression for the estimation error is derived:
.TS
wbere,
s == [ rt..l) rt..2) ... rt..N) ]T (3.66)
Denoting tbe impulse response ofGL1 as {ho(.)}, rt..k) can be expressed as,
Parameter estimation methods 213

k
l1(k) = L ho (i)u(k -I) + v(k)
i=O

Then the following relationship is obtained from (3.66):


S=fJIH+V
where,
v = [ v(1) v(2) .... v(N) ]T
u(l) o o
tp = u(2) u(l) o

u(N) u(N -I) u(1)


H= [h(O) h(l) .,. h(N-I)]T (3.67)
Apriori knowledge is assumed to be available which a1lows to give a prior distribution to
{h(.)}. This is known as stochastic embedding, ie. the unmodeled dynamics are
embedded in a stochastic process (Goodwin et al., 1989). In the present context,
knowledge of the mean and covariance function for these distributions is adequate.
Given these, the expected value of the estimation error can be calculated. This will be
the basis ofthe fault detection method.
Assume that two sets of data, In and 1ft corresponding to normal and suspected faulty
operation are available. In this situation, estimated parameters 8 and corresponding
nominal transfer function G(z) may differ for the two cases:

fJ = {~n. for data set In (3.68)


(Jf' for data set I f

-1 " _
{
G"n(q
-I
) =Gn( q -1 ,(J" n) for In
G(q ,(J) - " -1 ( -1 " ) (3.69)
Gf(q ) = Gf q ,(Jf for I f

The fault detection procedure now amounts to comparing fJn and fJf ( or On and 0 f)
and deciding if the observed changes can be explained satisfactorily in terms of the
effects of noise or undermodeling. If not, then it may be concluded that a system fault
has occurred. The covariance functions of (On -Of) and (On -Of) under nonfaulty
conditions will be used as measures of the uncertainty due to noise and undermodeling.
An upper bound for the covariance of(On -Of) is given by Kwon and Goodwin, (1990),
as:
214 Real time fault monitoring of industrial processes

where,

and a;
is the equivalent noise variance corresponding to the upper bound S of the power
spectrum of v.
The first term on the right side of (3.70) accounts for the effects of undermodeling and
the difference in input signals for the two experiments. Note that if there is no
undermodeling or if the inputs are identical, these terms vanish. The second term
corresponds to measurement noise. The higher the SNR (signal-to-noise ratio) is, the
smaller the norm of this term. The matrix C can now be used to formulate appropriate
test variables for fault detection. For example, one may use,

(3.71)

(3.72)

These test variables are based on a comparison between the observed value and the
[8 8 ][8 8
expected value of n - f n - f ] T . If the test variable is larger than a fixed
threshold, then tbis is evidence that a fault has occurred.
Uncertainty bounds in the frequency domain can easily be obtained by extending the
previous results. Kwon and Goodwin, (1990), have shown that the expected value of
the difference between the estimated transfer functions in two nonfaulty experiments is
given by,

=V(m, n) C V*(m, n) (3.73)


where,

8 .
... - G(e}fIJIJ ,fJ)
]T
8fJ
=
m [IDJ CI'l .. . ro"JT
and G is the nominal transfer function, Cis given by (3.70), n is the number of fre-
Parameter estimation methods 215

quencies and * denotes conjugate transpose. Based on tbis result, test variables
analogous to tbe ones ofequations (3.71), (3.72) can be devised.

Evaluation of test variables require knowledge of (1;


and R. Tbis knowledge can be
acquired from experiments performed on non-faulty operation. For example, (1~ can be
estimated by supplying a constant input and recording tbe output variations around tbe
mean value. For R, simplification may be necessary. For example, it may be assumed
tbat,
E{h(k)h(j)} = r(k)ökj
where
r (k) -- 2 -fJk.,
(1oC k ,). -- 0, 1, ••• (3.74)
In (3.74), 21ß can be considered an average time constant for tbe class of unmodeled
dynamics. With this simple description, the parameters of (3.74) can now be estimated
from experiments of non-fauIty operation.
Tbis idea can also be extended to nonlinear situations by including terms wbich will
account for linearization errors around the operating point. In tbis situation, the system
output, given by equation (3.60), will be,

y(k) =G(q-l)u(k) + GA(q-l)u(k) +GDL1 (q-l)u 2(k)sign(u(k» +v(k) (3.75)

where GnA accounts for model mismatch due to input nonlinearity. Proceeding similarly
to the linear case, bounds in the parameter space are:

where,

u 2 (l)sign(u(I» o o
u 2 (2)sign(u(2» u 2 (I)sign(u(1» o

u 2 (N)sign(u(N» u 2 (N -1)sign(u(N -1) u 2 (1)sign(u(1»

Rn =E{HnHJ}
Hn=[hn(O) hn (1) ... hn(N-1)Y
In the above, hik) denotes the impulse response of GnA and the remaining variables are
as in (3.70). An example oftbis procedure will be given in the applications section.
Black box identification approach.
This method, proposed by Wahlberg, (1990), aims at providing robust fault detection
216 Real time fault monitoring of industrial processes

when underrnodeling exists, ie. when a finite order structure is used to model systems of
infinite order. This is accomplished by using flexible models of a finite order in the
frequency domain, which approximate the process in the frequency range of interest.
The approach uses the high order black-box identification theory ofLjung, (1987).
The situation is again viewed as a standard hypothesis testing problem: given two data
sets, In and Iß decide if they relate to the same underlying system. Decision is made
based on the estimates of the system parameter vector 6 obtained from the two data sets.
The general model described by (3.7) and parameterized by 6eR" is used:
y(k) =G(q-l; 6)u(k) + H(q-l; 6)e(k) (3.77)
Equation (3.77) is supposed to model time-invariant exponentially stable linear system
with additive noise of the form,
=
y(k) Gr(q-I)u(k) +v(k)
where v(k) is the part of the output that cannot be explained from the input signal in
open-Ioop experiments. It is a zero mean, stationary stochastic process with spectral
representation,
co
v(k) = LhT(T)e(k-T); hT(O)=l
T=O
where {e(k)} is a sequence of independent random variables with zero mean and
variances Är.
The two hypotheses are then given by,
A A

Ho: (Jn and (Jf relate to the same system


H I : Ho is not true
To decide in a robust way between these two hypotheses, firstly a robust estimate of 6
must be obtained and then a suitable test statistic must be devised. Let (3.77) be
rewritten in transfer function form as,
B( -I) C( -I)
y(k) = q u(k) + q e(k)
A(q-I) D(q-I)
whereA(q-I), C(q-l) are given stable polynomials of fixed degree and
B(q-I) =b1q-1 + ... + bnq-n
D(q-I) =1+ d1q-1 + ... + dnq-n

Then the parameter vector is,


6=[b d .t]
Based on N system measurements, the system model is estimated by the two-step
procedure:
Parameter estimation methods 217

I. Estimate B(q-l) by minimising the criterion,

~N i[L(q-I)(Y(k) - B(q~:)
k=l A(q )
U(k)J]2

where L(q-l) is prefilter used to modifY the bias distribution.


2. Determine D(q-l) by minimising the criterion,
N 2
J... L[D(q-I)'y(k)]
N k=l
where,
- k - 1 k _ B(q-
Al) u k J
y( ) - C(q-l) ( y() A(q-l) ( )

The spectral density estimate is then given by,

where,
A 1 N A 2
Ä, =- L[D(q).Y(k)]
N k=l

A suitable test statistic is then the following:

T= ±[IE(CV )1 +IF(CVm)n
m
2

m=l

E(cv )=G(e'o)· 8 )-G(e"'. 8 )x(n1 ~~(e~.)+!2..~:(e~.. »)-1I2


N N
m , D 'f
1
~l(
u e '0).) 2
~2(e1OJ_)
u

(3.78)
where NI> N2 are the sampie sizes for the two experiments, nl' n2 are the estimated
orders of B(q) and U ~n(nl' n2)' The input spectral density estimates may be given
by a-priori information or estimated using a smoothed DFT. By noting that under Ho,
E(cvm) and F(cvm); m=l, ... , f, are asymptotically jointly complex normally distributed
with zero mean and identity variance, the test variable T is ;Cl distributed with 4f
218 Real time fault monitoring of industrial processes

degrees of freedom. Hence for a given confidence level a, a confidence interval can be
calculated. If T falls outside tbis interval, a fault is declared.

3.4 Decision rules

Following the parameter estimation phase of the fault monitoring algorithm a decision
has to be made, based on the estimated values, on whether a change/fault has occurred.
There are two basic approaches to tbis problem:
• decide based on the estimated values ofthe model parameters 8i
• decide based on the calculated values of the process parameters Pi using the
inverse relationsbip (3.6).
It should be stressed that decisions made using the first approach do not probibit in-
ferences about the state of the process parameters. As an example consider the following
set ofrelations amongst 8 andp. (pouliezos and Stavrakakis, 1989):
81 = R '(}2 = KmN '(}3 =!'(}4 =- Km '(}5 =..E.,86 1_
= __
L L L Jm Jm JmN
resulting in,
() 1 () () 1
Pt =R = 82 ,P2 =L=e' P3-
-K - 2 P -J - 2_
m-(}N' 4 - m--(}(}N--llN
3 3 3 3 4 176

(}()
Ps = P = __2_5_
(}3(}4 N

Reasoning backwards, a change inp3=Km would appear in (}2, (}4 and a change inpl=R
in () 1. It is obvious therefore that by constructing a table of interrelations, inference on
change ofpi can be made by simple observations ofthe (}j.
Let us examine these two approaches in detail.
Decision based on 8. Hägglund, (1984), proposes the following procedure for
detecting a change in one or more ofthe estimated parameters:
Define the variables,
A8(t) = -8(t) + 8(t -1) (3.79)
w(t) =rjw(t -1) + A8(t) 0 ~ rl < 1 (3.80)
In the case a fault has occurred, w(t) can be viewed as an estimate ofthe direction ofthe
parameter change. The test sequence that will be studied is (s(t», where s(t) is defined
as,
s(t) =sign [A8(tl w(t -1)] (3.81)
Parameter estimation methods 219

The sign function makes the test sequence insensitive to the noise variance. It is now
clear in principle how to carry out the fault detection:
• Inspect the latest values of s(t): if s(t) is +1 unlikely many times, conclude that a
fault has occurred.
Under normal operation, Le. when the parameter estimates are close to their true values,
s(t) has approximately a symmetric two point distribution with mass 0.5 each at +1.
When a fault has occurred, the distribution is no longer symmetric, but the mass at +1 is
larger than the mass at -1. To add the most recent values of s(t), the stochastic variable
r(t) defined as
,(t) = r2r(t-l) + (l-r2)s(t); 0~r2<1

is introduced. The sum of the most recent values of s(t) is replaced by an exponential
smoothing in order to obtain a simple algorithm. When the parameter estimates are close
to the true ones, r(t) has a mean value close to zero. When a fault has occurred, a
positive mean is expected. The parameter r2 determines, roughly speaking, how may s(t)
values should be included. For example r2=0.95 corresponds to about 20 values, which
is a reasonable choice in many applications. A small r2 allows a fast fault detection,
although at the price of reduced security against false alarms. When the signal to noise
ratio is small, it is not possible to detect the faults as fast as otherwise. It is then
necessary to have more information available to decide whether a fault is present. This
can be achieved by increasing r2'
For values of r2 close to one, r(t) will have an approximately Gaussian distribution with
variance,
U _l-r
__ 2
2-
1+r2
Since r2 is generally chosen in this region, it will be assumed that r(t) has a Gaussian
distribution. If r(t) exceeds a certain threshold '0, a fault may be concluded with a
confidence determined from the value of the threshold. In the present algorithm, the
threshold can be computed directly as a function of the rate of false alarms ft If a false
alarm frequency equal to.lj is acceptable, a fault should be declared every time r(t) is
greater than the threshold '0, defined by,

~ r o] = ~
1 rexp[ - -x2] dx =f E
2
p[ r(t)
(27t)u IO 2u
(3.82)
If a small value of the threshold is chosen to make it possible to detect faults quickly, the
false detection rate will be high. This is seen in (3.82) where there is an inverse relation
between '0 and ft The determination of '0 by this method has the advantage that it is
formulated in terms of the expected frequency of false detections, which may be chosen
to suit any particular application.
220 Real time fault monitoring of industrial processes

Tbis procedure is not sensitive to changes in noise level. This characteristic is not found
in other fault detection methods. If this property is not needed, study of the quantity
w(t)Tw(t) can result in more powerful tests. However, it does suffer from the fact that it
cannot locate the faulty parameter. To overcome tbis shortcorning, the following test
based on the time dependancy of the estimated parameter can be performed:
Given the time-dependent quantity a(k) define the function.f{a, NI> N2) as:
1 N2
f(a,NI ,N2 ) = La(k)
N2 - NI + 1k=N1
Use the following detection criterion for each estimated parameter ej , iE[I,n],

IBs/t) - BL,j(t)1
lj(t) = I(}L,i (t)I
where,
Bs/ t) =f( ej , t - N s + 1, t)
BL,j(t) = f(e j, t - N L - N s + 1, t - Ns)

and Ns and NL satisfying O<NS<NL denote the short- and long-term estimator
computation windows respectively. Then adecision on fault occurrence can be made
according to the value of J;(t) as folIows:
If J;(t) ~Jimin \iiE[I,n]: no change
If Jimin <J.-(t) < Jimax for some iE [l,n]: alarm for ith parameter
If Ji(t»Jimax for some iE [1,n]: change in ith parameter
Tbis procedure suffers however from the difficulty of choosing the following parameters:
Ns , Nb Jimax, ~min; i =1, ... , n. These may be chosen through simulation studies only
since their probability distributions are not known. Furthermore the conducted tests
showed that tbis method is difficult to implement because the detection criterium value
fluctuations are much greater after model changes, making the choice of thresholds even
more difficult.
Decision based on p. After calculation of the physical process coefficients consider
p(k) as a gaussian vector with its components statistically independent and its
realizations p(i) and p(j) in different sampie instants i<>j statistically independent. It is
assumed that the mean vector p(k) = E{p(k)} and the covariance matrix,

are invariant for the non-error case, i.e.


Parameter estimation methods 221

p(k) = p, eCk) =diag [o}, ... , 0";]; all k


Under these conditions the joint probability density function over N sampies is defined as,

[(.0(1), ... , p(n») = n[(pU»


N

;=1
A fault will be declared by a significant deviation of the mean p; and/or variance
B; o[ .0; (k), the ith component of p(k) , from the non-error vaIues Pj, 0";.
This is a c1assical hypothesis test problem and can be handled by the formulation of
(m+ I) hypotheses H j , O$,i~:
no fault in the mean and /or the variance of pU = 0)
{
H; = fault of type i (significant deviation of mean and / or variance of .0;
Each hypothesis H j can be associated with a gaussian conditional density function, where
p(Hj ) and dl(Hi ) denote conditional mean and variance for hypothesis H j . Therefore, the
non-error case is described by Jl(Ho), dl(Ho), whereas Jl(H;), dl(Hj ), 1<km, describe a
fault of parameter i.
Fault detection and localization is possible by computing appropriate logarithmic
likelihood ratios. The algorithm used here for estimating the non-error statistics and the
fault detection and 10caIization is described by Geiger (1986), where a common window
technique is used in order to calculate the relevant statistics of the fault occurrence.
However, the procedure employed is non-iterative and this greatly increases
computational time. An iterative procedure described by Pouliezos and Stavrakakis
(1989) is:
(i) For the traininglnon-error case, Ho:
a) Jl;(k) = ~[(k -I)Jl;(k -1) + .0; (k)], i = 1, ... , m, k = 1, ... ,Ns
k
The aIgorithm is initiaIized by Jl; (1) = .0; (1).

b)
2
0";
k- 2 2 1 (A
(k)= k_10";(k-I)+'k p;(k)-Jl;(k-I)
)2 i =l, ... ,m, k=2, ... ,N •
s

The aIgorithm is self-initialized by,

0";(2) =],,(.0;(1)- p;(2»)2.


2
The sampie size, Ns' is chosen so that an accurate estimate for the above statistics
is obtained.
(ii) For the error-case, Hj , ;=1, ... , m, three quantities are needed:
222 Real time fault monitoring of industrial processes

a) Window sampie mean

itj(k) =_1_ ±pj(J)


Nw j=k-Nw +l

= itj(k -1) - _1_ Yj (k); i = 1, ... , m


Nw
where,
Yj(k) = pj(k - N w ) - pj(k); i = 1, ... , m
b) Window sampie variance

u;(k) = _1_ ±(p/i) - itj(k»)2


N w j=k-Nw +1

= u;(k -1) +_I_[2 Yj (k)it j (k -1) __I_ YJ (k) - pJ(k - N w ) + pJ(k)]


Nw Nw
c) Window sampie variance based on non-error mean

aJ(k)=_I- ±[Pj(J)_,uj]2
N w j=k-Nw +l
These three iterative schemes need a starting window of Nw sampie values Pj in order to
be initialized. The window size, NWJ is chosen so that reasonable rates for missed alarms
and false detections are achieved. The advantage of using a small moving window of
sampie parameter values Pj ( k) is in the improved speed of detection. This is offset,
however, by an increasing false detection rate.
A fault in the ith parameter is declared at time k, ifthe quantity,

A-(k) = N w [aj(k) _ 2In(Uj (k)]


1 2 ~(k) ~(k)
-1]
exceeds a predeterrnined threshold in M consecutive time instants.
Altematively the comparable quantity may be:
A.(k) = aj(k) _ 21n(U j (k)]
1 (Jj(k) (Jj(k)

resulting in less operations per time update. The threshold may be set by defining an a-
priori probability of no-fault PiO ' Then a suitable test is,
Parameter estimation methods 223

Aj(k»ln~
< l-ljo
An alternative discrimination measure is the Kullback discrimination index (KDI)
proposed for fault detection purposes by Kumamaru el al. (1988). The KDI can be seen
as distortion measure to compare two probability density functions. In fault detection
situations one pdf refers to an interval of no-fault, while the other to subsequent
intervals where fault monitoring is in effect. If these intervals are denoted I}, h, their
estimates 0)j2(/) their points numbered by {l, 2, ... , N}, {I, 2, ... , I} respectivelyand
parametric models of the form given by equation (3.7) are considered, the KDI can be
written as,
I N [12]=Jp(Y /8 U )In p(YN ~81,UN_I) dY
t' N I' N-I p(YN /8 2(t),UN _I ) N

where Yj ,Uj are data collections up to point i for interval Il> and p is the likelihood
function. The index can be calculated iteratively using,
I:" [1,2] =I:" [1,2]<2) + I:" [1,2](3)
where,
N 2
I:" [1,2](2) =.!. LIIHil(GI - G2 )u(k)1I
2 x=} A-1

/:"[1,2](3) = ~ 2~i f(Hil (z)HI (z) -1] x [HiI(z-I)HI(z-I) -1] ~


the index i denotes identified models using 0, , and A=a2 is the noise variance.
Once the index has been calculated, it leads to adecision problem using a threshold n:
t
IN n=>
> {OI ;# O2 : fault
81 =82 : no fault
A A

<
The value of the threshold can be determined according to the statistical properties of
the KDI. Under the normal situation Söderström and Kumamaru (1985), have shown
r
that all the terms involved in the iteration have asymptotic distributions with degrees
equal to the dimension of the parameters included in their expressions. This index has
performed quite weil in adaptive control schemes, where a sensitive fault index is
required because of the system's adaptation properties. In those cases monitoring of the
estimation history alone is not enough to trigger fault alarms.
224 Real time fault monitoring of industrial processes

3.5 Practical examples

In tbis section several examples of fault detection methods using parameter estimation
applied to real technological problems will be presented. These examples use different
mixtures of modelslestimationldeeision approaehes and thus present an interesting
framework for eomparison.

3.5.1 Evaporator fault detection

Dalla Molle and Himmelblau (1987), have applied real time parameter estimation
teehniques for fault deteetion in an evaporator. The eomplexities of a real evaporator
have been simplified, so that the model reduees to,

dX1 =F-(wx1 +EJ-V (3.83a)


dt

~: [ßFX F + (V - F)(x2 - TB)]/X1 (3.83b)


where,
[Xl X2] = [ W T]
V = [VA(Ts-1) - FCiT-TF) - QL]/L1Hv
and,
VA : (heat transfer coefficient)x(area ofheat transfer),
Ts : steam temperature in steam ehest,
TB : normal boiling point of solvent,
Cp : heat capacity of solution,
TF : temperature of feed system,
QL : rate of heat loss to the surrounding,
Mlv : heat of vaporization of solvent,
Qs : total rate ofheat transfer from steam,
w : eonstant (0.6),
ß : boiIing point elevation per mass fraetion of solute,
Ec : eonstant (0.1).

Figure 3.7 shows the rest of the notation.


Here the states of the model are the hold-up (w) and temperature (1) and two pa-
rameters of interest for proeess degradation are the heat transfer eoeffieient VA and the
composition of feed xp
Parameter estimation methods 225

••por

r..
hit

CE, x, r, Cp )
Figure 3. 7. Evaporator configuration and
notation

As the heat transfer surfaee becomes fouled or scaled, the heat transfer rate is decreased
and the emcieney of the proeess is reduced. On the other hand eomposition at the input
of the evaporator could be useful in determining if the previous unit was operating
properly.
To illustrate the types of trajeetories that oeeur for the two parameters, the following
faults were simulated:

UA Xp

% change in value -10.0 ramp -20.0 square

Starting time of change (min) 75.0 165.0


Stoppingtimeofchange(min) 375.0 258.0

Noise was added to the process measurements to represent randomness, and was also
introduced into the inputs. For the simulations all process parameters were assumed to
remain constant (except for the fault parameters). The standard deviations ofthe noise
factors are listed in Dalla Molle, (1985).
Two fault detection methods were used:
a Least squares with jorgetting jactor. As they are, equations (3.83) are not
suitable for applying the standard L.S. procedure (3.20). However, as shown by Dalla
Molle, (3.83) ean be put into the form:
S(k)p = b(k) + (k)
where,
b(k) = [x(k) - x(k -I) ]/r - Ax(k) - Bu(k) - r(k)
226 Real time fault monitoring of industrial processes

and s(k), r(k) contain the coefficient terms of the fault parameters and other non-linear
terms respectively and 6 is the discretization time constant. Then the parameters p(k)
can be estimated from the following L.S. equations:
p(k + 1) = [I s - U(k)V(k)S(k + 1) ][p(k) + U(k)b(k + 1)]

=(1 / A)[RT(k)R(k)r ST (k+ 1)


U(k)

V(k) = [In + S(k + l)U(k)t

[R T(k+l)R(k+l)r =(1/ A)RTR(k)rU(k)V(k)UT(k)

r
The initial values for the algorithm are given by,

p(O) = [R T(0)R-1(0)b(0»); [R T(O)R(O)r =[ST (O)S(O)


The results ofthe simulations are shown in Figures 3.8,3.9.
These results demonstrate that the least squares estimation scheme is valid even when
two faults occur simultaneously. However, it can be seen in Figure 3.9 that for abrupt
changes such as that applied to xF, the estimate responds slowly and convergence to the
new value takes nearly 60 minutes. The rate of convergence of the estiamte could be
increased by decreasing the weighting factor, but the variance of the faster response
would be larger. The effect of making process measurements at discrete intervals rather
than continuously is a slower response in the estimates. A larger sampling interval
reduces the effect of measurement noise on the approximation of the derivative of the
states because the approximations are averaged over a longer time.
Rence, the variance of the estimates does not necessarily increase with the slower
response due to sampling. The speed of the response for the discrete case can also be
increased by decreasing A, but only at the expense of the variance at a constant sampling
interval.
72.5

7·,·-illMl~~.

67.S

55.'

62.S

Figure 3.8. Estimate of UA for ..t=O.95; (--) Estirnate; (- - -) Measured value.


Parameter estirnation methods 227

0.032

0.030
XI'

0021

-
0.026

0.824
100 200 ]00 .00 TDO! (MIN)

Figure 3.9. Estimate ofxFfor A=O.95; (--) Estimate; (- - -) Measured value.

b. KaIman Filter: By modeling the parameter evolution as,


dp(t)/dt =p(t) + wit)
where wit) is N(O, Qp(t», and augmenting this differential equation to the state

H
equation, the following system is obtained:

*. =[;] f(X,;,p,t G(J[~p,t) 0][ W(t)]


I p wp(t)
(3.84)
To implement an extended KaIman filter for (3.83) using the representation (3.84) initial
conditions must be supplied for the states, parameters, and the error covariance matrix of
the augmented state vector. For the states and parameters the design or normal operating
values can be used as the initial conditions. The initial error covariance matrix is
assumed to be diagonal with large values of the elements to express uncertainty in the
initial values of the states and parameters. In addition to the initial values, the noise
covariance matrices R(t) and Q(t) must also be supplied. Normally, the measurement
noise covariance matrix, R(t), is assumed to be diagonal. The variances of each
measurement can be guessed or estimated from sampie output values. The input noise
covariance matrix, Q(t), is also assumed to be diagonal. Although the values or"the
elements in R(t) and Q(t) might be obtained from process measurements, the filter is
usuaUy "tuned" to dynamics of the process so that the response for the parameter
trajectories is reasonably fast and their covariance matrix elements are of reasonable size.
Figures 3.1 Oa and 3.1 Ob illustrate the trajectories of xF and UA for the Kalman filter in
which confidence limits (P=O.95) have been put in place, based on the period of normal
operation.
The simulation runs demonstrated that there was a neeed for some heuristics in the
analysis of the estimates to avoid misdiagnosing nonexistent faults when more than one
fault occurs at a time. Factors such as decision rules, confidence coefficients and so on
were obtained based on the selected filter parameters and the dynamics of the process.
228 Real time fault monitoring of industrial processes

UA
12.S
(a) '.'34
?t •• ..I3i!

67.5 •. nl
xr
&5.' '.te1

62.S '.t2&

&I.'
I" i!M 31'
T'ftE (ft'"'

Figure 3.10 EKF estimate of UA and xF with confidence intervals


(-) Estimate; (- - ) True value; (- -) Confidence limits.

3.5.2 Gas turbinejault detection and diagnosis

Gas turbine performance degrades over time due to the influence of many effects
including tip clearance changes in the rotating components, seal wear, blade fouIing,
blade erosion, blade warping, foreign object damage, actuator wear, blocked fuel nozzles
and sensor problems. In some applications, such as in the commercial transport field, the
availabily of reliable cruise data facilitates the use of performance trending techniques for
alerting maintenance personnel to emerging problems. However, the successful
implementation of any trending techniques to gas turbine performance data still depends
very largely on the skill and experience of the operator, especially when trying to
diagnose some faults to module or line replaceable unit level. This situation is futher
exacerbated in the military area because combat aircraft, in particular, seldom operate
with their engines in a steady-state condition for extended periods. Thus, the selection of
a suitable data capture window to provide maintenance personnel with reliable steady-
state data is often difficult without resorting to dedicated tests, either on the ground or
in- flight. In view of this, it would be convenient if operational transient engine data
could be used for assessing engine condition and for diagnosing some of the more
difficult engine faults.
Current generation military aircraft are often equipped with an Engine Monitoring
System (EMS) which can be configured to capture selected engine data under certain
conditions. These conditions include, durlng each take-off and in-flight if one or more of
the measured parameters exceed predetermined limit values. The take-off data, in
particular, has the potential to provide a consistent data base for assessing engine
condition provided the analytical means are available for extracting the fault information.
Parameter estimation methods 229

Because these data comprise engine accelerations from part-power positions, the current
steady- state methods for assessing engine condition are not suitable. Methods have been
developed for extracting fault information from gas turbine transient data (Baskiotis et
al., (1979), Carlsson et al., (1988), Henry (1988), Smed et al. (1988». However, these
methods suffer in their ability to detect small changes which usually accompany the
presence of degraded engine components.
A L.S. method of analysing transient engine data based on the stochastic embedding
principle has been implemented by Merrington et al., (1991). This method, which has
the potential to detect the presence of degraded engine components from the actual EMS
take-off measurements, folIows.
Exact models of aircraft engines are highly nonlinear (Merrill, 1984) and thus simplified
linearized models are usually employed (Dehoff et al., 1978). For example, taking the
engine fuel flow WF as the input and the fan-spool speed NL as the output, an
appropriate Iinearized nominal model is given as folIows:

AN L (t) = bleP + bOe A W F (t) (3.85)


p 2 + fieP + {Oe
where p denotes the differential operator.
Taking noise and linearization errors into consideration, the underlying system can be
described by the following discretized model:
ML(k) =G(q-I).t1WP(k) + GnL1(q-I)[.t1Wp(k)]2 + v(k) (3.86)
where,

( -I) B(q-I,e) 1
lW- +b2q-2
Gq = A(q-I) = 1+8Iq-1 + 82q-2
The denominator A(q-I) is determined from apriori information about the system, e.g.,
approximate values of dominant poles or by some prior estimation experiments.
Using this system description the system ouput has the form,

(3.87)
where,
I 1
1'1 WF = ( ) 1'1 WF(k)
A q-I

1](k)=G1lL1[I'1W~(k)r +v(k)
Equation (3.87) can be put in the standard regression form of (3.61) if,
230 Real time fault monitoring of industrial processes

fI(k) = [A W;(k -1) A W;(k - 2)]


O=[q b2 ]

Two noise-fTee non-faulty data sets (CLF6 and CLF61) and a faulty data set (LTEF)
with a -2% change in the low pressure turbine efficiency were chosen for the study
(Figure 3.11). Note that LTEF has the same operating point as that of CLF6 but that
CLF61 has a different operating point with a very similar output as LTEF.

ClFS 8IId CLFSI lTEF


78 76

74
,, .. -----------
.....
z ,,
72
, ~
,,
I

70
J
68 68
0 2 4 I 0 2 4 6
Me Me

1.25.----r--"T"""-..., 1.25 .
1.2 1.2
1.15
~ 1.1 ~ 1.15

1.05 ... 1.1

1.05
o 2 4 I 0 2 4 8
Me

Figure 3.11 Non-faulty data sets (--CLF6; - -CLF61) and faulty data set (LTEF) in aircraft
engines.

Using the data sets and the theory of section 3.3.5 appropriate test variables for fault
detection can be formulated. For example, equations (3.71), (3.72) may be used:

[ A A]T C- I[On-Of
1J=On-Of
A A] (3.71)

(3.72)

The following constants were chosen: sampling period Ts=0.02, number of data points
N=350, a; = 0.152 (with a reference value of 100%) and the input L1WF was assumed to
be corrupted by white noise with variance a~ = 0.003 2 (with a fuet range ofO to 1). The
fixed denominator was taken by prior experiments as al=-1.8238 and a2=0.8294 and the
values of ßn and a; as ß = 0.0837 and a; = 0.0818.
n

Simulation results for test TI are shown in Figure 3.12 and summarized in Table 3.3.
Parameter estimation methods 231

Note that 100 trials were conducted with different noise realizations. These results show
that this fault detection method works very weil even under the effect of linearization
errors.

2000 , . . - - - - , . - - - . . . ,

.= 1000

500

50 100 Figure 3.12 Simulation results


\IIal' LNFl: no-fault; -NF2: no- fault; ... F3: fault)

Table 3.3 Cases in aircraft engine fault detection simulation

Case no. Experiment n Experimentf Tl


NFI CLF6 CLF6 2.45±2.42
NF2 CLF6 CLF61 9.67±6.18
F3 CLF6 LTEF 1459.88±85.07

3.5.3 Fault detectionjor electromotor driven centrifugal pumps

The early detection of process faults is especially atttractive for engines. In this example
a centrifugal pump with a water circulation system, driven by a speed-controlled direct
current motor is considered (Figure 3.13, after Iserman, 1984). The goal is to detect
changes (faults) in the d.c. motor, the pump and the circulation system based on
theoretically derived process models and parameter estimation.
The dynarnic models of the d.c. motor, the centrifugal pump and the pipe system are
gained by stating the balance equations for energy and momentum and by using special
physical relationships. In order not to obtain too many parameters, appropriate
simplifications have to be made, as lumping of more than one process coefficients
together, e.g. the mction coefficients of the motor cFMI and the pump cFPw and the
torque coefficient gw ofthe pump.
232 Real time fault monitoring of industrial processes

Figure 3.13
----_:~-----------~--~~
----------------- M

Scheme of a speed controlled d.c. motor and centrifugal pump.


M mass flow, angular velocity, T torque, U voltage, I current, R resistance, L inductance.

The resulting four basic equations will be used for parameter estimation in the following
form:
(a) Armature circuit:
dI}(t) =aItM}(t) +aI2 Aw(t) +qAUI(t)
dt
(b) Mechanies of motor and pump:
dw(t)
- - =a2I AII (t) +a22 Aw(t) +a23 AM(t) (3.88)
dt
(c) Pipe system:
dM .
-dt =a33AM(t) +d3A Y(t) (3.89)

(d) Pump (specific energy Y):


A(Y) =~(t) + hMA.M(t) (3.90)
The parameters are,
RI 1 I[J I[J I[J
all =-14' q = 14' a12 =-14' a 21 =(i= (JM +(Jp

a22 = cF} ;gw, =-g;, ah =-::


a23 d3 = a~ (3.91)

Astate variable representation,


x(t) =Ax(t) + bu(t)
y(t) = Cx(t)
Parameter estimation methods 233

can be given with the following definitions:

[MI(t)] [all a l2

x(t) = A ~(t) , A= a~l a 22 a:,]


AM(t) a 32 a 33
Lill(t) 0 0
Aw(t) 0 1 0
y(t) = C=
AM(t) 0 0
AY(t) 0 ~ hM

A block diagram of the modeled system is given in Figure 3.14. The parameters of
(3.87)-(3.90) can be estimated by bringing them into the form of (3.9) and applying the
least-squares method. The simple case of the d.c. motor and pump with c10sed valve
and measured signals AU!> All and Aw will be considered.
In this case M(t)=O, so that only (3.87) and (3.88) are to be used. Both equations are
written due to (3.9),

where,
YI(t) = dIl(t)/dt; Y2(t) =dw(t)/dt
V/{(t) = [AII(t) Liw(t) AUI(t)]

9{(t)=[all al2 ~]
"y (t) =[A1 (t) 1 Li w(t)]
Or (t) =[a21 a 22 ]

Using (3.91), the following five process coefficients can be calculated based on the five
A A

parameter estimates (J1 and (J2:

RI =-a12~ =-all I ~
~ =11 ~,
,p =-a12~ =-a12 I~, e=,p I a 21 =-a12 / ~a21
234 Real time fault monitoring of industrial processes

I
I I
L.--ARMATURE CIRCUIT -~I. - - MECHANICS -----0·-+1-·-PIPESYSTEM -----{
I D.CMOTOR : D.C.MOTOR
AND PUMP
l I

Figure 3.14: Block diagrarn of the linearized d.c. motor-pump-pipe system

CFI =CFMI + CFMw =0:220=0:22 0:12 / ~0:21 (3.92)


Hence, all process coefficients which describe the linearized dynarnic behaviour can be
calculated. However, the fiiction coefficients of the motor cFMI and the pump cFMw and
the moments of inertia ()M and ()p are lumped together so that only their sum can be
gained. If not the dynamic behaviour, but only the static behaviour could be identified, L j
and () could be not obtained. This shows that by identifying the dynarnics, more
parameters can be estimated and therefore more process coefficients can be monitored.
A disadvantage of the linearized dynamic relationships is that the coefficients cFMO for
the adhesive fiiction do not appear. However, it can be shown that with the assumption
that the fiiction torque only depends linearlyon the speed,
T~t) = cFMO + cFMI w(t)
Tpp(t) = CFPO + cFPI w(t)
«()p + ()M)( dw(t)/dt) = 'FII(t) - cFO - cF1 w(t) (3.93)
CFO = cFMO + cFpO
cF1 =cFMl + cFPI
Then the absolute values w(t) and h (t), and not their deviations, are used and the
estimation of cFO also becomes possible (Geiger, 1982).
Experiments were made with a centrifugal pump driven by a speed controlled d.c. motor.
The technical data are,
(a) D.C. motor:
maximum power Pmax = 4kW
maximum rotation speed Nmax = 3000 rev/rnin
(b) Centrifugal pump, one stage:
maximum total head H max = 39m
for Nmax = 3000 rev/rnin
Parameter estimation methods 235

(c) Pipe system:


length = 10m
diameter, dj =50mm
The d.';. motor is controlled by an a.c./d.c. converter with cascade control ofthe speed
and the armature current as auxiliary control variable. The manipulated variable is the
armature current U1. For the experiments the reference value W(t) of the speed control
has been changed stewise with a magnitude of 2% of Nmax> i.e. 60 rev/min every 60s.
The measured signals were sampled with sampling time To=2ms over aperiod of 2s, so
that 1000 sampies were obtained.
After 2s measurements the parameters were estimated off-line, using the recursive least-
squares method with state variable filters for the determination of the time derivatives.
As the noise is negligibly small, the parameter estimates can be assumed to be unbiased.
In order to obtain the adhesive friction coefficient, (3.93) has been used,
dw(t)
()-- ='P/I(t) - cFO - CFIW(t)
dt
together with,
~ = d/I (t) =UI (t) - RtfI (t) - tpw(t).
dt
for the armature circuit. Therefore the deviations of the signals have to be replaced in
(3.91) by their absolute values. The process coefficients are obtained by (3.92) with
CFO = -0:23 9 in addition.
In Figures 3.15-3.18 results of the parameter monitoring are presented. Figure 3.15
shows the step responses after a speed setpoint change. The resulting process
coefficients after astart of the cold engine (Figure 3.16), indicate that the armature re-
sistance increases during the first 10min, the flux linkage decreases during 20 min and the
friction torque coefficient decreases during the first hour. Hence, small changes of the
process coefficients can be detected. Figures 3.17 and 3. 18 show the reaction on
artificial changes (faults). A significant change of the armature resistance estimate is
detectable after a 7% change (Figure 3.17). The effect of tightening and loosening the
screws of the pump packing box cap is clearly seen in Fig. 3.18. More details are given
in Geiger (1982). Results of more experiments including multiple hypothesis testing for
the fault decision are described in Geiger (1984).
236 Real time fault monitoring of industrial processes

018
Figure 3.15 Step responses for a change of
015 the speed setpoint.
012
u)=U)/U J, arrnature voltage U )=60V
i)=/)/l). armature current, 1)=O.5A
09 W=Wl/W 1> angular velocity, w 1=62.83s- 1
(~600 rev min- 1)
03 06 09 12 15 18
[sec) I

q,

Figure 3.16 Process coefficient estirnates


after start of the cold engine.
R, R) arrnature resistance, IJI flux linkage, cFO
(FO
friction coefficient.
0 20 40 60 80 100 120 140
I minI t

09 Figure 3.17: Change of armature


circuit resistance.
08L-~--~--~--~--L-~~~-
o 40 80 120 160 200 240 280
[mln)1

4
~,
2 I
I
(fo

,
,-Ioosenmg
Figure 3.18: Change of pump
o 20 BO 120 160 200 240 280 packing box friction by tightening and
Imin Jt
loosening ofthe cap screws.
Parameter estimation methods 237

3.5.4 Fault detection in power substations

Stavrakakis and Dialynas, (1991), have used recursive least squares estimation with
forgetting factor and hypothesis testing techniques on the process parameter values, for
improving the reliability performance of power substations. Following a positive fault
decision, the sub station is reconfigured according to a detailed fault tree. The fault
detection methodology adopted was applied to the following power substation
components:
A. Power transjormers, modeled by their one-phase equivalent circuit, described by,

V,. =R1I.+ 1.
dL· - M
_1
dIo
- (3.94)
1 1 ~e dt dt
dIi dIo
Vo=M dt -R2Io-~edt (3.95)

where,
V;, Vo : actual input (primary) and output (secondary) voltages,
I j , 10 actual input (primary) and output (secondary) currents,
R 1, R2 primary and secondary winding resistances,
LI, L 2 primary and secondary winding self-inductances,
Lm mutual inductance between windings on the same core, and,
L L
M =-1!!., 4 =~ + L m ~ =L e + -1!!..
a a2
The faults that most frequently arise in practice in the power transformers, were
classified as folIows:
1. Failures in the magnetic circuits (cores, yokes and clamping structure).
2. Failures in the windings (coils and minor insulation and terminal gear).
3. Failures in the dielectric circuit (oil and major insulation).
4. Structural failures.
By monitoring the estimated values of R1, R2, LIe> L 2e> M and performing a hypothesis
testing using the likelihood ratio test, a change in these parameters can be detected,
leading to adecision regarding one ofthe failures 1-4, described above.
B. Substation lines and cables, modeled by their equivalent one-phase circuit which
neglects entirely the susceptance and leakance, and is described by the simple first order
differential equation,
dI·
v· =v,o +RI· +L_
dt
1
1 1
(3.96)
The most important failures occurring on the lines or cables of power substations are the
238 Real time fault monitoring of industrial processes

short circuits which are generally due to insulation breakdown. By applying the
previously described method on the parameters R and L of this model, short circuits can
be detected and localised early, in this way avoiding further degradation ofthe system.
e. Synchronous generators. The model used corresponds to an unsaturated
cylindrical-rotor machine under balanced polyphase conditions, and is described by,

Bf =Vo + Talo + Ls -dIo (3.97)


dt
where,
Vo actual value of terminal voltage,
Ef actual value of the excitation voltage,
La synchronous reactance (constant at constant frequency),
ra armature resistance.
Here, deviations of La' ra from their nominal values, will indieate a voltage failure ofthe
ac synchronous generator, which is a result of an open in the field cireuit, an open in the
field rheostat or a failure of the exciter generator. The loss of field excitation to a
generator operating in parallel with others, causes it to lose load and overspeed. High
armature eurrent caused by the high voltage differential between the armature and the
bus, and the high currents induced in the field iron and field windings by the armature
current, will cause rapid heating of the apparatus. This is avoided, in the case offailure,
by the fast detection which the proposed method provides.
Substation configuration after the diagnosis of a failure condition. After the
diagnosis of a failure condition on a substation component has been deduced by the
previously described methodology, the circuit breakers which surround the component
are instructed to open. System restoration follows with automatic or manual switching
sequences which aim at minimising the effects of the outage by returning to service the
healthy components and circuits as quickly as possible. The effect that the outage will
have on the continuity of supply to each circuit load-point can then be assessed. Two
types ofload point failures can generally be recognised: (1) totalloss of continuity which
causes no load to be supplied; (2) partialloss of continuity which causes part of the load
to be supplied. Load-point supply restoration can be achieved by closing components
(breakers and/or isolators) which are in an open condition to provide alternative routes
for power supply. These routes can be deduced from the list of available normally open
paths leading to the load-point of interest from all the sources. An approach must
therefore be developed to identify the suitable substation conflgurations and evaluate the
supplied load. In most cases the outages being considered can be assumed to be of first
order but a more complete assessment of substation operation would require the
simulation of up to third order outages so that common mode faults can be also
considered. The outages which may occur in power substations can be generally divided
into the following six categories according to the type of the available restoration
procedures:
Parameter estimation methods 239

Category A: Outage on components belonging to the incoming or outgoing circuits. The


circuit node can not be reconnected to the substation because no alternative route is
available. However, if the fault has occurred on an outgoing circuit component,
alternative restoration procedures for other load- points may exist.
Category B: Outage on the isolators belonging to the interlocking scheme of the
substation. More than one alternative restoration procedures may exist and can be
deduced from the list of the isolators (branches) belonging to the same interlocking
sequence with these taken out. In the case that the outage being considered is of a
second or third order, the possible alternative restoration procedures are the respective
second and third order combinations of all the isolators (branches) belonging to the
corresponding interlocking sequences. If an outage occurs on isolators 16 and 24 in the
substation of Figure 3.19 four possible alternative procedures are available by operating
either on the pairs ofisolators (17,25), (17,25), (18,26) as can be seen from Table 3.4. If
abusbar configuration contains two or more bus-tie breaker schemes, one of the
schemes is normally in operation while the breakers of the other schemes remain open. In
the event of an outage on a component of the scheme being closed, one of the other
schemes may become effective to provide additional routes of supply.
Table 3.4 Interlocking scheme ofsubstation ofFigure 3.19

No. Isolator sequence Branch sequence


1 7,8 2,3
2 9,10 4,5
3 33,34 15,16
4 35,36 17,18
5 62,63 28,29
6 64,65 30,31
7 87,88 41,42
8 89,90 43,44
9 16,17,18 7,8,9
10 24,25,26 11,12,13
11 42,43,44 20,21,22
12 50,51,52 24,25,26
13 70,71,72 33,34,35
14 78,79,80 37,38,39
15 95,96,97 46,47,48
16 103,104,105 50,51,52
240 Real time fault monitoring of industrial processes

Category C: Outage on the busbar section(s). All the incoming or outgoing circuits
connected to these busbar sections are disconnected and each circuit can be transferred
to any of the available busbars by closing the appropriate breakers and isolators.
Category D: Outage on the components (breaker, isolator) of the busbar sectionilising
branches. After isolation of the outage, the respective busbars are divided into two or
more parts not directly connected to each other. If this substation configuration is not
operationally accepted, all the circuits connected to the affected busbar sections can be
transferred to other busbars with the same restoration procedure followed after the
occurence of an outage on busbar sections (Category C).
Category E: Outage on components belonging to a branch containing a transformer.
Since the power supply from the superior to the inferior voltage level is decreased,
alternative restoration procedures may exist and can be deduced from the list of the
transformer branches being open.
Category F: Outage on the remaining substation components. Alternative restoration
procedures may exist.
The basic steps of the developed algorithm for deducing the suitable substation
configuration after the diagnosis of a substation abnormality are the following:
(i) Consider the detected faults and simulate the corresponding outage.
(ii) Depending on the outage category:
(a) For outage category B, detect the isolators (and their corresponding branches)
belonging to the same interlocking sequence with these taken out. Deduce their
second and third order combinations of the outage contains two and three isolators
of such type respectively.
(b) For outage category E, detect the sub station open branches containing
transformers.
(c) For outage categories F and A on outgoing circuit components, detect the
substation open branches not considered in steps (a) and (b) and their second and
third order combinations.
(d) For outage categories C and D, detect the breakers and isolators which may
close to transfer the disconnected circuits to healthy busbars. Deduce all the
alternative restoration procedures by considering the substation interlocking
shceme.
(iii) Deduce the list of possible alternative restoration procedures by combining the
relevant switching actions obtained in step (ii).
(iv) For each circuit load-point to be considered:
(a) Read the paths from data base.
(b) Identify the closed and open paths.
(c) For each open path deduce the order ofits discontinuity by counting the con-
tained open components.
(d) For either total loss of continuity (no path in operation) or partial loss of
continuity (one or more paths in operation, the supplied load less than required),
Parameter estimation methods 241

consider an the possible alternative restoration procedures and for each ofthem:
• Detect the paths which can be closed by considering only the paths with order of
discontinuity less than or equal to the order of the procedure.
• If one or more paths can be closed, evaluate the load supplied to the load-point
being considered by performing a load-flow on the modified sub station configura-
tion. This configuration contains a limited number of nodes since an the sub station
busbars connected to each other by branches having zero impedance are linked
together.
In order to illustrate the increased and more meaningful information for sub station
opearation that can be achieved using the described computational techniques, a typical
4001150 KV high voltage sub station was analysed. The sub station employs the tripie
busbar scheme for all system busbars and its detailed one line diagram is shown in Figure
3.19. It consists of 34 nodes, 58 branches and 109 components while its interlocking
scheme is shown in Table 3.4. An opearational sub station configuration was studied by
assuming the breakers and isolators status shown in Figure 3. 19. Source points are as-
sumed to be the circuit busbar L8 and the generator busbar L17 while load-points are
the nodes L25 and L33. The minimal paths leading to each load-point from an sources
were deduced and retained in compact form in a data base. Finally, parameter estimation
methods and hypothesis testing on the process parameters were used to deduce the alter-
native restoration procedures which are available after the diagnosis of faults on the sub-
station components. The category of each fault and the components to close are shown
in Table 3.5. For category E and F faults, it has also been assumed that breakers 40 and
93 are open and 32 and 86 are closed.
Table 3.5 Substation configuration after restoration of supply

Fault on com- Fault category Alternative restoration switching operations


ponent

Components to elose Additional routes of


suuply to L33
101 A - no
104 B 105 no
103 yes
62 B 63 yes
86 yes
64 B 65 no
85 yes
24 B 25 yes
26 no
82(L28) C 103 yes
105,65 yes
53 D 26,8,10 yes
67 E 40,93 yes
66 E 40,93 yes
14 F 40,93 yes
242 Real time fault monitoring of industrial processes

10
IgT 20

txI
/21
LB

22
109

1
45

23H47
X 48
ll7
46

-.!!-
L5

10 52
II I

2a l l!

L3 LIO
29

18

41

2?.
"~
45t
38 LI5
37
91
97
I
l52 94 x 93

72

l20 lZ8
57 82

Ll9 58 L27
83
105

x 7& 101
36t }75 49t J100
73 1 74
L25
J99
98
L33

Figure 3.19: Detailed one line diagram of a typical high voltage substation. L5, node; x, circuit
breaker (c1osed); <8}, circuit breaker (open); /, isolator (c1osed); 0, isolator (open);
CD, transformer; -. line; e. generator.

3.5.5 Fault diagnosis in robotie systems

Stavrakakis et al., (1990), describe a fast fault detection system for robotic D.C. motor
drives. The detection system is implemented on a commercially available parallel proc-
essing machine.
Using the global dynamic model of a 3 degrees of freedom robotic manipulator derived
by Tzafestas and Stavrakakis (1986), the state-space representation for the actuator of
the i'th link of the robot can be written as,
Parameter estimation methods 243

(3.98)
where,

where,
V; applied armature voltage,
TL; disturbanee torque referred to the link side of the drive shaft,
i; armature eurrent,
OJ; shaft angular velocity referred to link side of the drive shaft,
N; gear ratio,
Jm; moment of inertia of drive rotor,
Km; eleetromeehanical constant of the motor (the baek-emf eonstant is
equal to the torque constant),
R[ armature resistence,
L; armature inductanee,
p; viseous frietion eoefficient.
The subseript i denotes the ithjoint ofthe robotic manipulator. Define,
e _Rj e _KmjNj e __1_ e __ K mj
I - L.' 2 - L. ' 3 - L. ' 4 - J .N.'
I I I ml I

(3.99)

i.e.
eT = [~ ()2 t% ()4 ()S ()6] E R6
The following variables are measured for eaeh motor:
• armature eurrent,
• angular velocity,
• armature voltage,
• shaft torque.
The former two are the system outputs, whereas the latter are the system inputs. Input
and output signal measurements are available at discrete times t = kTo, k = 0, 1, ... ,N, ... ,
where To is the sampling time, defined as i;(k), OJ;(k), V;(k), Tr;(k). The following obser-
244 Real time fault monitoring of industrial processes

vation equations are therefore obtained:

yfl) ="T (k)6a +Cj(t)


y~l) = 'Ir (k)Ob + C2(t)
where,

"r(k) = [_yT(k) u1(k)]ER 3

"r(k) = [_yT(k) U2 (k)]ER 3

The fault detection a1gorithm for tbis case consists ofthe following steps (tasks) carried
out at every sampling instant k:
Measurements: Measure ij(k), w;(k), Vj(k), TuCk) and compute the derivatives iP) (k)
and wj1) (k) by a third order backward formula:

ijlJck) = (1I2h){3ij (k) - 4ij (k -1) + ij(k - 2)}

w~1) (k) = (1 / 2h){ 3wj (k) - 4w j (k -1) + Wj(k - 2)}

where h= To is the sampling interval.


Task J: Perform one iteration ofthe recursive least squares (RLS) parameter estimation
algorithm for parameters,

Task 2: Perform one iteration ofthe parameter estimation algorithm for parameters,

ol = [04 05 06]
Task 3(a): Calculate the physical parameters p;(k), i = 1, 2, 3 from the previously com-
puted estimates Oa and 0busing,

PI
( k) - R -
- j- °
·°1
3
(k)
(k)' P2
(k) - L - _1_
- j - (} (k)' P3
3
(k) - N K _ 02 (k)
- j mj - (k) ° 3 (3.100)
The case of a fault occurrence into the gearbox is considered as an event with prob ability
O.
Task 3(b): Redefine the data window by accepting the new estimates
A(k), i = 1, 2, 3, dropping the oldest estimates pAk - N w -1) and recalculating the
real time parameter mean and variance estimates (i.e. the parameter statistics are esti-
mated over the Nw+1 most recent parameter estimates). The recursive ralations described
in Section 3.4 are used.
Parameter estirnation methods 245

Task 3(c): Compute the likelihood ratio for the hypothesis detection problem.
Task 3(d): Decide on whether a fault condition exists. The decision is taken by compar-
ing the likelihood ratio obtained in Task 3(c), against a predefined threshold. To avoid
false alarms, the fault condition is signalied if the threshold is exceeded in M consecutive
instants. The optimal threshold value and M are best chosen by trial and error using
simulation.
Task 4: Perform Tasks 3(a) to 3(d) for parameters pik), Ps(k), using,

(k) - N 2J - °2(k) (k) - N 2 __ O2(k)Os(k)


P4 - i ml - °3 (k)04(k) , Ps - i Pi - °3(k)04(k)

The above procedure assumes that the algorithm is run initiallyon a fault free d.c. motor.
From this run the non-error statistics are obtained and are used subsequently in Tasks
3(b), 3(c), 4(b) and 4(c).
The effectiveness ofthe method was verified using simulated data. For this purpose the
d.c. motor robotic actuator parameters were chosen as,
R = 1.04 Q, L = 0.00089 H ,Km = 0.0224 Vsec/rad,
Jm =0.00005 kgm2, P= 0.005 kgm2/sec, N= 64.
A 2KHz sampling frequency is considered. The non-error statistics are calculated using
Ns = 300 sampies, whereas the detection window was Nw = 50. The first parameter es-
timate to be used by the detection procedure was taken at time k = 70, giving a large in-
itial sampie. The likelihood ratio fault detection threshold value is 11.2 and M = 10.
From sampie time k = 1 to 130, the normal operating d.c. drive was simulated. A simu-
lated fault occurred at k = 131, indicated by a 4.8% change in the armature resistance R;
(Rif= 1.09 D). A recursive least squares (RLS) estimator with a forgetting factor of Ä =
0.95 for estimating 0a' and Ä = 0.99 far Ob is used. All estimates converge quickly to
their respective true values. The exact estimated values are shown in Table 3.6.
A major factor for the success of the algorithm is the 2KHz sampling rate. This means
that the algorithm must be implemented on a computer capable of performing all the
above calculations in O.5ms. The above procedure however is suitable for implementa-
tion in commercially available parallel processing machines e.g. the INMOS transputer
system. This algorithm was implemented in a system employing four processors operar-
ing as two-stage pipeline as shown in Figure 3.20. At the input a measurement unit M
feeds the first two processors. At the output, a fault decision unit operates as aseparate
unit, having however a light computational load and it is therefore a low cost processor
system. The numbers shown in Figure 3.20 correspond to the tasks performed by each
machine according to the task partition described earlier. This implementation forms a
2-stage pipeline where its first stage consists of machines 1 and 2 and its second of ma-
chines 3 and 4. The FD machine which is underutilised by the algorithm, leaves power
246 Real time fault monitoring of industrial processes

for suitable presentation of the results. The computational complexity (i.e. the multipli-
cations and divisions per recursion, MADPR) is 30 for each estimator and 60 for the de-
tection procedure.

Table 3.6: True and estimated values for test run

2 2
Rl LI Km l Nl 'mINI PINI
Truevalue 1.09 0.000890 1.4336 0.2048 20.48

Estimated 1.10 0.000896 1.4476 0.2038 20.82


values at
k-300

d~)E'~~~ Figure 3.20: Four processor


real-time computer implemen-
L,. ,~ tation of DC-drive fault detection
algorithm

3.6 Additional references

The field of fault detection based on parameter estimation techniques is vast. The pre-
ceding sections present only a small sampie of what has been developed. The interested
engineer can look at the relevant references for more information. Some additional work
follows in summarised form.
The team around Iserman has published several reports of application of parameter esti-
mation fault detection methods to industrial processes. Reiß (1991), developed models
for drilling processes and applied the LS algorithm to the detection of tool wear in two
machining centers. Wanke and Reiß (1991) and Reiß et al. (1990), applied similar tech-
niques to milling machine drives. Janik and Fuchs (1991), used a singular decomposition
technique to enhance LS estimation performance in order to detect tool wear and grind-
ing chatter of grinding processes. Neumann (1991), applied a two-step identification
algorithm for the estimation of a parametrie signal model with ARMAX structure.
Signal spectra are then used for fault detection of machine tools. Freyermuth and
Iserman (1991), have combined parameter estimation techniques with statistical feature
classification methods. This idea was tested on detecting malfunctions of sensors, actua-
Parameter estimation methods 247

tors and gears in industrial robots. Iserman (1991) and Iserman et al. (1990), proposed a
general hybrid framework for machine fault detection using parameter estimation tech-
niques with knowledge processing. Finally, Cho et al. (1992) studied the detection of
broken rotor bars in induction motors. This was done by estimating the rotor resistance
from measurements of stator voltage, stator current, stator excitation frequency and ro-
tor velocity.

Appendix 3.A

Using the definitions (3.33), it folIows,

,T(k-nw +2)

,T (k + 1)
I
The iteration for ~ ~ k is considered first.

and,

T
~k+l~k+l = [~ T( k,k - nw + 2) If)(k + 1)][~(k,k - nw + 2)]
---'--T---"---'-
, (k +1)

=,(k + I),T (k + I) + ~T (k,k - nw + 2)~(k,k - nw + 2)


=~I~k +,(+I),T(k+I)-,(k-nw +1),T(k-nw +1)
248 Real time fault monitoring of industrial processes

where,

Secondly, the iteration on tPkYk is considered. Define,

y(k)
then,
y(k-D w +2)

Yk+l =
=[y(k,k - Dw + 2)]
y(k+l)

y(k+l)
Hence,

and,

T
tPk+lYk+l =[tP T( k,k-Dw +2) I.,(k+l)J[Y(k,k-D w +2)]
--'---=T-~----'-
Y (k+l)

= .,(k + I)y(k + I) + tP T (k,k - Dw + 2)y(k,k - Dw + 2)


= tPkYk + .,(k + I)y(k + 1) - .,(k - Dw + l)y(k - Dw + I)
= tPkYk +6(k + I)
where,
fi...kt 1) =tp(k+ I) y(kt 1) - tp(k-nw+ I) y(k-nw+ I)
Now,
Parameter estimation methods 249

Defining,

P(k + 1) = (4JI+I4Jk +1t l


which is the eovarianee ofthe estimate 9(k + 1), yields,

9(k + 1) =P(k + 1)[4JIYk + 8(k + 1)]


=P(k + 1)[p-l (k)9(k) + 8(k + 1)]
=P(k + 1)[p- l (k + 1) - r(k + 1)9(k) + 8(k + 1)]

=9(k) - P(k + 1)[r(k + 1)9(k) - 8(k + 1)]


and,
P-l(k+l) = P-l(k) + r(k+l)

Appendix 3.B

Modified Gram-Schmidt orthogonalization algorithm.


The triangularization problem ean be stated in the following general form:
Given:
Ax=b
whereAE.RnXrxRn, xERn, bE.Rn+r , find an orthogonal tranformation Tsueh that

where Wis upper triangular (or equivalently, find Wand b '= Tb direetly).
The following algorithm is a numerieally improved adaptation of the classical Gram-
Sehmidt orthogonalization proeedure. When eomputations are made exaetly (no round-
off) the result is equivalent to the classical Gram-Sehmidt result. However, when round-
off errors oeeur, Björek (1967) has shown that the MGS proeedure is mueh more aeeu-
rate. The algorithm ean be derived from the classical Gram-Sehmidt orthogonalizing
proeedure, and is essentially the classical Gram-Sehmidt proeedure in reverse order
(Kaminski, 1971). The MGS algorithm is stated in a form whieh eomputes Wand b' di-
reetly.
MGS Aigorithm: For k= 1, ... , n eompute,
250 Real time fault monitoring of industrial processes

_ ~ATCk)ACk)
(Jk - k k

O' } =1, ... , k-l


{
W - (J }=k
k
(lj' • k +1
J= , ... , n
kj - )ATCk) A(k)
(Jk k j'

b~ =_1 AJCk)bCk)
(Jk

k W ·
A~k+l) = A C.k ) - - " Aik ), }=k+l, ... , n
J J (J k

bCk+1) =bCk) _ b~ AY)


(Jk

(here single suffix denotes column of matrix and double suffix denotes element of ma-
trix). If (Jk =0 at any stage in the algorithm, then the rank of A is less than n.

References

Aström K.J. and P. Eykhoff(1971). Special Issue, AUTOMATlCA, 7, 123- 162.


Aström K.J. and B. Wittenmark (1971). Problems ofidentification and control. Journal
0/Mathematical Analysis and Applications, 34, 50-113.
Baskiotis c., Raymond J. and A. Rault (1979). Parameter identification and discriminant
analysis of jet engine mechanical state diagnosis. Proceedings, IEEE Conference on
Decision and Control, Fort Lauderdale.
Björck A. (1967). Solving linear least square problems by Gram-Schmidt orthogonaliza-
tion. BIT, 7, 1-21.
Biermann G.J. (1977). Factorization methods for discrete sequential estimation.
Academic Press, N. Y.
Carayannis G., Manolakis D. and N. Kalouptsidis (1983). A fast sequential algorithm for
least squares filtering and prediction. IEEE Transactions on Acoustics, Speech and
Signal Processing, ASSP-31, 1394-1402.
Carayannis G., Manolakis D. and N. Kalouptsidis (1986). A unified view ofparametric
processing algorithms for prewindowed signals. Signal Processing, 10,335-368.
Carlsson B., Salgado M. and G.C. Goodwin (1988). A new method for fault detection
and diagnosis. Technical Report EE8842, Dept. of Electrical Eng. and Computer
Science, University ofNewcastIe, Australia.
Parameter estimation methods 251

Cho K.R., Lang lH. and S.D. Umans (1992). Detection ofbroken rotor bars in induc-
tion motors using state and parameter estimation. IEEE Transactions on Industry
Applications, 28, 3, 702-709.
Cordero A.O. and D.Q. Mayne (1981). Deterministic convergence ofa selftuning regu-
lator with variable forgetting factor. Proceedings lEE, Part-D, 128, 1, 19-23.
DehoffR.L. Hall W.E. Jr. Adams R.J. and N.K. Gupta (1977). FI00 multivariable con-
trol synthesis program. AFAPLTR-77-35, Vol. land II.
DehoffR.L. and W.E. Hall Jr. (1978). Models for jet engine systems. Part 11: state space
techniques and modeling for control. Controland Dynamic Systems, 14,259-299.
Dalla Molle D.T. (1985). Fault detection via parameter estimation ia a single effect
evaporator. MS Thesis, University ofTexas, Austin.
Dalla Molle D.T. and M.D. Himmelblau (1987). Fault detection in an evaporator via pa-
rameter estimation in real time. Fault Detection anti Reliability: Knowledge-based and
other approaches, Pergamon Press, 131-138.
Favier G., Rougerie C., Bariani lP., de Amaral W., Gimena L. and L.VR de Amanda
(1988). A comparison offault detection methods and adaptive identification algorithms.
Proceedings, IFAC Identijication and System Parameter Estimation, Beijing, PRC,
535-542.
Fortescue T.R., Kershenbaum L.S. and B.E. Ydstie, (1981). Implementation of self-
tuning regulators with variable forgetting factors. Automatica, 17, 6, 831-835.
Freyermuth B. (1991). An approach to model based fault diagnosis ofindustrial robots.
Proceedings, IEEE International Conference on Robotics and Automation, April 7-12,
1991, Sacramento, USA.
Freyermuth B. and R. Iserman (1991). Model based incipient fault diagnosis ofindustrial
robots via parameter estimation and feature classification. Proceedings, European
Control Conjerence ECC '91, 2-5 July 1991, Grenoble, France.
Gantmacher F.R. (1977). The theory ofmatrices. Chelsea Publishing Company.
Geiger G. (1982). Monitoring of an electrical driven pump using continuous- time pa-
rameter estimation methods. Proceedings, 6th IFAC Symposium on Identijication anti
Parameter Estimation, Washington.
Geiger G. (1984). Fault identification of a motor-pump system using parameter estima-
tion and pattern classification. Proceedings, 9th IFAC Congress, Budapest.
Geiger G., (1986). Fault identification using a discrete square root method.
International Journal ofModeling and Simulation, 6, 1, 26-31.
Goodwin G.C. and M.E. Salgado (1989). Quantification ofuncertainty in estimation us-
ing an embedding principle. International Journal of Adaptive Control anti Signal
252 Real time fault monitoring of industrial processes

Processing, 8, 232-345.
Hägglund T. (1984). Adaptive control of systems subject to large parameter changes.
Proceedings, IFAC 9th Triennial World Congress, Budapest, Hungary, 993-998.
Henry JR. (1988). CF-18F404 transient performance trending. AGARD, Paper No.
448, Quebec City.
Iserman R. (1984). Process fault detection based on modelling and estimation methods -
A survey. Automatica, 20, 387-404.
Iserman R. (1987). Experiences with process fault detection methods via parameter es-
timation. In System Fault Diagnostics and Related Knowledge-Based Approaches, S.
Tzafestas et al.. (eds.), D. Reidel.
Iserman R. (1991). Fault diagnosis ofmachines via parameter estimation and knowledge
processing. Proceedings, IFACIIMACS Symposium "SafeProcess '91", 10-13
September 1991, Baden-Baden, Germany.
Iserman R., Appel W., Freyermuth B., Fuchs A., Janik W., Neumann D., Reiss Th. and
P. Wanke (1990). Model based fault diagnosis and supervision ofmachines and drives.
Proceedings, IFAC I Ith Triennial World Congress, Tallinn, Estonia.
Janik W. and A. Fuchs (1991). Process- and signal-model based fault detection ofthe
grinding process. Proceedings, IFACllMACS Symposium "SafeProcess '91", 10-13
September 1991, Baden-Baden, Germany.
Kaminski P.G., (1971). Square root filtering smoothing for discrete processes. Phd.
Thesis, Dept. Aeronautics and Astronautics, Stanford University.
Kalouptsidis N. (1987). Effident transversal and lattice algorithms for linear phase mul-
tichannel filters. IEEE Transactions on Circuits and Systems, CAS-37, 805-813.
Kalouptsidis N., Carayannis G. and D. Manolakis (1984). A fast covariance type algo-
rithm for sequential least squares filtering and prediction. IEEE Transactions on
Automatie Control, AC-29, 8, 752-755.
Kalouptsidis N., Manolakis D. and G. Carayannis (1983). A family of computationally
effident algorithms for multichannel signal processing. Signal Processing, 5, 1, 5-19.
Kalouptsidis N. and S. Theodoridis (1987). Parallel implementation of effident LS al-
gorithms for filtering and prediction. IEEE Transactions on Acoustics, Speech and
Signal Processing, ASSP-35, 11, 1565-1569.
Karaboyas S. and N. Kalouptsidis N. (1991). Effident adaptive algorithms for ARX
identification. IEEE Transactions on Acoustics, Speech and Signal Processing.
Kumamaru K., Söderström T., Sagara S. and K. Morita (1988). On-line fault detection
in adaptive control systems by using Kullback discrimination index. Proceedings, IFAC
Identification and System Parameter Estimation, 1135-1140.
Parameter estimation methods 253

Kwon O.-K. and G.C. Goodwin (1990). A fault detection method for uncertain systems
with unmodeled dynamics, linearization errors and noisy inputs. Proceedings, 1Ith
IFAC Triennial World Congress, Tallinn, Estonia, 367-372.
Liu lS.H. (1977). Detection, isolation and identification techniques for noisy
degradation in linear, discrete-time systems. Proceedings, 1977 CDC, 1132-1139.
Ljung L. (1987). System Identification: Theory for the User, Prentice Hall, Inc.
Englewood Cliffs, NJ.
Ljung L., MorfM. and D. Falconer (1978). Fast calculations of gain matrices for
recursive estimation schemes. International Journal oj Control, 27, 1-19.
Maguire L.P. and G.W. Irwin (1991). Transputer implementation ofKalman filters. IEE
Proceedings-D, 138,4,355-362.
Manolakis D., Carayannis G., Kalouptsidis N., (1980). Fast inversion ofvector gener-
ated matrices for signal processing. Signal Processing: Theories and Applications,
North-Holland, 525-532.
Merrill, W. (1984). Identification ofmultivariable high-performance turbofan engine dy-
namics from closed loop data. JournalojGuidance, 7, 677-683.
Merrington G., Kwon O.K., Goodwin G. and B. Carlsson (1991). Fault detection and
diagnosis in Gas Turbines. Transactions ojthe ASME, 113,276-282.
Neumann D. (1991). Fault diagnosis of machine-tools by estimation of signal spectra.
Proceedings, IFAClIMACS Symposium "SajeProcess '91", 10-13 September 1991,
Baden-Baden, Germany.
Nold S. (1987). Fault detection in AC-drives by process parameter estimation.
Proceedings, IFAC 10th Triennial World Congress, Munich, Germany.
Nold S.and R. Iserman (1986). Identifiability ofprocess coefficients for technical failure
diagnosis. Proceedings, 25th IEEE Conjerence oj Decision and Control, Athens,
Greece, Dec. 1986, 1587-1592.
Pot l, Falinower, V.M.and E. Irving (1984). Regulation multivariable adaptative des
fours. Colloque CNRS "Commande Adaptative. Aspects Pratique et Theoriques", St.
Martin d'Heres.
Potter lE. (1963). New statistical formulas. Memo 40, Instrumentation Laboratory,
MIT.
Pouliezos A, Stavrakakis G. and C. Lefas (1989). Fault detetcion using parameter esti-
mation - A survey. Quality and Reliability International, 5, 4, 283-290.
Pouliezos A and G.S. Stavrakakis (1989). Fast fault diagnosis for industrial processes
applied to the reliable operation of robotic systems. International Journal oj Systems
Science, 20, 7, 1233-1258.
254 Real time fault monitoring of industrial processes

Reiß T. (1991). Model based fault diagnosis and supervision of the drilling process.
Proceedings. IFA CIIMA CS Symposium "SafeProeess '91", 10-13 September 1991,
Baden-Baden, Germany.
Reiß T., Wanke P. and R. Iserman (1990). Model based fault diagnosis of a flexible
milling center. Proeeedings. IFAC Triennial World Congress, Tallinn, Estonia.
Rhodes I.B. (1990). A parallel decomposition for Kalman filters. IEEE Transaetions on
Automatie Control, AC-35, 3, 322-326.
Shibata, H. Ikeda, Y, Maruoka, G., Aoki, S. and T. Ogawa (1988). Application of es-
timation techniques to failure detection for AC. electric machines. Proceedings. IFAC
Identifieation and System Parameter Estimation, Beijing, PRC, 1147-1152.
Smed, T., B. Carlsson, C.E. de Souza and G.C. Goodwin (1988). Fault detection and di-
agnosis applied to gas turbines. Teehnieal Report EE8815, Dept. of Electr. Engr. and
Computer Science, Univ. ofNewcastle, Australia.
Söderström T. and K. Kumamaru (1985). On the use of Kullback discrimination index
for model validation of fault detection. Report UPTEC 8520R, Uppsala University,
Sweden.
Söderström T. and P. Stoica (1988). System Identification. Prentice Hall.
Stavrakakis G.S. and E.N. Dialynas (1991). Efficient computer based scheme for im-
proving reliability performance of power sub stations. International Journal of Systems
Scienee, 22, 9, 1527-1539.
Stavrakakis G.S. and A Pouliezos (1991). Fatigue life prediction using a new moving
window regression method. Meehanieal Systems and Signal Processing, 5, 4, 327-340.
Stavrakakis G.S., Lefas Ch. and A Pouliezos (1990). Parallel processing computer im-
plementation of a real time DC motor drive fault detection algorithrn. lEE Proeeedings.
Part B, 137, 5,309-313.
Thomton C.L. and G.J. Bierman (1977). Gram-Schmidt algorithrns for covariance
propagation. International Journal ofControl, 25, 243-260.
Tzafestas S.G. and G.S. Stavrakakis (1986). Model reference adaptive control ofindus-
trial robots with actuator dynamics. IFACIIFIPIIMACS International Symposium on
Theory ofRobots, Vienna, Austria, December 3-5.
Wahlberg B. (1990). Robust frequency domain fault detection/diagnosis. Proeeedings.
1lth IFAC Triennial World Congress, rallinn, Estonia, 373-378.
Wanke P. and T. Reiß (1991). Model based fault diagnosis and supervision ofthe main
and feed drives of a flexible milling center. Proceedings. IFACllMACS Symposium
"SafeProeess '91", 10-13 September 1991, Baden-Baden, Germany.
Watanabe K. and D.M. Himmelblau (1983). Fault diagnosis in nonlinear chemical proc-
Parameter estimation methods 255

es ses; Part I. Theory. AIChE Journal, 29, 2, 243-249.


Weiss J.L. (1988). Threshold computation for detection offailures in SISO systems with
transfer function errors. Proceedings, American Control Conference, 2213-2218.
Ydstie B.E., (1981), Phd. Thesis, University ofLondon.
Yeh H.G. (1991). Processing performance of two KaIman filter aIgorithms with a
DSP32C by using assembly and C languages. IEEE Transactions on Industrial
Electronics, 38, 4, 298-302.
Young P. C. (1981). Parameter estimation for continuous time models - a survey",
Aulomatica, 17,23-29.
CHAPTER4

AUTOMATIC EXPERT PROCESS FAULT DIAGNOSIS AND


SUPER VISION

4.1 Introduction

Automatic fault diagnosis, supemSlon and control of very complex systems are
becoming extrememly important. This is the direct consequence of the occurrence of
recent disasters because of unsatisfactory control or missed diagnosis of failures (Three
Mile !sland and Tchemobyl are but a few examples). Control and fault diagnosis cannot
be realized without a good methodology of modeling, i.e. representing the structure and
behavior of the systems under consideration in the significant states of their operation.
The conventional methods of large scale modeling require comprehensive knowledge
about the system consisting of conforming elements (e.g. a set of ordinary differential
equations), and no gaps in the knowledge are allowed. Complex physical systems (e.g. a
nuelear power plant, chemical processes) contain several types of elements and processes
(e.g. nuelear, mechanical, electrical, electronic, etc.) with different types of description
and eventually gaps in the available knowledge. The purely numerical-mathematical
approach oflarge scale systems modeling could not offer adequate methodology to solve
the problems arising in this field, therefore, symbolic and artificial intelligence methods
have been tried to obtain an adequate solution.
Diagnosis is currently one of the largest application domains of expert systems.
Strategies and capabilities for diagnosis have been evolving rapidly. Most of the past
applications involving diagnosis have been rule-based. That is, they use simple
production rules to provide a mapping between the possible causes and inputs of a
system and the possible faults.
The most primitive approach to automation would be to store diagnostic procedures in a
computer and activate them when symptoms arise. This approach is valid, however, only
when the symptoms are anticipated and the corresponding procedures can be predeter-
mined by the designer of the diagnostic system.
The leading wave of technology, however, provides powernd new techniques that are
applicable in a broad range of situations. These techniques give the ability to build and
Automatie expert process fault diagnosis and supervision 257

reason ab out deep models and can operate with a wide range of information, such as
learning from experience, probabilistic information, fuzzy reasoning and learning from
examples. At the same time, a clearer picture has emerged about the range of strategies
available and when they are most appropriate.
In this chapter these strategies will be examined and the nature of automatic expert
diagnostic and supervision systems, with respect to them, will be revealed . A framework
for coupling them together and their real-time implementation features will be provided.
Examples from the recent expert diagnostic practice in industry will be presented to help
the reader to delve into the matter.

4.2 Nature of automatie expert diagnostie and supervision systems

4.2.1 Expert systems for automatie proeess fault diagnosis

4.2.1.1 The terminology of knowledge engineering

An industrial Expert System (iES), in its most basic sense, is no more than a tool to
organize and codifY for the computer, the experience and thought processes of a human
with expertise concerning the operation of a technological process or an industrial plant
or a given piece of equipment.
Knowledge engineering (KE) is the process ofbuilding expert systems. Such systems are
medium-to large-scale software products which are designed to solve problems of differ-
ent kinds using a knowledge-based approach, where the knowledge is represented in an
explieit manner. They have a wide area of applieability, particularly in industrial control.
Generic categories of knowledge engineering applications are interpretation, prediction,
diagnosis, design, planning, monitoring, debugging, repair, instruction and control. Such
systems normally contain two main components: the inference mechanism (the problem
solving component) and the knowledge base (which may ·actually comprise a number of
knowledge bases). Generally speaking, expert systems work best in narrow application
domains.
The process of building an expert system consists of two main activities which usually
overlap: acquiring the knowledge and implementing the system. The acquisition activity
involves the collection of knowledge about facts and reasoning strategies from the do-
main experts. Usually, such knowledge is elicited from the experts by so-called knowl-
edge engineers, using interviewing techniques or observational protocols. However, ma-
chine induction, which automatically generates more elaborate knowledge from an initial
258 Real time fault monitoring of industrial processes

set ofbasic knowledge (usually in the form of examples), has also been extensively used.
In the system construction process, the system builders (i.e. knowledge engineers), the
domain experts and the users work together during all stages of the process, which tra-
ditionally has involved extensive prototyping.
To automate the problem solving process, the relevant task knowledge in the domain of
interest needs to be understood in great detail. However, acquiring the knowledge for
expert system building is generally regarded as a hard problem. This is not surprising, as
acquiring knowledge from an expert entails answering some really fundamental questions
such as:
• What is the relationship between knowledge and language?
• How can different domains be characterized?
• What constitutes a theory of problem solving?
The process of extracting knowledge from an expert is not the process of transferring a
mental model lying in the brain of an expert into the mind of the system builder, but the
formalization of a domain for the first time, and this is inherently a difficult process.
Ideally, models of conceptual structures of problem solving behavior are required as a
prerequisite to the knowledge transfer process. However, cognitive science approaches
have not yet yielded sufficient information to enable a full understanding of the knowl-
edge structures and problem solving strategies of experts to be applied, so that current
approaches are incomplete and often ad hoc.
The situation is further complicated by the fact that experts often have faulty memories
or provide inconsistencies. This means that separate validation of the expertise elicited
from experts is essential. Furthermore, experts exhibit cognitive biases such as overcon-
fidence, simplification, and a low preference for the abstract, the relative and conflicting
evidence. It is therefore important to test and validate expert systems both by analyzing
the expertise in the knowledge base and by examining failures in actual performance. As
far as possible, cognitive biases should be filtered out during the elicitation process.
A great deal of experimental evidence exists about the limitations of human decision
making and it has been suggested that the development of systems which mimic human
problem solving should be approached with some degree of caution. In order to reduce
the chances of bias, experts should be made aware of commonly found biases in
judgment, the elicitation process should include probes to foster the consideration of
alternatives and when experts run through sampie problems in the elicitation process, it
should be borne in mind that the way in which the problems are presented, will have an
impact as to how far any derived rules will exhibit cognitive bias.
Appraising the knowledge engineering process from a cognitive engineering viewpoint,
the following six stages, termed mainstream development, are suggested:
1. Knowledge e1icitation.
2. Cognitive bias filtering.
3. Knowledge representation and control scheme selection.
Automatie expert process fault diagnosis and supervision 259

4. Software development and integration.


S. System evaluation and validation.
6. Advanced prototype expert system.
Stages three and four ideally should only be carried out after the elicitation and cognitive
bias stages have been completed. In reality, tbis is not possible and researchers' experi-
ence suggests that several iterations through the first five stages are required before
stage six can be contemplated.
It is important to realize that experts change their solution strategies dependent upon the
boundedness of the problem. In well-bounded problems an expert's approach differs
dramatically from that of a novice. Conceptual models of experts retlect the physical
processes that actually occur. By contrast, the models used by non experts do not
account for all the process parameters driving the problem. When problems concerning
the processes are less weil understood, expert and novice models appear superficially to
be similar, though the experts seem to recognize that the simple models are not accurate
and that the use of a precise model is not practicable. So experts know what they do not
know and can readily identit}r features of uncertainty that preclude the use of precise
solution strategies; in such circumstances they will adopt simple and somewhat
inaccurate process models. Knowledge acquisition techniques must be able to cope with
tbis variation in expert strategy.
There are a number of terms used to describe the expert system building process wbich
are not weil defined and appear to overlap. Such terms include knowledge elicitation,
knowledge acquisition, system implementation, machine induction and even the term
knowledge engineering itself Knowledge acquisition is defined as the transfer and
transformation of problem-solving expertise from some knowledge source to a program.
Tbis definition covers the whole process including identification of the problem, its con-
ceptualization, formalization, implementation, testing and prototype revision.
The process ofbuilding knowledge-based systems (KBS) is essentially one of knowledge
engineering, thus the relationsbip between the different terms should be as in fig. 4.1.

KNOWLEDGEENGUffiERmG

KNOWLEDGE ACQUISITION SYSTEM IMPLEMENTATION

KNOWLEDGE ELICITATION MACHUffi INDUCTION

Figure 4.1 Relationship between tenns in knowledge engineering.


260 Real time fault monitoring of industrial processes

Developing knowledge-based systems is a far from trivial process. Those who build
knowledge-based systems for industrial systems supervision and diagnosis know that no
significant systems development can sensibly take place without a structured approach.
Systematic and structured approaches to KBS development available today can be found
in Hickman et al. (1989) and Luger and Stubblefield (1989).
Expert systems can be introduced into industrial systems to provide support for different
classes of people such as designers, operators and maintenance personnel. In general
such systems will be off-line (for designers and maintenance personnel) and on-line (for
operators). The knowledge engineering task will be different for each of these applica-
tions since the tasks involved will comprise different knowledge sources and structures.
One difference is that between technologicaVscientific knowledge and experimental
knowledge. This difference was described as a knowledge 0/junctioning versus a knowl-
edge 0/ utilization. The former knowledge is used by designers and maintenance person-
nel whereas the latter characterizes the one used by operators.
Off-line knowledge-based systems are not time critical. They may utilize several knowl-
edge sources including technical documents, reference literature, handbooks, ergonomic
knowledge, and knowledge about operator personnel (for use in user modeling). Whilst
their operation is not time critical, operator time constraints may still have to be taken
into account.
The most critical and challenging industrial expert systems are those developed for sys-
tem operation. They may encompass support for the automatic monitoring system as
weil as support for the operators, and may provide heuristic control, fault diagnosis, con-
sequence prediction and procedural support. The latter is particularly suitable for consis-
tency checking of input sequences or for operator intent recognition (Johannsen and
Alty, 1991).
Expert systems will be more effective when linked to dynamic databases. Knowledge can
then be applied to new situations by periodically executing rules and queries. Because of
this linkage, knowledge acquisition costs can be amortized over many instances of reuse.
Users will not lose their motivation to employ the system because of the the pain of
having to enter data each time. Furthermore, data-driven expert systems can propose and
rank suggestions to deal with the world as changes are observed (Kaiser et al. 1988).
Support expert systems work under time constraints because they are running in parallel
with the dynamic industrial process. These expert systems will depend upon a number of
knowledge sources related to knowledge of functioning and knowledge of utilization.
Additional knowledge such as that of senior engineers will be required.
Whilst a support expert system for predicting the consequences of some technical failure
will normally need only engineering knowledge, procedural support, diagnosis and heu-
ristic control modules will need operational knowledge as well. Since they will also have
to be integrated with the supervision and control system, they will need to support nu-
merical as well as symbolic knowledge.
Automatie expert process fault diagnosis and supervision 261

The importance of signal and symbol processing has been emphasized by Nawab et al.
(1987) and Rouse et al. (1989). They point out that models of symbol processing are
much harder to be identifiable than those of signal processing because semantics and
pragmatics playa large role in symbol processing systems. In particular, they stress the
need for symbolic representations in industrial process control applications.
In all cases of knowledge-based systems development, it will be necessary to define
carefully the goals and functionalities of the various systems and their interdependencies
at an early stage. It is also important to realize that in the industrial environment, not all
applications are suitable for the application of knowledge-based techniques. For exam-
pIe, existing numerical supervision and control systems are based upon thorough engi-
neering methodologies and replacement by knowledge-based techniques would, in most
cases, lead to performance degradation.
Finally, it must be realized that most industrial applications are very complex and this
makes the problem of acquiring and assembling the knowledge in the industrial envi-
ronment much more severe than in traditional computing domains. The elicitation and
conceptualization processes are liable to be far more complex and attempts to prove the
consistency of the knowledge will be very time-consuming. The full process is likely to
take years rather than months. In the absence of a powerful methodology, one is forced
to work with inadequate tools for some time to come.

4.2.1.2. TechniquesjoT knowledge acquisition

The most time consuming portion of constructing an expert system is the knowledge ac-
quisition phase (Forsythe and Buchanan (1989), Adelman (1989». Conceptually, knowl-
edge engineering is a measurement problem This measurement problem is a complex one
because there are five sources of variation: domain experts, knowledge engineers,
knowledge representation schemes, elicitation methods and problem domains. Adelman,
(1989), suggests the use of two or three distinctly different knowledge engineers,
knowledge representation schemes and elicitation methods when working with two or
more domain experts. The expert system development team will then be able to identify
which, if any, of these sources of variability result in disagreement in the predictions of
the knowledge base and, thereby, resolve them.
The techniques used in knowledge acquisition can be broadly divided into two catego-
ries: elicitation and machine induction. Strictly speaking, there is a continuum between
human-human elicitation and automatic induction. Three general principles have been
proposed for the acquisition process by Gruber and Cohen, (1987). They are concerned
with primitives and generalizations.
The first principle prescribes that task-Ievel primitives should be designed in order to
capture important domain concepts defined by the expert. The knowledge engineer must
use a language of task-Ievel terms rather than imposing implementation-Ievel primitives.
262 Real time fault monitoring of industrial processes

This principle stresses the importance of separating out acquisition from implementation.
These task-Ievel primitives must be natural constructs for describing information, hy-
potheses, relations, and actions, in the language of the domain expert. This would
suggest that task analyses should be combined with knowledge analyses.
The second principle suggests that explicit declarative representational primitives are
preferable to procedural descriptions. This principle is based upon the observation that
most experts understand declarative representations more easily. Formulating procedural
aspects in this way can facilitate acquisition, explanation, and maintenance. Gruber and
Cohen, (1987), suggest that an expert should be asked "for the parameters of a domain
that affect control decisions, and then to formulate control knowledge in terms of these
parameters" .
The third principle requires representations at the same level of generalization as the
expert's knowledge. Experts should not be forced to generalize except when absolutely
necessary and they should not be asked to specify information not available to them. An
example of an oversimplified generalization would be the requirement to categorize a
process variable as high, medium or low, when the expert needs to differentiate between
many more steps or even a full range of numbers.
Knowledge elicitation.
A number of techniques for knowledge elicitation are now in use. They usually involve
the collection of information from the domain expert(s) either explicitly or implicitly.
Originally, reports written by the experts were used, but this technique is now out of fa-
vour since such reports tend to have a high degree ofbias and reflective thought. Current
techniques include interviews (both structured and unstructured), questionnaires or ob-
servational techniques such as protocol analyses and walkthroughs.
Knowledge elicitation methodologies have more in common with the field-work orienta-
tion of anthropology and qualitative sociology than with the experimental orientation of
many cognitive sciences. It is suggested that knowledge engineers also use the large
amount of literature and experience as weil as the much longer tradition of the social
sciences in field worle, particularly data-gathering methods such as face-to-face inter-
viewing. Some pitfalls of knowledge elicitation are described on the basis of this experi-
ence in the social sciences. In particular, some interviewing problems such as obtaining
data versus relating to the expert as a person, fear of silence and failing to listen, diffi-
culty in asking questions, interviewing without arecord as weIl as conceptual problems
such as treating interview methodology as unproblematic or blaming the expert, are
explained.
Interviews. In a structured interview, the knowledge engineer is in control. Such inter-
views are useful for obtaining an overall sense of the domain. In an unstructured inter-
view, the domain expert is usually in control; however, such interviews can, as the name
implies, yield a somewhat incoherent collection of domain knowledge. The result can be
a very unstructured set of raw data that needs to be analyzed and conceptualized. It is
Automatie expert process fault diagnosis and supervision 263

obviously important for the knowledge engineer to have some knowledge of the domain
before wasting the valuable time of the expert. This might be obtained through text-
books, manuals and other well-documented sources. Group interviews can be useful
particularly in the phase of cognitive bias filtering.
Far from coming naturally, interviewing is a difficult task that requires planning, stage-
management, technique and a lot ofself-control. Forsythe and Buchanan, (1989), present
some ethnographic techniques that can be applied to the problem of identifying and miti-
gating difficulties of communication between knowledge engineers and experts during
interviews.
Questionnaires and rating scales. Questionnaires can be used instead or in addition
with interviews. The interviews can be standardized in question-answer categories or
questionnaires can be applied in a more formal way. However, the latter should be
handled in most cases in a relaxed manner for reasons of building up an atmosphere of
confidence and not disturbing the expert too much when applied in actual work
situations.
Rating scales are formal techniques for evaluating single items of interest by asking the
expert to cross-mark ascale. Verbal descriptions along the scale such as from "very low"
to "very high" or from "very simple" to "very difficult" are used as a reference for the
expert. The construction, use and evaluation of rating scales is described very weil in the
psychological and social sciences literature. Rating scales can also be combined with in-
terviews or questionnaires.
Observations. Observations are another technique for knowledge elicitation. They re-
quire little or no active participation of the expert. All actions and activities of the expert
are observed as accurately as possible by the knowledge engineer who makes recordings
of all the observed information. A special mixture of interview and observation tech-
niques are the observation interviews. Sequences of activities are observed and questions
about causes, reasons and consequences asked by the knowledge engineer during these
observations. The combined technique is very powernd because the sequence of activi-
ties is observable whereas decision criteria, rules, plans etc. are elicited in addition
through what-, how- and why- questions.
Protocol analysis. Protocol analyses are useful for obtaining detailed knowledge. They
can involve verbal protocols in which the expert thinks aloud whilst carrying out the
task, or motor protocols in which the physical performance of the expert is observed and
recorded (often on videotape). Eye movement analysis is an example of a very
specialized version of this technique. Motor protocols, however, are usually only useful
when used in conjunction with verbal protocols.
In a verbal protocol, the expert thinks aloud and a time-stamped recording is made of his
utterances. In such protocols, the expert should not be allowed to include retrospective
utterances. He or she should avoid theorizing their behavior and should "only report
information and intentions within the current sphere of conscious awareness". As a ver-
264 Real time fault monitoring of industrial processes

bal protocol is transcribed, it is broken down into short lines corresponding roughly to
meaningful phrases. This technique can collect the basic objects and relations in the do-
main and establish causal relationships. From these a domain model can be built.
Experience with the use of verbal protocols for the analysis of trouble-shooting in
maintenance work of technicians, is described by Rasmussen, (1984).
The critical decision method (CDM) as described by Klein et al., (1989), is a special
protocol analysis which elicits knowledge from experts and novices in a retrospective
way. Non-routine cases such as critical incidents are selected in order to discriminate the
true knowledge of the expert(s). Sources of bias are minimized by asking for
uninterrupted incident descriptions. Subsequently, the history of the incident is
reconstructed by means of time lines and decision points are identified and probed. It is
claimed that, by using the critical decision method knowledge can be elicited with
relatively little effort .
The cognitive task analysis approach. Roth and Woods, (1989), and Lancaster et al.
(in SIGART newsletter, p. 152, 1989) suggest a multi-phase progression from initial
informal interview techniques (to derive a preliminary mapping of the semantics of the
domain), to more structured knowledge elicitation techniques (to refine the initial
semantic structure), to controlled experiments designed to reveal the knowledge and
processing strategies utilized by domain practitioners.
The first phase (categorial knowledge structure) gives preliminary cognitive description
of the task as a guide for further analysis. It is important here not to horne in on specific
rules. One possibility is to get the experts to provide an overview presentation. Only
when an overview of the semantics of the application has been developed, can more
structured techniques be used.
The second phase (temporal event organization) concentrates on how practitioners per-
form their tasks; thus, there is an emphasis on observation and analysis of actual task
performance. It involves techniques such as critical incident review, discussion of past
challenges, or the construction of test cases on which to observe the experts at work.
During this phase the use of expert panels it is also recommended in order to obtain a
corpus of challenging cases for identifying critical elements and strategies for handling
them.
The third phase (causal structures in knowledge) uses observational techniques under
controlled conditions to observe expert problem solving strategies. The practitioner is
observed and asked to provide a verbal commentary (i.e. the why and how of a particular
domain). The task can be deliberately manipulated, for example, by forcing the expert to
go beyond reasonable routine procedures. In some cases, the expert hirnself controls the
information gathering. Altematively, it is controlled by the ob server. Each approach
provides useful information; the former provides data on the diagnostic search process
and the latter on the effect (or bias) of particular types of information on expert interpre-
Automatie expert process fault diagnosis and supervision 265

tations. Another useful technique is to compare the performance of experts with different
levels of expertise, so as to isolate what factors really account for superior performance.
Teachback interviewing. In tbis technique, the expert first describes a procedure to the
knowledge engineer, who then teaches it back to the expert in the expert's terms until the
expert is completely satisfied with the explanation. Johnson and Johnson, (1987),
describe tbis technique and illustrate its use in two case studies. Their approach is guided
by Conversation Theory, in wbich interaction takes place at two levels: specific and gen-
eral. The paper gives a useful set of guidelines on the strengths and weaknesses of tbis
technique.
Walkthroughs. More detailed than protocol analysis and often better because they can
be done in the actual environment, resulting in better memory cues. They need not, how-
ever, be carrled out in real time. Indeed, such techniques are useful in a simulated envi-
ronment where states of the system can be frozen and additional questions pursued.
Time lines. Tables in wbich several items of knowledge are contained in columns. The
left column has to be filled with the time of occurrence of particularly interesting events
such as failures or operator actions. Related information about the behavior of the tech-
nical process, the automatie system and the human operators at these times is recorded in
separate columns with as much detail as is feit appropriate.
Formal techniques. These include multidimensional scaling, repertory grids and bierar-
cbical clustering. Such techniques tend to elicit declarative knowledge. The most com-
monly used is the repertory grid teehnique based on the personal eonstruet theory. It is
used in ETS, (Boose, 1986), wbich assists in the elicitation of knowledge for
classification type problems. In ETS, the expert is interviewed to obtain elements of the
domain. Relationsbips are then established by presenting triads of elements and asking
the expert to identifY two traits wbich distinguish the elements. These are called
eonstruets. They are then classified into larger groups called eonstellations. Various
techniques such as statistical, clustering and multidimensional scaling are then used to
establish classification rules wbich generate conclusion rules and intermediate ruies
together with certainty factors. The experts are interviewed again to refine the
knowledge. ETS is said to save 2-5 months over conventional interviewing techniques.
The system has been modified and improved and is now called AQU/NAS (Boose and
Bradshaw, 1988). To obtain procedural knowledge, techniques such as verbal protocols
can be used.
Hypertext as a means of knowledge acquisition. Hypertext is an approach to informa-
tion management in wbich information is organized as a network of nodes connected by
links. Nodes may contain text, grapbics, audio, video and generally software for operat-
ing on numeric and/or symbolic data. While other software paradigms are promising
similar things, the essence of hypertext is that linking is machine-supported. At the de-
velopment level most hypertext environments feature control buttons (link ieons) wbich
can be arbitrarily embedded witbin the content material by auser. Hypertext allows for
266 Real time fault monitoring of industrial processes

easy and intuitive access to documents and programs by linking dispersed yet interrelated
information throughout a document, a program or aseries of documents/programs.
Traditional document structure is sequential, in other words there is a single linear se-
quence defining the order in which the text is to be accessed. Hypertext is nonsequential,
i.e. there is no single order that determines the sequence in which text is to be read.
Hypertext is simply the nonlinear presentation of any informational medium. Some com-
munities prefer to reserve this term for textual information only and use hypermedia as a
more general one. The term hypertext will be used here in the more general sense, as
applied to the broad scope of informational media including graphics, video disc, and
other such media. Hypertext systems have been used for information management and
intelligent computer aided instruction systems, and they should prove equally useful in
the area of expert systems technology.
A general knowledge acquisition tool designed around a hypertext concept could allow a
knowledge engineer to list important concepts, create nodes attached to these concepts
which explain their relevance, connect related concepts by linking their nodes, use
graphics to explain difficult concepts, and even critique information entered into the
system previously. In such a system, knowledge acquisition would not be confined to
linear input of information. The knowledge engineer could use the hypertext system to
compile knowledge gathered from an expert after interviewing, or (s)he could enter the
knowledge into the system as the expert sits there saying what information to encode.
It is the nonsequential capabilities of hypertext systems that make them attractive as
automated knowledge acquisition tools. The user of a hypertext system can dynamically
create new nodes for information, make notes to herlhimself in these nodes, and attach
these nodes to the places in the hyperspace whieh have eaused herlhim to think of the
nodes. The user eould even make anode and leave it unattached to any referenee, but
displayed on the screen as areminder of information whieh needs to be added to the
hypertext database. A hypertext system facilitates linking together related information in
hierarchieal structures that resemble the relationships between the nodes of information,
regardless ofhow complex the structure ofthe relationship may be.
Consider an example session with a knowledge engineer or an expert sitting down for a
first pass at amassing the knowledge relevant to a project at hand. The user brings up the
hypertext knowledge acquisition tool and enters a name for the knowledge base. The
system confirms a new knowledge base and displays anode for relevant concepts. The
user enters in list format short phrases to describe the topics which (s)he believes the ex-
pert system will have to know about in order to complete its function. Each concept will
represent a link to anode in which that concept is described in further detail. There can
be multiple links from concepts to other nodes. Also, concepts can be linked to one an-
other to indicate their relationships (similarly to semantic network connections). To the
nodes describing the concepts, other nodes can be attached. Such nodes can be definition
nodes, primarily for use in the expert system for explanation and help faeilities, graphie
nodes, also for user help in the expert system, note nodes, as discussed above, and more
Automatie expert process fault diagnosis and supervision 267

informational nodes describing subconcepts within the main concepts. The key here is
that the node attachments are created by the user as (s)he sees they would best be able to
represent the knowledge required for implementing the expert system.
Another capability wbich the hypertext environment otTers, is the contribution of
knowledge to the knowledge-base from multiple experts. Once an initial attempt has
been made at gathering the knowledge for the expert system, the experts can navigate
through the hypertext system and judge the validity of knowledge represented. Each
expert can attach comment nodes to currently existing nodes which indicate what (s)he
tbinks about the information contained within the original node. Experts can even
critique the information input by other experts (tbis is a capability wbich may or may not
be desirable, so there should be provisions for enabling and disabling such a feature).
There exists, however, a fundamental drawback in hypertext: user disorientation and
even confusion. Increasing the number of connections, or links, increases the possibility
that a user will get lost in irrelevant information. User disorientation may be particularly
severe for large scale applications such as those involved in the utilization of power plant
databases. The process of moving through a hypertext information base is referred to as
navigation. Tsoukalas et al., (1991), outline navigational tools based on the theory of
fuzzy graphs and fuzzy relations. These tools quantity context-dependent user
preferences and application-specific constraints in such a manner that a user may direct
(him)herself to an information island of interest. A numerical example and a prototype
for monitoring special material in a nuclear power plant are included.
Bottom-up and top-down knowledge capturing.
There are two competing views about the knowledge acquisition task which might be
described as bottom-up and top-down.
The bottom-up proponents aim to prise data and concepts out of the expert and then
iteratively refine it. The implication is that deeper mining will reveal more relevant
knowledge, but this assumes that there is a simple relationsbip between what is verbal-
ized by experts and what is actually going on in their minds.
The basic assumption underlying the bottom-up approach is that an expert system is
based upon a large body of domain specific knowledge, and that there are a few general
principles underlying the organization of the domain knowledge in an expert's mind.
However, the existence ofunderlying principles and causal relationsbips may be an indi-
cation that expert knowledge is somehow domain independent. So, expert behavior that
is seemingly domain-specific, may originate from bigher level problem solving methods
wbich are weil structured and have some degree of domain independence. Currently, the
most popular heuristic rule generation procedures are based on schemes of top-down
induction ofdecision trees (TDID1) (Gray, 1990).
The essence of a TDIDT program for decision tree generation is depth-first search. One
starts with some training set of example objects, each characterized by attribute values
and a class designation. The program selects and tests a binary attribute, resulting in two
268 Real time fault monitoring of industrial processes

recursive subproblems. Each subproblem involves a subset of the original example set.
The first subproblem is then analyzed in the same manner leading to two further recur-
sive subproblems. Search proceeds depth-first until either all objects associated with a
general subproblem are in the same class (completing adecision tree branch), or all at-
tributes have been utilized (demonstrating that either data are incorrect or the attribute
set is inadequate). When a branch is completed, the program backtracks to a previous
choice point and from there explores the subproblem(s) associated with alternative val-
ues for the tested attribute.
If the attribute set is sufficient, many decision trees may exist that correctly classify
training set examples. The program uses inductive inference to construct adecision tree
that correctly classifies other objects in addition to those in the training set.
Consequently, adecision tree must capture some meaningful relation between an object
class and its attribute values. Given several trees that correctly classify training set data,
the "simplest" is usually chosen on the grounds that it is more likely to capture the
inherent structure of the problem.
TDIDT methods restrict search by employing a heuristic measure: they considere combi-
nations of attributes appearing to have a high information content. This useful restriction
can make rule generation feasible, but other aspects ofthe TDIDT approach are unhelp-
fut. Its depth-first recursive search with backtracking, causes it to impose inappropriate
context restrictions on rule search. These restrictions lead to opaque knowledge repre-
sentations and excessive sensitivity to noise.
Gray, (1990), proposed algorithmic modifications so that users can still measure heuristi-
cally the information content of attributes in order to guide the search. The program
iteratively examines all positive instances remaining to be covered, along with negative
training set instances. Moreover, the search does not take place with irrelevant context
restrictions. This algorithm is no more complex than usual TDIDT, it is just as fast, it is
less sensitive to noise, and it leads to clearer representations of the information present in
trainingset data.
Machine induction.
Machine induction is a special case of machine learning which encompasses heuristics for
generalizing data types, candidate elimination algorithms, methods for generating deci-
sion trees and rule sets, function induction and procedure synthesis. MacDonald and
Witten, (1989), developed a framework for describing such techniques that allows an
evaluation of the usefulness of any method in solving particular knowledge engineering
problems. They have concentrated upon decision tree and rule set generation approaches
because these techniques have been successfully used in a number of knowledge acquisi-
tion situations.
It is commonly observed that experts have great difficulty in explaining the procedures
which they use to arrive at decisions. Indeed, experts often make use of assumptions and
beliefs which they do not explicitly state, and are surprised when the consequences of
Automatie expert process fault diagnosis and supervision 269

these hidden assumptions are pointed out. The inductive approach reHes on the fact that
experts can usually supply examples of their expertise even if they do not understand
their own reasoning mechanisms. This is because creating an example set does not re-
quire any understanding of how different evidence is assessed or what conflicts were
resolved to reach adecision. Sets of such examples are then analyzed by an inductive
algorithm (one of the most popular being the ID3 algorithm, see knowledge acquisition
tools in the following) and rules are generated automatically from these examples.
The problem with inductive techniques is that the rules induced depend both upon the
example set chosen and the inductive algorithm used. There is no guarantee that the rules
induced will be valid knowledge. Therefore, the approach normally involves acheck with
the expert to validate the induced rules. It is not uncommon to cycle a number of times
through the induction process, refining the knowledge base with the domain expert.
The most important guidelines in the appropriate use of inductive techniques are:
• The technique is useful if there are documented examples or if they can be obtained
easily. It is not suitable where an unpredictable sequence of observations drives the
system (e.g. as in some real-time situations).
• The technique is consistent and unbiased and is very suitable for domains where
rules form a major part ofthe knowledge representation.
• Induction provides the knowledge engineer with questions, results and hypotheses
which form a basis for consultation with the expert.
• There is no explanation for the rules produced' All output must be examined criti-
cally.
• The process assurnes that the example set is complete and current.
• Results should not be sensitive to small changes in the training set.
The inductive technique has been used successfully for weather prediction, predicting the
behavior of a new chemical compound, diagnosing plant disease, symbolic integration,
improved debt collection, and designing gas-oil separators.
Knowledge acquisition tools.
A large number of tools for supporting the knowledge acquisition process have been
developed in the academic environment and some of these have been mentioned already.
The general aim of all these tools is to minimize the number of iterations needed for the
whole knowledge engineering process by bridging the gap between the problem domain
and the implementation. Boose and Gaines, (1988), and Johannsen and Alty, (1991),
give abriefsummary ofthe main tools under development and provide a summary. Some
tools endeavour to make the process fully automatic. KRITON for example, has a set of
procedures and pre-stored interviews, and caters for incremental text analysis and
protocol analysis. Repertory grids are used to pull out declarative knowledge. An
intermediate knowledge representation system is suggested for supporting the
knowledge elicitation techniques. The knowledge representation scheme involves a
propositional calculus for representing transformations during the problem solving
270 Real time fault monitoring of industrial processes

process and a descriptive language for functional and physical objects. Tbis is then
translated semi-automatically into the run-time system but tbis commits the knowledge
engineer to a particular representation.
Other tools, for example KADS and ACQUIST, merely provide a set oftools to aid in a
more methodological approach. Thus, KADS aims only at producing a document de-
scribing the structure of the problem in the form of a documentation handbook.
The KADS methodology is based upon the following principles (Hickman et al., 1989):
• Knowledge and expertise should be analyzed before the design and implementation
starts, i.e. before an implementation formalism is chosen.
• The analysis should be model driven as early as possible.
• Expert problem solving should be expressed as epistemological knowledge.
• The analysis should include the functionality of the prospective system.
• The analysis should be breadth-first allowing incremental refinement.
• New data should only be elicited when previous data has been analyzed.
• All collected data and interpretations should be documented.
KRITON supports only bottom-up knowledge acquisition but KADS supports both top-
down and bottom-up through a hypertext protocol editor (PED) and bierarcbies are
developed and manipulated by a context editor (CE). Top-down is supported by a set of
interpretation models each describing the meta-level structure of a generic task.
KEA TS-1 provides a cross reference editing facility (CREF) and a graphical interface
system (GIS), to support data analysis and domain conceptualization. CREF organizes
the verbal transcript text into segments and collections and GIS allows the knowledge
engineer to draw and manipulate domain representations on a sketch pad. In KEATS-2,
these have been replaced by ACQUIST, a hypertext application for structuring the
knowledge from the raw text data. Fragments from the data are collected around con-
cepts, concepts are factored into groups, and groups into meta-groups. Links can then be
defined between any of these entities. The emerging structure is displayed grapbically.
ACQUIST provides support for both bottom-up approaches (fragments to concepts to
groups to meta-groups) and top-down approaches (using what are called coding sheets
on wbich a caricature ofthe observed behavior ofthe domain expert is captured). In tbis
approach, the knowledge engineer uses a redefined abstract model to guide the
knowledge acquisition process. Use of such models (even if incomplete or inadequate)
can dramatically improve the knowledge acquisition process. The coding sheet is a set of
hypertext cards.
A further knowledge acquisition tool is ROGET. It conducts a dialogue with a domain
expert in order to acquire (his)her conceptual structure. ROGET gives advice on the
basis of abstract categories and evidence. Initial conceptual structures are selected on
tbis basis. Only a small set of example cases have been tested on tbis system.
The systematic acquisition of knowledge about the faulty behavior of a technical system
was suggested by Narayanan and Viswanadham, (1987). A procedure involving the
Automatie expert process fault diagnosis and supervision 271

development of a hierarchical failure model with fault propagation digraphs and cause-
consequence knowledge bases for a given system, is proposed. It uses the so called
augmented fault tree as an intermediate knowledge representation. Fault propagation
digraphs describe the hierarchical structure of the system with respect to faults in terms
of propagation. The cause-consequence knowledge bases characterize failures of
subsystems dependent on basic faults by means of production rules. The knowledge
acquisition process can be reduced to defining parameters required by the knowledge
representation scheme and transforming human expertise into these parameter values.
The augmented fault tree is a conceptual structure, which describes causal aspects of
failures as in conventional fault trees as weil as probabilistic, temporal and heuristic
information (see also Contini, 1987). The production rules of cause-consequence
relations are derived from the augmented fault tree by decomposing it into mini fault
trees. The proposed methodology has reached a relatively high level of formal
description. However, it cannot yet deal with inexact knowledge by using ranges of pa-
rameters. An example of a failure event in a reactor system is given.
Strategic knowledge acquisition tools have been built especially for diagnosis purposes:
TEIRESIAS, ASK, CART, ODYS, NEOMYCIN, HERACLES, BDM-KAT, CATS, ID3,
ELf, HELIOS, MILKAM, KSSO assistants and their evolutions are some of the more
recent knowledge elicitation systems which are in use in medical and industrial diagnosis
practice. The reader is referred to the SIGART Newsletter (1989), Mussi and Morpurgo
(1990) and Brule and Blount (1989) for details.

4.2.1.3 Expert system approaehes for automatie proeess fault diagnosis

Having concluded that the human's cognitive limitations, biases, errors, and lack of
knowledge are major obstacles in diagnostic performance, it seems reasonable to attempt
full or partial automation of the diagnostic procedure in order to aid the human
diagnostician in real-time. One approach in tbis category that has recently received
widespread attention is that of expert systems (ES).
Some recent surveys of ES appropriate in fault detectionldiagnosis of technological
processes are provided by MiIne (1987), Tzafestas (1987, 1989, 1991), Majstorovic
(1990), Prasad and Davis (1993) and Dounias et al.(1993).
Several designs have been efficiently employed for fault diagnosis in various domains
such as medical, electronic, computer HIW and SIW, and industrial process diagnosis. A
brief description of the main characteristics of the most important current approaches is
given in the following.
Shallow reasoning approach.
Since shallow reasoning is highly domain-specific, diagnosis is fast if the symptom has
been experienced and thus has been included in the knowledge base. This reasoning
272 Real time fault monitoring of industrial processes

typically uses (production) rules which consist of antecedents and consequents. An ante-
cedent is a condition part and a consequent is an action part of a rule. If certain condi-
tions are met, then some actions are performed. For this reason, such a rule is often
called an IF-THEN rule. These rules can be classified by their behavior, i.e. self-managing
rules or meta-rules. A self-managing rule is one in which actions are performed without
referring to any other rules. A meta-rule is one in which its actions result from the
triggering of other rules.
A disadvantage however, is that shallow reasoning is rigid in the sense that substantial
changes in the rules may have to be made if a single component is added or deleted.
Consequently, the number of rules becomes practically unmanageable as the number of
components of the system being diagnosed increases. If multiple faults can occur
simultaneously, then this approach becomes combinatorially explosive.
The classical example of a shallow expert diagnostic system is MYCIN and
NEOMYCIN (see Section 4.2.2). The main shortcomings ofthis approach are:
• Difficult knowledge acquisition.
• Unstructured knowledge requirements.
• Diagnosability or knowledge-base completeness non guaranteed.
• Excessive number of rules.
• Knowledge-base highly specialized to the individual process.
These disadvantages can be overcome by decomposing the problem into smaller prob-
lems either in a hierarchical manner or according to unit operations.
Deep knowledge approach.
This approach is appropriate for man-made technological system diagnosis (causal ori-
ented systems) and is based on a structural and functional model ofthe problem domain.
A deep knowledge ES attempts to capture the underlying principles of a domain (or
process) explicitly, and so the need to predict every possible fault scenario is eliminated.
Obviously, this approach leads to expert system tools that are able to handle a wider
range ofproblem types and larger problem domains (Yoon and Hammer, 1988).
Deep reasoning, compared to shallow reasoning, is more flexible and thorough, but
slower. "More flexible" means that since deep reasoning is not domain-specific, it is eas-
ier to modify the model when a single component is added or deleted. Deep reasoning
may not be sensitive to the change. "More thorough" means that deep reasoning may
answer "what-if' type questions which may not be possible in shallow reasoning. This
implies that there is no limitation of fault coverage in deep reasoning. "Slower" means
that the speed of reasoning is slower than that of shallow reasoning because a deep
knowledge base does not contain every detail of a symptom.
The principal deep-knowledge diagnosis methods are:
• The causal knowledge search method.
• The physical system mathematical model method.
Automatie expert process fault diagnosis and supervision 273

• The hypothesis test method.


The causal knowledge search method is based on the tracing of process malflmctions to
their source. Causality is usually represented by directed graphs (digraphs), the nodes of
which represent state variables, alann conditions or failure origins, and the branches
(edges) represent the influences between nodes. The digraph can include (besides the
positive (+) or negative (-) influences) the intensity of the effect, the time delays and
probabilities of fault propagation.
Since the space of possible hypotheses can be very large if multiple disorders can be
present simultaneously, some means are required to focus an expert system's attention on
those hypotheses that are most likely to be valid. A domain-independent algorithm is
proposed by Peng and Reggia, (1987), that uses symbolic causal knowledge and numeric
probabilistic knowledge to generate and evaluate plausible hypotheses during diagnostic
problem solving with multiple simultaneous disorders.
The use of causal knowledge for diagnosis has been more popular in applications such as
electronics and mechanies because the problems and mechanisms there are better defined
(Fink and Lusth, 1987).
Hudlicka and Lesser, (1987), integrated a number of known techniques (diagnostic
paths, simulation, qualitative reasoning, constraint networks) and described two new
ones (comparative reasoning and the use ofunder-constrained abstracted objects) in an
attempt to solve, using a causal model of the system, the problems encountered in repre-
senting and reasoning about problem-solving system behaviour. The diagnosis system
selects its own "correet behavior criteria" from objects within the problem-solving sys-
tem which did achieve some desired situation.
The physical system mathematical model method relies on redundancy between process
measurements and consists of an analytic problem solution, a process knowledge base, a
knowledge acquisition component and a rule-based inference mechanism. The analytic
problem solution uses a physico-mathematical model of the process through which
process state or parameter estimation can be performed. Detection of changes in process
variables or parameters are then considered to be symptoms of process faults.
Classification of the changes which may occur in the process state can be performed. The
process knowledge base is comprised of analytical knowledge in the form of process
models, and heuristic knowledge in the form of fault and/or event trees and fault
statistics (Contini, 1987). In the phase of knowledge acquisition, the process specific
knowledge, ego theoretical process models, classification of changes, normal behavior,
event trees and fault trees, is compiled. The inference mechanism performs the fault
diagnosis, based on the observed symptoms, the fault and/or event trees, fault
probabilities and the process history (Pouliezos and Stavrakakis (1987), Frank (1990),
van Soest et al. (1990), Isermann and Freyermuth (1991), Rhodes and Karakoulas
(1991), Gertler and Anderson (1992)).
274 Real time fault monitoring of industrial processes

The hypothesis jormulationlhypothesis testing method follows the usual human diagnos-
tics path, i.e. a cause for a system malfunction (upset) is postulated, the symptoms ofthe
postulated fault are determined, and the result is compared with the process observables.
Of course the search for the location of a fault can be narrowed by using appropriate
heuristics. Hypothesis testing requires qualitative simulation of the effects of the postu-
lated malfunctions. Qualitative (non numerical) simulation requires prediction of the
direction of deviation of measured variables of the process as a result of faults.
Qualitative simulation models need to be enriched with suitable heuristics and
precedence rules in order to be able to resolve competing causal influences on the same
process variable.
A means of interpreting observations made of a physical system across time, in terms of
qualitative physics theory, is described by Forbus, (1987), and Bandekar, (1989). The
theory described is ontology-independent as weil as domain-independent. This means
that it only requires a qualitative description of the domain capable of supporting
envisioning and domain-specific techniques for providing an initial qualitative description
of numerical measurements, even when noisy data should be handled. For the diagnosis
problem this theory provides a general method for testing fault hypotheses using an
analogous AI model, to test if it actually explains the observed behavior. Trave-
Massuyes et al., (1990), present a qualitative calculus (qualitative equations setting, the
orders ofmagnitude qualitative algebras, qualitative equations solving techniques) which
is a key point in qualitative simulation for automatic intelligent fault diagnosis and
supervision. Although qualitative calculus gives rise to numerous problems, this paper
and its authors' previous and current work provide a quite complete resolution scheme.
Ontological analysis approach. Another general knowledge engineering methodology
which is based on the deep knowledge approach is the so called ontological structure
analysis. Ontological analysis proceeds in a step-by-step articulation of the knowledge
structures needed to perform a task by following the objects and relationships that occur
in the task domain itself The application of ontological analysis to practical
troubleshooting/diagnosis problems can be done very effectively by decomposing the
notion of ontological structure into three levels, namely:
1. Static ontology: Definition of actual physical objects in the problem domain and
their properties and relationships.
2. Dynamic ontology: Definition of the state space in which the problem solving must
occur and the actions that transform the problem from one state to another.
3. Epistemic ontology: Definition of the form of constraints and heuristics that control
navigation in the state space.
For example, in the case of electronic instrument troubleshooting, static ontology
encompasses the components and knobs of the instrument which are connected
electronically by nodes, and are grouped in blocks. Dynamic ontology defines the states
of the problem which consist of belief states and instrument states. A beliej state consists
of diagnostic beliefs about the diagnostic condition of each module (component, knob or
Automatie expert process fault diagnosis and supervision 275

block), wbich have associated justifications. An instrument state consists of knob set-
tings and signal inputs that are used to stimulate the instrument under test. Transfor-
mations are measurements of signals and electrical parameters. Measurements are
grouped into tests, and tests are grouped into strategies. Each test and strategy has
implicl\tions, i.e. diagnostic beliefs implied by the test results. Finally, epistemic ontology
defines appropriate types of knowledge that make it possible to choose effective
transformations and thus navigate the problem state space in a reasonable amount of
time. Most experts use the hypothesis formulation/hypothesis testing method presented
previously, i.e. they diagnose an instrument by having (or formulating) a set of
hypotheses about wbich modules might be good and bad. In order to test such
hypotheses, they have heuristic diagnostic strategies, wbich relate each module to the
method by wbich it may be tested.
Formal tools exist for defining and communicating the ontological analysis. These tools
are illustrated in Freiling et al., (1986), with an example domain equation language called
SPOONS (SPecijication Of ONtological Structure) and the presented results are based
on the knowledge-based systemHIPE (Hierarchicallnference Processing Engine).
Hybrid reasoning.
Shallow reasoning has been widely used but, because of the disadvantages mentioned
above, deep reasoning has emerged. However, since it requires more search time and
thus shows an undesirable speed of reasoning for some complex systems, deep reasoning
alone is not satisfactory either. Hence, hybrid reasoning, combining these two
approaches has been attempted in order to perform the diagnostic process efficiently
(from deep reasoning) and effectively (from shallow reasoning). In other words, hybrid
reasoning utilizes both deep and shallow reasoning methodologies in an attempt to take
advantage of the strengths of each. Two directions exist:
1. Deep first, then shallow (D-S).
2. Shallow first, then deep (S-D).
Examples of an S-D approach are CHECK (Combining Heuristic and Causal
Knowledge) by Torasso and Console, (1989), and IDM (Integrated Diagnostic Model)
by Fink and Lusth, (1987). An example of a D-S approach is ISA (Integrated Status
Assessment) developed by Marsh, (1988).
The current trend in diagnosis is toward hybrid reasoning. Yet, there has been no
comparative study of the various types of reasoning. However, when the system to be
diagnosed is relatively small (tbis also implies a small number ofrules), an S-D approach
seems to be preferred. For a large-scale system like a manufacturing plant, a D-S
approach is often chosen. The D-S type hybrid reasoning diagnosis model is analytically
described in Appendix 4.A for the interested reader.
276 Real time fault monitoring of industrial processes

Attribute grammar approach.


The discussion on ES-based fault diagnosis will be c10sed with abrief discussion of the
attribute grammar approach to building process fault diagnosis tools. By employing the
attribute grammar model for knowledge representation, both dec/arative (factual)
knowledge and inferential (procedural) knowledge can be combined in a single tool.
Since there exist many implementations of compilers and interpreters for processing
attribute grammars, tbis approach promises a lot for knowledge engineers and fault
diagnosis experts. In cases where knowledge can be expressed in the form of logic rules,
an attribute grammar interpreter is written in the form of logic rules in a PROLOG-like
way, i.e.,

Ro(tOI,t02 ,···,tOkO ) is true if

R1(tll,tI2,.·.,t1k1 ) is true and

where tij; 0 ~ i me , i ~ j ~ kj, I ~ e ~ n, is a constant or a variable, and e is the


~

rule number (assuming that the rules are numbered from I to n). If m" =0, then the rule
reduces to a fact.
A syntax mIe corresponding to the above logic rule for an homologous attribute gram-
mar (ie. a grammar wbich when processed by its interpreter will give the same resuIts as
those coming from the successful application of the above logic rules) has the form,
< Ro > :: = < R1 >< R2 > ... < Rm > I-I
"
where the combination of the last two characters means the end of the syntax rule.
The parser of the homologous attribute grammar has the following features.
• No terminal symbols are used (ie. it is degenerate).
• An extended stack is used for saving the attribute values as weil.
• Calls to the attribute evaluator are included.
• A meta-variable, named FLAG, is used to show variable matcbing (when a value
mismatch occurs, the PLAG takes the value false).
A false value of FLAG results in a badctracking of the parsing process. The semantic
mIes that perform the unification of variables can be written in a straightforward way.
The interpreter includes a facility of providing "why" and "how" explanations.
To deal with inexact knowledge (i.e. uncertain facts and rules or imprecise items of evi-
dence) each rule is assigned a certainty measure wbich can be the conditional probability
Automatie expert process fault diagnosis and supervision 277

ofthe validity ofits conclusion, given the corresponding premises (de Kleer, 1990). Each
of the premises is assigned a posterior probability evaluated from previous inference. Tbe
updating of these posterior probabilities can be done using Bayes inference rule. De
Kleer, (1990), applied a similar procedure for the construction of a diagnostic engine in
order to identify automatically the faulty components of a malfunctioning device in the
fewest number of measurements. A minimum entropy technique is used (pandelidis,
1990) to select the next best measurement to be used for the diagnosis procedure.
Alternatively, one can use an upper or lower bound of the validity probability of each
item, i.e. the so called possibility and necessity measures, or the weU known Shortliffe's
certainty factors employed in the expert system MYCIN, mentioned previously.

4.2.1.4 High-speed implementations 0/ Tule-based diagnostic systems

In the previous sections of this chapter, rule-based knowledge systems for modeling
intelligent behavior and building expert systems for automatie process fault monitoring,
are described.
However, most rule-based programs are extremely computationally intensive and run
quite slowly. The slow speed of execution has prohibited the use of rule-based
knowledge systems in domains requiring high performance and real-time response such
as in real-time process fault diagnosis. In this section various methods for speeding up
the execution of rule-based knowledge systems are explored. In particular, the role of
parallelism in high-speed execution of rule-based knowledge systems is examined and the
architectural issues in the design of computers for rule-based systems are studied. It is
shown that contrary to initial expectations, the speed-up that can be obtained from paral-
lelism is quite limited, only about tenfold. The reasons for this small speed-up are:
1. The small number of rules relevant to each change in data memory.
2. The large variation in the processing requirements of relevant mIes; and
3. The small number of changes made to data memory between synchronization steps.
Furthermore, in order to obtain this limited factor of tenfold speed-up, it is necessary to
exploit parallelism at a very fine granularity. A suitable architecture to exploit such fine-
grain parallelism is a shared-memory multiprocessor with 32-64 processors. Using such a
multiprocessor, it is possible to obtain execution speeds of about 3800 rule-firings/sec
(Gupta et al., 1989).
A rule-based knowledge system is composed of a set of IF-THEN rules (also called
productions) that make up the rufe memory, and a database of assertions called the
working memory. Tbe assertions in tbe working memory are called working memory
elements. Eacb rule consists of a conjunction of c01uiition elements corresponding to tbe
IF part of the rule (also called tbe left-hand side of tbe rule), and a set of actions
corresponding to tbe THEN part oftbe rule (also called tbe right-hand side oftbe rule).
278 Real time fault monitoring of industrial processes

The aetions associated with a rule can add, remove, or modifY working memory
elements, or perform input-output.
The rule interpreter is the underlying mechanism that determines the set of satisfied mies
and controls the execution of the mle-based knowledge system. The interpreter executes
a rule-based program by performing the following recognize-act cyde:
• Match: In tbis first phase, the left-hand sides of aII mies are matched against the
contents of working memory. As a resuIt a conflict set is obtained, wbich consists
of instantiations of a11 satisfied rules. An instantiation of a rule is an ordered list of
working memory elements that satisfies the left-hand side ofthe rule.
• Conflict resolution: In tbis second phase, one of the rule instantiations in the con-
flict set is chosen for execution. If no mies are satisfied, the interpreter halts.
• Act: In this third phase, the actions of the rule selected in the conflict-resolution
phase are executed. These aetions may change the contents ofworking memory. At
the end of tbis phase, the first phase is executed again.
The recognize-aet cyde forms the basic control structure in mle-based programs. During
the match phase, the knowledge ofthe program (represented by the rules) is tested for
relevance against the existing problem state (represented by the working memory).
During the conflict-resolution phase, the most promising piece of knowledge that is rele-
vant is selected. During the act phase, the aetion recommended by the selected rule is
applied to the existing problem state, resulting in a new problem state.
Parallelism is possible to be used wbile performing each of the above three phases. It is
further possible to overlap the processing performed witbin the match phase and the con-
fliet-resolution phase of the same recognize-aet cyde, and that within the act phase of
one cycle and the match phase of the next cycle. However, it is not possible to overlap
the processing witbin the conflict-resolution phase and the subsequent act phase, because
the conflict-resolution must finish completely before the next rule to fire can be
determined and its right-hand side evaluated. Thus, the possible sources of speed-up are:
1. Parallelism witbin the match phase.
2. Parallelism witbin the conflict-resolution phase.
3. Parallelism within the aet phase.
4. Overlap between the match phase and the conflict-resolution phase of the same
cyde.
5. Overlap between the act phase of one cyde and the match phase of the next cyde.
Parallelism within the match phase.
In the following, several ways in wbich parallelism may be used to speed up the match
phase, are discussed.
Rule-Ievel parallelism. When using rule-Ievel parallelism, the mies in a program are
divided into several partitions and the match for each of the partitions is performed in
parallel. In the extreme Case, the number of partitions equals the number of mies in the
Automatie expert process fault diagnosis and supervision 279

program, so that the match for each rule in the program is performed in parallel. One of
the main advantages of using rule-Ievel parallelism is that no communication is required
between the processes that perform a match for different rules or different partitions.
Contrary to all expectations, Gupta et al. (1989), show that the true speed-up expected
from rule-Ievel parallelism is really quite small, only about twofold. Some of the reasons
for tbis are given below:
• Simulations show that the average number of rules affected per change in working
memory is around 28. (A rule is said to be affected by a change in working memory,
if the new working memory element matches at least one of the condition elements of
that rule). In most matches, determining the set of affected rules is much faster than
processing the state changes associated with the affected rules. Thus the number of
affected rules bounds the amount of speed-up that can be acbieved using rule-Ievel
parallelism.
• The speed-up obtainable from rule-Ievel parallelism is further reduced by the variance
in the processing time required by the affected rules. The maximum speed-up that
can be obtained is proportional to the ratio tav/t1flDX' where tavg is the average time
taken by an affected rule to finish match and tmax is the maximum time taken by any
affected rule to finish match. The parallelism is inversely proportional to tmax because
the next recognize-act cycle cannot begin until all rules have finished match. Note
that nominal speed-up (or concurrency) is defined to be the average number of
processors that are kept busy in the parallel implementation. Nominal speed-up is to
be contrasted against true speed-up, wbich refers to the speed-up with respect to the
bighest performance uniprocessor implementation, assuming that the uniprocessor is
as powerful as the individual nodes of the parallel processor. True speed-up is usually
less than the nominal speed-up because some of the resources in a parallel
implementation are devoted to synchronizing the parallel processes, scheduling the
parallel processes, recomputing some data that are too expensive to be
communicated, ete.
• The tbird factor that influences the speed-up is the loss of sharing in the data flow
network when rule-Ievel parallelism is used. The loss of sharing happens because
operations that would have been performed only once for sirnilar rules are now
performed independently for such rules, since the rules are evaluated on different
processors.
• The fourth factor that influences the speed-up is the overhead of mapping the parallel
algorithm on to a parallel hardware arcbitecture. The overheads may take the form of
memory-contention costs, synchronization costs or task-scheduling costs.
Some implementation issues associated with using rule-Ievel parallelism are now dis-
cussed. The first point that emerges from the previous discussion is that it is not advis-
able to allocate one processor per rule for performing match. If tbis is done, most of the
processors will be idle most of the time and the hardware utilization will be poor. When
280 Real time fault monitoring of industrial processes

using only a small number of processors, two alternative mapping strategies can be
considered. The first is to divide the role-based program into several partitions so that
the processing required by roles in each partition is almost the same, and then allocate
one processor for each partition. The second strategy is to have a task queue shared by
all processors in wbich entries for all roles requiring processing are placed. Whenever a
processor finishes processing one role, it gets the next role that needs processing from
the task queue. Some advantages and disadvantages of these two strategies are given
below.
The first strategy is suitable for both shared memory multiprocessors and non-shared
memory multicomputers, since little or no communication is required between proces-
sors. The main difficulty, however, is to find partitions of the role-based system that re-
quire the same amount of processing. Note that even if one finds partitions with only one
affected rule per partition, the variance in the cost of processing the affected rule still
destroys most of the speed-up. The task of partitioning is also difficult because good
models are not available for estimating the processing cost of rules, and also because the
processing cost of roles varies over time. A discussion of the various issues involved in
the partitioning task is presented in Carriero and Gelernter, (1989).
The second strategy is suitable only for shared memory arcbitectures, because it requires
that each processor has access to the code and state of all roles in the program (wbile it
is possible to replicate the code in the local memories of all the processors, it is not pos-
sible to do so economically for the dynamically changing state associated with the roles).
Since the tasks are allocated dynamically to the processors, tbis strategy has the advan-
tage that the load distribution problem is not present. Another advantage of tbis strategy
is that it extends very weil to finer granularities of parallelism. However, tbis strategy
loses some performance due to the synchronization, scheduling, and memory contention
overheads.
Node parallelism. When node parallelism is used, activations of different multi-input
nodes in the data-flow network are evaluated in parallel.
It is important to note that node parallelism subsumes role-Ievel parallelism, in that node
parallelism has a finer grain than role-Ievel parallelism. Thus, using node parallelism, both
activations of two-input nodes belonging to different roles (corresponding to role-Ievel
parallelism), and activations of two-input nodes belonging to the same role (resulting in
the extra parallelism) are processed in parallel.
The main reason for going to tbis finer granularity of parallelism is to reduce the value of
tmax>the maximum time taken by any affected role to finish match. Tbis decreased
granularity of parallelism, however, leads to increased communication requirements be-
tween the processes evaluating the nodes. When using node parallelism, a process must
communicate the results of a successful match to the successors of that two-input node.
However, no communication is necessary ifthe match faits: To evaluate the effectiveness
of exploiting node parallelism, it is necessary to weigh the advantages of reducing tmax
Automatie expert process fault diagnosis and supervision 281

against the cost of increased communication and the associated limitation on feasible
hardware architectures.
Another advantage of using node parallelism is that some of the sharing lost when using
rule-level parallelism is recovered. If two rules need anode with the same functionality,
it is possible to keep only one copy of the node and to evaluate it only once, since it is no
longer necessary to have separate nodes for different rules. The gain due to the increased
amount of sharing is a factor of 1.3, which is quite significant.
Action parallelism. Usually when a rule fires, it makes several changes in the working
memory. Processing these changes concurrently, instead of sequentially, leads to in-
creased speed-up from rule, node, and intranode parallelism. This source of parallelism is
named action parallelism, since matches for multiple actions in the right-hand side ofthe
rule are being processed in parallel.
Data parallelism. A still finer grain of parallelism may be exploited by performing the
processing required by each individual node activation in parallel. This task can be
speeded up using data parallelism (Carriero and Gelernter, 1989). Such parallelism is
expected to reduce tmax even further, and thus help increase the overall speed-up. The
disadvantage of exploiting data parallelism of conventional shared memory multiproces-
sors is that the overhead of scheduling and synchronizing these very fine grained tasks (a
few instructions) nullifies the advantages. However, exploiting data parallelism is not as
hard on highly parallel machines.
Parallelism in conflict resolution.
The conflict-resolution phase is not expected to be a bottleneck in the near future. The
reasons for this are:
• Current rule-based interpreters spend only about 5 percent of their execution time
on conflict-resolution. Thus the match phase has to be speeded up considerably be-
fore conflict-resolution becomes a bottleneck.
• In rule-Ievel and node parallelism, the matches for the affected rules finish at differ-
ent times because of the variation in the processing required by the affected rules.
Thus many changes to the conflict set are available to the conflict-resolution proc-
ess while some rules are still performing match. Thus much of the conflict-resolu-
tion time can be overlapped with the match time, reducing the chances of conflict-
resolution becoming a bottleneck.
If the conflict-resolution does becomes a bottleneck in the future, there are several
strategies for avoiding it. For example, to begin the next execution cycle, it is not neces-
sary to perform conflict-resolution for the current changes to completion. It is only nec-
essary to compare each current change to the highest priority rule instantiation so far.
Once the highest priority instantiation is selected, the next execution cycle can begin.
The complete sorting of the rule instantiations can be overlapped with the match phase
for the next cycle. Hardware priority queues provide another strategy.
282 Real time fault monitoring of industrial processes

Parallelism in RHS evaluation.


The RHS-evaluation step, like the conflict-resolution phase, takes only about 5 percent
of the total time for the current ruIe-based systems. When many mies are a1lowed to fire
in parallel, it is quite straightforward to evaluate their right-hand sides in parallel. Even
when the right-hand side of only a single rule is to be evaluated, it is possible to overlap
some of the input/output with the match for the next execution cycle. Also when the
right-hand side results in several changes in the working memory, the match phase can
begin as soon as the first change to working memory is determined.
It was stated before that the conflict-resolution phase must finish completely before the
right-hand side can be evaluated (until that time it is not sure wbich ruIe will fire next).
However, if one takes a speculative approach, it is possible to overlap the conflict reso-
lution and the RHS evaluation step. The solution is to make an intelligent guess about
wbich rule is going to fire next. For example, one may guess that the second best rule
from the previous conflict resolution phase is the rule that is going to fire next. After
making the guess, one can go ahead and evaluate the RHS of that rule; that is, determine
what changes are going to be made in the working memory. Actually, the working
memory is not modified at tbis point. When the winning rule is found out at the end of
the conflict resolution phase, if the guess was correct, the RHS evaluation step is already
done. If the guess was wrong, then some processing resources have only been wasted,
wbich is not too bad, especially if they were idle in any case.
Application parallelism.
There is extra speed-up to be gained from application parallelism, where a number of
cooperating, but loosely coupled, rule-based tasks execute in parallel. The cooperating
tasks may arise in the context of search, where there are a number of paths to be ex-
plored, and it is possible to explore each of the paths in parallel (similar to OR-
parallelism in logic programs). Alternatively, the cooperating tasks may arise in the
context where there are a number of semi-independent tasks, a11 of wbich have to be
performed, and they can be performed in parallel (similar to AND-parallelism in logic
programs). It is also possible to have cooperating tasks that have a producer-consumer
relationsbip among them (similar to stream-parallelism in logic programs). The
maximum speed-up that can be obtained form application parallelism is equal to the
number of cooperating tasks, wbich can be significant. Unfortunately, most current rule-
based systems do not exploit such parallelism, because,
1. The rule-based programs were expected to run on a uniprocessor, where no advan-
tage is to be gained from having several parallel tasks, and
2. Current rule-based languages do not provide the features to write multiple coop-
erating rule-based tasks easily.
Gupta et al. , (1989), describe the arcbitecture of the Production System Machine
(PSM), a hardware structure suitable for executing in parallel rule-based systems of
dataflow-like nature. The performance that would be obtained as a result of parallel
Automatie expert process fault diagnosis and supervision 283

implementation of rule-based knowledge systems was evaluated through a large number


of simulations. They also compared tbis performance rating to that expected from other
proposed arcbitectures for parallel implementation of rule-based knowledge systems.
Very useful information for eventual application problems concerning parallel
implementation of rule-based diagnosis systems can be found there.

4.2.1.5 Validating expert systems

Like typical software development, expert system development has a life cycle.
Validation is formally included in most expert system development frameworks, in the
form of phased or task-stepwise decomposition of the complete development process.
The term validation is used many times inconsistently and often confused with
evaluation. Validation is defined here to be distinct form evaluation.
Validation is the process of determining that an expert system accurately represents an
expert's knowledge in a particular problem domain. Tbis definition of validation focuses
on the expert system and the expert. In contrast, evaluation is defined as the process of
examining an expert system's ability to solve real-world problems in a particular problem
domain. Evaluation focuses on the expert system and the real world. Grogono et al.,
(1991), outline some ofthe issues involved in evaluating expert systems and cite almost
200 significant papers on tbis topic.
Validation has two dimensions, verification and substantiation. Verification is the
authentication that the formulated problem contains the actual problem in its entirety and
is sufficiently weil structured to permit the derivation of a sufficiently credible solution.
Substantiation is defined as the demonstration that a computer model witbin its domain
of applicability possesses a satisfactory range of accuracy consistent with the intended
application of the model.
Among the many concems expressed about developing and validating expert systems are
the following:
• What should be validated?
• How is it validated?
• What are the procedures for validation?
• How is bias controlled?
• How is validation integrated into development?
• How are costs controlled?
These concems are particularly relevant when developing demonstration prototypes,
where costs and time resources are constrained. In these situations, it is easy to minimize
or overlook validation. All too often validation becomes bighly informalized and, as a
result, does not become an integral part of development. 0' Leary et al., (1990), extend-
ing Buchanan's and previous testing tasks, presented a specific formal validation para-
284 Real time fault monitoring of industrial processes

digm for prototype expert system development witbin time and cost constraints. It incor-
porates many of the descriptive elements addressed by others, and explicitly incorporates
validation into the development life-cycle approach for prototype development.
The validation process involves verification that the model sufficiently addresses the real
problem in its entirety, and substantiation that the model possesses a sufficient range of
accuracy. Verification and substantiation are evaluated through a three stage procedure
ensuring face validity, establisbing subsystem validity and comparing input-output
transformations.
These stages and processes are related by the interaction of the knowledge engineering
term, the expert(s), the prototypical expert system and the real world. Central to the
validation process are the expert(s) and the knowledge engineering team, consisting of at
least two members. One member, the system designer, has primary responsibility for
knowledge acquisition and encoding the prototypical expert system. The other member,
the third-party validator, has primary responsibility for validation.
The development process begins as the system designer interacts with the expert to de-
velop a view of the expert system. (S)he then creates a tangible representation of tbis
view in the form of an initial prototype (Buchanan's identification, conceptualization,
formalization, and implementation tasks).
During formal validation (Buchanan's testing task), the third-party validator, the system
designer and the expert's work closely together. The validator examines the prototype to
ensure that the system designer's view and the expert's view are consistently represented
and that the prototype is able to respond to domain-specific real world situations. Tbis
examination iterates through three stages: face validity, subsystem validity, and input-
output comparison. As the team members find inconsistencies or unacceptable limitations
in the prototype, they make system reformulations, redesigns, and refinements, and re-
visit appropriate tasks. In tbis manner, validation becomes the driver as the initial proto-
type evolves into a demonstration prototype.
Tbis paradigm is especially relevant to expert system endeavors where demonstrating
feasibility and potential performance is necessary or appropriate before making a sub-
stantial resource investment. As organizations consider integrating expert system tech-
nology into their repertoire of computer-based applications, it is important that experi-
ence precede development work.

4.2.2 Event-based architecturelor real-timelault diagnosis

A new class of diagnostic systems is emerging from recent programs directed toward ve-
bicle operator aids for fighter aircraft, submarines and helicopters. These systems are
neither static off-line aids nor real-time controllers. Instead they are expert control advi-
sory systems wbich span the time seales of both regimes. These systems interface with
Automatie expert process fault diagnosis and supervision 285

controllers to interpret the error codes and to conduct tests and implement reconfigura-
tions. On the other hand, these systems also interact with the vehicle operator to priori-
tize their activity consistent with the operator's goals and to recommend diagnos-
tic/emergency procedures. The extension of the applicability of these methods to the
industrial fault diagnosis practice is straightforward.
System status (SS) is the function responsible for in-flight diagnosis of aircraft equipment
failures and SS examples will be used here to describe the requirements for diagnosis in
expert control advisory systems (pomeroy et al. , (1990), Passino and Antsaklis (1988».
The diagnostic architecture developed for SS integrates a number of separate technolo-
gies to achieve coverage of all the requirements. This architecture is a fusion of statistical
fault detection techniques like Kalman filters (see Chapters 2 and 3) with artificial
intelligence techniques such as rule-based logic, blackboards, causal nets and model-
based reasoning (see Section 4.2.1). This approach exploits the strengths of each
technique and provides a mechanism for automated reasoning using both quantitative
and qualitative information. Furthermore, the concept of an "event" has been introduced
to track multiple faults and maintain diagnostic continuity through priority interrupts
from the SS controller. A specific application of this approach to jet engine diagnosis is
described by Pomeroy et al. (1990).
Levels of architecture. In the real-time environment of system status any diagnostic
activity must be structured so that it can be interrupted and restarted as SS control reacts
to new events and changing priorities. The diagnostic process must also provide answers
with varying degrees of resolution depending upon the time available for processing.
Both ofthese requirements are met by dividing the diagnostic process into four levels:
1. Monitor for abnormal data.
2. Generate hypotheses that might explain the abnormal data.
3. Evaluate the available data to confirm or rule-out the hypothesized faults; if more
data are required request tests to be done.
4. Execute the tests, and monitor for the results. Tests may consist of running models
of the systems, initiating non-intrusive built-in tests (BITs) in the systems, or
requesting operator approval for intrusive or operator-initiated tests.
These levels communicate through messages as shown in fig. 4.2, and each level is a
knowledge source within the SS blackboard control scheme. While these messages pro-
vide the internallexternal communication functions of diagnosis, something more is
needed to provide coordination of the multiple diagnostic processes which can occur
with overlapping time frames. This problem is solved by linking the overall diagnostic
procedure to the concept of an event.
Events. An event is triggered by a new abnormality appearing in the bus data stream. An
event includes all of the subsequent diagnostic steps leading to isolation of the fault
which caused the abnormality. A frame-based data-structure is used to track each event
and keep it untangled from other events which may be proceeding through processing at
286 Real time fauIt monitoring of industrial processes

the same time. Tbis structure also provides arecord of the event that may be useful for
post-operation maintenance.

fault-found *
bus data a t-corrected*
fiul
I
FauIt
Monitor
- data-abnormal *
new-data
Hypothesis ~
po- Generator I--
eval- fault-suspected *
complete
~ Hypothesis
- -
~
Evaluator

-
test-
complete test-requested
Hypothesis
Testing ~
Faulted models

I Operator initiated
tests

Figure 4.2 Event based diagnostic architecture and messages.

Each event is an instance of a general event class; event frames have the following slots:
BUS DATA: a list of data sampies connected with the event; tbis is a "snapshot" ofthe
situation near the event, and may include later sampies collected during testing.
ANOMALIES: a list of abnormal data items wbich triggered tbis event. Tbis list is used
by the Fault Monitor to suppress further data-abnormal messages once an event has been
spawned; it provides a "we know about that and we're working on it" sort ofbehavior.
HYPOTHESES: a list ofpossible faults.
TESTS PENDING: a list oftests that are to be performed.
TESTS COMPLETED: a list oftests and their results.
FAULTS CONFIRMED/RULED-OUT: the hypotheses are sorted into one of these two
categories.
STATUS OF EVENT: is pending until it becomes resolved or unresolved.
Diagnosis stops when there are no new hypotheses.
Interaction with other functions. Communication between SS Diagnosis and the SS
Limits Estimation and Corrective functions is provided by the activity of a causal net-
work (see Section 4.2.1.3).
Automatie expert process fault diagnosis and supervision 287

Communication with the outside world consists of the input and output streams dis-
cussed earlier in connection with fig. 4.2. All communication between the system subsys-
tems is by means of the bus data stream, which implements the following division of la-
bor between SS diagnosis and the local system diagnosis:
1. All fault detection is performed within the local systems. Detection requires con-
tinuous screening of sensor data at the sampling rate of the local controller, and
detection processes are typically included in the controlloop to protect against sen-
sor failure. Transtnitting the sensor data to a central detection process in most cases
would require high bandwidth communication. Fault detection can be done more
efficiently in the local systems.
2. Isolation offaults is shared between SS and the local systems. In general, fault iso-
lation can be most efficiently done by the central diagnostic process (SS) which can
bring multiple sources of information to bear on the problem, and which can exe-
cute tests beyond the scope of the local systems.
3. On the other hand, there are classes of faults which must be isolated by the local
system in order to reconfigure quickly enough to avoid loss of control.
Thus the bus interface to SS normally reports only the results of continuously running
built-in tests (BITs), i.e. error alerts; in the case of jet engines these BITs are generated
by a Kalman filter that continuously compares the engine sensor data to outputs from an
engine model. Only when a fault occurs and SS begins isolation, does SS request access
to detailed data sampling streams.
Multiple faults that are related through a common mode can be addressed within the
event-based architecture by adding a Fault Predictor to the four functions in fig. 4.2.
Whenever a fault is found this predictor searches for common mode relations, e.g. func-
tionally connected or physically connected, and posts the names of components which
may be effected to act as a focus mechanism for the hypothesis generator.

4.2.3 Curve analysis techniques for real-time fault diagnosis

Process parameters and some process observables are gathered during the process exe-
cution, so they may be represented as discrete curves with time as the independent vari-
able. Frequently, an ideal curve can be associated with each process. This is what is
expected from a perfectly executed process.
Problems in operation are often identifiable when the input curve deviates from the ideal
curve. The deviation may be a difference in slope, amplitude, or duration between the
input and ideal curves. The difference in curves may be caused by malfunctioning equip-
ment, processing an already damaged part, or processing problems (e.g., operator er-
rors). In all ofthese cases, it is important to identify the problem in order to make ap-
propriate corrections. Analysis of curves is therefore an important tool for diagnosis.
288 Real time fault monitoring of industrial processes

Diagnostic techniques have been developed to analyze process parameters and observ-
ables that change over time (Dolins and Reese, 1992). These techniques can use specific
digital signal-processing algorithms to transform the input signal into symbolic data.
Knowledge-based diagnosis is performed on the symbolic data to determine malfunc-
tions. The monitoring system informs appropriate personnel of problems by sounding an
alarm or printing a message.
Curve analysis involves detecting and identifying deviations of an input curve from an
ideal curve. There are two alternative ways to perform analysis: one approach is to com-
pare the input curve to a set of curves that result from unsuccessful processing. Another
compares the input curve only to the ideal curve using qualitative analysis of the differ-
ences.
In the first approach, a knowledge base of abnormal curves is defined, where each curve
is a characteristic representative of a particular problem. Associated with each character-
istic abnormal curve is a diagnosis. Ifthe input curve closely matches one ofthe abnor-
mal curves, then the associated cause of the problem is reported. The advantage of this
approach is implicit diagnosis; when the input curve matches successfully, it already has
an associated diagnosis. However, tbis approach has two disadvantages. First, it may be
difficult to build a complete knowledge base as the anomalous curves must be defined to
match closely with actual erroneous measurements. The second disadvantage is that
curves are hard wired, i.e., if the process changes, then the entire knowledge base must
be changed to support the new data describing the correct and incorrect behavior of the
process.
The second approach compares an input curve to the ideal curve only. Ideal and input
curves are composed of regions. A region is a continuous group of data points where
each point has approximately the same slope. Regions can be inclining, flat, or declining.
If a process engineer is uninterested in several contiguous regions, then (s)he may elect
to aggregate them into one region. In general, region divisions correspond to significant
changes in the process, e.g., an abrupt change in the value of a parameter. Tbis approach
is possible if the user has some technique available to describe anomalous curves with
respect to the ideal curve. Such a description should allow the user to express deviations
using qualitative as weil as quantitative criteria, and associate causes using symbolic
processing. Suppose one uses a technique based on tbis approach, to interpret an input
curve that has a flat region with a longer duration than the ideal curve. The technique
should allow her(bim) to describe the problem in terms of the flat region having a dura-
tion that lasts too long. Also, the user must be able to associate causes of problems with
the different anomalous curves. Several diagnostic systems have been developed to diag-
nose manufacturing problems based on tbis second approach (Dolins and Reese, 1992).
Dolins and Reese (1992), developed a technique that allows manufacturing and process
engineers to describe abnormal curves. The abnormal curves are described in terms of
their differences from the ideal curve, wbich is the curve that best describes a process
parameter or observable after a given industrial process successfully finishes processing.
Automatie expert process fauIt diagnosis and supervision 289

Manufacturing engineers can describe the differences symbolically, e.g., "if the first re-
gion of the curve lasts too long then the machine must have agas leak". The user can
also input numeric values to set tolerances for determining unacceptable input curves.
The technique is independent of any industrial process, and all domain specific informa-
tion is input by the user, who is an expert in the process, to the program.
The technique has two operating modes: process definition and process monitor-
ingldiagnosis.
In process definition, the human expert has to describe the ideal curve and anomalies. An
ideal curve is initially input into the computer program, and the user manually selects
regions. Each region is an interesting feature in the ideal curve which corresponds to a
specific manifestation of the process.
After defining the ideal curve the human expert describes input curve anomalies by creat-
ing a knowledged-base of process-specijic ru/es. Process-specific roles relate generic
tests to input and ideal curve regions for a given process. Generic tests are built-in func-
tions provided by the diagnostic technique that compare different symbolic attributes of
input and ideal regions. For example, length is a symbolic attribute of a region, and the
result of a comparison of the length of two regions can be described as either too /ong,
too short, or okay.
In the process monitoringldiagnosis mode, the technique analyzes input curves in two
steps: signal-symbol transformation and knowledge-based diagnosis, see fig. 4.3. The
signal-symbol transformation step identifies regions of the input curve by matching all of
the points of the input curve to the ideal curve. After a11 points are matched, the regions
of the ideal curve are used to find the regions of the input curve. The second step applies
the complete knowledge-base of process-specific roles to compare the regions of the
input curve to the regions of the ideal curve.
An expert is required to select an ideal curve for a particular process and input the curve
to the program. Some machines may have idiosyncracies that make their ideal curve dif-
fer in shape from the ideal curves generated by the other machines of the same type. In
these cases, an ideal curve has to be defined for each machine.
Once the ideal curve is input, the expert divides the curve into meaningful regions, Le.,
(s)he marks divisions where process-related changes occur. These regions are stored and
used later in the analysis. The expert also defines a set of roles for testing input curves.
Entering an ideal curve, dividing the ideal curve into regions, and defining roles are
initialization tasks required of the human expert. These tasks constitute the process
definition mode.
The diagnostic system can now ron automatically without human intervention until an
error is detected, Le., the program can operate in a process monitoringldiagnosis mode.
The combination of signal-to-symbol transformations and role-based reasoning has sev-
eral advantages, but it is not a panacea for a11 diagnostic problems based on curve inter-
290 Real time fault monitoring of industrial processes

pretation. One disadvantage of the diagnostic technique is that two potential processing
problems may have identical input curves. In tbis case, a better diagnosis can only be
provided if more data are available and more reasoning provided. A second disadvantage
of tbis technique is that an abnormality in a curve may mask other problems. One ap-
proach is to explain only the first difference between the ideal and input curves.

Knowledge Base

Rulel:If regionl is 15% too long


then "material is too thick"
else "it is too short"
then "material is too thin"
Rule2:If r~ion2's slope is l00!o too steep
then ''Loose seal on hose"
Ideal Curve

1\ "
..
,-------------------~
ATTENTION:

Rule 2: Loose seal on hose

Input Curve \ . " - -_ _ _....::11\

Knowledge-based Diagnosis
Signal-Symbol Transfonnation

Figure 4.3 Curve analysis based diagnosis combining digital signal processing and mle-based
reasoning.

One advantage of tbis method is that the signal processing algorithm used to transform
the input signal into symbolic data allows the fast analysis of regions that vary with re-
spect to time. Tbis is important because the durations of regions may vary due to unsuc-
cessful processing. Regions ofthe input curve, with varying durations, can match directly
to corresponding regions of the ideal curve. Tbis processing allows the user to examine
regions symbolically.
A second advantage is that few false alarms are generated with tbis method. Problems
are detected by the process-specific mIes, and the process engineer has complete control
over the criteria for judging acceptable and unacceptable traces. False alarms can only be
caused when process engineers define rules that incorrectly diagnose problems or incor-
rectly set thresholds.
The system's ease of use is a third advantage. Only an ideal curve and process-specific
rules have to be defined. Furthermore, few rules are needed for the system to be effec-
Automatie expert process fault diagnosis and supervision 291

tive, which is unlike most knowledge-based systems. For these cases, a process engineer
many only need to define a single rule to detect a commonly occurring error.
Several applications to detect manufacturing problems as soon as they occur are dis-
cussed by Dolins and Reese, (1992), to illustrate the general purpose use of this tech-
nique.

4.2.4 Real-time fault detection using Petri nets

Petri nets are a powerful tool for system description (Al-Jaar, 1990). Nevertheless up to
the present they have mainly been used only for simulation purposes. The problem of
process fault monitoring in an industrial plant can be stated as folIows: The measurement
signals come from the system with a constant scanning rate. When processing these data,
a computer-based system should decide on-line in real time if an error has occurred or
not. To perform this, the computer program needs some expert knowledge about the
system (or the "total" process, which is composed of several partial processes, like big-
ger subsystems in a power plant or in a chemical factory) under consideration.
By modeling the system as a Petri net, failures with slow time constants are detectable in
real-time. Sensor or process errors which are manifested in signals related to physical
conservation quantities can be identified. After a fault is detected, a prognosis of the
future system's behavior can be provided.
The original Petri net theory only describes the causal correlation between places and
transitions within a system (an event is a consequence of another one). There were no
statements about its temporal behavior. This, however, is absolutely necessary for de-
scribing events and processes in the manufacturing area. There are different theories how
to link Petri nets with time. In the manufacturing techniques the processes (milling, drill-
ing, assembling, ... ) are responsible for the consumption of time. This is the reason, why
one has to associate time with the transitions. Thus, in the case of firing a transition, the
tokens of the places before a transition will be removed. If the firing time is over they
will be at the place behind the transition. An example is the time which takes a slide from
one limit switch to the following one.
The nets for diagnosis purposes represent the temporal progress of aplant or machine,
which are to be controlled, as a model. This explains why the nets used for control form
the basis for the construction of the nets for diagnosis (see fig.4.4). The places in both
nets represent the inputs and the outputs of the PLC and, therefore, they are the interface
between control and diagnosis. Thus, in both nets the count and the indication of the
places must be identical.
The most important function of a diagnosis system is the monitoring component. Its ca-
pability defines the nature, the scope and the precision of the failure detection. Only after
the detection of a failure a specific diagnosis can start. The power of monitoring is
292 Real time fault monitoring of industrial processes

equivalent to the quality and quantity of infonnation from the machines. This is especially
the case when sensors and actuators do not have their own infonnation processing and,
therefore, they are not able to monitor themselves.
The range of methods for monitoring depends extremelyon the support for the methods,
which is provided directly from the model.

e.I SC
@
, sc
SI .-----1( S2

.....
SC ." .'

S 4 1 - - - - -.... S4
CONTROL DIAGNOSIS
SI,S2:input signal from sensor
S3,S4:output signal to actuator
SC:secondary condition. This condition is necessary for
firing of a transition.
Ifthe transition fires, no tokens will be removed from the
place before the transition.

Figure 4.4 Diagnosis of sensors.

Within the concept of monitoring, one can distinguish between a functional and a tempo-
ral comparison. The required state is determined by the interpretation of the Petri net
data structure. This takes place on the facility level as weil as on the station level. The
actual state on the station level results from the inputs and the outputs of the PLC, which
are assigned to the places of the Petri net. On the higher levels, the actual state results
from the condensing of the state reports from the different PLCs which control plant
components as single machines, conveyors, robots etc.
In order to show the different monitoring methods clearly, the following cases have to be
distinguished (see fig. 4.6):
1. The real process has kept to the required time.
2. The real process has fallen short ofthe required time.
3. The real process has exceeded the required time.
In the case of time monitoring, the duration of performing a real action is recorded and
compared to the required time. The required state of time is taken from the active transi-
Automatie expert process fault diagnosis and supervision 293

tion in the Petri net. If more than one transition is simultaneously active, the time moni-
toring will be processed in a parallel way. Ifmicrocomputers are used on the facility level
and PLC's on the station level, their operating system provides several timers. These
timers can be used for monitoring.

ü--1 T+At
-0
occurance of
required stote
runtime oftransition

start of
transition Trequired

f f f
0 (0 0
Figure 4.5 Diagnosis of sensors.

I··· -_...•
j~~:
S1
()--..J 0 sec Tl t----O: :
~~I
, I I ~_ _ _..,

, I I
SI S2 ~,!
1'\ S3
I ...
Sl,S3:Sensors I
I
I
S2 :Actuator
.. _------I
Figure 4.6 Different states in the Petri net based monitoring concept.

The interpreter of the Petri nets within the diagnosis system always determines the next
required state and, by means of the time component, also the precise time of its occur-
rence one step in advance compared to the real plant. During the runtime of the system it
294 Real time fault monitoring of industrial processes

is important that the diagnosis program and the control program work concurrently
(Maäberg and Seifert, 1991).
In order to prevent the indication of a failure in cases of small deviations from the re-
quired time, a tolerance time is additionally implemented. A tolerance time can be clearly
assigned to a transition. After the required time of an active transition within the
diagnosis net model has passed, the component for monitoring of the tolerance time will
be activated and the required conditions of the places behind that transition will be ac-
tualized. Within the monitoring of the tolerance time, a continuous comparison between
the required and the actual state of the places which are directly connected to the transi-
tion, is performed. Ifthe required and the actual state ofthose places are equal (case 1 in
fig 4.6), a failure has not occurred, the comparison between the required and the actual
state will be broken off, and control and diagnosis of the plant will be continued. If, even
after finishing the tolerance time a deviation between the required and the actual state of
the places can be determined, a failure will be detected from the time monitoring (case 3
in fig 4.6).
In order to select and define the correct reaction of the diagnosis system in case of a fail-
ure, a thorough analysis of all possible failures by the operator of the plant is necessary.
The failures have to be classified according to their effects and the reactions correspond-
ingly defined. In case of serious failures the diagnosis system must react with emergency
shutdown or emergency stop. Deviations, which do not represent any failure can there-
fore be ignored. This means that the diagnosis system causes reactions which do not
stop the plant, but make an operation possible in those individual manufacturing
parameters (e.g. velocity of motion etc.) which are changed. Another possibility of the
limited operation (LO) of the plant is the activation of alternative predefined control
strategies, wbich, for example, transfer the plant into a secure condition. With tbis con-
cept minor failures can be compensated or even corrected by control instructions. As
soon as the classification of failures has finished and individual failures as weIl as combi-
nation failures have been assigned to the correct reactions, the results are made available
to the diagnosis system in the form of a so called reaction model. This model consists of
IF-THEN rules. The causal correlations describe which preconditions lead to which
reactions. A mechanism, that handles the rules can choose, after detecting a failure, the
correct rule and activate the planned reaction.
The essential module of the cooperation between the functions of monitoring, diagnosis
and therapy is the mechanism of handling the rules. It does not only process the reaction
model as the collection of all failure rules, but it also administers the failure vector as the
interface between the above mentioned mechanism and the module, which compares the
actual and the required state.
The failure vector is structuraHy identical with the vector of the required and actual value
and consequently identical with the structure of the IF-part of the failure rules. Each
column of that part of the rule represents a vector, which has as many elements as there
Automatie expert process fault diagnosis and supervision 295

are states in the Petri net. This sort of data compatibility guarantees a very fast
processing in the mechanism ofrule handling (see fig. 4.7).

:f}t.~·E·~ :,:.·:· EMERGEtJCY>·::.:.


~;,:::::-:::::::·:: ··:::::""':":SHÜTD OWN·· '..
........
. :'.:.,:.::-:'. J f'· I .:':1:~~ :.,:'
.' '\:' .~: :,.· l F/L.:: ; ... ··.
I ·... ·....·...·.·...•..·.••..·.·..·..•·•· ::: ::: .:: .':·;:i· ~: ;::~ :i~r:\:::·~:.::·:::·· .:- .....:...
-

Figure 4.7 Concept ofthe mechanism, which handles the rules in Petri net based fault diagnosis

In the case of an emergency shutdown or emergency stop, control orders for stopping
the operations are given out, the diagnosis is stopped and areport about this interruption
is produced.
In case the reaction intends a limited operation, the control orders which the operator
has defined in advance in the form of a program for a PLC, are activated. For this
purpose, a message is sent from the station level to the area level which immediately
selects the corresponding Petri net for control.
Such a diagnosis system must be designed in such a way, so as to be able to be imple-
mented on all levels of a hierarchical control structure. This concept is supported by the
capability of the Petri nets to decompose complex systems. By means of Petri nets it is
possible to describe a manufacturing system as a rough net in which a transition for
296 Real time fault monitoring of industrial processes

example, represents an individual machine, robot or conveyor belt. The places within the
net represent, according to their definition, static components as a storage system or as a
buffer for workpieces. The individual transitions, however, can be specified in greater
detail depending on their meaningfulness. A transition on a higher level represents an
entire net in a subordinated level. With that capability it becomes possible to model
individual function units like an actuator or the movement of a slide, as weil as individual
places like a limit switch. This means, that on the station, or PLC level, individual units
and their functional sequences can be monitored and diagnosed, whereas on the higher
controllevels the level of abstraction increases and because of this the entire plant or the
cooperation between individual machines is to be monitored and diagnosed here.
The module which activates the reactions has the same allocation of tasks. Each control
level is autonomous in its reaction behavior. In case a failure is detected on the
component level, areaction will be activated and areport will be sent to the area level
computer. If the reaction is a limited operation, arequest is additionally sent to the area
level which immediately transfers the replaced control program and its corresponding
diagnosis program to the PLC.
An event, for example the stoppage of an individual machine by the local PLC, causes a
change of the corresponding actual state on the higher level. On this level, a deviation
will be recognized from the monitoring component and an appropriate reaction will be
activated. Such areaction may be the activation of a redundant machine on which the
workpieces can be further processed.
The advantages of the decentralized diagnosis tasks are the relief of the area level com-
puter, the uniformity of failure recognition algorithms at all control levels and therefore
the high response speed in case of deviations.
Maäberg and Seifert, (1991), present a Computer Aided Automation Environment sup-
porting a user during all phases of the life cycle of an automated plant, beginning from
the planning and projection phase up to the runtime of the plant. Petri nets are the
integrating components. They are generated in the projection phase by translating the
function charts. In the realization phase they are used for simulation and planning and
finally in the running phase of the plant they are used for controlling, monitoring and
diagnosis of the plant.
Prock, (1991), describes a new technique of on-line fault detection in real time, in a
process independent formulation, using pIace/transition nets. Place/transition nets are a
subclass of Petri nets. For readers who are unfamiliar with the Petri net theory, some
basic definitions of place/transition nets (hereafter called pt nets) are given in Appendix
4.B. This formal presentation for pt nets theory will help the reader to understand related
application examples as weil as to deal with related diagnostic problems from
engineering practice.
Prock, (1991), applied this method to the real time fault monitoring of a secondary
eooling loop of a nuclear power plant. The deteetion of abnormal proeess behavior or
Automatie expert process fault diagnosis and supervision 297

measurement faults with low time constants was possible and a prognosis of the future
system behavior was given in the error case. Due to the simplicity of the fault detection
criterion no diagnosis of the failure localization could be provided. This is not areal
drawback because fast transients, as a consequence of serious faults, are weil managed
by the automatic plant safety systems.
The Petri net fault monitoring methods are predestinated for the surveillance of complex
technical systems like production lines or transport circuits (Wiele et al., 1988). Because
of the lack of the diagnosis feature, this method should be considered as part of an on-
line process information system which is able to trigger a (possible off-line and thus more
practical from the implementation point ofview) diagnosis and interpretation unit.

4.2.5 FuZZ)' logic theory in real-time process fault diagnosis

Rule-based approaches have been proposed as capable of realizing flexible diagnostic


methods by paying attention to the mies describing the relationship between the causes
and symptoms offailures (see the previous sections ofthis chapter).
It is important to recognize the fact that a large part of expertise consists of heuristic
knowledge, which relies mostly upon subjective judgments and may inc1ude incomplete,
ambiguous and imprecise information. As a consequence, the application of such
uncertain knowledge results in inexact reasoning that the expert system has to deal with.
There are numerous methods which show how the expert system copes with uncertain
knowledge and inexact reasoning (de Kleeer (1990), Maruyama and Takahashi (1985),
Rhodes and Karakoulas (1991».
Generally it can be said that the theory of probability is employed to solve the problems
of plausible reasoning, while fuzzy set theory is used to solve the problems of .
Approximate reasoning, as opposed to plausible reasoning, means drawing conc1usions
by taking into account the linguistic consistency of the facts. In all expert systems based
on symbolic manipulation and plausible reasoning, uncertainly resides in the state of one's
knowledge. In expert systems based on semantic manipulation and approximate
reasoning, the emphasis is on fuzziness viewed as an intrinsic property of the natural
language.
The elicited knowledge, which allows interpretation and diagnostics, is organized in the
knowledge base as a set of juzzy conditional statements that relate test results to con-
c1usions about process condition or possible failures. The natural, logical way of reason-
ing and data reduction is applied. At the first level, a statement which estimates the pos-
sible situation based on the analytical input data is selected. Then, at the second level,
adequate additional input data are collected and the situation is specified more c1osely.
The juzzy conditional statements are of the form:
298 Real time fault monitoring of industrial processes

If A1.1 and Au and ... and A1 . N then 8 1 or


If A2 .1 and A2 .2 and ... and A2.N then 82 0r

If AM.1 and AM.2 and ... and AM.Nthen 8M

where Aij is a linguistic variable. A linguistic variable is a variable whose value can be
presented by a linguistic term used by experts such as "high", "normal" or "low" (i.e.
words or sentences in a synthetic language). A linguistic variable includes an adjective-
like term and its antonym, a modifier and a connective. The modifier is a measure of in-
tensity which is associated with a possibility distribution. This is often referred to as the
membership function in the literature. The fuzzy logic connectives are the weil known
conjunction, disjunction and negation operations. The value of a linguistic variable can
be presented by a fuzzy set which permits the definition of a membership function p" re-
flecting the degree to which an element belongs to the set. The membership function for
elicited expert knowledge about the fuzzy test limits can be represented by a piecewise
linear function. Such a function is presented in fig. 4.8. The four values a, b, c, and d are
numerical values stated by the experts in the process of knowledge acquisition Bi is a
possible conclusion.
Membership
degree (p.)

a b c d
Variable
Figure 4.8 Representation ofthe fuzzy function

Since the value Ai,j is represented by a fuzzy set, it is possible to associate it with the
grade of membership of conclusion by means of rules of fuzzy logic, even in cases where
the input value A; is not equal to that in the implication part of the rule, as contrary to
the "modus ponens" oftraditionallogic:
Automatie expert process fault diagnosis and supervision 299

A* input value
A~B fuzzy statement
A * 0 (A ~ B) =B* fuzzy conclusion

The tmthfulness designated as a grade of membership for this simple implication is


evaluated through the operation called Zadeh's Composition and provided by the min-
max operator:
mB*(y) =max(min(mA*(x),mAxB(x,y»)

Another way of writing the above equation is:


mB*(y) = min(mB(y),max(min(mA(x),mA*(x»»

The practical solution ofthe above equation for mB=l, is shown in fig. 4.9.

i =measured value
r = repatibility ofmeasurement
ZI and Z2 '" interseetion oftwo
lines y and y"
a,b,c,d= predetennined values for
definition of fu:zzy sets

a b x-r c X d i+r x

Figure 4.9 Determination ofthe maximum ordinate ofintersection betweenA andA*.

The explanation is as follows: As the fuzzy set Ais defined by four values a, b, C, d and
by their membership degrees, the two lines of different slope and known equation
describing this set are y(+) and y(-). SimilarIy, as the set A* is also fuzzy, taking into
account the measuring errors that may occur, it is described by the linesy*(+) andy*(-).
The maximum ordinate of the intersection z, between these two sets, can be found by
means ofthe followingthree mIes:
1. If2(1,2):::;;1 andz(2,1»1 then ms*(y)=z:::;; 1
2. If Z(1,2) >1 and 2(2,1»1 then ms'(y) =1
3. If Zi1.21 > 1 and z(2.11:::;; 1
300 Real time fault monitoring of industrial processes

The quantitative analysis of the possibility of a certain situation in the system described
by the fuzzy conditional statements, is made through the evaluation of its grade of mem-
bership according to Zadeh's compositional equation,
mB*(y) = m.ax (min(mB.(Y)' m~n (max(min(mA*(xj),mi,/Xj))))))
1:5:1:5:m 1 l:5:J:5:n j

taking into account the solution for the intersections mentioned above. Since each Bi;
;=1, 2, ... , n, in the fuzzy conditional statements can be considered as a fuzzy singleton
over a domain consisting of certain situations y, the starting value for mß; while evaluat-
ing mB*(Y) at the first level is 1, and at the second level the evaluated mB*(Y) becomes the
starting value.
The uncertainty of the knowledge in the knowledge base is taken into consideration by
giving different weight factors to fuzzy conditional statements. The choice of weight
factors is rather subjective. Trained artificial neural networks (ANNs), as generators of
membership functions and weight factors in fuzzy conditional statements, are potential
tools for the purpose of fuzzy logie process fault monitoring. Details and ANN
application examples are given in Chapter 5.
Zadeh's compositional inference rule is adopted as an inference mechanism. It accepts
fuzzy descriptions of the process symptoms and infers fuzzy descriptions of the process
faults by means ofthe fuzzy relationships described above.
The main characteristies of a fuzzy logic diagnosis system performance are:
1. Automatie interpretation of relations among the test (observation results) and pos-
sible situations, pointing out the process condition.
2. Detailed explanation ofhow the particular conclusion has been reached.
3. Indieation ofthe possible causes offailures.
4. Description ofthe possible consequences.
5. Recommendation for process maintenance and repair under new circumstances.
There may be difficulties for the above techniques to comprise the elements of the fail-
ures and symptoms and their logieal connections perfecdy. In other words, it is a plausi-
ble criticism against rule-based (fuzzy or not) diagnosis, that its design may be beyond
human knowledge since exceptional events can be introduced as soon as an improvement
has completed the system. A practieal answer should be provided for this argument.
How should the exception be expressed and included to reinforce the diagnosis? In the
present chapter the exception is expressed in a practical form of fuzzy logic. First, the
logical form of the exception is derived as the conjunction of the dictative functions.
Second, the cancellation law in binary logic is fuzzified in order to give an arithmetie for
calculating the linguistic truth value for reasoning. The logieal form of the exception is
derived in Appendix 4.C and requirements for its use are clarified there as weIl. The can-
cellation law is extended to fuzzy logic in order to devise the diagnostic method with the
exception (Maruyama and Takahashi, 1985).
Automatie expert process fault diagnosis and supervision 301

The introduction of a practically defined exception may be a solution to a plausible criti-


cism against rule-based diagnosis, which emphasized that its design was beyond human
knowledge since exceptional events always exist. As will be shown in Section 4.4, where
application examples will be presented, complementary utilization of the exception
generates reinforcement of the existing expert fuzzy diagnostic method to identitY the
leaking location of a Boiling Water Reactor power plant (Takahashi and Maruyama,
1987).

4.3 Applieation examples

4.3.1 Automatie expert diagnostie systems for nuelear power plant (NPP)
safety

4.3.1.1. Diagnostie expert systems for NPP safety

Research in applying expert systems software to nuclear power plants (NPP) has
substantially increased in the last two decades. The dynamically complex system of a
NPP is a most challenging topic for artificial intelligence (AI) specialists. Malfunction
diagnosis of NPP's systems generally uses shallow knowledge incorporated in fault
models. This derives from the fact that most NPP's systems receive signals from sensors
and that their possible malfunction causes and effects on system variables are weIl
known.
In recent years many important results have been obtained about representation and rea-
soning on structure and behavior of complex physical systems using qualitative causal
models. The current AI trend in this aspect is qualitative reasoning using deep
knowledge representation ofphysical behavior (Soumelidis and Edelmayer, 1991).
The suitability and the limits of a qualitative model based on deep knowledge for fault
detection and diagnosis of the emergency feed water system (EFWS) in a NPP is pre-
sented here. The EFWS has been chosen because of its importance in the safe function-
ing ofthe NPP.
The EFWS is a standby system which is not operated during normal plant operation. The
role of the EFWS is to provide full cooling of the Reactor Coolant System in emergency
conditions. The EFWS is automatically activated in three cases ofNPP malfunction:
1. Loss of offsite power (LOOP).
2. Low-Iow level in any steam generator.
3. Loss ofalternative current (LOAC).
302 Real time fault monitoring of industrial processes

The possible malfimctions which can occur in the EFWS and their causes and effects on
system variables are weIl known. They are associated with cracks in the pump or
condensate storage tank (CST) casing, pipe or valve mptures, and pump or valve
operation failure.
As the EFWS is working only in emergency conditions, the occurrence of a malfunction
in the EFWS will lead to catastrophic results. Safety insurance is. an acute problem in
NPP. It is expected that expert systems can contribute to the improvement of flexibility
and man-machine communication in NPP.
The expert system diagnostic process is performed by a forward-chaining inference en-
gine that operates on the knowledge base. The inference mechanisms adopted in deep
modeling techniques are used.
The diagnostic process module consists of two modules: Fault Detection and Fault
Diagnosis (see fig. 4.10).
The process starts with the Fault Detection module which detects a symptom of
malfimction by observing any qualitative variation of the output parameters. Several
information sets are instantiated in the initialization phase. The process then continues
with the identification of the causes of malfunction by exploiting the information
contained in the Model. The Model actually represents the Knowledge Base. It contains
descriptions of the Physical System (generic components, initial measurements,
connections, possible measurements, actual components). Note that only the correct
systems behavior is described in the model (Obreja, 1990). The Fault Diagnosis module
then pro pagates the observed qualitative variation through the system model using a
constraint propagation method (de Kleer, 1987). Thus, all possible fault models are
generated. This step ends when some input parameters (i.e. parameters in the LHS of
the mIes) are unknown, thus making further propagation impossible. The qualitative
reasoning process can continue only if new measurements are taken.
The decision on the choice of optimum measurements is taken according to heuristic
criteria, i.e. probabilities of component failures. From these probabilities one can
compute candidate probabilities and Shannon's entropy function (de Kleer, 1990, 1987).
After the most appropriate point to measure next has been identified, the measurement is
taken, and the qualitative propagation is continued for this measurement.
The Knowledge Base contains qualitative information derived from the EFWS model.
This information is used by the Diagnostic Process as presented above. The EFWS
model is described by components, connections, equations involving: process variables
and design parameters. Components are manual isolation valves, pumps, tanks and "t-
connections", i.e. pipes.
Components are connected together by process variables. The component's behavior
involves process variables and design parameters. Design parameters have nominal val-
ues stated by design. Qualitative analysis considers design parameters as constants.
Automatie expert process fault diagnosis and supervision 303

component description
MODEL
eomponent interconnections

propagation facts FAULT


t-_ottsearch depth and cutoffthreshold DETECTION
measurement point seleetion set

START DIAGNOSIS

create/update set of possible faulty eomponents


seleet next measurement point
get value at the seleeted point
update measurement point seleetion set
create/update set of eomponent predictions FAULT
remove improbable diagIlOSeS DIAGNOSIS
reeompute probabilities of measurement points

Figure 4.10 The expert system diagnostie process for NPP safety.

A symptom of malfunction is detected when a variation on the variables' qualitative val-


ues is observed. The variations of affected variables caused by EFWS malfunctions are:
tank malfunction dec std
pipe break inc inc
valve malfunction dec dec
where dec = decreases, inc = increases, std = steady
304 Real time fault monitoring of industrial processes

Each possible malfunction affects in a known way (inc or dec), but in an unknown meas-
ure, some variables. Thus, the model and its analysis are intrinsically qualitative. The use
of simple dynamic models for the system and the components is indispensable for the
real-time implementation of the proposed diagnostic procedure. Even quite complicated
technical systems, as NPP, are made up of rather simple components. It is usually quite
clear what is the input and what is the output of each component and their interactions.
Therefore, it is usually easy to conceptually split the process into subprocesses, with
simple interactions between them. For each process wbich is desired to survey, a sub-
model is written. Each submodel is fed, in real-time, with measurements of the variables
that influence the corresponding subprocess. If the usual relationsbip between the proc-
ess variables is broken, tbis indicates that sometbing is wrong. A given submodel re-
ceives the same input as the corresponding subprocess and should also give the same
output. A fault in a given subprocess may after some time spread its influence over a
large part of the total plant and give abnormal values to all variables, but the normal
relation between these abnormal variables is still valid, except in the faulty subprocess.
The ability to say where the fault is situated, in addition to saying that there is a fault is
an important advantage of tbis procedure.
Each diagnosis can be considered an independent, time-stamped object completing
witbin seconds from the point of invocation. Parallel diagnosis invocations may exist,
facilitating the simultaneous analysis-detection of multiple faults.
In implementing expert diagnostic procedures in real-world problems, considering PC-
platforms is notbing more than a waste oftime. For an efficient real-time implementation,
the model-based fault detection part should be coded in FORTRAN on a general
purpose number-cruncher, wbile the qualitative diagnostic reasoning should be per-
formed on a dedicated AI workstation, utilizing for example the convenient and powerful
LISP development environment. Coupling between the two modules can be facilitated by
means of Ethernet hardware and multivendor network software glue like the TCP!IP
Arpanet protocols. Use of TCP (transmission control protocol) ensures a reliable
transfer of critical data between the computers involved in the system. As TCP is a
connection-oriented communications protocol, some overhead in connection
management can be noticed. More speed could be gained by using connectionless
protocols, ego UDP (user datagram protocol) or the lower level IP (Internet protocol)
directly, but tbis would come at the expense of reduced safety and reliability, unless
carefully programmed by the application developer.
The methodology described above is applicable in a straightforward manner to other
complex plants (chemical plants, conventional power plants, marine equipment, etc.).
Automatie expert process fault diagnosis and supervision 305

4.3.1.2 Fuzzy Teasoning diagnosis fOT NPP safety

Fuzzy relation 0/ symptoms with leaking locations in NPP


The fuzzy diagnosis to be presented, was applied in identifying the leaking loeation in the
eooling system of a Boiling Water Reaetor (BWR) plant, in order to ascertain whether
the eoneept of exeeption was effeetive or not. Fig. 4.11 shows a sehematic
representation of the eooling system of tbis power plant. Typical examples are assumed
for the leaking loeations and the indueed symptoms in eonformity with the text. It is also
assumed that the leakages inside the dry-weH were found generaHy by eqs. (4.15), (4.16)
derived from the implieations of eqs. (4.4), (4.5), wbile those inside the building were
identified especially by eq s. (4.26), (4.27) resulting from the exeeption in eq. (4.14) .
Table 4.1 defines the failure and symptom vectors in the cooling system.

Reactor Building

RHR

Turbine Bui.lding

Turbine
Generator

Figure 4.11 Diagram ofboiling water reactor cooling system.


M$: Main steam system RHR: Residual heat removal system
RFW: reactor feed water system RClC: Reactor core isolation cooling system
PLR: Primary loop recirculation system CUW: Reactor water clean-up system
306 Real time fault monitoring of industrial processes

Table 4.1 Definition offailure and symptom vector.

(a) Elements offai/ure vector.


xI = {main steam line in dry-weIl}
x2 = {residual heat removal system in dry-weIl}
x3 = {steam line ofreactor core isolation cooling system in dry-weIl}
x4 = {reactor water clean-up system in dry-weIl}
Xs = {feed water system in dry-weIl}
x6' ... , xlO = {main steam line in building}
Xl1' ... , XI4 = {residual heat removal system in building}
XIS, XI6 = {steam line ofreactor core isolation cooling system in building}
x17 = {water line ofreactor core isolation cooling system in building}
x18' xI9 = {reactor water clean-up system in building}
x20, x21 = {feed water system in building}

(b). Elements of symptom vector.


YI = {flow rate increase in dry-weIl sump}
Y2 = {flow rate increase in air condenser drain}
Y3 = {pressure increase in dry-weIl}
Y4 = {pressure decrease in steam line}
Ys = {flow rate increase in main steam line}
Y6 = {flow rate increase in residual heat removal system}
Y7 = {high differential flow rate ofreactor water clean-up system}
Y8 = {high temperature in building}
Y9 = {flow rate increase in building sump}

Table 4.2 shows the matrices r(;land Pijlappearing in eqs. (415), (4.16) while Table 4.3
gives eijl and hjl appearing in eqs. (4.26), (4.27). Unity was specified for the upper
bound of ei.iu only if ei.ilwas greater than zero in eq. (4.26). The upper bound was zero at
the zero lower bound. The domain (1~ i ~ 5, l~j ~ 7) corresponds to Ri;' and (6 ~ i ~ 21,
4 ~j ~9) to Eij'
Examples of diagnosis through exceptions.
To ascertain whether the arithmetic given by eqs.(4.B-26), (4.B-27) is capable of
detecting a failure or not, presents a problem. Several examples were solved by utilizing
only the exceptions, where an input to this diagnostic method was provided by the
linguistic truth value of the proposition "the fth symptom is recognized", which
determined both the lower and upper bound.
Automatie expert process fault diagnosis and supervision 307

Table 4.2 Matrices of fuzzy relation of fail- Table 4.3 Matrix ofalternative fuzzy relation
ures with symptoms. Ei]' and vector of exceptional proposition ~

(a) Lower bound of Rija appearing in eq. (a) Lower bound of E ija appearing in eq.
(4.15): (4.27).
1.0 1.0 1.0 1.0 1.0 0.0 0.0 j=4 j=9
0 0 j=1
1.0 1.0 0.0 0.0 0.0 1.0 0.0
0.6 0.0 0.0 0.0 0.6 0.6 j=6
Ijjl= 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0.6 0.0
1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.9 0.0 0.0 0.0 0.0 0.9
1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 0.0 0.0 0.9 0.0
0.0 0.9 0.0 0.0 0.0 0.9
0.9 0.0 0.0 0.0 0.6 0.0
0.6 0.0 0.9 0.0 0.0 0.6
elil = 0 0.0 0.0 0.9 0.0 0.6 0.0
(b) Lower bound of Piju. appearing in Eq. 0.0 0.0 0.9 0.0 0.0 0.6
(4.16): 0.9 0.0 0.0 0.0 0.6 0.0
0.9 0.9 0.6 0.6 0.3 0.0 0.0 0.9 0.0 0.0 0.0 0.0 0.6
0.9 0.6 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.9
0.0 0.0 0.0 0.9 0.9 0.0
1'ijl= 0.9 0.6 0.3 0.3 0.0 0.0 0.0
0.0 0.0 0.0 0.6 0.0 0.6
0.9 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0.0 0.9 0.0
0.9 0.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 j =21

(b) Lower bound of Hin appearing in eq.


(4.27).
j=4 j=9
hjt =[0: 0.6 0.3 0.3 0.6 0.6 0.6 ]

Example 4.1. "No symptom has been recognized". This example should take the lin-
guistic value "completely false" for all} and consequently the range ofthe a-cut is [0, 0]
for all a so that,
bjt = 0.0, bju = O.O;} = 1, ... , 9
The calculated failures were,
Aia = 10; i = 1, ... , 21
This resulted in producing the announcement that no faiIure exists.
Example 4.2. This next example will diagnose a hypothetical state. Suppose a large
leakage occurs at the main steam line not in the dry-weIl but in the reactor building, and
308 Real time fault monitoring of industrial processes

reeall that sueh an event was assumed to be reeognized as the exeeption in the present
exercise. In tbis example, the isolation valve for the main steam shall be elosed due to a
deerease in pressure on the steam line, and the reactor may be stopped if the pressure in
the reactor vessel deereases. Consequently the temperature of the atmosphere rises and
the flow rate of the sump inereases in the building. However, the flow rate of the main
steam line deereases rapidly beeause of the main steam isolation valve being elosed.
The symptoms "pressure decrease in the main steam line", "high temperature in the
building" and "an increase in the flow rate of the sump in the building" are sharply
observed. The values b4t = 1.0, bat = 1.0 and b9t = 1.0 were substituted for the lower
bounds, as these symptoms. The solutionA j (i = 1, ... ,21) is obtained as:
A 6a = [1, 1], A7a = [1, 1], Aga = (0.6, 1],
A lla =(0.6,1] A I5a = (0.6, 1], A I6a = (0.6, 1],
A 17a =(0.6,1], A20a = (0.6, 1], A21a = (0.6, 1],
A ja = 0 (for other ;'s).
The solution iodieates aeeurately a leakage in the main steam line inside the building by
reading the a-eut of A6, A7 and Ag. At the same time, it suggests that a warning should
be given about leaking in the buildiog of the residual heat removal system (A 11), the
steam line of reaetor eore isolation eooliog systems (A 15, A 16), the water line of reaetor
eore isolation eooling system (A 17) and the feed water system (A 20 , A21 ).
Participation of exception in dia gnosis. It is not easy to deeide how exeeption as pre-
sented, should participate practieally in identifying a failure. It is important for the diag-
nostie system to use a large amount of fragmentary information eoneemed with partieu-
lar events. A small number of sharply aggregated implieations beeome mueh more effi-
eient when mixed with eonsideration of the fragmentary information, wbieh serves as the
exeeption in the present sense.
Fig. 4.12 indieates a proeedure ofreinforeement ofthe diagnostie system by mixing ex-
eeptions with implieations. Substitution of the reeognized symptoms ioto eqs. (4.15),
(4.16) generally yields a failure in terms of linguistie truth values. When an engineer fails
to identify the failure firmly andlor an alternative failure weighs on her(bis) mind, (s)he
should eonsider the exeeptional proposition. In other words, the truth value of Pj of eq.
(4.9) must be elose to true, and eonsequently (s)he may fall baek on the exeeptions. The
next example elaborates tbis proeedure.
Example 4.3. "The flow rate of the building sump has slightly increased and the
differential flow rate of the clean-up system for the reactor water becomes large".
Lower bounds bu = 0.6 and b9t = 0.6 were adopted for these symptoms. The present
method tries a failure inside the dry-weil with the eonventional fuzzified implieation
written by eqs. (4.15), (4.16). The solution is ofthe form,
A ja = [0, 0.1]; ;= 1, ... , 5
Automatie expert process fault diagnosis and supervision 309

which indicates "no leakage exists inside the dry-weil". This is a case where no failure
is obtained although several symptoms are recognized. It is found from calculation that
the antecedent of eq. (4.10) is deduced to be very true for the proposition
(-,3 k(Ak 1\ Rkl)a = (0.9, 1]; k = 1, ... ,5 This enables the engineer to decide that
there rnight exist a failure elsewhere and steps may be taken to exarnine by exception;
then the solution is ofthe form,
A 17a = (0.6, 1], A 19a = [1, 1], A21a = (0.6, 1], Aia = 0; i = 1, ... , 21 and i :;t:17, 19,21.

Reasoning Failures
by ~

End of Diagnosis

Figure 4.12 Flow of fai/ure diagnosis with implication and exception.

The above equation reveals that a leakage in the clean-up system for the reactor water
inside the building (A 19) exists, and that of the water line of the reactor isolation cooling
system (A 17) and the feed water system (A 21 ) in the building are found by a possible
grade.
310 Real time fault monitoring of industrial processes

Figure 4.13 shows the interactive procedure of the present diagnosis on a CRT tenninal
of a personal computer. The truth value of bu = 0.6 serves as a symptom for "the
differential flow rate increased in the clean-up systemalittle ... This method builds up
a hypothesis from the symptom that the clean-up system is leaking in the dry-weil, and
asks the engineer to ascertain whether the hypothesis generates the various symptoms or
not. But, no symptom is recognized being based on the hypothesis (fig. 4.13(a», and
then no failure is identified by the implications (fig. 4.13(b». In the case that he is able
to recognize the symptom "the flow rate increased in the building sump", the input
9r
b 0.6 and all symptoms yield the solution on the CRT (fig. 4.13c)

Start
Flow rate increase in dry-weil sump **0.0
Flow rate increase in air condenser drain **0.0
Pressure increase in dry-weil **0.0
Pressure decrease in steam line **0.0
Flow rate increase in main steam line **0.0
Flow rate increase in residual heat rem oval system **0.0
High differential flow rate of reactor water clean-up system **0.6
Possibly leaking in
reactor water clean-up system in dry-weil
"Check folIows"
Flow rate increase in dry-weil sump **0.0

(a) First segment of a session with diagnostic system. User responses follow the double asterisks.
**** Kind of failure ****
**** Possibly ****
main steam line in dry-weil [0.00.1)
residual heat rem oval system in dry-weil [0.00.1)
steam line or reactor core isolation colling system in dry-weil [0.00.1)
reactor water clean-up system in dry-weil [0.00.1)
feed water system in dry-weil [0.00.1)

(b) Inferred failures by implication listed on CRT tenninal with their truth values.
**** Possibly ****
water line of reactor core isolation cooling system in building [0.60.1)
reactor water clean-up system in building [1.01.0)
feed water system in building [0.61.0)

(c) Final segment of a diagnosis. Inferred failures by exception are displayed with their truth
values.

Figure 4.13 Example of fuzzy fault diagnosis by CRT tenninal


Automatie expert process fault diagnosis and supervision 311

It is the subject of further research to integrate fragmentary information about failures in


a refined implication by means of repeated experiences of the diagnosis and/or the
learning process.

4.3.2 Automatie expert fault diagnosis ineorporated in a proeess SCADA


system

Ouring a complex industrial process (e.g. power system, chemical plant, etc.) distur-
bance, many alarms are presented to the operator making it difficult to determine the
cause of the disturbance and delaying the corrective action needed to restore the power
system to its normal operating state.
In order to provide continuous real-time analysis of the alarms generated by a SCADA
(Supervisory, Control And Data Acquisition) system, a knowledge-based system, being
immune to emotional factors, can be used to assist the operators in analyzing a crisis
situation so that an optimal solution may be found as rapidly as possible.
A knowledge-based alarm processor can replace a large number of alarms with a few
diagnostic messages that describe the event(s) that generated the alarms. It may also pre-
sent advice to the operators when certain situations occur. The knowledge-based system
performs much of the analysis that a power system operator would have to perform.
Since it can quickly analyze the alarms and present a small number of concise messages,
the operator is given a clearer picture of the condition or conditions that caused the
alarms making it easier for the operator to take corrective action in a timely manner.
Because the system operator (in a power system for example) is very busy during a dis-
turbance, a basic requirement of a knowledge-based alarm processor is that it may not
query the operator for any type of information. Since the SCADA system is also busy
processing alarms, coUecting disturbance data and performing its normal functions such
as Automatic Generation Control, the knowledge-based system should not strain the
computer resources of the SCADA system. The knowledge-based system must be able
to handle multiple, independent power system disturbances, presenting diagnostic mes-
sages to the operators within a short period of time. Also, a diagnostic message must be
retracted ifthe conditions that caused the message to be generated are no longer valid.
Two basic approaches are possible in incorporating a knowledge-based alarm processor
(KBAP) ioto an Energy Management System (FMS) or other complex industrial process
environment, an embedded approach and an appended approach. In an embedded ap-
proach, the knowledge-based system is incorporated in the SCADA system. In an ap-
pended approach, a separate computer is used with a data link connecting the KBAP
with the SCADA computer.
The appended approach is selected here, mainly because a knowledge-based system is
processor and memory intensive. By implementing the KBAP on a separate computer,
312 Real time fault monitoring of industrial processes

SCADA resource contention is minimized. Also, implementation, maintenance and test-


ing ofthe KBAP would not disrupt normal SCADA operations. The two main disadvan-
tages of this approach are that a data link must be established between the two comput-
ers and there is no direct access to either the SCADA database nor the SCADA
man/machine interface.
Since the system operators are comfortable with the SCADA alarm displays, the KBAP
uses the SCADA alarm format to present diagnostic messages to the operators. A new
message appears as an alarm. When the conditions that caused the message to be gener-
ated are no longer valid, the message is presented in a manner similar to an alarm return-
ing to its normal state.
Fig. 4.14 shows the components of a general KBAP. The SCADA alarm processor sends
each change of state packet across the data link to the KBAP computer. The KBAP uses
the data link to present diagnostic messages to the operator and, on operator request, an
explanation on how a KBAP conclusion was reached. The data link is also used by the
KBAP to retrieve database information from the SCADA system. When sending alarm
packets and database information, the SCADA system sends the symbolic value of a
point so that the KBAP does not have to be concerned with the various operating ranges
of analog values and the normal positions of digital values. Examples of analog symbolic
values are low, normal and high.
In the case of a power system, since information related to astation is maintained on a
station-by-station basis, aseparate node must be created in memory for each station.
Each node contains information about the station including current facts, working hy-
potheses and validated conclusions. The station node is also the head node of the con-
figuration database for the station. Information is maintained in the station node because
the order of incoming alarms cannot be predicted. Each station node is used to keep
track of the facts related to the station during the inference process. General rules along
with the configuration database simplify the management of the rules and also simplifies
the KBAP analysis process.
The system is intended to run on a standalone computer under a multiprocessing
operating system. The KBAP consists of the following components: A Rule
Preprocessor, Configuration Preprocessor, Alarm Preprocessor, Inference Engine,
Conclusion Processor and an Explanation Facility. The Rule Preprocess0r and
Configuration Preprocessor execute only during KBAP initialization. Once initialized,
the Alarm Preprocessor, Inference Engine, and Conclusion Processor each runs as a
separate process. This allows the Alarm Preprocessor to queue incoming SCADA alarm
packets because alarms could arrive at a rate faster than the Inference Engine is able to
process them.
Automatie expert process fault diagnosis and supervision 313

Rule
File

SCADA System KBAP System


,
___________ 1

SCADA
Alarm
Processor

Data Base
Management f--J-~~
System

Knowledge
Base

Explanation
Facility

_______________ I

SCADA System KBAP System

Figure 4.14 The general appended KBAP configuration.

The processing speed of the KBAP depends on both the hardware that the KBAP is im-
plemented on as weil as on the rate that the SCADA system supplies alarms to the
KBAP. In other words, the limitation ofthe KBAP in presenting diagnostic messages to
the operators is mainly due to the limitation of the SCADA system in detecting and
314 Real time fault monitoring of industrial processes

generating alarms. The SCADA limitation is a result of varying RTU (Remote Terminal
Units) scan rates as weIl as power system relay actions.
The collection ofknowledge in the KBAP is referred to as the knowledge base. One way
of organizing the knowledge is to form rules and facts. The rules contain accumulated
knowledge in the form ofIF-THEN constructs. The facts in the knowledge base are col-
lected pieces of information related to the problem at hand. Rules express the relation-
ships between facts. Using the current facts, the Inference Engine decides how to apply
the rules to infer new knowledge. It also decides the order in which the rules should be
applied in order to solve the problem.
The rules in the KBAP may be fired using forward chaining or backward chaining (see
Section 4.2.1). The difference between the two approaches is the method in which the
facts and rules are searched. In forward chaining, the Inference Engine searches the IF
portion ofthe rules. When a rule is found in which the entire IF portion is true, the rule is
fired. Forward chaining is a data driven approach because the firing ofthe rules depends
on the current data. In backward chaining, the Inference Engine begins with a goal that is
to be proved. It searches the THEN portion of the rules looking for a match. When a
match is found, the IF portion of the rule is established. The IF portion may consist of
one or more unproven facts. These unproven facts become separate goals that must be
proved. Backward chaining is goal driven because the order of the firing of the rules is
done in an attempt to prove a goal.
Metalevel control rules improve system performance by selecting the object-Ievel rules.
The object-Ievel rules in the KBAP are the forward and backward chaining rules de-
scribed above. Metalevel actions provide context switching and goal selection. Fig. 4.15
shows a conceptual representation of how the metalevel control rules are implemented in
theKBAP.
When a SCADA alarm arrives, the metalevel control rules are used by the Inference
Engine to generate one of two metalevel actions. A context switching action selects data
driven, or forward chaining rules. That is, the metalevel control rules are used to select
the proper context based on the incoming alarm. When the context is selected, metalevel
control rules are used to produce one or more goal selection actions (ifpossible). A goal
selection action selects a hypothesis that the goal driven, or backward chaining rules,
attempt to prove. As can be seen in figure 4.15, object-Ievel actions result from applying
the forward and backward chaining object-Ievel rules. For the KBAP, two object-Ievel
actions are possible, diagnostic messages and advice for the operators.
The metalevel control rules contain heuristics that guide the Inference Engine in forming
hypotheses. Heuristics are rules of thumb that an expert uses when solving a problem.
Heuristics are used to narrow the search space for a solution. Backward chaining object-
level rules are generic in nature and are not related to any particular station for the case
of a power system, or process subsystem, in the general case.
Automatie e":pert process fault diagnosis and supervision 315

Alarms

Meta - level actions : Goal selection


Context switching

Object - roles

Object - level actions : Diagnostic messages


Advices Figure 4.15 Metalevel control in a KBAP.

Fig. 4.16 shows how the metalevel contral rules are internally organized. The order of
the metalevel control rule nodes is the same order of the rules as found in the rule file.
Each metalevel node is the head of a Iinked list of one or more premise nodes. Apremise
node contains apremise clause. All of the premise clauses of a rule must be true in order
for the rule to fire. The Inference Engine scans the metalevel contral list beginning with
"metahead", the head node. If all of the premise clauses of a rule are true, the Inference
Engine fires the metalevel control rule triggering one of the two metalevel actions, con-
text switching or goal selection. In the case of goal selection, the metalevel node con-
tains the goal, or hypothesis, that the object-Ievel rules attempt to prove. The station
node, for the case of a power system, shown in fig. 4.16, has two working hypotheses.
Each hypothesis is represented by a hypothesis node linked to the station node. A
hypothesis node points to the metalevel node that it is associated with. When a working
hypothesis is proved to be valid by the Inference Engine, the hypothesis node is linked to
the station node's validated conclusion chain.
Each change of state detected by the SCADA system is reported by the SCADA alarm
pracessor to the KBAP Alarm Prepracessor. Astate change includes points going into
alarm, points returning to normal, and supervisory contral actions. The SCADA change
of state packets are converted by the Alarm Preprocessor into a form suitable for the
Inference Engine. Incoming packets are queued until the Inference Engine is able to
pracess them. The configuration point nodes contain the current symbolic value of the
316 Real time fault monitoring of industrial processes

point. When a change of state occurs on a point, the value of the point in the node is
pdated.

metahead

Premise
Node

station or
Hypothesis process
subsystem
node
no valldated
conclusions _

Figure 4.16 The interna\ organization or the metalevel control rule node.

Because knowledge is often added incrementally to a knowledge-based system, a Test


Facility exists so that the firing of the KBAP roles may be observed and validated under
controlled conditions. The Test Facility reads each change of state record from an event
file and passes the change of state events to the Alarm Preprocessor. The event file is
built off-line using a text editor.
If a metalevel bypotbesis bas been proved true, tbe Conclusion Processor is invoked to
pass tbe validated conclusion to tbe SCADA system for presentation to tbe operator. A
Automatie expert process fault diagnosis and supervision 317

validated conclusion may result in one or more new hypotheses being formed. A change
of state may also result in a previously validated conclusion becoming no longer tme.
When tbis occurs, the Conclusion Processor is invoked to inform the SCADA system
that a previous message is no longer valid. Tbis is similar to an alarm returning to
normal.
When an operator requests information on how a particular conclusion was reached, the
SCADA system manlmacbine interface sends the request over the data link to the to the
KBAP. The Explanation Facility is invoked to process the request (see fig. 4.14). The
Explanation Facility passes information back to the SCADA system showing the mIes
that fired and the facts that caused the mIes to fire for presentation to the operators.

System operation.
Tbis part of the section describes the operation of the KBAP for processing different
system components. Power system and general industrial plant components, as motor
pumps and rotating macbinery are discussed.
Low voltage bus and electricity supply networks. For tbis exampIe, the low voltage
bus has three bus sections. Fig. 4.17 shows some of the object-Ievel mIes wbich can be
used for fault diagnosis.
Object-Ievel mle names are enclosed in colons and metalevel control mies are sur-
rounded with double colons. Comments are enclosed in /*and*/. Backward chaining is
used on the object-Ievel mIes in fig. 4.17, in an attempt to prove the goal "06". If all of
the bus sections on a bus have low voltage and the bus breakers are closed, a single low
bus voltage message is presented to the operators. On the SCADA system, the operators
would receive aseparate low voltage alarm for each bus section and possibly other
alarms such as under voltage relay alarms. Tbis simple exampie illustrates how a single
message can be presented in place of numerous SCADA alarms. As weIl as reducing the
number of messages that the operators receive, the KBAP also contains mIes that diag-
nose the situation(s) that triggered the alarms.
Tesch et al., (1990), present a case study of a KBAP implementation for Wisconsin's
Electric Energy Management System. The KBAP, written in the C Programming
Language, uses a configuration database that contains the structure of the power system
as weIl as symbolic data values for each point monitored by the SCADA system. The
knowledge-based system continuously analyzes and processes SCADA alarms in real-
time, presenting diagnostic messages to the power system operators without requiring
any information from the operators.
Brailsford et al., (1990), present a prototype KBAP system, named FAUST, for use in
132 kV and 33kV eIectricity supply networks. All items of the electricity distribution
network, especially those at the bigher voltages, are telemetered. Each telemetered item
is polled regularly (5+20 secs) and any changes of state and alarms are reported to the
318 Real time famt monitoring of industrial processes

operator. FAUST is equipped with a network database, aplant database, advanced


graphics facilities and can perform continuously the following tasks: manual input of
current network status, input from telemetry stream, telemetry message filtering, fault
and outage hypothesis generation and user interface (continuous mode, off-line mode,
hard copy mode, fault simulator mode). Causal reasoning based on a model of the
operation of the underlying network, heuristic knowledge, complex graph-searching
algorithms, the use of blackboards for communication amongst system modules and
message passing between objects are employed to resolve the complex real-time
diagnosis problem for the distribution network.

/* If the first circuit breaker on the device is closed, establish conclusion "C1" as a fact.
The rule name is "Rule 1". The underscore character denotes a blank. The word "first'
is a position indicator.*/
: Rule-l: If (first circuit-breaker=closed) then Cl;
/* Establish conclusion "C2" if the second circuit breaker on the device is closed. */
:Rule-2: If (second circuit-breaker=closed) then C2;
/* This rule is fired if the first bus section voltage is low and conclusion "Cl" is an
established fact·/
:Rule-3: If (first bus-voltage=low) and (Cl) then C3;
/* The rule order is not important for general rules. Rules are entered free format.
Comments may appear anywhere. */
:Rule-6: If (C4) and (C3) then C6;
/* Fire this rule if the third bus section voltage is low and conclusions "C2" and "CS"
are established facts. */
:Rule-4: If (third bus-voltage=low> and (CS) and (C2) then (C4);
/* If the second bus section voltage is low, establish conclusion "CS". */
:Rule-S: If (second bus-voltage=low> then CS;
/* Fire this rule if the bus voltage on the bus at the opposite end of a transmission line
is low and the bus voltage in the current station is normal*/
:Rule-1S: If (opposite bus-voltage=low> and (adjacent bus-voltage=normal> then
R1S;

Figure 4.17 General object level rule examples for a low voltage bus.

Power transmission substations. Several additional benefits can be obtained when the
sub station integrated control and proteetion system (ICPS) is used as part of the overall
Energy Management System (EMS). A number of substation ICPSs are being developed
Automatie expert process fault diagnosis and supervision 319

around the world today, where the protective relaying, control, and monitoring functions
of sub station are implemented using microprocessors. In this design, conventional relays
and control devices are replaced by clusters of microprocessors, interconnected by mul-
tiplexed digital communication channels using fiber optic, twisted wire pairs or co axial
cables. The ICPS incorporates enhanced functions of value to the utility and leads to
further advancement of the automation of transmission sub stations. More powerful proc-
essing capabilities can be established if an ICPS is used instead of the conventional
SCADA Remote Terminal Units at the substation. In addition, an extensive data base can
be available at the sub station level. This data can be used to assist dispatcher, protection
engineer and maintenance personnel during an emergency.
Fault diagnosis is carried out by operators using information of active relays and tripped'
circuit breakers. The faulty components are inferred by imagining a protective relay
sequence related to the incident and simulating backwards the relay sequence from the
observed data. An expert system will be very useful for these type of tasks, since the
problem involves a mass of data and uncertainties and cannot be described by a weil
defined analytical model. For example, a rule to identify a failed breaker is:

Rule?
{
&1 (Relay operated = yes; considered = yes;);
&2 (Breaker name = &1.br1; open = no; failed = no; status = on);
®
modify &2 (failed = yes);
};

This rule implies that, if a relay has operated and one of its corresponding breakers con-
nected in the circuit has not opened, identify this breaker as a failed breaker.
Stavrakakis and Dialynas, (1991), describe models and interactive computational tech-
niques that were developed to model and detect automatically the available restoration
operations following a fault on sub station equipment or a sub station abnormality. These
abnormalities can be detected before the component enters into the failure stage using
the computer-based diagnostic techniques described in Chapter 3. The developed com-
puter-based scheme can be instalied easily through a rule-based expert system in power
sub station ICPSs in order to determine the optimal switching operations which must be
executed after a component fault has been detected. The development of a data base
containing all the necessary information concerning the component failure characteristics
and the average repair or replacement and switching times is possible. A supply reliability
index of sub station load-points is also evaluated to quantify the reliability performance of
the substation.
320 Real time fault monitoring of industrial processes

Underground distribution cables. Kuan and Warwick, (1992), developed a prototype


real-time expert system aid for the fault location on high voltage underground distribu-
tion cables. Diagnostic tests are performed by measuring the continuity and insulation
resistance ofthe cable.
Different types of faults (e.g. series, low resistance, high resistance, flashing, etc.) pro-
duce different transient waveforms which an experienced engineer would be able to dis-
tinguish by looking at the certain characteristics ofthe waveform. The knowledge used is
mostly heuristic in nature and in the form of IF-THEN guidelines; e.g. ifthe second crest
of the waveform is smaller than the first crest, then it is most likely to be a short circuit
or low resistance fault. Coupled with information from the diagnosis stage, the type of
fault can be confirmed; e.g. ifthe insulation resistance ofthe cable is low, then it is likely
to be a short circuit or low resistance fault. After confirmation of the type of fault, the
time interval between the start of the waveform and the fault point is measured and
calculations are done to determine the distance to the fault.
Power Systems considered as a whole. Nebiacolomobo et al., (1989), describe
HOlMES, an ES for automatic fault diagnosis and restoration planning in industrial
power systems. The whole power system is divided into modules according to topo-
logical and functional criteria. Apower sub station can be considered as a module. Each
module contains a list of all the possible faults that may occur in the power equipment it
contains. Each module is connected only to its adjacent ones according to system
topology. The only information flowing across the modules are the over-currents and/or
over-voltages caused by a fault. With such a model, the propagation of a fault
throughout the power system can be easily detected together with the protective re!ays
involved by the fault itself. Petri nets (see Section 4.2.4) are used to solve the
representation problem of the protection dynarnics under faulty conditions. For every
defined module, a Petri Net is built to describe its internal dynamics and its input-output
behavior. The causal connections among the modules are represented by input and
output conditions. For every output from a module, a corresponding input to another
module should exist. By aggregating the Petri net of every module according to the
system topology, the global system model is obtained.
After its activation, HOLMES performs pre-fault network analysis, local analysis of
received events and possible faults generation for every module, global propagation
analysis with correlation of faults generated by different modules, fault validation, results
transmission to interface and recording to archives. HOLMES implements a KBAP
mechanism to consider the fault diagnosis and propagation in the controlled plant.
In the area of power system security monitoring and control, transient stability is consid-
ered to be one of the most important and at the same time most problematic issues,
strongly related with the on-line fault diagnosis procedure. Indeed, in a broad sense,
transient stability is concerned with the system's capability to withstand severe distur-
bances and/or faults causing important electromechanical transients. This is a strongly
Automatie expert process fault diagnosis and supervision 321

non-linear problem. It also high dimensional, since power systems are by essence large
scale.
In general, on-line transient stability assessment (TSA) aims at appraising the power sys-
tem robustness in the inception of a sudden, severe disturbance or fault, and whenever
necessary at suggesting remedial actions. A measure of robustness is the critical clearing
time (CC1); this is the maximum time a disturbance may act without causing the irrevo-
cable loss of synchronism of the system machines. In particular, on-line TSA is used in
real-time operations and aims at performing on-line analysis and preventive control.
Indeed, because the transient stability phenomena evolve very quicldy (in the range of
very few seconds), a good way to face them efficiently is to prevent them.
Wehenkel et al., (1989), proposed a new approach to on-line transient stability
assessment of power systems, suitable for implementation in the SCADA system. The
main concern of this approach has been the application of an inductive inference method
in conjunction with analytic dynamic models and numerical simulations to the automatic
building of decision trees (DTs).
A DT is a tree stmctured upside down. It is composed of test and terminal nodes, start-
ing at the top node (or root) and progressing down to the terminal ones. Each test node
is associated with a test on the attribute values of the objects, to each possible outcome
of which corresponds a successor node. The terminal nodes carry the information re-
quired to classify the objects. The methodology developed there is based on inductive
inference and more specifically on ID3 which is a member ofthe TDIDT (top-down in-
duction of decision trees) family (see Section 4.2.1). Most of the inductive inference
methods infer decision mIes from large bodies of reclassified data sampies. The TDIDT
aims at producing them in the form of decision trees, able to uncover the relationship be-
tween a phenomenon and the observable variables driving it. Adapted and applied to on-
line transient stability assessment, the method intends to uncover in real-time the intrinsi-
cally very intricate relationships between static, pre-fault and/or pre-disturbance condi-
tions of a power system and their impact on its transient behavior in order to discover
the appropriate control actions needed.
Industrial motor pumps.
To investigate the applicability of expert systems in the area of equipment diagnostics of
industrial plants, the principles of a knowledge-based system for real-time diagnosing of
motor pump malfunctions is presented.
The diagnostic approach used in the development of the knowledge base is based on the
method of decision tree (DT) analysis (see previously, Wehenkel (1989». Using infor-
mation derived from the equipment maintenance manual and mechanical-electrical
drawings, and through observation of the procedures used by efficient engineering me-
chanics, adecision tree can be developed to mimic the way a human expert makes deci-
sions and arrives at conclusions. The decision tree is then translated directly into the IF-
THEN mIes which make up the expert system's knowledge base. The manner in which a
322 Real time fault monitoring of industrial processes

decision tree can be translated into production rufe language (PRL) mIes is illustrated in
fig. 4.18 which shows a smaIl portion of adecision tree developed to diagnose a faulty
pump starting switch and the corresponding mIes written in PRL.
Using on-line DT analysis, the system leads the user through the appropriate procedures
required to quickly identify the faulty pump circuit component. Graphical displays can be
incorporated within the system to assist the user in locating the various components and
test points. Once the faulty component has been isolated, the system is capable of access-
ing a database which can provide personnel with information concerning specific
component part numbers, the availability and location of spare parts and the proper re-
pair action to be taken.
Rotating machinery. During the past few years, condition monitoring of gas turbine
and other industrial engines has blossomed into an economically viable activity for some
of the larger constmctors and commercials. These gains have been spurred to a signifi-
cant degree by the development of sophisticated software algorithms for the interpreta-
tion of the limited available sensors.
Software packages are being employed to realize many of these gains using artificial in-
telligence techniques (Doel, 1990). Classical tools that are currently being employed for
commercial engine condition monitoring as EGT (exhaust gas temperature) margin
trending, vibration monitoring, oil monitoring, under cowl leak detection etc. can be
expressed analytically and performed automatically using the statistical aids and signal
analysis techniques described in Chapter 1.
These basic fault occurrence knowledge sources can be used for the creation of the
knowledge base and the inference engine of an expert fault diagnosis tool for rotating
machinery.
Before designing a failure diagnostic system, damage statistics of the specific system
should be regarded, indicating these sections, in which failures are occurring with a cer-
tain frequency. Taking the rotating system of a turbomachine as an example, the defects
on moving blades occupy the highest percentage, concerning the number (37%) and the
repair costs (26%), followed by casing failures (24%), where the financial effect is not as
severe (13%).
In contrary, less frequent failures of bearings (5%) are classified at a high cost level
(27%). During the last years therefore, research in monitoring and diagnosing of systems
focuses on the improvement of detecting damages of the rotating system, including
blades, rotor unbalance, cracks, bearings and others.
Pattern recognition methods have also proven to be very effective in failure detection
(see Section 1.3.2.C). For diagnostic purposes, time signals are utilized, which are ob-
tained at the machine with special sensors, like pressure transducers mounted directly
above moving or stator blades, accelerometers placed at the casing of the machine, and
shaft displacement measuring systems, which detect the shaft oscillations.
Automatie expert process fauIt dia gnosis and supervision 323

RUlE #1
IF specific pump problem IS pump will not start from machine
AND control Dower is available
AND switches are set properly
AND NOT red LED on remote box lights when trying to start pump
AND pump will start from remote box
THEN probable detective pump starting switch
AND end of pump diagnosis

RUlE #2 RUlE #3
IF operating node IS machine IF 12 VDC LED is ON
AND main control switch is ON OR 110 VDC LED is ON
AND cutter switch is OFF THEN control power is available
THEN switches are set properly

Problem S!!!!!!IO!!!!

..

_, ......
/I
.peaoc ..... J)IOIIIem lS
tanIDI DOWef 11 IWIIIIbIe

NOT lea lfll Oll _


pumo WIll 1101 .1aII_m.......

ballMJllb _
ptIIIDWIIstllltom_bal
I"ag ., 1II1II PIII1D

........ -_I1IfIII9!"M1d1
.... OIp_ClllQM...

n
DpenIIIIQ __ IS _
RlA.E I'J
_CDIIIIJIrMl:llosON F 12 YDC lfllll ON
_,ore,et_rtr
,,*,!"Mldl1I OfF OA
MN
1I0YDC lfD 1$ ON
CDIIIIJI_IS.,_

Figure 4.18 Partial decision tree diagrarn and corresponding PRL roles for a motor pump fault
diagnosis.
324 Real time fault monitoring of industrial processes

Consider a vibrating part of a rotating machine. The behavior of the part can be modeled
in real-time by the auto spectral density function of the signal of an accelerometer
attached to it. The goal is to make diagnosis about the state of the vibrating part. Two
diagnoses can be made locally for this part using the peak on the spectra representing its
behavior approximately at 10.5 Hz. For example ifthis peak is,
a) missing, then the part is probably broken;
b) too large, then the part is excessively vibrating.
The following mies can be defined for the part:

<rule-l
if (null? peak)
then (set! peak (peak-nearest-to
(spectral-peaks
(apsd <signal> <parameters>)) 10.5)))
<rule-2
if (and peak
(or « (frequency peak) 10)
(> (frequencypeak) 11))
then (conclude ' (broken part)))

<rule-3
if (and (not (broken ' part))
(> (amplitude peak) amplitude-limit))
then 'conclude " (excessively-vibrating part)))

where,
• peak and amplitude-limit are instance variables ofthe part;
• (apsd <signal> <parameters» and (spectral-peaks <spectral function» are exter-
nally defined global functions;
• (peak-nearest-to <spectral peaks> <frequency», (frequency <peak», and
(amplitude <peak» are internal functions (i.e. methods) connected to instance vari-
able peak,
• (broken part) and (excessively-vibrating part) are assertions.
This example is intended to illustrate how numerical models (in the present case a
spectral density function of a signal) presented in Chapter 1 can be integrated in mies
aimed at solving a rotating machine diagnostic problem. For this purpose two steps are
taken:
1. The model is activated (i.e. the spectral function is computed) under specific condi-
tions (in the present case ifpeak has no value).
Automatie expert process fault diagnosis and supervision 325

2. The results (having numerical nature) are compared to limit values (frequency and
amplitude limits in the present example) resulting in Boolean values, which can be
combined with the assertions.
Given these features, it is not difficult to envision very powerful, linked diagnostic and
maintenance systems to diagnose engine problems using all available input in assisting
the maintenance process.
Detecting and masking transient /ailures 0/ the SCADA system itself.
The effect of lightning poses significant problems to computer systems and electronic
equipment used in process supervision due to the possibility of a direct strike. Protection
against a direct strike is very expensive and seldom used (Noore et al. , 1992).
When lightning strikes the plant, it produces both electric and magnetic fields that vary
with distance, frequency, and time. These fields can couple into power transmission and
communication lines and adversely affect the performance of computers and other
electronic equipment. Protection from equipment damage and power supply interrup-
tions through shielding, insulation, lightning arrestors, and filtering circuits help in reduc-
ing the effects due to lightning but are not completely effective. The field-induced volt-
ages and currents cause transients that can create an upset in a digital or computer sys-
tem. Problems due to transients have not been a significant area of research until re-
cently. The reason for this is that electronic equipment that use vacuum tubes or power
semiconductor devices is relatively immune to some transients. In addition, with these
devices, the short duration of its occurrence usually is not of serious consequence, but
the rapid advancement of technology in recent years has led to the emergence of several
complex VLSI devices used in the design of computer-based data acquisition systems,
process monitoring systems, surveillance systems, and control systems to support both
critical and non critical industrial plant activities. With the attendant problems that are
exacerbated as a direct consequence of the submicron geometry of advanced microelec-
tronics semiconductor technology, transients of smaller magnitudes and shorter time
duration can have a dramatic effect and cause a system upset in computers.
To detect transient faults efficiently within a short time period, the proposed detection
mechanism focuses on a set of hardware modules and control paths that are utilized by
the microcomputer system. For example, if a transient fault alters the contents of a regis-
ter that is not used by the microcomputer for perforrning process monitoring or control
operations, it is not necessary to detect this fault. On the other hand, if a transient fault
causes a change in the contents of a register used by a specific application, then it is im-
perative to detect this critical fault. Thus, by monitoring only the control paths of the
application program and selected modules used by the microcomputer, the proposed
approach detects all critical transient faults and maintains a high fault coverage.
Let the global memory space M(O, n-1) be partitioned into several disjoint contiguous
physical blocks ofmemory M(O, i), M(i+l,j), M(j+1, ... ), ... M( ... , k), M(k+1, n-1). Each
memory partition M(x, y) is bounded by the lower memory address x and the upper
326 Real time fault monitoring of industrial processes

memory address y, and the size of each memory partition is y (= x+ 1). Each physical
memory block M(x, y) is logically mapped to a set of instruction sequences I(x, y) that
are executed within the boundary (x, y). Depending on the type of application program,
the logical execution of the instruction sequences can, in general, be globally distributed
in a noncontiguous manner. Each instruction sequence I(x, y) is modular and represents a
subtask of the application program or a subroutine. Although instructions in the instruc-
tion sequence I(x, y) are physically contained in M(x, y) sequentially, the program can
logically execute severalloops within the boundary (x, y). Instructions contained in I(x,
y) can either be of single word length or multiple word lengths. The time 1{x, y) for exe-
cuting I(x, y) is exactly computed from the type of processor used in the design, the op-
erating frequency of the dock, the response time of the memory or peripheral device
selected, the number of cydes required to execute a single or multiword length condi-
tional or unconditional instruction, the number of loops in I(x, y), the manner in which
the loops are structured, and the number ofpasses per loop.
Given an application-specific program that is structured as described, the starting ad-
dress x or the lower boundary of each 10gica1 set of instruction sequence I(x, y) can be
extracted. Associated with each I(x, y), the execution time 1{x, y) can be obtained from
the program structure. This value is stored in the EPROM, whose address is given by x.
Similarly, al1 execution times are stored under the corresponding addresses. Addresses
that do not correspond to the starting address of the instruction sequences are used to
store all Os in the EPROM. The transient fault-detection circuitry is shown in fig. 4.19,
and the detection algorithm is outlined as folIows:

<I1C address> = <EPROM address>;


if <C> = 0 then begin
enable EPROM ;
if <E> "* 0 then begin
<timer> = <timer> - 1;
end;
else transient-fault-detected;
else begin
disable EPROM;
<timer> = <timer> -1;
end;

lethe system dock-out frequency ofthe microcomputer system is denoted by fsystem and
the countdown timer dock frequency is denoted by ftimer> then the length of the counter
is given by:

IOg2{ fnmer [T(X,y)]max}bits.


fsystem
Automatie expert process fault diagnosis and supervision 327

For real time monitoring and control operations, rollback of either instructions, segments
of programs, or entire programs is necessary after a transient fault is detected. A rollback
operation makes use of the concept of checkpoints. Acheckpoint is a preselected point
in the execution sequence when the state of the system is saved. Program rollback forces
the execution to restart at the last checkpoint. The selection of the number of check-
points is important. If too many checkpoints are used, the program execution time is in-
creased since the status of the processor has to be saved often. On the other hand, if the
checkpoints are fewer in number, the recovery time is increased since the program has to
be executed, starting from the previous checkpoint. Strategies for introducing optimal
checkpoints in a program can be found in Noore et al. (1992). Furthermore it is shown
that transient fault detection and recovery can be performed in real time, which is an
important factor when monitoring, processing, and controlling critical industrial
processes operations. The approaches presented have a wider range of applications and
can be extended to real-time, high-risk, and life-critical systems used in the aerospace,
banking, military, chemical and nuclear industries.

ADDRESSBUS

data in

MICROCOMPlJfER EPROM
SYSTEM

dataout

fsystem

COUNTDOWN
11MER
ftimer

Figure 4.19 Circuit to detect transient faults in a microcomputer system.


328 Real time fault monitoring of industrial processes

4.3.3. Expert systems for quick fault diagnosis in the mechanical and
electrical systems domains

A particular mechanicalor electrical device works based on certain underlying principles.


Due to the components that make it up and on how these components are connected, the
device manifests a certain behavior. The connectivity and functionality of the device
constitute some of the fundamental knowledge that a human leams first on the road to
becoming an expert diagnostician for that device. Tbis knowledge is often the basis on
wbich the human acquires all other knowledge about diagnosing and repairing the de-
vice.
Regardless of the ultimate functionality of a mechanical or electrical device, most are
built based on a finite set offundamental building blocks. These building blocks represent
some general functionality that can be found witbin many different devices. For example,
no matter what device a copper wire is found in, it is usually used to transport electricity,
and regardless ofwhere a battery is found, it usually functions as a reservoir or source of
electricity. These building blocks also have a fairly well-defined set of potential fault
modes. For example, a copper wire can fail to allow electricity to flow by not making a
complete circuit. Tbis could be due to a break in the wire, a short, or a missing ground.
A battery can fail to store electricity. Tbis could be due to the age of the battery, to a
lack of recharge trom the system to wbich it is connected, etc. A human expert uses this
fundamental knowledge about the purpose of the various components witbin a device to
diagnose problems and devices that are unfamiliar to him/her.
Devices are not ooIy composed of primitive components, but they themselves can serve
as a functional primitive in a higher level view ofthe device. For example, the generator
belt witbin the generating system of an automobile can serve as a conduit, transporting
mechanical energy from one component to another. At a bigher level the entire generat-
ing system could be considered a transformer, taking mechanical energy and producing
electrical energy. Such a bierarchy can easily be represented using functional primitives
(for example, power supply is reservoir, safety switch is regulator, and all links are con-
duits in agas heating system) by simply allowing a component at one level to be de-
scribed as a set of components represented by functional primitives at a lower level. Tbis
allows for a hierarcbical breakdown of the problem, and a diagnosis can thus be made to
any desired level of detail.
Tbis generalization of information can also be used by human experts when they are
faced with diagnosing an unfamiliar, but similar, device. Any particular device is usually a
member of a dass of devices of a particular type. For example, a specific car battery is a
member of a set of different kinds of car batteries which are, in turn, members of a more
general set that also indudes batteries for flash lights and calculators. Each type of bat-
tery may look quite different, but they all work on the same basic principles and they all
perform a similar job: providing apower source. The different types of batteries create a
bierarchy of types of power supplies. An expert can use this knowledge about similarity
Automatie expert process fault diagnosis and supervision 329

of function to diagnose problems on devices that may not be familiar but that are similar
to one that he/she has expertise in.
Once the device has been represented to the computer using these functional primitives,
the resulting model can be used in two ways. First, it can be used as a basis for qualita-
tively simulating the device. Given specific values for the states of certain components in
the system (i.e., switch settings), it can determine the effects on the device by simulation.
This can be done both for correct and malfunctioning behavior. Malfunctioning behavior
is handled by providing symptoms of the failure at the beginning of the simulation (i.e.,
the switch is on, but the light is not). This simulation is based on the substances and their
qualitative levels as input and output to the components of the device.
The second way that the model can be used is to guide diagnosis. To do this, some very
general diagnostic reasoning mies to examine the functional model and isolate the prob-
lem to a specific functional unit or units are used by the functional expert system. These
mies are based upon the values of the inputs and outputs of the functional units. They
are:
1. If an output from a functional unit is unknown, ask the user;
2. If an output from a functional unit appears incorrect, check its input;
3. If an input to a functional unit appears incorrect, check the source of the input;
4. If the input of a functional unit appears to be correct but the output is not, assume
that something is wrong with the functional unit being examined.
Once the problem has been isolated, more specific analysis can proceed in the same man-
ner at a lower level, since the knowledge representation can be hierarchical. When a fault
hypothesis is confirmed with enough confidence, an attempt is made to repair the be-
lieved problem. If the repair fails, then the expert restarts the diagnostic process. This
process is followed no matter what type of knowledge is guiding the human at any given
point in time, whether it be experiential, fundamental, common sense, or some other
type.
In Yoon and Hammer, (1988), an aiding approach has been described and evaluated for
novel fault diagnosis in complex electromechanical systems. This approach is an alterna-
tive to the one presented in Sectiom 4.3.1.2. The emphasis is on novel rather than rou-
tine faults. It contains a qualitative model that may correspond to the human's internal
model of the system. This model represents knowledge only of how the system behaves.
Therefore, this aiding approach does not rely on proceduralized knowledge. The qualita-
tive model is the basis for much of the aiding that takes place.
Although the definition of novel fault diagnosis was rather narrow in this research, its
results may find generalization in any diagnostic task that involves causal reasoning
based on deep knowledge. The experimental results (fault diagnosis of an orbital
refueling system to refuel orbital satellites with hydrazine) confirmed that a deep-
reasoning diagnosis can be aided, without disturbing the human diagnostic procedure, by
providing relevant information.
330 Real time fault monitoring of industrial processes

However, the results also suggested that the aiding information should be compatible
with the human information processing. Thus, to design a successful information aid,
human diagnostic activity should be first understood from the viewpoint of human infor-
mation processing.
In diagnosis tasks, system behavior in different modes is investifated: normal behavior,
actual behavior, and hypothetical behavior. The results suggest that normal system be-
havior, while intuitively important, should not be presented prominently. This is perhaps
because the human is interested in actual behavior. Abnormal system behavior was found
to be very important in diagnosis. Simply presenting integrated, actual (observed) system
behavior topologically has the comparable effect of improving diagnostic performance.
There are several applications of troubleshooting expert systems which are currently
used in the industrial electrical-mechanical sector to perform quick and accurate diagno-
sis. By examining some of the practical existing systems, one can acquire information
which can aid the development of new fault diagnosis systems (see Chen and Ishiko,
1990).
Conceivably, General Electric's (GE's) Computer-Aided TroubleShooting (CA TS-l),
mainly used at locomotive minor maintenance repair shops, is the earliest and most weil
known expert system in troubleshooting. In its current status, CATS-l limits its trouble-
shooters to the ideas and disciplines of the expert whose knowledge was programmed in
the system. What GE intends to add to the system in future development, is a rule editor
that facilitates the creation of new rules for the technician without having to concern
himself with knowledge engineers. AT&T Bell's Automated Cable Expertise (ACE), a
knowledge-based, cable-trouble analysis system to diagnose cable troubles is used in the
Texas facilities at Dallas and Houston. The ACE was designed to analyze, autornatically,
the large amounts of daily complaints and maintenance reports and help the managers to
investigate the data contained in the complaint system. The Testing Operations
Provisioning Administration System-Expert System (TOPAS-ES) is a real time,
distributed, multi-task expert system for switched circuit network maintenance. TOPAS-
ES is one of the most eminent troubleshooting expert systems because of its ability to
handle the volume of trouble reports, perform its analysis, and continuously support
proper trouble resolution. AT&T Bell's Interactive Repair Assistant (fRA) is a
knowledge-based (KB) system that provides expert troubleshooting advice to the tele-
phone company's mobile field technicians. Unlike most KB systems, !RA is capable of
delivering real time expert advice to many remote users.
For the development ofthe following three expert systems, the vendors chose commer-
cial expert system shells. Termi-point Expert Maintenance System (TEMS), is an expert
system for supporting wire-bonding machine maintenance. TEMS can diagnose certain
machine problems in few minutes while the human experts solve the same problems in an
hour or more. Coordinate Measuring Machine Professional (CMM-PRO), is used by
Rockwell International in Colorado, to assist in the precise calibration and servicing of
state-of-the-art computer-controlled coordinate measuring machines. When CMM-PRO
Automatie expert process fault diagnosis and supervision 331

was put in use, it became a powernd training and learning tool for novice engineers. The
COOKER, Cooker sterilizer expert system, used by Campbell Soup Co., can check large
sterilizer cookers for malfunctions and can guide the start-up and shutdown procedures
of the sterilizer operations. Since the system has been instalied in eight plants, COOKER
has been used day and night by the plant personnel to diagnose cooker problems without
the need of consulting human experts.
EXACT(Expert systemjor Automobile air-conditioner Compressor Troubleshooting) is
an expert system developed by Toyoda Automatic Loom Work to support field engi-
neers to conduct quickly the effective diagnosis of compressor troubles. EXACT runs
under three software environments in the PC: EXSYS, FOXBASE+, and AutoCAD.
The expert system module is handled by the EXSYS environment, EXSYS shows the
results of the diagnosis using an executable external program created in QuickBASIC.
The FOXBASE environment handles the user interface and database management fea-
tures. On the results screen of the expert system, a pictorial information module is pro-
vided under the AutoCAD environment for locating the trouble area in the compressor.
Fifty real-world cases have been used to evaluate the system performance (Chen and
Isbiko, 1990).
The ALLfTM Plant Monitoring and Diagnostics (M&D) system. The integration of
various monitoring and diagnostic systems with plant process parameters is realized by
the Westinghouse ALLyTMPlant Monitoring and Diagnostic System. It is the envi-
ronment where diagnostic evaluations are made on plant-wide equipment and systems
and the mechanism where information is managed and displayed.
Monitoring and diagnostics (M&D) systems are weIl recognized as a vital tool to help
maintain structure, system and component (SSC) integrity and reliability. These systems
are generally designed to measure key SSC parameters such that an assessment of their
performance can be made. Some of the more advanced systems have the added capability
of performing a diagnosis and determining equipment malfunctions and associated
contributing factors. For example, not only is it important to detect equipment perform-
ance deviations in a timely manner but it is equally necessary to have the information ex-
plaining the occurrence. Then, and only then, can proper maintenance actions be pre-
scribed and future failures/degradations be eliminated.
In order for such systems to be effective, they need to have access to a relatively large
and distributed set of information. Information directly acquired trom the M&D system's
local sensor set is necessary but usually insufficient by itself. Global plant process pa-
rameters measuring the dynamic behavior of other associated equipment and systems are
equally necessary for a complete diagnosis and root cause analysis. Although tbis infor-
mation has always been available through plant computers, accessing tbis information has
been difficult because these computers do not typically have sufficient 110 for diverse
M&D networks. Moreover, most M&D systems are designed without a common arcbi-
tecture thereby making the connection to the plant computer information systems custom
engineered and uneconomic. Tbis difficulty often results in a compromise between
332 Real time fault monitoring of industrial processes

standardization and cost. The M&D systems, networked together and linked to the plant
information systems, represent a tremendous wealth of information. So much, in fact,
that the distributed architecture of the M&D network becomes a necessity, simply be-
cause of the quantity of data it must process.
The basic design of ALLyTM implements a Blackboard Architecture, in which all M&D
systems and knowledge sources revolve around a common data base of information, the
Blackboard. Like the name suggests, it is a mechanism where different knowledge
representations of the plant exist and are available for any application. When a
knowledge source executes, it reads the Blackboard for its necessary inputs, processes
them, and puts the results back on the Blackboard. Other knowledge sourees, in turn,
use the newly acquired information to further enhance the current model of the plant.
The Blackboard architecture consists of the Blackboard itself, a Control Mechanism,
Knowledge Sourees, and Graphical User Interface.
Along with time based functionality, the system has been designed with high perform-
ance and fast execution in mind. This is the means by which the end users are provided
with time relevant information. While most other commercial knowledge sources had
their beginnings in Lisp versions and then converted to C, ALLyTM plant monitoring
and diagnostic system has been exclusively written in C and C++ (an object oriented
programming language) with performance as a primary objective. Dnly standard UNIX
system calls, sockets for inter-processor communication, shared memory for intra-proc-
essor communications, and signals for interrupts are used. ALLyTM is currently imple-
mented on both Sun Spare and Hewlett Packard 9000 UNIX platforms. Requirements
are: 32 Mb of RAM, 100 Mb swap space, and 600 Mb hard disko As a benchmark, the
HP version of the rule based knowledge source has been measured to fire over 8000
simple rules per second.
JET-X is a PC-based expert system, developed by GE, for support of the TF34-GE-IOO
engine on the A-IO aircraft. JET-X was designed for use with the existing TF34 Ground
Station GEMS IV. GEMS provides for the display of trend and event data acquired by
the on board Turbine Engine Monitoring System (TEMS). Both GEMS and TEMS are
capable of generating alarms that may indicate engine problems. Maintenance personnel
are expected to respond to these alarms by diagnosing potential problems and
performing appropriate maintenance actions. JET -X's principal role is to capture the
diagnostic procedures in an interactive environment that does not limit the size or
complexity ofthe decision trees used (Doel, 1990).
HELIX (Helicopter Integrated Expert) is a helicopter real-time diagnostic expert system
using a causal model of the helicopter's engines, transmission, flight control and rotors.
At the heart ofthe HELIX program is a Qualitative Reasoning System (QRS). The QRS
is a general mechanism to support the creation of hierarchical device models and reason-
ing about device behavior using qualitative physics. The HELIX qualitative model is rep-
resented as a set of constraints that define the normal behavior of the engines, transmis-
sion, flight control, and rotors of the helicopter. Aircraft health is assessed by
Automatie expert process fault diagnosis and supervision 333

determining whether observations (sensor readings and pilot control inputs) are
consistent with the constraints of the model. If an inconsistency is detected, a process of
systematie constraint suspension is used to test various failure hypotheses.
Critical to the efficient operation of the HELIX program is the hierarchical model repre-
sentation, which enables reasoning at various levels of abstraction. Using a top-down
approach, the diagnostic process exploits the hierarchy by beginning fault isolation with
the most reduced form of the model. To refine the diagnosis, a branch of the hierarchy
may be expanded until a component-level diagnosis is made. The hierarchy also greatly
reduces the complexity of multiple failure diagnosis. Rather than considering combina-
tions of failures in allieaf components, the diagnosis can be restricted to combinations of
branches in the hierarchy.
HELIX has been successfully tested on a variety of simulated failures. By representing
only the normal behavior of the helicopter and testing hypotheses by constraint suspen-
sion, HELIX has been able to diagnose single or multiple failures without prior knowl-
edge of failure modes. The approach represents a promising technique for automating
the qualitative reasoning required to diagnose novel failures and may form the basis for
extensive automation both in airbome and ground-based diagnostic systems (Hamilton,
1988).
ENGEXP is an integrated environment in PC- TurboPascal for the development and ap-
plieation of expert systems to the quick fault diagnosis and repair assistance of
equipment and engine. The main components of ENGEXP are the user interface, the
expert system core, the feedback and knowledge acquisition tool, and a help facility for
the user. As usual, the expert system core contains the inference engine, the knowledge
base(s) and the explanation subsystem. A particular feature ofthe inference engine is that
it can resolve situations with multiple causes. The knowledge is organized into three
layers as is done by the human experts. The qualitative analysis of the results obtained in
a large sampie of applications showed a remarkable performance, while the quantitative
analysis showed a success rate over 95%. The response time is very good (always ~ 2
sec., in many cases ~ 0.5 sec.) despite the overhead caused by the user-friendly user
interface (Tzafestas and Konstantinidis, 1992).
FAKS is an expert system for diagnosis and recommendations reports produced on board
the ship for the Wartsila Diesel VASA series of ship engines. At the first step in the di-
agnosing process FAKS creates two mathematical models, one of the "Ideal Engine"
based on laboratory data and one of the "Real Engine" based on data from the Diesel
engine monitoring system. Every 15 minutes a comparison between these two models is
made and the result is "flashed out" through the entire knowledge base. Diagnosis is the
result of the automatie continuous evaluation of the engine and contributes to safe, con-
tinuousand economic operation. Good possibilities are also given to improve the safety
at sea (Ahlqvist, 1990).
334 Real time fault monitoring of industrial processes

SCAR and SCAR-2 are rule-based systems for assisting electricians in diagnosing faults in
a shuttle car quickly. A shuttle car is a vehicle used to transport coal from the working
face, where mining takes place, to the primary haulage system, such as a conveyor belt.
The shuttle car is a key element in the mining cycle, since no coal can be mined if a car is
not available. Insight 2+, a microcomputer-based expert system development tool by
Level Five Research Inc. is used created the system. The program requires the user to
specify the initial symptoms of the failed machine, and the most probable cause of failure
is traced through the knowledge base, with the software requesting additional
information, such as voltage or resistance measurements, as needed. A causai-reasoning
approach was used to develop the production rules. Generalized systematic procedures
for creating and organizing the knowledge base, the incorporation of on-screen
presentations of the shuttle car circuit schematic, the development of reference and
tutoriai programs and a microcomputer implementation that resists moisture, oil and dust
are some features ofthe more advanced SCAR-2 system (Novak et al., 1989).
HEPHAESTOS is an interactive expert system for quick fault diagnosis of electric ma-
chines. Electric machines playa very important part in furnishing power for a11 types of
domestic and industrial applications. Any fault on a electric machine insta11ed in a line
production process, results off-line repairing which disturbs the line production process.
HEPHAESTOS is a rule-based expert system with efficient and quick reasoning,
containing as much knowledge as possible concerning the fault occurrence, repairing and
maintenance of electric machines. The knowledge required to build this expert system
has been acquired from expert engineers working for years at workshops of the Public
Power Corporation of Greece, construction companies, many instruction manuals,
maintenance handbooks and trouble charts. One year real life tests of the system have
been performed. The knowledge base can be expanded according to the comments of the
users (protopapas et al. , 1990).
IDM (Integrated Diagnostic Model) is a hybrid expert system for real-time fault diagno-
sis and repair assistance of mechanical and electrical devices which integrates shallow
and deep knowledge. The shallow knowledge is stored in the experientiai knowledge
base and the deep knowledge is stored in the physical knowledge base. The physical
knowledge base contains functional models of subsystems. Each of these two types of
knowledge is structured and represented in a way that is natural for that type of knowl-
edge. Each of the two knowledge bases has its own inference engine. Thus, each is an
expert system in its own right. These two expert systems are then integrated into a single
expert system via an executor that can draw on either expert system in diagnosing a par-
ticular problem. The executor contains its own knowledge base that maintains aglobai
representation of what is known and unknown so far about the device under diagnosis.
This knowledge base also provides a means of "translating" the knowledge between the
experiential and fundamental knowledge bases. The result is an expert system that can
solve problems even when no experiential knowledge exists to handle the situation, re-
sulting in a more graceful degradation of capabilities at the periphery of its knowledge. It
can also provide different levels of explanation. Two prototype systems have been im-
Automatie expert process fault diagnosis and supervision 335

plemented, the first for fault diagnosis in a simple thermostat-controlled gas heating sys-
tem, the second for fault diagnosis in the electrical system of an automobile. Recent
work on the IDM involves implementation to handle truly analog devices (Fink and
Lusth, 1987).
Finally, DESPLATE is an expert system designed by Ng et al., (1990), to diagnose
abnormal plan view shapes of steel plates in a plate mill. Abrief overview of the
production process in the plate mill and the problems that motivated the development of
DESPLATE are provided. The goal ofDESPLATE is to locate the possible causes such
as electrical failures, mechanical breakdoWDs or wear, operational errors, and pre-rolling
conditions for a particular abnormal shape, and to suggest appropriate remedies. Some
of the issues arisen in developing DESPLATE are addressed, including the use of
graphics for user interface, forward and backward chaining techniques, and knowledge
acquisition methodologies from multiple experts from different disciplinary backgrounds.

4.3.4. Automatie expert fault diagnosis for maehine tools, robots and
CIMsystems

Computer Integrated Manujacturing (CIM) is an up-to-date research and engineering


application field involving many different fields of technology: manufacturing manage-
ment, industrial engineering, computer aided design-engineering-manufacturing, material
handling, control and automation.
Although control methodologies are being practiced for some time and experience has
been gained, this is not the case for the monitoring requirements of CIM projects. In
controlling a production system workstation, it is required to integrate the monitoring
functions of all components and of the global controller (the components being the ma-
chine tools, the robots, local transport system and tool management, etc.). At the process
level, part scheduling and dispatching are also to be monitored.
Monitoring was, in the past, limited to reading sensors (monitoring sensors) for detection
of abnormal conditions often followed by invoking an emergency action such as machine
shutdoWD. Advanced techniques for monitoring are required to cope with actual system
complexity (Computers in Industry, 1986). Some requirements in monitoring such as
diagnosis and decision can be built using AI techniques. The integration of such tech-
niques is not a new concept in complex automation projects, but it gives rise to a deep
discussion on the use of such tools for specific problems, also relating some equivalent
available tools in a complementary way. One of the available tools is Petri nets, a
commonly used tool for FMS control system specification and implementation. An ap-
proach has been developed by Sahraoui et al., (1987), to design a production system
workstation monitoring system. The approach is best illustrated by fig. 4.20.
Fuzzy classification is also applicable in on-line machine tool and wear fault diagnosis
(Computers in Industry, 1986). Classification is a subdivision of a set of elements or
336 Real time fault monitoring of industrial processes

objects specified by features into a set of different sub sets each with special importance.
In the case of complex systems, one should use all possible information about the system.
This is primary information from measurements and expert consultation and secondary
information that is a result of signal processing (FFT, regression analysis, statistical aids,
parameter estimation and so on). All information together is calledfeatures.

~_._-------------- - -- - --- - - -- - -- - -- - - - - - - - ...

To higherlevel

MONITORING

OPERATOR

_ _ _ _ _ _ _ _ _ _ _ _ ...J

Sensor
PROCESS

Figure 4.20 A production system workstation monitoring system.

The capabilities of classification are fundamental ordering and structure building. In the
structure building step one gets a cluster conflguration. Clusters found by cluster
methods are of a natural type. Classical non-fuzzy clustering methods have the disadvan-
tage that the classifier yields only crisp membership values (0 or 1) and also a weil de-
fined result for an object with a great distance from every class. This is very risky be-
cause there could be an unknown class. The fuzzy concept is then suitable for the
description of overlapping classes and is able to refuse objects. This is possible because
the membership of every class in such a case is very small. The essence of the fuzzy view
is that the membership value of a set element cannot only be 0 or 1 but lies between 0
and 1 (in diagnosis this is the grade of membership to the normal of a special failure
class). In Computers in Industry, (1986), a wear fault diagnosis system of conveyor belt
rolls based on fuzzy classification is described.
Automatic monitoring is not limited to cell and plant monitoring as explained in the pre-
ceding section. In fact there are already some embedded monitoring functions in actual
machine tools such as:
Automatie expert process fault diagnosis and supervision 337

• Drilling machines witb detection of tool breakage at tbe end of cycle.


• Turning machines witb torque measurement.
• Latbe machines using consumed electric power measurement in order to estimate
tool wear.
Concerning tbe tool wear and prediction of breakage witb pattern recognition tecb-
niques, Neumann, (1990), bas already publisbed promising results.
In order to maintain continued operation, it is essential to detect a tool breakage because
sucb an event can occur at any time and can cause damage to tbe tool and tbe part. In
certain operations sucb as milling, small breakages cannot be recognized by monitoring
an increase of tbe torque. A common event is tbe breakage of tbe cutting edge and this
also can bardly be detected since tbe average torque in one turn does not cbange.
An on-line breakage detection algorithm whicb can be exploited from tbe fault diagnosis
ES can be tbe following (Computers in Industry, 1986):
Tbe cutting torque discrete signal x(k) wbere k=0, 1, ... is tbe integer sampie interval, is
considered. It is assumed tbat x(k) can be expressed by a linear combination of tbe
preceding sampled cutting torque signals as,
x(k) = ao + alx(k-l) + ... + a~(k-n) + w(k)
wbere w(k) is gaussian wbite noise wbose mean value and variance are zero and 02 re-
spectively. This expression is an AR (autoregresive) model of order n. Tbe coefficient
vector,
fI(k) = [ao al ... anF
of tbe above AR model can be estimated in real time from tbe preceding observations by
tbe recursive least squares estimation algorithrns extensively described in Cbapter 3.
Tbe predicted value at tbe next observation x(k) is obtained by tbe previous observations
P(k) = [x(k-l) x(k-2) ... x(k-n)] and tbe estimated coefficient vector q,(k) at tbe
(k-l)th instance oftime as: x(k) = fT (k);'(k).
Tbe observed value x(k) is compared witb tbe predicted, and tbe residual
eCk) = x(k) - x(k) is calculated. When tool breakage occurs, tbe cutting state cbanges
suddenly. This leads to an unexpected cbange of tbe cutting torque and tbus in a sudden
increase of tbe residual. In order to judge tbe breakage of tbe tool, tbe "normalized re-
sidual" is used, whicb is given by tbe following equation:

d(k) = le(k)l{{e 2(k - m)+e2(k - m + 1)+ ... +e2(k -1)] / m}-1I2 rl


When plotting tbe normalized residual against tbe cutting time, tool breakage is pre-
dicted by a peak. Tbe Generalized likelihood ratio (GLR) techniques, described in
Cbapter 2 and applied also in Cbapter 3 for automatic detection of abrupt signal or pa-
rameter cbanges, can be used bere.
338 Real time fault monitoring of industrial processes

A second type of detection which can be designed, is the signal processing of the
acoustic signal (see Chapters 1 and 6). It has been shown that it is relatively easy to ana-
lyze the cross-correlation existing between the frequency of the signal due to the parts
turning and the cutting force measurement. The conclusions of such analysis can be a
basis for implementing an adaptive algorithm to improve the quality of machining, since
the forming of parts turnings is an important parameter (Sahraoui, 1987, Neumann,
1990).
Tool wear can also be detected through power consumption using the empirical formula,
kWb=T·P
where kWb is the power required, T is the material removal rate (in 10-6 m3 ) and P is the
amount of removed material per unit time.
The monitoring of sequential steps can be ensured by using a Petri net-based
specification. Three modules MI, M2 and M3 can be envisaged: MI to monitor the
initialization of the machine, M2 for monitoring during operations and M3 for exception
handling.
Faults in the machine tool motor (load, electric components) can be detected via pattern
recognition and spectrum of current (see also Section l.3.2). The quotient of amplitudes
at the odd harmonics of 50 Hz in the current spectrum shows the form of the magnetic
field. Changes in load or geometric changes in the machine influence the waveform ofthe
magnetic field. Furthermore, the current amplitudes give some information about the
momentary load and position in the working cycle. Faults in other mechanical compo-
nents (gear box, belt drive, excenter) can be detected by evaluation of rotation speed and
parameter estimation (see Chapter 3).
A rule-based system can be designed to detect on-line tool wear and machine malfunc-
tion from such information. The action on the machine tool is to correct the tool in ad-
vance and to invoke an emergency shutdown. These issues and implementation aspects
are fully discussed by Arreguy et al. , (1990), and Monostori et al. (1990).
Freyermuth, (1991), introduced a computer assisted incipient fault diagnosis system for
industrial robots, with the objective to detect and diagnose faults in the mechanical part
of the devices at a relative1y early stage to prevent subsequent damages. F or this purpose
a suitable combination of analytical and heuristic tools was developed. The analytical
symptom generation procedure comprises of the detailed mathematical modeling of the
robot's different axes. The parameters of the models directly represent characteristic
physical quantities (process coefficients) being identified by specific continuous-time
parameter estimation algorithms. The estimated quantities then undergo a statistical
evaluation. Deviation of coefficients from nominal values are considered as symptoms
(see also Chapter 3). The subsequent heuristic evaluation processes these symptoms
based on specific fault-symptom-trees and knowledge of fault statistics and process
history using a specific inference mechanism (see also Appendix 4.A). The developed
diagnosis system can be realized on a PC with a process interface.
Automatie expert process fault diagnosis and supcrvision 339

Pouliezos and Stavrakakis, (1989), proposed a fault detection mechanism for a ro-
botIcontroller combination to acbieve optimum robot performance at all times. A detec-
tion mechanism based on logarithmic likelihood ratios combined with parameter estima-
tion through RLS with forgetting factor, making it essentially a moving window estima-
tor, has been shown to work quite satisfactorily (see Chapters 2 and 3). Fault symptom
trees can then be elaborated to built an ES fault diagnosis tool as previously (see
Appendix 4.A).
An application example concerning the finding of cause(s) for an observed symptom in a
CIM setting, using hybrid expert diagnosis procedures (see Appendix 4.A) is described
by Lee, (1990). As shown in fig. 4.21, tbis CIM system consists of3 workstations (i.e.
machines), 1 main conveyor, 3 subconveyors, 2 material handling robots, 1
programmable controller, and 1 computer terminal. The three machines are a CNC
(Computer Numerical Control) machine, a GCA robot and a General Electric P50 robot.
The two material handling robots are a Unimation PUMA 761 and a General Electric
MH33.

I funpu~r I
Terminal L....-~_naoIJ_er_Ic
.......1 d Convcy« 2

Main Convcyor ij
OMH33

111111--.
Convcyor3

GCA PSO

Figure 4.21 CIM system layout.

As the raw material comes in on Conveyor 1 (in wbich there is no pallet), the PUMA
picks it up and places it on the CNC. The CNC processes the raw material into the spe-
cific pattern by performing the turning operation. When the CNC finishes its operation,
the PUMA picks it up and places it on a pallet on the main conveyor. The pallet arrives
at the GCA on the main conveyor and is first identified as type x or y depending on its
340 Real time fault monitoring of industrial processes

bar code to be read by the laser bar code reader located at the GCA. If the pallet is of
type x, then the GCA does operation A and releases the pallet x onto the main conveyor.
Similarly, ifthe pallet type is y, then operation Bis performed. At the P50, operation C is
performed if the pallet is of type y, and no operation if it is of type x. When the proc-
essed material (or part) arrives at the MH33, it is unloaded from the main conveyor to
Conveyor 2 if it is of type x, and to Conveyor 3 if it is of type y. Finally empty pallets
travel to the CNC (at which point the cycle begins) via the main conveyor.
For the purpose ofthe diagnosis model, the CIM system is modeled in the way shown in
fig. 4.21. Since there are 9 terminal nodes in the functional hierarchy, there are 9 shallow
KBs. One of these shallow KBs is shown in the lower half of fig. 4.22 for the GCA.
Every node in this shallow KB is represented by a pseudo-rule, defined as a rule of which
the consequent part is not explicitly specified. In this sense, this shallow KB is a pseudo-
shallow KB. To be complete, one has to specify an antecedent and a consequent. For
example,

R 111: If GCA will not respond


then select the rule that has lower entropy between R211 and R212 .
R 31l : Ifthere is no pallet at GCA
then the gate must be broken.

Referring to the strategy outlined in Appendix 4.A, it will be explained how it works
through use offig. 4.21, which shows both the deep KB and the shaUow KB.
Suppose that one (or a sensor) observes a symptom indicating that apart is not being
processed on a machine. This leads to the deep KB starting at F oo (however, ifthere is a
good mapping heuristic, it leads to a specific node other than F oo ). At Stratum 0, using
the breadth-first search, q11 (=0.65) is greater than q12 (=0.35). So Machines (F 11) is the
next destination. By the same search, one arrives at the terminal node GCA (F32) via
Special-purpose Machines (F22 ). At this point, one moves on to the shallow KB attached
to F32 . That is, one is at R 1l1 , at which point the entropy calculation begins. To calculate
the entropy at R2l1 (i.e. H2l1 ), the data for aU child nodes at Echelon 3 must be known.
H211 is given by,
H 2l1 = {w311P311 lnp311 + wmP312 lnp312 + w313P313 lnp313}
= - {(0.2) (0.2) In(0.2) + (0.2) (0.4) In(O.4) + (0.2) (0.4) In(O.4)}
= 0.2110
where,
W311 = test cost of R311 /maximum test cost of shallow KB

= $10/$50 (from the test cost of R 323 )

= 0.2
Automatie expert process fauIt dia gnosis and supervision 341

and w312 and w313 are obtained similarly. In the same manner, H 212 can be obtained (i.e.
H212 = 0.9130).

Figure 4.22 CIM system examplc diagnosis.

Thus the decision can be made at R 111 . Since H 21l is less than H 212 , the search moves
down to R211 . This procedure is repeated until the terminal node is reached. The results
of entropy calculations (and costlbelief ratios if there is a tie in entropy values) are
shown in Table 4.4 where (a) is for the R211 group and (b) is for the R212 group. Note
that the entropy for a terminal node is zero, because the terminal node has no children.
From Table 4.4, the first terminal node to be tested is R313 . Ifit turns out that there is no
bar code on the pallet after the test, then tbis pallet is faulty. The diagnostic session stops
at tbis point. However, ifthe problem still exists although tbis pallet is already taken care
of, the observed symptom may have multiple faults. In this case, the diagnostic session
resumes from R311 wbich has the next lowest entropy (actual costlbelief ratio) at the
same echelon. If the test result for R 313 indicates no faults, the diagnostic session also
resurnes from R311 . Should the entire shallow KB contain no faults, the diagnostic pro-
cedure has to go back to the terminal node ofthe deep KB (i.e. F32 ) and update Q32'
342 Real time fault monitoring of industrial processes

Table 4.4 Calculation results for all nodes.

(a) R2II group (b) R2I2 group

Node (R) Entropy(H) Ratio (r) Node (R) Entropy(H) Ratio (r)
211 0.2110 212 0.9130
311 0.0000 1.0 321 0.0000 2.0
312 0.3576 322 0.2220
313 0.0000 0.5 323 0.0000 3.3
421 0.0000 1.5 324 0.0000 2.7
422 0.0000 1.0 451 0.0000 0.9
423 0.0000 0.5 452 0.0000 0.7

By the updating heuristic in the foregoing section, one will find the parent node of F 32
(where b = 3 and e = 2) as shown in Table 4.5 (see Appendix 4.A).
The second subscript GN(2) is 2. Therefore the parent node is F22 . Also,
EN(3) = e - (cumulativeNbefore GN(2» = 2 - 1 = 1
Based on this result, it follows that q'32 = O.

Table 4.5 Finding the parent node.

Node (F) No. ofelements (N) CumuJative N

21 1 l<c=2
22 2 3>c=2
23 4 7
24 2 9

(since Nb-I, GN(b-I) = N 22 = 2 and t iI: EN(b) = EN(3) = 1, t = 2). Also considering
k il:e = 2, k = [e - EN(b)] + t = (2 - 1) + 2 = 3
qh = q3y(1 - q32) = 0.25/(1 - 0.75) = 1.0
At Stratum 2, the updated responsibility probability for the parent node F22 becomes,
qZ2 = q22 - q22 x (q32 - qh)
= 0.75 - 0.75 x (0.75 - 0) = 0.l875.
By applying the same methods, the parent node of F22 is F II . The updating for F2I is,
q21 = q21 + [q21 / (l-q22)} x q22 x (q32-qh>
= 0.25 + {0.25 / (1- 0.75)} x (0.75-0) = 0.8125
Automatie expert process fault diagnosis and supervision 343

At Stratum 1, similarly one has,


qil = qjj - qJ1 x (q2r q22)
= 0.65 - 0.65 x (0.75 - 0.1875)
= 0.2844
qi2 = q12 + {q12/(1 - qjj)} x qjj x (q22 - qh)
= 0.35 + {0.35/(l - 0.65)} x 0.65 x (0.75 - 0.1875)
= 0.7156
The updating procedure ends here because the current stratum is Stratum 1. This update
is summarized in fig. 4.23 where the number in the parenthesis shows the previous re-
sponsibility probability.
Hence the diagnostic session continues from F 12 on since, after updating, ql2 far greater
than qll (note that before updating, q12<qjj).

Figure 4.23 Updated probabilities in deep KB for the CIM system diagnosis.

The model is implemented by Lee et al., (1990), on IntelliCorp's Knowledge


Engineering Environment (KEE). The implementation is realized by specifying each of
KBs including one deep KB and its corresponding shallow KBs.

4.4 COllclusiolls

Methodologies for modeling failure knowledge about complex industrial systems and for
utilizing these models for real-time reasoning about, and diagnosing, failures have been
presented in this chapter. Their key component is the modular frarnework used for expert
knowledge-based failure analysis. This modularity permits easy implementations of fault
diagnosis programs, based on this framework, for different industrial systems, since the
only domain-dependent components that must be replaced are the specific failure models.
344 Real time fault monitoring of industrial processes

The necessary theoretical background regarding the more recent methods, is given in the
Appendices of the Chapter for the interested reader. Many application examples from a
wide industrial practice have been cited, with sufficient details giving the reader the
ability to develop (his)her own applications.
In parallel with the development of automatie expert diagnosis and knowledge-based
decision support systems (DSS), concern has been growing regarding the potential for
catastrophic errors created by these systems and, what is worse, the potential for
catastrophes whose causes cannot be established. Concern for the risks associated with
expert systems is now so strong, that it has spilled over into public discussion, such as a
British television program in which American and British practitioners and critics argued
the dangers of using expert systems in medical, industrial, military and other
applications. Also, there have been calls for restrictions on the deployment of
unsupervised or autonomous system in safety critical situations (The Boden Report,
1989).
Additional issues for the designers of ES, whether these are statistical, knowledge-based
or hybrid systems, fall into two main categories: performance and responsibility.
Performance issues.
1. The decision procedure used by the ES must perform weIl (make or recommend
good decisions), even in the face of degraded data. Robustness entails being able to
assess the reliability of information sources and to seek alternatives where necessary,
as weIl as merely cope with uncertainty.
2. Few practical situations involve just one c1ass of decision (e.g. diagnosis); ES theory
must surely address the problem of deciding what decision is required.
3. Many practical automata must face rapidly changing situations, not only in the infor-
mation available but also in the problem that needs to be solved. Expert systems must
incorporate capabilities for altering their decision goals as circumstances develop.
The central requirements for meeting these demands inc1ude the ability of an ES to be
rationally flexible, i.e.:
1. Recognize that adecision is needed.
2. Identify the kind of decision it iso
3. Establish a strategy for making it.
4. Formulate the decision options.
5. Revise any or all of the above in the light of new information.
An expert decision system should be capable of autonomously invoking and scheduling
these processes as circumstances demand. Classical ES theory offers little guidance for
developing the necessary techniques.
Automatie expert process fault diagnosis and supervision 345

Responsibility issues.
One must also aehieve a high level of eommunieation between human supervisors and/or
auditors wishing to examine and potentially to intervene in any aspect of the automatie
expert decision proeess.
1. If the automatie expert decisions lead to errors, it must be possible to establish the
reasons for those errors.
2. Where it is practieal and appropriate, provision should be made for a skilled supervi-
sor to exercise overriding eontrol.
In general an automatie expert deeision maker needs to be able to reflect on the decision
proeedure, to be able to examine the:
3. Decision options (what ehoiees exist).
4. Data (the information available that is potentially relevant to a ehoice).
5. Assumptions (about viability ofoptions, reliability ofdata ete.).
6. Conclusions (in light of data and knowledge ofthe setting).
Reflective eapabilities should extend to the decision process itself, including:
7. The goals ofthe decision (what is the decision supposed to aehieve).
8. The methods being pursued (what justifies the eurrent strategy).
9. Charaeteristies of speeifie procedures (applieability eonditions, reliability, eomplete-
ness ete.).
A theory of expert rational decision making must aeknowledge these requirements.
Classieal expert deeision procedures may be optimal in the sense that they promise to
maximize the expeeted benefits to the decision maker, but they must be viewed as unsat-
isfactory in other ways. It seems that connectionist models are drawing inereasing inter-
est as useful tools for expert decision making whieh ean aecomplish the above require-
ments. These methods allow an automatie clever combination of different type knowl-
edge bases for a speeific application, thereby reducing the effort required for an equiva-
lent expert system development. The conneetionist models also resemble the type of fine-
grained parallelism of eonventional symbol proeessing. The theory and practice for their
applieation to robust real-time fault diagnosis are given in Chapter 5.
Expert systems can be written in conventionallanguages like C, FORTRAN or Pascal,
whieh have the advantage of being widely known and of being the language in whieh
other programs are written, so that no integration problems oecur. However, they are
poorly adapted to the expression and the handling of knowledge expressed by words.
Languages speeifie to artificial intelligence, like Lisp or Prolog, do not have this disad-
vantage, but they are not yet widely used, and their standards are not weIl established.
All these languages have the eommon drawback that they lack programming aids. In
contrast, expert system building tools, especially shells, have the advantage that the
knowledge representation, the inferenee engine, and many other faeilities, such as re-
ports, windows, menus and forms are largely preprogrammed, faeilitating the accom-
346 Real time fault monitoring of industrial processes

plishment of the above requirements. In the early days, these tools were written in special
languages, but nowadays eonventionallanguages are used more and more often.

References

Adelman L. (1989). Measurement issues in knowledge Engineering. IEEE Transactions


in Systems Man and Cybemetics, 19, 3, p. 483.
Ahlqvist I. (1990). An expert system based diagnosis system for ship maehinery.
Proceedings, 5th. International ConJerence on Marine Technology - Athens 90, May
1990, Athens, Greeee.
Al-Jaar R. V. and AA Desroehers (1990)./ Petri nets in automation and manufaeturing.
In G. Saridis (Ed.), Advances in Automation and Robotics, 2, JAI Press. Ine., CN, p.
153.
Arreguy D. et al. (1990). Monitoring maehine tools. IEEE Ind. Eleetr. Conf,
November 1990.
Bandekar V. R. (1989). Causal models for diagnostic reasoning. ArtificialIntelligence
in Engineering, 4, 2, p. 79.
Boden M. (1989). Benefits and risks of knowledge-based systems. Report oJ Council
Jor Science and SOciety, Oxford University Press,Oxford, U.K.
Boose J. H. (1986). Expertise transfer for Expert Systems Design. Elsevier, New York.
Boose, J. H., Bradshaw, J. M., (1988) "Expertise transfer and complex problems: Using
AQUINAS as a knowledge aequisition workbench for knowledge-based systems", Int. J.
Man-Maehine Studies, 2, p. 39.
Boose J. H. and B.R. Gaines (1988). Knowledge acquisition tools for expert systems. In
B. R. Gaines and J. H. Boose (Eds), Knowledge Acquisition Tools Jor Expert Systems,
Vol. 2, Acadernic Press, London.
Brailsford J. et al. (1990). FAUST: a 132 kV fault diagnosis system. The Electricity
Council Research Center, ECRClM2460, Capenhurst, Chester, CHI 6ES, u.K.
Brule J. F. and A Blount (1989). Knowledge acquisition. Mc Graw HilI, N.Y.
Carriero N. and D. Gelernter (1989). How to write parallel programs: a guide to the
perplexed. ACM Computing Surveys, 21, 3, p. 323.
Chen J. G. and K. Ishiko (1990). Automobile air-conditioner compressor troubleshoot-
ing - an expert system approach. Computers in Industry, 13, p. 337.
Computers in Industry (1986). Special Issue on Computer Aided Testing and Diagnosis,
7, p. 3-82.
Automatie expert process fault diagnosis and supervision 347

Contini S. (1987). SALP-PC. A fault-tree analysis package on PC. Technical Note I.


87.165 - PER 1427/87, JRC-Ispra Establishment, Commission of European
Communities.
Doel D. L. (1990). The role for Expert Systems in Commercial gas turbine engine
monitoring. Proceedings, Gas turbine and Aeroengine Congress, June 11-14, 1990.
Brussels, Belgium.
Dolins S. B. and J.D. Reese (1992). A curve interpretation and diagnostic technique for
industrial Processes. IEEE Transactions on Industry Applications, 28, 1, p. 261.
Dounias G., Vassilakis P. and V. Moustakis (1993). A model of fault diagnosis using
inductive leaming techniques. Proceedings, IEEE ''Athens Power Tech '93", Sept. 5-8,
Athens, Greece.
Fink K P. and J. Lusth (1987). Expert Systems and diagnostic expertise in the mechani-
cal and electrical domains. IEEE Transactions on Systems, Man and Cybernetics, 17, 3,
p.340.
Forbus K D. (1987). Interpreting observations of physical systems./EEE Transactions
on Systems, Man and Cybernetics, 17,3, p. 350.
Forsythe D. E. and B.G. Buchanan (1989). Knowledge Acquisition for Expert Systems:
some pitfalls and suggestions. IEEE Transactions on Systems, Man and Cybernetics,
19, 3, p. 435.
Frank P. M. (1990). Fault diagnosis in dynamic systems using analytical and knowledge-
based redundancy - a survey and some new results. Automatica, 26, 3, p. 459.
Freiling M. J. et al. (1986). The ontological structure of a troubleshooting system for
electronic instruments. In D. Sriram and R. Adey (Eds.), Proceedings, 1st International
Conference on Applications of AI in Engineering Problems, Springer-Verlag, Berlin, p.
609.
Freyermuth B. (1991). Knowledge based incipient fault diagnosis of industrial robots.
Proceedings, IFAC/IMACS Symposium "SAFEPROCESS '91", September 10-13,
Baden-Baden, Germany.
Gertler J. J. and KC. Anderson (1992). An Evidential Reasoning Extension to
Quantitative model-based failure diagnosis./EEE Transactions on Systems, Man and
Cybernetics, 22, 2, p. 275.
Gray N. A. B. (1990). Capturing Knowledge through top-down induction of decision
trees. IEEE Expert, June 1990, p. 41.
Grogono P. et al. (1991). Expert System evaluation techniques: a selected bibliography.
Expert Systems, 8, 4, p. 227.
348 Real time fault monitoring of industrial processes

Gruber T. R. and P.R. Cohen (1987). Design for aequisition: Prineiples ofknowledge
system design to faeilitate knowledge aequisition. International Journal of Man-
Machine Studies, 26, p. 143.
Gupta A., Forgy C. and A. Newell (1989). High-speed implementations of mle-based
systems. ACM Transactions on Computer Systems, 7, 2, p. 119.
Hamilton T. P. (1988). HELIX: a helicopter diagnostic system based on qualitative
physics. Artificiallntelligence in Engineering, 3, 3, p. 141.
Hiekman F. R. et al. (1989). Analysis for knowledge-based systems. A practical guide
to the KADS methodology. Ellis Horwood Ltd., West Sussex, England.
Hudlicka E. and V. Lesser (1987). Modelling and Diagnosing problem-solving system
behavior. IEEE Transactions on Systems, Man and Cybernetics, 17,3, p. 407.
Isermann R. and B. Freyermuth (1991). Process fault diagnosis based on process model
knowledge - Part I and Part 11. ASME Journal of Dynamic Systems, Measurement and
Control, 113, p. 62l.
Johannsen G. and L. Alty (1991). Knowledge engineering for industrial expert systems.
Automatica, 27, 1, p. 97.
Johnson L. E. and N.E. Johnson (1987). Knowledge elicitation involving teachback in-
terviewing. In A. Kidd (Ed.), Knowledge Acquisition for Expert Systems: a practical
Handbook, Plenum Press, New York, p. 9l.
Kaiser G. E. et al. (1988). Database support for knowledge-based Engineering
Environments. IEEE Expert, Summer 1988, p. 18.
de Kleer J. and B.C. Williams (1987). Diagnosing multiple faults. Artificial
Intelligence, 32, p. 97.
de Kleer J (1990). Using emde probability estimates to guide diagnosis. Artificial
Intelligence, 45, p. 38l.
Klein G. A., Calderwood R. and D. Mac Gregor (1989). Critical decision method for
eliciting knowledge. IEEE Transactions on Systems, Man and Cybernetics, 19, p. 462.
Kuan K. K. and K. Warwick (1992). Real-time expert system for fault location on high-
voltage underground distribution cables. lEE Proceedings-C, 139,3, p. 235.
Lee W. Y., Alexander S. M. and JH. Graham (1990). A hybrid approach to a generic
diagnosis model. Proceedings, Fourth International Conference on Expert Systems in
Production and Operations Management, May 14-16, Hilton Head Island, SC, p. 264.
Luger G. F. and W.A. Stubblefield (1989). Artifieial Intelligence and the design of
Expert Systems. Benjamin/Cummings, N. Y.
Automatie expert process fault diagnosis and supervision 349

MacDonald B. A. and LH Witten (1989). A framework for knowledge acquisition


through techniques of concept learning. IEEE Transactions on Systems, Man and
Cybernetics, 19, p. 499.
Majstorovic V. D. (1990). Expert Systems for Diagnosis and Maintenance: the state of
the art. Computers in Industry, 15, p. 43.
Marsh C. A. (1988). The ISA expert system: a prototype system for failure diagnosis on
the space station. Proceedings, 1st International Conference on Industrial and
Engineering Applications ofAI and ES '88, Vol.1, June 1-3, p. 60.
Maruyama Y. and R Takahashi (1985). Application of fuzzy reasoning to failure diag-
nosis. Journal ofAtomic Energy Society ofJapan, 27, p. 851.
Maäberg W. and H.J. Seifert (1991). Petri net based system for monitoring, diagnosis
and therapy of failures in complex manufacturing systems. Proceedings, IFAC Fault
Detection, Supervision and Safety for Technical Processes SAFEPROCESS '91,
September 10-13, Baden-Baden, Germany 1991.
Milne R (1987). Strategies for Diagnosis. IEEE Transactions on Systems, Man and
Cybernetics, 17,3, p. 499.
Monostori L. et al. (1990). Concept of a knowledge based diagnostic system for ma-
chine tools and manufacturing cells. Computers in Industry, 15, p. 95.
Mussi S. and R Morpurgo (1990). Acquiring and representing strategie knowledge in
the diagnosis domain. Expert Systems, 7, 3, p. 157.
Narayanan N. Hand N. Viswanadham (1987). A methodology for knowledge acquisi-
tion and reasoning in failure analysis of systems. IEEE Transactions on Systems, Man
and Cybernetics, 17, 2, p. 274.
Nawab H, Lesser V. and E. Milios (1987). Diagnosis using the formal theory of a sig-
nal-processing system. IEEE Transactions on Systems, Man and Cybernetics, 17,3, p.
369.
Nebiacolombo G. et al. (1989). Methodi di formalizzazione della conoscenza basati sulle
Reti di Petri per la realizzazione di Sistemi Esperti di diagnosi. Proceedings, ANIPLA
'89, 33rd National Annual Conference on Automation, Nov. 21-23, 1989, Rome, Italy.
Neumann D., (1990). A knowledge based fault diagnosis system for the supervision of
periodically and intermittent working machine tools. Proceedings, IFAC 11th World
Congres, Tallinn, Estonia, 1990.
Ng T. S., Cung L. D. and J.F. Chicharo (1990). DESPLATE: an ES for abnormal shape
diagnosis in the plate mill. IEEE Transactions on Industry Applications, 26, 6, p. 1057.
Noore A., Cooley W. L. and RS. Nutter (1992). Detecting and masking transient fail-
ures in computers used for coal mining operations. IEEE Transactions on Industry
Applications,28, 1, p. 186.
350 Real time fault monitoring of industrial processes

Novak T. et al. (1989). Development of an Expert System for Diagnosing component-


level failures in a shuttle car. IEEE Transactions o/Industry Applications, 25, 4, p. 691.
Obreja I. (1990). Diagnosis ofPower Plant Faults using Qualitative models and heuristic
mies. 1990 ACM 089791-372-8/90/007/0047.
o 'Leary T. et al. (1990). Validating Expert Systems. IEEE Expert, June 1990, p. 51.
Pandelidis I. (1990). Generalized entropy and minimum system complexity. IEEE
Transactions on Systems, Man and Cybernetics, 20, 5, p. 1234.
Pappis C. P. and G.I. Adamopoulos (1992). A software routine to solve the generalized
inverse problem offuzzy systems. Fuzzy Sets and Systems, 47, p. 319.
Passino K. M. and PJ. Antsaklis, (1988). Fault detection and identification in an intelli-
gent restracturable controller. Journal 0/Intelligent and Robotic Systems, 1, p. 145.
Peng Y.and JA Reggia (1987). A probabilistic causal model for diagnostic problem
solving. Part 11: Diagnostic Strategy. IEEE Transactions on Systems, Man and
Cybernetics, 17,3, p. 395.
Pomeroy B. D., Spang H. A. and M.B. Dausch (1990). Event-based architecture for di-
agnosis in control advisory systems. Artificiallntelligence in Engineering, 5, 4, p. 174.
Pouliezos A. and G.S. Stavrakakis (1987). Linear state estimation in the presence of
sudden system changes - An Expert System. Applied Modeling and Simulation 0/
Technological Systems, North-Holland, Amsterdam, p. 41.
Pouliezos A. and G.S. Stavrakakis (1989). Fast fault diagnosis for industrial processes
applied to the reliable operation of robotic systems. International Journal 0/ Systems
Science, 20, 7, p. 1233.
Prasad P. R. and J.F. Davis (1993). A framework for Knowledge-Based Diagnosis in
Process Operations. In P. I. Antsaklis and K. M. Passino, "An introduction to intelligent
and autonomus control", Kluwer Academic Press, Massachusetts.
Prock J. (1991). A new technique for fault detection using petri nets. Automatica, 27,
2, p. 239.
Protopapas C. A. et al. (1990). An Expert System for fault repairing and maintenance of
Electric Machines. IEEE Transactions on Energy Conversion, 5, 1, p. 79.
Rasmussen J. (1984). Strategies for state identification and diagnosis in supervisory
control tasks, and design of computer-based support systems. In W. B. Rouse (Ed.),
Advances in Man-Machine Systems Research, Vol. 1, JA! Press, Greenwich, CN, pp.
139-193.
Rhodes P. C. and GJ. Karakoulas (1991). A probabilistic model-based method for diag-
nosis. Artificiallntelligence in Engineering, 6, 2, p. 86.
Automatie expert process fault diagnosis and supervision 351

Roth E. M. and 0.0. Woods (1989). Cognitive task analysis: an approach to knowledge
acquisition for intelligent system design". In G. Guida and C. Tasso (Eds), Topics in
Expert System Design, North Holland, Amsterdam.
Rouse W. B., Hammer 1. M. and C.M. Lewis (1989). On capturing human skills and
knowledge: algorithmic approaches to model identification.IEEE Transactions on
Systems, Man and Cybernetics, 19, p. 558.
Sahraoui A. E. K. et al. (1987). Combining Petri nets and AI techniques for monitoring.
Proceedings, IEEE Conference on Robotics and Automation, Raleigh, U.S.A., April
1987.
SIGART-Newsletter (April 1989). Knowledge Acquisition Special Issue, 108.
Soumelidis A. and A. Edelmayer (1991). Modeling of complex systems for control and
fault diagnostics: a knowledge based approach. In. S. Tzafestas (Ed.), Engineering
Systems with Intelligence, Kluwer Academic Publ., The Netherlands, p. 147.
Stavrakakis G. S. and E.N. Oialynas (1991). Effident Computer-based scheme to im-
proving the reliability performance of power substations. International Journal of
Systems Science, 22, 9, p. 1527.
Takahashi R and Y. Maruyama (1987). Practical expression of exception to diagnosis.
Bull Research Laboratory for Nuclear Reactors, Tokyo Institute of Technology, 12, p.
50.
Tesch O. B. et al. (1990). A knowledge-based alarm processor for an energy manage-
ment system. IEEE Transactions on Power Systems, 5, 1, p. 268.
Torasso P. and L. Console (1981). Oiagnostic problems solving: Combining Heuristic,
approximate and causal reasoning. Van Nostrand Reinhold, N. Y.
Trave-Massuyes L., Missier A. and N. Piera (1990). Qualitative models for automatie
control process supervision. IFAC 11th World Congress, Tallinn, Estonia, p. 211.
Tsoukalas L. H, Upadhyaya B. R, and N.T. Clapp, N. T., (1991). Hypertext-based in-
tegration for nuclear plant maintenance and operation. Proceedings, AI91 Confernce,
September 15-18, 1991, Jackson Lake, Wyoming, U.S.A.
Tzafestas S. G. (1987). A look at the knowledge-based approach to system fault diag-
nosis and supervisory control. In S. Tzafestas et al. (Eds), System fault diagnosis, reli-
ability and related knowledge-based approaches, O. Reidel Publishing Company.
Tzafestas S. G. (1989). Knowledge Engineering approach to system modeling, diagno-
sis, supervision and control. Syst. Anal. Model. Simul, 6, 1, p. 3.
Tzafestas S. G. (1991). Second Generation Oiagnostic Expert Systems: Requirements,
Architectures and Prospects. In R Isermann and B. Freyermuth (Eds.), Fault detection
supervision and safety for technical Processes, IFAC Symposia Series, No 6, !992.
352 Real time fault monitoring of industrial processes

rzafestas S. G. and N.!. Konstantinidis (1992). ENGEXP - an integrated environment


for the development and application of expert systems in equipment and engine fault
diagnosis and repair. Advances in Engineering Software, 14, p. 3.
van Soest D. C., Bakker R. R. and NJ.L. Mars (1990). Model-based diagnosis ofLarge
Systems. The Second AAAI workshop Oll model based Reasollillg, Boston,
Massachusetts, July 30, 1990.
Wehenkel L. et al. (1989). Inductive inference applied to on-line transient stability as-
sessment ofElectric power systems. Automatica, 25, 3, p. 445.
Wiele H., Liermann G. and H. Raschke (1988). Monitoring, diagnosis and operator
leading systems for flexible automated manufacturing. Computers in Industry, 10, p. 61.
Yoon, W. C. and I.M. Hammer (1988). Deep-Reasoning fault diagnosis: an aid and a
model. IEEE Transactiolls on Systems, Man and Cybernetics, 18,4, p. 659.

Appendix 4.A A generic "ybrid reasoning expert diagnosis model

The diagnosis model developed here consists of a single Deep Knowledge Base (DKB)
and a number of Shallow Knowledge Bases (SKBs) attached to each of the terminal
nodes in the DKB. The DKB is constructed by viewing the whole system under consid-
eration as a hierarchical system. The SKB is also organized hierarchically, based on the
level of decision complexity. The reasoning process employed is a D-S type of hybrid
reasoning as shown in fig. 4.24.

Hybrid Reasoning

Figure 4.24 D-S (deep-shallow) type of expert hybrid reasoning.

Deep knowledge base. A system is represented by functional blocks within a deep KB


in the form oftree (i.e. hierarchy) as shown in fig. 4.25. Even though the entire figure
resembles a multiechelon system, each level is called a stratum and not an echelon be-
cause the figure represents levels of (functional) description or abstraction of the system.
In the deep KB, a "functional block" is referred to as a "node". No test is associated with
Automatie expert process fault diagnosis and supcrvision 353

every node in the deep KB and, in turn, "no test" implies "no decision-making". For this
reason, every node is connected downwards by a one-way arrow. Except for F oo which
represents the given system as a whole, it is assumed that functional blocks are inde-
pendent of each other within each stratum. Every terminal node has its own shallow KB.
With the exception ofF00, a functional block F ij is defined as folIows:
• i indicates the stratum number, i=l, 2, .. , v
• j indicates the "overall" element number,j=l, 2, ... , n
where v and 11 are arbitrary integers. This definition of a functional block is shown in fig.
4.25.
Lethj denote a failure probability of anode Fij" Anhj value is obtained from a historical
failure data set from a system under consideration. However, one can convert F ij to qij'
termed a responsibility probability, that is, a probability that F ij will be responsible for
causing the symptom. Since the hj value represents an absolute probability from the his-
torical data, the summation of failure probabilities of all the children nodes of any node
may not be unity. Thus, for convenience in probability updating (explained later in this
chapter), this sum is forced to unity by normalization of/;j into qi/
Lqij = 1 for 1= 1,2, ... , v and j = 1,2, ... ,n
jeK

where K is the set of children nodes which have the same parent node.

SttalUmO

SttalUml
A"_____
FII Fil ............................. F,.n

SttalUm2
/+~
F
21F
Ir\
F F F ... F
22•·· ................
/t~
F .... F 2n

Sttatum v
I
FY1
\ F\1,

Figure 4.25 Functional hierarchy as dcep knowledge base.

Shallow knowledge base. A shallow KB, attached to each of the terminal nodes in the
deep KB, is organized in multiechelon form because every role involves its own decision-
making. This decides whether or not the role is responsible for the observed symptom
after testing the consequent ofthe role. For this reason, every role is connected by a two-
354 Real time fault monitoring of industrial processes

wayarrow. Each level is called an echelon, as shown in fig. 4.26. Unlike F ij' a role Rijk is
defined as folIows:
• i indicates echelon i; i = 1, 2, ... , e.
• } indicates group};} = 1, 2, ... , s.
where s is the total number of elements in echelon i-I and k indicates element k; k= 1, 2,
... , t; e, s, and t being arbitrary integers.
In the shallow KB, two attributes are associated with every role Rijk at and below eche-
lon 3 (i.e. R ijk where i~3, and) and kare arbitrary). One is Pijle> defined as a degree of
belief, e.g. obtained from the options of experts, which acts as a probability that Rijk is
believed to be responsible for the observed symptom. Furthermore, similar to that of qij
in the deep KB, Pijk is designed to have the foUowing property:
t
LPijk =1
k=1

where i~3 and} is arbitrary.

Echelon I

Echelon 2

Echelon 3 R 4'*'
311
.... R
3\.
R
32\
... R
321
11\
R
3.\
.... R
311

I!
Echelon e ROll Reil

Figure 4.26 Rule hierarchy as shallow knowledge base.

In other words, at and below echelon 3, the sum of the degrees of belief of all the ele-
ments within any arbitrary }th group is unity. The other is cYK ' defined as a test cost (in
dollars), which accounts for the cost oftesting the consequent part of Rijk.
Automatie expert process fault diagnosis and supervision 355

Diagnostic strategy. The diagnostic strategy is an extension of the strategy described


by Lee et al. (1990). The strategy requires the failure probabilities of functional blocks
for the deep KB, the degrees ofbelief, and the associated test costs of mies for the shal-
low KB. Given these requirements, this model identifies and isolates a fault by searching
the deep KB first and then the shallow KB. Backtracking is allowed if needed. The out-
line ofthe diagnostic strategy is illustrated in fig. 4.27. Given a single symptom, the basic
cycle of the strategy is to find out a single fault for this symptom. The basic cycle
consists of8 steps which are described below.
A single symptom means that a symptom is observed only once and which component(s)
is (are) faulty is detrmines. Multiple faults for the symptom observed can be handled by
testing the possible faulty components one by one, sequentially. That is, multiple faults
are found by continuing the current diagnostic session even if one fault has already been
found. Multiple symptoms can also be treated similarly. However, it will be much better
if we can classify these multiple symptoms, based on functionallevels. The reason is that
the further we go down in the functional hierarchy, the more we reduce the search time
required to find the fault.

Shallow {

Figure 4.27 Schematic diagram for diagnostic strategy.


356 Real time fault monitoring of industrial processes

The basic cyde of8 steps embedded in fig. 4.27 is as folIows:


Step 1. Construct a functional bierarchy of the given system.
Step 2. When a symptom is observed, go to the deep KB and start at the root Foo .
Step 3. Conduct a breadth-first search using the responsibility probability based on the
failure probability of each of the functional blocks at the immediate lower stra-
tum. Select the most unreliable branch.
Step 4. Repeat Step 3 with the functional block selected until the terminal node is
reached. Then go to the shallow KB attached to the functional block finally
found.
Step 5. Use entropy to determine the rule yielding the lowest entropy. Repeat the en-
tropy calculations until the terminal node is reached.
Step 6. Test the "consequent" part ofthe rule (i.e. the terminal node) found in the previ-
ous step. If the consequent is malfunctioning, then stop and assurne it is faulty.
If it is functioning, then continue the search.
Step 7. Back up to the immediate bigher echelon and select the rule having the next low-
est entropy. Repeat Step 6.
Step 8. If the right rule is not found from the shallow KB, then go back to the terminal
node of the deep KB and update the responsibility probabilities. Starting at
the root F oo , conduct the same breadth-first search until the terminal node is
reached. Then go to Step 5.
In the following, the breadth-first search, the concept of entropy, and the probability
updating method are described. Details for these steps can be found in Lee et al. (1990).
Breadth-first search. In Step 3, since a functional block wbich fails most frequently is
to be fouod, a breadth-first search is performed. Obviously a more reliable branch and its
own sub-branches do not have to be explored. The breadth-first here however, is some-
what restricted. The reason is that, looking at the nodes from "only one" parent at any
level wbile in a usual breadth-first search, all the nodes from "all" parents at any level are
evaluated. Hence, tbis search enables us to reach the terminal nodes faster than the usual
breadth-first search.
Tbis search procedure presented here resembles the depth-first search, but it is different
in that it does not go back up to the nearest ancestor node with unexplored cbildren
during the backtracking process; it goes to the root.
Entropy. The concept of entropy, employed in Step 5, has been used in selecting the
best next measurement in electronic circuit diagnosis, measuring the routing flexibility in
flexible manufacturing systems, and manufacturing flexibility (de Kleer, 1987 and 1990,
Pandelidis, 1990).
Shannon's entropy, by definition, does not incorporate any other attributes like test cost
but estimated probabilities. In order to accommodated other attributes, Shannon's en-
tropy must be modified. Tbis modified form of entropy is called "useful" information on
Automatie expert process fault diagnosis and supervision 357

which the entropie measure applied here is based. Given the degrees ofbelief and the test
costs, the entropy of a rule Rijk is defined as folIows:
t
Hijk = Hijk(W,P) = - LWi+1,k,xPi+l,k,xlnPi+1,k,x, i ~2
x=l
where,
t
~
L..J Pi+l ,k ,x = 1.
x=l
Here j just indicates the group number from the immediate higher echelon and x denotes
the element k from echelon i+ 1. Moreover, note that the two units are non commensu-
rate, i.e. Cijk is in dollars and Pijk is non-dimensional. For this reason, cijk is normalized as
folIows:
Cijk
W ijk =---"""--
max
··k
{c·~}
1)
I,j,

where W yk represents the weight of a rule Rijk.


When w yk=1 for all i, j, and k (i.e. all test costs are equal), the above entropy equation
reduces to Shannon's entropy. Notice that all the distributed or assigned beliefs within a
group sum to unity.
In short, the rule which has the lowest entropy must be determined. Referring to the
shallow KB in fig. 4.26, start by making a decision at Rlll to select the rule to be tested
next. What is needed at this point are entropies of the rules at echelon 2. This, in turn,
means that the probabilities (degree of belief) and the test costs at echelon 3 must be
known. This procedure is repeated until the terminal rule is reached. A tie occurs at the
terminal rules since the entropies for these rules are zero. The ratio, weight/probability, is
computed in order to break: the tie. This ratio is defined as,
Wijk
rijk =--.
Pijk

So, the tie is broken by selecting the lowest ratio which gives the lowest test cost. A tie
at other rules can be broken "arbitrarily" since these rules do not directly involve the
testing (i.e. the entropy of each of these rules is not zero). The overall feature of the
search in the shallow KB is similar to the best first search because the search is per-
formed in increasing order of entropy.
In Step 8, if after going through the shallow KB, a fault cannot be found, one goes back
to the deep KB and update qij. The next subsection describes an updating heuristic (Lee
et al., 1990).
Heuristic for updating probabilities. Since the search path in the shallow KB did not
yield any faults, the responsibility probability of the terminal node (called F bc) which
358 Real time fault monitoring of industrial processes

owns this shallow KB changes trom the eurrent value to zero. This enables one to update
the probabilities in the deep KB.
In the light of the test results, the updated responsibility probability of F bc, q'be> becomes
O. The ripple effect of this updating spreads out to the rest of elements within K, that is,
the set whose elements haVe the same parent node. This K can be interpreted as a group.
Then, in this fashion, the updating is propagated upwards through the parent nodes until
the root is reached. In the following this updating heuristic is described.
To apply the updating heuristic, the number of elements of each ofthe nodes (i.e., how
many children each node has) in the deep KB must be known. So let,
Noo = Number of elements that F00 owns
N;j= Number ofelements thatFij owns, where i=I,2, ... , andj=I,2, ... ,n.
This data can be obtained when the hierarchy ofthe deep KB is constructed.
Finding the parent node: First, the last element (called Fb_l,g) at Stratum b-l IS
searched, by summing Nb-2.j over all j. That is,

fNb-2j if 2< b:S; v


g= { j=l
N oo if b=2
This g signifies the total number of elements at Stratum b-l and is used in determining
the group number for Fbc at Stratum b (i.e the overall element number for the parent
node at Stratum b-l).
Next, compare c (trom F bc) with the group-wise cumulative number of elements to de-
termine the second subscript (Le. the element number) ofthe parent node at Stratum b-l.
The second subscript will be the element number of the node whose cumulative number
of elements is equal to or greater than c. Note here that the first subscript (i.e. the stra-
tum number) is automatically known because the parent node is in the immediate higher
stratum (Le. just one stratum higher; b-l) than the eurrent stratum (Le., b). The second
subscript is denoted by GN(b-I). Hence the parent node sought is F b_l , GN(b-I)'
Finding the element number at the current stratum: Let EN(b) be the element num-
ber of F bc within the group GN(b-I). EN(b) is caleulated by subtracting the cumulative
number ofelements in the groupsjust before GN(b-l) [i.e. GN(b-I)-I] ftom c.
Updating the probabilities at Stratum b: In the following, the symbol' is used to rep-
resent the updated value.

qix:=O,qk=t qbk , k"#:c,wherek=[c-EN(b)]+t


-qbc
Automatie expert process fault diagnosis and supervision 359

for t = I, 2, ... , Nb-I,GN(b-l) and t::F- EN(b). Here, q'bk is proportionally increased, de-
pending on its original weight (i.e. qbk), and thus normalized.
Updating at Stratum b-l: The parent node of F bc will be less responsible for the ob-
served symptom since one of its children tumed out to be innocent. Thus the heuristic
equation tht represents this amount of reduction in responsibility (prob ability) is given
below:
qb-I,GN(b-l) = qb-I,GN(b-l) - qb-I,GN(b-l) x (qbc - q'bc)'

Apply the methods described above to determine the parent node and the element num-
ber at Stratum b-I. Let this parent node be Fb-2,GN(b-2) and this element number be
EN(b-I).
Since one of elements is innocent, the rest of the elements at Stratum b-I will be more
suspect. That is, the amount of reduction used in updating the parent node of F bc will be
proportionally distributed to the rest of the elements at Stratum b-I with the positive
sign. Thus the following equation is given:

q/,-l k = qb-lk + qb-l,k , k *" GN(b -1)


, , 1- qb-l,GN(b-l) x qb-l,GN(b-l) x (qbc - qbc)

where k = [GN(b-l) - EN(b-I)] + t for 1=1,2, ... N b-2,GN(b-2) and t 7:. EN(b-I).
Updating at Stratum u (b-2~~1): Similarly to the methods above, one has,

q'u,GN(u) = qu,GN(u) - qu,GN(u) x (qu+ I,GN(u+ I) - q'u+ I,GN(u+ 1)

, _ + qu,k
qu,k - qu,k 1- x qu,GN(u)x(qU+l,GN(U+l)-q~+l,GN(u+l»,k#JN(u)
qu,GN(u)

where, k=[GN(u)-EN(u)]+t for 1=1,2, ... , Nu_PN(u-I) and t 7:. EN(u).


The updating ends at Stratum 1 because the responsibility probability is not involved in
the root itself (one did not define qoo). The root at Stratum 0 has the number of elements
(i.e. Noo ) only.
By this way, a generic diagnosis model, which takes advantage of a hybrid approach,
using the theory of hierarchical systems for both deep and shallow knowledge is devel-
oped. In case that a deep KB needs to be updated, the updating heuristic is devised.
The requirements of the model are failure probabilities for deep knowledge and degrees
ofbelief and test costs for shallow knowledge. The technique employed are the breadth-
first search lind the updating heuristic for deep knowledge, and the concept of entropy
for shallow knowledge. Also backtracking is allowed if necessary, and the updating is
carried out if one shallow KB is not sufficient to locate a fault.
360 Real time fault monitoring of industrial processes

The developed diagnostic model is applicable to many domains. Once the bierarcbies for
both a deep KB and its dependent shallow KBs are constructed, the diagnostic strategy
offig. 4.27 can work.
The prob ability updating is simple and straightforward relative to other techniques, such
as Bayes' Rule and Dempster's Rule.
Multiple faults can be handled by multiple diagnosis sessions. In addition, it is desirable
to have a scheme to classify symptoms when multiple symptoms are observed.
The main difficulty is that it is not obvious how to delimit the level of abstraction in a
deep KB (i.e. how many strata are needed). The number of strata in the deep KB can
affect the size (in terms of the number of rules) of each shallow KB because, as the deep
KB is specified in more detail, the degree of specificity of the shallow KBs might de-
crease. Tbis, in turn, implies the smaller size of the shallow KBs. There may be a trade-
offbetween the larger deep KB (the smaller shallow KBs) and the smaller deep KB (the
larger shallow KBs).
One of the critical factors with respect to the speed of diagnosis here is a mapping
scheme wbich relates a symptom to a certain node ofthe deep KB. Tbis node will serve
as the start node in the entire diagnostic process. The lower the stratum of the start node,
the faster the speed of diagnosis. Unless a mapping scheme is provided, the diagnostic
process begins with the root node ofthe deep KB (see also Section 4.2.1.4.).

Appendix 4.B Basic definitions olplace/transition Petri nets and their use
lor on-line process lai/ure diagnosis

A Petrit net (pt net)is a 6-tuple N=(S, T, F, K, W, M(O)) where,


(i) S is a finite set ofplaces S={sJ, S2, ... , slsl};
(ü) T is a finite set oftransitions T={ 11> '2, ... , Irn };
(iii) Fis a binary relation F~(Sx1)u(TxS) wbich is represented by directed arcs
between the pi aces and transitions;
(iv) K is a mapping, K:S~fWu{ oo} wbich describes the capacity of each place K(s);
(v) Wis a mapping, W:F~M{O} wbich attaches a weight to each arc; and
(vi) M(O) is a mapping, M(O):s~fW wbich gives the initial marking ofthe places,
taking the capacity K( s) of each place into account.
A pt net is abipartite graph consisting of 2 types of elements, a finite number /S/ of
places [definition (i)] and a number /T/ oftransitions (ii). Definition (iii) states that these
places and transitions are coupled alternately by arcs, wbich are directed from a place to
a transition or vice versa. Pt nets are introduced to describe the transport of a quantity,
Automatie expert process fault diagnosis and supervision 361

such as mass, information or a sort of goods. To remain in a system independent formu-


lation, tbis transport quantity will be called a token. Places, the passive elements of the
net, store a certain number (finite or infinite) of tokens according to their capacity K
[definition (iv)]. Under special conditions a transition transports a number oftokens from
the previous to the next place(s), the transition fires. The number of tokens transferred
depends on the transport capacity [the weight W, definition (v)] of the connecting arcs.
The transport process starts at time 0 with an initial distribution M(O) of the tokens per
place (vi).
The static structure ofthe pt net, defined in (i), (ii), (iii) and (v) can also be described in
an algebraic manner by the so-called ISI-row /TI-column incidence matrix N. The ele-
ment Nij of tbis matrix indicates if place si is reached by a weighted arc coming from
transition Ij (+sign) or is left by an arc wbich is directed to transition ~ (-). Assuming that
no arc simultaneously starts and ends at the same place (the no-Ioop condition (s, t)EF
=>(1, s)~F), Nij is defined:
+W(tj'Si)' if(tj,Si eF)
N ij = { -W(sjotj ), if(sjotj E F) (4.1)
o otherwise
The dynarnic behavior of the pt net is represented by the firing rule. A transition tj will be
able to fire, ifthe following relation holds, where M(k) is a marking vector, k is a discrete
time point (M(k):s~N)
(4.2)
Tbis means that firing will be only possible if the token content per place after firing does
not surpass its capacity. K is the capacity vector of dimension ISI and the transition ~ is
the jth column vector of the incidence matrix N. After firing the subsequent marking
M(k) arises,
M(k) = M(k-l) + ~ (4.3)
With the help of equation (4.3), and beginning at time k=O, folIows:
M(k) = M(O) + Nv (4.4)
where the vector v describes the firing frequencies of each transition which lead from the
initial marking to the actual state. Of particular importance are special sets of places of
the pt net called S-invariant. They are integer solutions of the linear equation,

i~N = 0 (4.5)
With the help ofthese S-invariants a new principle for fault detection in complex systems
can be formulated.
362 Real time fault monitoring of industrial processes

Supposing it is possible to map the structure of the total process as a pt net, the transport
ofthe physical conservation quantity is represented by the firing oftokens. Ifthe conser-
vation quantity takes only a few discrete values and the signals measuring the number of
tokens are not noisy, the process monitoring is easy: Using eq. (4.4) it can be tested at
each scanning time point if the actual marking vector M(k) beginning from the initial
marking M(O) is reachable. If M(k) is not reachable it can be ooncluded that an error has
occurred. The algorithmical evaluation of tbis failure detection criterion is simple.
Keeping in mind that the marking vector is integer valued, eq. (4.4) is therefore a linear
diophantic equation system. It is sufficient to test the existence condition of(4.4) at each
time step. Examples for systems govemed by such weil defined unnoisy physical quanti-
ties are industrial production systems or automatic shunting yards.
In the case of plant fault monitoring, the measurement signals are noisy and their domain
of definition is much larger than in the former case. Because of tbis the simple evaluation
of eq. (4.4) must fail.
Multiplying (4.4) with the transpose of the S-invariant and taking (4.5) into account,
yields:
i~M(k) =i~M(O) (4.6)
For applications in power and other industrial plants, it is correct to assume that the net
token flow across the envelope surface of the total process under consideration vanishes
or is zero in the mean. Otherwise continuous plant operation is not possible. Moreover it
is assumed that the transitions fire without changing the number of tokens, in other
words, the sum of arc-weights in front and after a transition should be equal:
L W(s,t) = LW(t,s) (4.7)
SE *t SE t*

with *t:={seS:(s, t)eF}, t*:={seS:(t, s)eF}. Eq. (4.7) is a conservation law for the fir-
ing of tokens. Under both these conditions it is clear that an S-invariant exists wbich
does not contain any other elements than 1 because each column sum vanishes. Such an
S-invariant is called hereafter a covering S-invariant. Therefore equation (6) can be rear-
ranged as
i=/S/ i=/S/
L Mi(k)- LMi(O) = 0 (4.8)
i=l i=l
The second sum in eq. (4.8) must be calculated only once at the initial time k=O.
Taking the noisy nature of the measurement values into consideration, a new fault crite-
rion for continuous total processes can be formulated:
i=/S/
L[Mj(k)-Mj(O)] < e (4.9)
j=l
Automatie expert process fault diagnosis and supervision 363

Eq. (4.9) is weil suited for on-line process monitoring. The actual number oftokens per
placeM,{k) is compared with the initial token content ofthe total process. This is possi-
ble because the continuous total process is naturally an initial boundary problem in con-
trast to the partial processes of the analytical redundancy methods. Therefore each slow
varying fault can be detected as soon as it surpasses the threshold €. The height of € de-
pends on the sensor noise and can easily be determined in an initiallearning period.
Ifeq. (4.9) does not hold one ofthe following reasons must be true:
(i) A sensor fault has occurred, and one of the measured token numbers is erroneous;
(ii) Inside the total system a source or sink of tokens has arisen, which means the
structure of the pt net has changed; or
(iii) The net token flow across the envelope surface of the total process is no longer zero
mean, and the operation of the plant has become discontinuous.
Which exactly of these different faults has occurred cannot be recognized by using eq.
(4.9), that is, a fault location is not possible. For this to be done more knowledge about
the total process under consideration in the form of quantitative or qualitative physical
models is needed. Details for the rule-based techniques and the way that can be com-
bined with the present technique can be found in Section 4.2.1. It should be noted here
that the process description in terms of pt nets is analogous to the state space formula-
tion
M(k)=AM(k-l)+ Bu(k-l) (4.10)
With the assumption that the state M is totally measurable and that it only represents
physical quantities of the same kind (i.e. only masses or only temperatures) the state
transition matrix A is equal to the identity matrix. Eliminating all previous time points
k-l,k-2, ... , 1 ineq. (4.10)onegets,
k-l
M(k) = M(O) + B Lu(j) (4.11)
j=o
This condition of observability (4.11) will be equivalent to the condition of reachability
(4.4) if the sum of the control vector in (4.11) is set to a vector v and the input matrix B
is named by incidence matrix N. The analogy demonstrates that it is suitable to use the
Petri net description to problems of process fault monitoring. Moreover, the possibilities
of the methods of Chapter 3 could be exploited through on-line monitoring of the evolu-
tion ofthe vector M(k).
364 Real time fault monitoring of industrial processes

Appendix 4.C Analytical expression for exception using fuzzY logic and
its utifization for on-fine exceptional events diagnosis

Definition 0/ exception. Human thinking is characterized by the argument that the sym-
bol (~) always appears in the universal proposition while the symbol (/\) occurs in the
existential proposition. This can be represented formally by
vx [P(x) ~ Q(x)] (4.12)
3 x [P(x) /\ S(x)] (S(x)"j; Q(x» (4.13)
where P(x), Q(x) and S(x) are appropriate dictative functions. These equations give the
interpretation that human thinking depends dominantly on the principle and simultane-
ously permits inconsistent remarks.
Conventional fuzzy diagnosis consists of two implications which take the form of eq.
(4.12), viz., "For all) there exists a failure Xi that correlates with a symptom y;., if Yj is
recognized" and "For all i there is a symptom Yj such that should be observed if Xi ap-
pears". The sets of failures and symptoms X, Y are given in the form,
(4.14)
where xi> Yi are elements of X, Y respectively. Here the diagnosis system is given by the
propositions,
Pi B(y)~3xi(A(xi)andR(xi'Y);)= 1, ... ,n (4.15)
p i/ A(Xi)~3Yj(B(y); i=I, ... ,m;)=I, ... ,n (4.16)
where the function A(xi) means that the failure Xi appears, and B(Yi) gives that the
symptom y;. is being recognized. R(x;. Y) indicates the correlation between Xi and YI
The exception is defined by the logical form of eq. (4.13). The aphorism "Exceptio pro-
bat regulam" means actually that the exception can serve as an examiner of the rule. This
emphasizes that the exception has a large capability of testing rules in diagnosis. This
aphorism motivated Maruyama and Takahashi, (1985, 1987), to introduce the exception
into simply-structured diagnosis while neglecting a hierarchy on rules and/or a classifica-
tion of the failures. How should the exception be represented and utilized to reinforce
the diagnosis?
Suppose a fully-experienced engineer who makes the comment "there can be special
logic for finding a failure while it is identified generally by such implications as eq. (4.12)
or eqs. (4.15), (4.16)". When he was averse to use eq. (4.12), bis logic might be trans-
formed in the following manner:
--, (V x [P(x) ~ Q(x)]) = 3 x --, ( --, P(x) v Q(x» = 3 x [P(x) /\--, Q(x)] (4.17)
where "--," denotes negation.
Automatie expert process fault diagnosis and supervision 365

Sinee he possessed a subeonseious proposition "If 3x[P(x)I\-,Q(x)] then there would


exist an alternative proposition Sex) whieh satisfied 3x[P(x)l\S(x)] ", a negation of eq.
(4.12) ean produee the relation,
:1 x[P(x) 1\-, Q(x)] ~:1 x[P(x) 1\ Sex)] (4.18)
where the proposition,
3 x[ -, Q(x) ~ Sex)] (4.19)
should be assumed. This argument indieates that a negation of eq. (4.12) is transformed
to such a form capable of serving as an exeeption as eq. (4.13).
This formal definition for the exeeption enables its practical expression for diagnosis
purposes by the same proeedure of eqs. (4.17)-(4.19).
The negation ofeq. (4.15) is given by,
~: :1Yj[B(y)I\-,:1x,(A(x;) I\R(xj,J'J»); i=l, ... , m; j=l, ... , n (4.20)
In order for eq. (4.20) to hold, the condition requires an alternative relation E(x j , Y)
consistent with the implication,
-,3 xk(A(xk) 1\ R(xk' Y t» ~ 3 xj(A(xj) 1\ E(xj, Y) (4.21)
for k=1, ... , m; t-=1, ... ,11; i=1, ... , M;)=I, ... , N, as shown in eq. (4.19). The logical ex-
pression of eq. (4.21) eould then be read as "in the special case where an engineer
eannot identify the failure Xk with A(xk) and R(xk' YJ; 15 k 5m, 15 t 5n, there may exist
another failure Xj that is correlated to E(x j ,yJ); 15 i 5 M, 15) 5 N, M >m, N>11, with
the symptomy{ A range was assumed to be larger with E(x j ,Y); 15; 5 M, 1 5) 5 N,
M>m, N>n) than with R(x;, Yj); 15; 5 m, 1 5) 511). The perfect sets of failure and
symptom might be in the range of (1 5 ; 5 Mo, 1 5) 5 No) as shown in fig. 4.28.
Connecting eq. (4.20) with eq. (4.21), the practical form of the exception ~ to Pj is
given by:
~: 3 J'J{BOj) 1\ 3 xj(A(xj) 1\ E(x;, J'J)]; ;=1, ... , M, ) =1, ... , N (4.22)

Vx(P(x).Q(x)) 3 ) ())
/ x(R(x 1\5 x
(~ Yn

Figure 4.28 Sets ofrelation between failure and symptom


366 Real time fault monitoring of industrial processes

For simplicity, derivation of eq. (4.22) has been discussed on the binary logical point of
view. Here, fuzzification of these equations shall be performed in accordance with fuzzy
set theory, since recognition depends on the subjective tasks of a human. Defining the
fuzzy sets on the spaces X, Yand XxY:
f= (set ofappearing failures)
S = (set ofrecognized symptoms) (4.23)
e = (set ofrelations (x;,Y)
One can obtain fuzzy propositions Ai , Bj and Eij from A(xi), B(y) and E(x;, Jj) in the
form,
Ai=(xiisj), Bj=(yjiss), Eij.=((Xi,Yj)ise) (4.24)
where the truth values of these propositions are represented by linguistic truth values of
Ai , Bj and Eij.. The linguistic truth values are fuzzy sets defined on the truth values
space. Substituting eq. (4.24) into eq. (4.22), the fuzzification of eq. (4.22) is written in
the logical form,
~: 3j [Bj I\3 i (A i I\Eij)]; i=l, ... ,M, j=l, ... ,N (4.25)

Utilization 0/ exception to diagnosis.


CanceUation law. Usually diagnosis terminates when the failures are identified using
the fuzzified form of eqs. (2.15), (2.16), which is capable of providing the two equations

B ja =(Y(ail l\rijl),IJ;j=I, ... ,n (4.26)

(4.27)

where aib rij/and Fijl stand for the lower bound ofthe linguistic truth values A ia , Rija
and P ija respectively and bju is the upper bound of ~·a. However, only when an engineer
fails to find out the failure by eqs. (4.26), (4.27), the exceptional proposition of eq.
(4.25) should be utilized.
An effective technique for inferring failure should be introduced to deal with exceptions
in the actual condition. The cancellation law will be applied as a tool, since eq. (4.25)
takes the form P I\Q basically. The cancellation law is written
P, Q
(4.28)
Q
which is then read practically as "If P I\Q is true then Q must be true". Defining the truth
values of P, Q and PI\Q by P, Q and T respectively, eq. (4.29) is transformed into the
fuzzified expression,
(4.29)
Automatie expert process fault diagnosis and supervision 367

where the symbol astands for an a-cut defined by the ranges,


T a= (t" tu)' P a= (P"Pu)' Qa= (q" qu) (4.30)
and the suffixes fand u stand for the lower and upper bound respectively. With the use
ofeq. (4.30), eq. (4.29) is transformed into,
(4.31)
The cancellation law in fuzzy logic serves as the axiom to determine both the lower and
upper bounds of Qa' which satisfy eq. (4.31) at given Ta and Pa' Thus a solution of Qa
is described by,

(4.32)

where 0 represents the empty set.


It may be meaningful for reinforcement of the present diagnosis to consider only the case
where the linguistic truth value of p/\Q is elose to "true". Then, Ta is given by,
(4.33)
Connecting eq. (4.33) with eq. (4.32), the inferred upper and lower bound of Qa are ob-
tained as,

I, Pu =I
q ={ (4.34)
u 0, Pu< I

te. Pe> t, Pu = I
{
qe = (te. I], Pe = t, Pu = I (4.35)
0, otherwise

Diagnosis by utilization of exception. Assuming that the exception of eq. (4.35) holds,
the truth value of "very true" is given by,
~a=(hj!' 1] (4.36)
and the failure, symptom and fuzzy relation are written,
A ia= (ai" aiu), Bja= (bj" bju)' Eija= (eij" eiju) (4.37)
then, substitution ofeqs. (4.33)-(4.35) into eq. (4.25) generates the solution,

m
'!Caj /\ eij)u = 0
{I, bju = I (4.38a)
1=1 ' bju<I;j=I, ... ,N
368 Real time fault monitoring of industrial processes

(4.38b)

Equations (4.37) and (4.38) show that the truth value ofthe failure becomes meaningful
only when Bj is c10ser to "completely true" than~. Satisfaction ofthis condition allows
one to obtain Ai by solving the inverse problem of the form (pappis and Adamopoulos,
1992),
m
V(AjI\Eij)=H j ;j=l, ""N (4.39)
j=!

This can be iIIustrated conceptually in fig. 4.29. The failure is calculated from
V~!(Aj 1\ Elj') at the given smooth line of"true"~. and the broken line of"very true"
Bj , but there is no solution for the failure, since V~:!l(Aj 1\ Eij)=0 at the given
broken line of "more or less true" Bj .

Figure 4.29 Linguistic tmth value of failure derived from exception.


CHAPTER5

FAULT DIA GNOSIS USING ARTIFICIAL NEURAL


NETWORKS (ANNs)

5.1 Introduction

The theory and practical applications of Artificial Neural Networks (ANNs) are expand-
ing with very high rates, and the fields of application are increasing. It is not surprising,
therefore, that fault diagnosis is one of the main areas that ANNs have been used with
promising results, along with similar progress in control and identification of non-linear
dynamical systems.
Fault diagnosis using neural networks has the same structure as model-based methods: a
set of signals, carrying fault information is fed into a neural machine which outputs a
vector fault signal indicating normal or faulty system operation. Thus, it can be seen that
the main difference between the two approaches is in the diagnosis engine. The selection
of the input set (calIed training set), neural machine and output signal/classification
method will be the central theme of this Chapter and will be examined in detail in subse-
quent sections.
ANN-based fault diagnosis is aimed at overcoming the shortcomings of model-based
techniques. These techniques require mathematical process models that represent the
real process satisfactorily. The model should not, however, be too complicated, because
calculations easily become very time-consuming. In methods relying on state-variable
estimation, state variables are seldom measurable and so nonmeasurable state variables
have to be estimated. For estimation, a non-linear dynamic process model must be lin-
earized around an operating point. This approach requires a relatively exact knowledge
of the parameters of the linearized or linear model. In addition the process must operate
near the point where linearization was done because the model is valid only in the neigh-
borhood of the operating point.
In fault detection based on parameter estimation the process model parameters have a
complicated relationship to physical process coefficients. Malfimctions usually effect the
physical coefficients and the effect can also be seen in the process parameters. Because
370 Real-time fault monitoring of industrial processes

not all physical process parameters are directly measurable, their changes are calculated
via estimated process model parameters. The relationsbip between the model parameters
and the physical coefficients should be unique and preferebly exactly known. This is
seldom the case.
The performance of model-based methods depends strongly on the usefulness of the
model. The model must include every situation under study. It must be able to handle
changes in the operation point. If the model fails, the whole diagnostic system fails. The
sensitivity to modeling errors has become the key problem in the application of model-
based methods. On the other hand, ANN methods do not require an analytical model of
the process but need representative training data. The idea is that the operation of the
process is classified according to measurement data. F ormally tbis is a mapping trom
measurement space into decision space. Thus, data play a very important role in this
method.
Model-based methods are usually computationally demanding, especially when nonlinear
models are used. On-line modeling ofthe process requires even more computational ex-
penditure. ANN methods are usually computationally easier but the calculation task
depends very much on the data and the actual problem.
Model-based methods are mostly difficult to change afterward. The build-up of a model-
based diagnostic system requires a lot of eifort and changing one equation leads easily to
changes in many other equations or parameters. ANN methods are more flexible in the
sense that changing the data means the same as changing the properties of the diagnostic
system. On the other hand changing the data may lead to repeating the diagnostic task all
over again.
ANN-based fault diagnosis can be seen as a Pattern Recognition (PR) problem.
Traditional pattern recognition and classification can be divided into three stages: meas-
urements, feature extraction, and classification. First the appropriate data is measured.
Then a feature vector is computed. The extraction should remove redundant information
from measurement data and create simple decision surfaces. Finally the feature vector is
classified into one or more classes. When fault detection and diagnosis are combined, the
classes are the following: normal operation, fault number 1, fault number 2, etc.
Traditionally, pattern recognition concentrates on finding the classification of features.
The problem is, how to calculate the features. There is no apriori basis for the choice of
the calculation. It's difficult to know which of the features are essential and wbich are
irrelevant. Inappropriate choices lead to the need for complex decision mies, whereas
good choices result in simple mies (Himmelblau, (1978), Pao, (1989».
A human being has an amazing skill at recognizing patterns. A human often uses very
complex logic in recognizing patterns and in classifying them. A human can pick up
some examples of the classes but cannot determine any formal law for the classification.
Mathematically this can be seen as an opaque mapping trom pattern space into class-
membership space. In computer pattern recognition, the opaque mapping has to be re-
placed by an explicitly described procedure - a transparent mapping (Pao, (1989».
Fault diagnosis using ANNs 371

Like human beings, neural networks are also trained with a group of examples. When a
classification is realized with neural networks, the whole mapping from measurement
space into decision space is done at the same time and the mapping is leamed by training
examples.
It should be evident from the introducing comments that ANN-based fault diagnosis
methods were first applied to complex, non-linear processes where previous attempts
using conventional methods had failed. Chemical processes are such an example:
Hoskins and Himmelblau, (1988), illustrated an artificial neural network approach to the
fault diagnosis of a simple example process composed of three continuous-stirred-tank
reactors in series. Watanabe et af. (1989) presented a network architecture to estimate
the degree of faulures. They used an example system of three measurements and five
faults. Venkatasubramanian and Chan (1989) used a binary-input network to diagnose
faults of a fluidized catalytic cracking process. They also compared the neural network
approach with a knowledge-based approach. Sorsa et af. (1993) applied radial basis net-
works to detecting deactivation of catalysts in jacketed reactors.
In the Computer Integrated Manufacturing (CIM) field, relevant work has been reported
by Barschdorff et af. (1991) who applied back propagation and condensed nearest
neighbour techniques to wear estimation and state classification of cutting tools. Miguel
et af. (1993) investigated the applicability of an Adaptive Resonance Theory (ART)-3
based neural network to the detection and diagnosis of rotative machine failure. Syed et
af. (1993) used Kohonen-maps for the real-time monitoring and diagnosing of robotic
assemblies. Chow et af. (1991) implemented a three-Iayer feed-forward neural network
for the real-time condition monitoring of induction motors. Yamashina et af. (1990)
considered neural networks as a failure diagnosis tool for servovalves. Suna and Bems
(1993) considered a back-propagation structure for pipeline fault diagnosis.
In the aerospace industry Rauch et af. (1993) performed fault detection, isolation and
reconfiguration of an F-22 aircraft using neural networks, while Feng et af. (1993) ap-
plied an ART -2 neural network for automatie diagnosis of a variable thrust liquid rocket
engine. Passino et af. (1989) used a multilayer perceptron, as a numeric-to-symbolic
converter in a failure diagnosis application on an aircraft example.
Finally, Naidu et af. (1990) and Konstantopoulos and Antsaklis (1993) implemented a
neural-model of a four-parameter controller suitable for sensor and actuator failures.
The outline of this chapter is as folIows: in the first sections the theory of neural net-
works is outlined together with the basic topologies of ANN-based fault diagnosis. This
is followed by a detailed examination of the principles of ANN-based fault diagnosis.
Finally specific applications from representative fields are presented.
372 Real-time fault monitoring ofindustrial processes

5.2 Introduction to neural networks

Biological systems implement pattern recognition computations via interconnections of


physical cells called neurons. This provides motivation to consider emulation of this
computational mechanism for automated PR applications. Researchers from such diverse
areas as neuroscience, mathematics, psychology, engineering, and computer science are
attempting to relate underlying models for pattern recognition, the computation that is
desired, the potential parallelism that emerges, and the operation of biological neural
systems. In fact, a whole field of study, centered around the creation and study of intelli-
gent systems by recreating the computational structures of the human (or animal) brain,
has fully emerged in only the last decade. This movement is known by several names,
including Connectionist Modeling, Neuromorphic Modeling and Parallel Distributed
Processing (PDP).
The idea that computations underlying the emulation of intelligent behavior may be ac-
complished via interactions of large numbers of simple processing units is hardly new.
For one, the neuron models of the behavior of brain cells provides biological inspiration
and plausibility for ANNs. The adaptable, context-sensitive, error-tolerant, large memory
capacity, and real-time capability of the human information processing system (mostly
the brain) suggests an alternative architecture to emulate. The mere fact that the basic
computing element of the human information processing system is relatively slow (in the
millisecond range), but the overall processing operation is achieved in a few hundred
milliseconds, suggests that the basis of the biological computation is a small number of
serial steps, each massively parallel. Furthermore, in this inherently parallel architecture,
each of the processing elements is locally connected and relatively simple. Thus, connec-
tionist or neural computing is not new or revolutionary. It is rather, evolutionary, with
roots in a number of well-understood concepts, including biological pattern recognition,
perceptrons and linear machines, adaptive networks, and fine-grained parallel computing
paradigms.
Basically, three entities characterize an ANN:
1. The characteristics of individual units or artificial neurons;
2. The network topology, or interconnection of neural units; and
3. The strategy for pattern learning or training.
To some extent, the ANN approach is a nonalgorithmic, black box strategy, which is
trainable. One hopes to train the neural black-box to leam the correct response or output
(e.g. classification) for each of the training sampies. This strategy is attractive to the
system designer, since the required amount of apriori knowledge and detailed knowl-
edge of the internal system operation is minimal. Furthermore, after training the internal
(neural) structure ofthe artificial implementation it is hoped that it will selj-organize to
enable extrapolation when faced with new, yet similar, patterns, on the basis of experi-
ence with the training set.
Fault diagnosis using ANNs 373

The connectivity of a neural network determines its structure. Groups of neurons could
be locally interconnected to form clusters that are only loosely, weakly, or indirectly
connected to other clusters. Alternately, neurons could be organized in groups or layers
that are directionally connected to other layers.
Several different generic neural network structures, are useful for ANN-based fault de-
tection. Examples are:
• The Pattern Associator (PA). Tbis neural implementation is exemplified by
feedforward networks. A sampIe feedforward network is shown in Fig. 5.la. In section
5.4.2.1 this type of network structure is explored in detail. Its leaming (or training)
mechanism (the backpropagation approach and the generalized delta rule) are considered
and the properties of the approach are explored.
• The Content-Addressable Memory or Associative Memory model (CAM or AM).
Tbis neural network structure, is best exemplified by the Hop.field model. A sampIe
structure is shown in Fig. 5.lb.
• Selj-Organizing Networks. These networks exemplity neural implementations of
unsupervised learning in the sense that they typically cluster, or self-organize input pat-
terns into classes or clusters based on some form of similarity.

Outputs

Figure 5.1
(a) Feedforward NN structure (b) CAM/AM neural network structure

Although these network structures are only examples, they seem to be receiving the vast
amount of attention.
The feedback structure of a recurrent network shown in Fig. 5.1 b suggests that network
temporal dynamies, that is change over time, should be considered. In many instances
the resulting system, due to the nonlinear nature of unit activation-output characteristics
and the weight adjustment strategies, is a bighly nonlinear dynamic system. Tbis raises
concerns with overall network stability, including the possibility of network oscillation,
instability, or lack of convergence to stable state. The stability of nonlinear systems is
often difficult to ascertain.
374 Real-time fault monitoring ofindustrial processes

Learning in ANN-based fault detection may be either supervised or unsupervised. An


example of supervised learning in a feedforward network is the generalized delta rule. An
example of supervised learning in a recurrent structure is the Hopfield (CAM) approach.
Unsupervised learning in a feedforward (nonrecurrent) network is exemplified by the
Kohonen self-organizing network, whereas the ART approach exemplifies unsupervised
learning with a recurrent network structure.

5.3 Characleristics 0/Artificial Neural Networks

An ANN is a distributed information processing system composed of many simple com-


putational elements (nodes) locally interacting across very-Iow-bandwidth channels
(connections). Most ANNs can be described as a directed graph (i.e., digraph) where the
vertices are called processing nodes and the edges are called connections. Inputs and
outputs of the network are simply additional connections originating or terrninating in the
external environment. The structure of these connections to the external environment is
determined by the problem to which the network is applied.
Nodes in artificial neural networks are simple processors inspired by their biological
counterparts. They can be described by a simple function that provides a mapping from
n-dimensional space (inputs received via connections to other nodes or from the external
environment) to one-dimensional space (a single output value, which becomes the input
connections to other nodes or to the external environment). The network is inherently
parallel, so that many nodes can carry out their computations simultaneously. Two fea-
tures of a typical artificial neuron are illustrated in Fig. 5.2:
1. The activity a (or state of the node) resulting from the inputs, and
2. the output, which is a function ofthe activity,j{a).
Processing nodes contain an internal structure that consists of a number of functions and
related data items. Typical functions include an activation function and an output func-
tion, while typical data items include input connection weights and local memory vari-
ables (e.g. activity value, output value, learning parameters, etc). All operations per-
formed by a processing element are determined by its transfer function. The transfer
function is usually composed of at least two functions: the activation funclion, which is a
rule for combining the input connection signals as modulated by the connection weights
to provide astate of activation, and the output Junclion, which maps (using the local
memory values) the current state ofactivation to an output signal.
Activation functions. Examples are shown in Figs. 5.3a-5.3b. This activation character-
istic may, for example, be simply a threshold test, thus emulating a relay characteristic.
Conversely, the possibility of external (i.e. not the output of other neurons) inputs to the
neural unit, inhibitory inputs (as in the perceptron, Minsky and Papert, 1969), and
weighted and nonlinear combinations of inputs are also possible. The perceptron and
other types of threshold logic were early attempts to employ single-Iayer, linear neural-
Fault diagnosis using ANNs 375

like networks to classification problems in pattern recognition. In a more general sense,


neural units may be thought of as programming objects. A number of characterizations of
tbis concept are described in Feldman and Ballard, (1982). Although there exist signifi-
cant performance differences, analog and discrete firing characteristics are not distin-
guished.

II _ _ _ _W
...:.
1---..



I D _ _ _ _W,-=D'--_-.I
I--_f(a)

WD+I = bias
In+1 = 1
Ca)

Figure 5.2 Features of


artificial neurons
(a) processing node with
adjustable weighted inputs.
-5 -4 -3 -2 -1 0 1 2 3 4 5 (b) node output, as function
(b) of its activity

Output functions. As shown in Fig. 5.4, a variety of functions that map neuron input
activation into an output signal are possible. The simplest example is that of a linear unit,
where,
0= f(a) =a (5.1)
One particular functional structure that is often used is the sigmoid characteristic,
where,
1
o=f(a)=-- (5.2)
l+e- la
Equation (5.2) yields 0 e[O,l]. Then Ä. is an adjustable gain parameter that controls the
"steepness" of the output transition as shown in Fig. 5.4e. Typically, Ä.=I and (5.2) is
376 Real-time fault monitoring of industrial processes

often referred to in the literature as a logistic junction. A computational advantage of


the activation function of(5.2), wbich is useful in training, is that when Ä.=1,
do
-=0(1-0) (5.3)
da
An interesting observation is that the average firing frequency of biological neurons as a
fimction of excitation follows a sigmoidal characteristic.

1-----;~o={O,I} o={O,I}

T: Threshold
Inputs Output Inputs Output
d
F2T;I=O LXjWj <T 0
j=1
F2T;l>O 0
d
E<T;l=O o LXjWj ~T
j=1
E<T;l>O 0
(E: sum of activated excitory inputs
I: sum of activated inhibitory inputs)
(a) (b)
Figure 5.3 Neuron activation characteristics.
(a) McCullouch-Pitts model (b) Linear weighted threshold model
Another particularly interesting class of activation functions are bilevel mappings or thre-
sholding units. For example,
I a~0
o=l(a)= { (5.4)
o a<O
Fig. 5.4b shows tbis unit characteristic using a general threshold, T. Alternatively,
Fault diagnosis using ANNs 377

(a)

o Upper threshold
o

T a a
Threshold, T
tower tbreshold, t 1
(b) (c)

r--~--)">1
),,=1

/}
)..<1

t==~"'~--- Ä= 0

a
o U

(d) (e)

Figure 5.4 Neuron output function characteristics.


(a) Linear, (b) threshold (relay), (c) threshold (linear), (d), (e) sigmoidal.
378 Real-time fault monitoring ofindustrial processes

+l a ~O
o=l(a)= { (5.5)
-1 a<O
Thresholding units may be viewed as a limiting case of the sigmoidal unit characteristic
of (5.2). This is shown in Fig. 5.4e. In addition, thresholding units may be used to com-
pute Boolean functions.
The characteristics of the previous activation functions suggest an instantaneous activa-
tion-to-outpu mapping. A more realistic model would involve delay or dynamics in the
unit response. A model to incorporate dynamics might be,
daj(t) 1 1 .,.
- - =--aj(t) + -aj (t) (5.6)
dt Tj Tj

where aj(t) is the activation, aHt) is the actual input activation and Oj is the time con-
stant of the ith unit. Equation (5.6) constrains the time change of individual unit states
and enables a local "memory." Biases may be used, for example to selectively inhibit the
activity of certain neurons.

5.4 ANN topologies and learning strategies

As noted earlier, an ANN can be regarded as a collection of processing nodes and con-
nections. However, almost all neural networks have considerable structure beyond this
simple representation. For example, most ANN architectures group the processing nodes
into disjoint subsets, called layers, in which all the processing nodes have essentially the
same transfer function. Processing nodes can send connections to other processing nodes
in the same layer as weil as to processing nodes on other layers.
Many ANN topologies have been proposed (Hopfield, 1982; Feldman and Ballard, 1982;
Rumelhart and McClelland, 1986; Kohonen, 1984). Each differs in the number and
character of the processing nodes, the connections, the training procedures, and whether
the input/output values are continuous or discrete. An extensive review of fundamental
developments in feedforward artificial neural networks from the past 30 years is given by
Widrow and Lehr (1990).

5.4.1 Supervised learning ANNs

Supervised learrung requires the pairing of each input vector with a target vector repre-
senting the desired output; together these are called a training pair. Usuallya network
is trained over a number of such training pairs. An input vector is applied, the output of
the network is calculated and compared to the corresponding target vector, and the dif-
ference (error) is fed back through the network and weights are changed according to an
Fault diagnosis using ANNs 379

algorithm that tends to rninirnize the error. The vectors of the training set are applied
sequentially, and errors are calculated and weights adjusted for each vector, until the
error for the entire training set is at an acceptably low level.

5.4.1.1 Multilayer, feedforward networks

In this section a neural network with a layered, feedforward structure and error gradient-
based training algorithm is presented. Although a single-Iayer network of this type,
known as the perceptron, has existed since the late '50s (Minsky and Papert, 1969), it did
not see widespread application owing to its lirnited classification ability and the lack of a
training algorithrn of the multilayer case. Furtherrnore, the training procedure evolved
from the early work of Widrow (Widrow and Hoff, 1960) in single-element, nonlinear
adaptive systems such as ADALINE (Widrow and Lehr, 1990).
The feedforward network is composed of a hierarchy of processing units, organized in a
series of two or more mutually exclusive sets of neurons or layers. The first, or input,
layer serves as a holding site for the values applied to the network. The last, or output,
layer is the point at which the final state of the network is read. Between these two ex-
tremes lie zero or more layers of hidden units. Links, or weights, connect each unit in
one layer to only those in the next-higher layer. Fig. 5.5 illustrates the typical feedfor-
ward network. The network as shown consists of a layer of d input units (LJ, a layer of c
output units (L o )' and a variable number ofinternal or hidden layers (L h;) ofunits.
The topology of the multilayer, feedforward network is not its essential point. What is
more important is the role by which the topology acquires intelligence, i.e. its learning
rule. In fact, the delay in developing a proper rule can be blarned for the early stalling of
neural network research. Even though an explicit learning formula was not developed
until around 1986 by Rurnrnelhart and McCleIland, an existence proof by Kolmogorov
laid the theoretical foundations for such a rule.
Kolmogorov's Mapping Neural Network Existence Theorem: Given any continuous
function qJ:Id~JlC, ~,)=o where] is the closed unit interval [0, I] (and therefore ]d is the
d-dimensional unit cube), qJ can be implemented exactIy by a three-Iayer neural network
having d processing elements in the input layer, (2d+ 1) processing elements in the
(single) hidden layer, and c processing elements in the output layer. The processing ele-
ments in the hidden layer implement the mapping function,
d
Z k = ~:>~ k",Uj + Ek) + k (5.7)
j=!

where ~ are the network inputs and the real constant A, as weIl as the continuous real
monotonie increasing function '" are independent of qJ (although they do depend on d).
The constant Eis a rational number, 0<t:S8, where 8 is an arbitrarily chosen positive
380 Real-time fault monitoring of industrial processes

constant. Further, it can be shown that '" can be chosen to satisfY a Lipschitz condition
11JI(i) -IJI(0)1 ~ cli - olP for any 0 < ß 5;, 1.
I ... L., L,

\~ _ _ _ _ _ _ _ _ _ _ _ _y - _ _ _ _ _ _ _ _ _ _ -JI
Interna) or "hidden" layers
(L",)

Figure 5.5 Structure ofmultiple-Iayer feedforward neural network.

The output layer elements implement the following mapping:


2d+1
Oj = Lgj(Zk)
k=1

where the functionsgj, i = 1,2 ... , c are real and continous (and depend on IJ' and &).
The utility of this result is somewhat limited however, since no indication of how to
constroct the '" and gj functions is given. For example, it is not known whether the
commonly used sigmoidal characteristics even approximate these functions.
The Genaralised Delta Rule (GDR) or back-propagation leaming role is the most
widely used procedure for "tuning" multilayer, feedforward networks. This was pro-
posed, as stated earlier, by Rummelhart and Mc Clelland in 1986. At about the same
time, Mecht-Nielsen reintroduced Kolmogorov's Theorem, thus opening a new era in
neural network research. Since then, a number of variants of the GDR have been pro-
duced, aiming at overcoming certain shortcomings of the original approach or trying to
fine-tune the algorithm to specific tasks such as system identification, control or pattern
recognition. The back propagation a1gorithm belongs to the dass of back-coupled error
correction procedures, which use a term that has to be computed from the networks'
output mode in question, and is then transmitted back to adjust the weights.
Before giving the details of the role, let us describe the basic operations in step format:
1. Apply input (stimulus) vector to network.
Fault diagnosis using ANNs 381

2. "Feed forward" or propagate input pattern to determine outputs.


3. Compare outputs in output layer with desired pattern response.
4. Compute and propagate error measure backward (starting at output layer) through
network.
5. Minimize error at each stage through weight adjustments.
The error measure or objective function to be minimized in GOR learning is the mean-
squared error between the actual outputs from the output layer and the desired or target
outputs for a11 the input patterns. Tbis function is given by,
(5.8)

where,
(5.9)

and w is the vector containing the network weights. Given a training input vector xp ' Yp
is the output vector generated by the forward propagation of the activation through the
network, and dp represents the desired output vector associated with xp '
Whether tbis objective function is appropriate in specific situations, e.g. in c1assification
problems, is debatable, and modifications may be made if tbis is not the case.
The objective function is minimised by a gradient descent technique. Applying minimisa-
tion criteria to the output node yields the following weight adjustment rule:
(5.10)

where L1pwji is the change in the weight from node i to node j after training sampie p, 11 is
a gain term usually called the learning rate and a is a momentum term that smoothes the
effect of dramatic weight changes by adding a fraction of the weight change, L1p-Iwji ,
from the previous training sampIe p-l. The error signal öpi is a measure of the distance
from the activation level of node j to its desired level after training sampie p.
The GDR provides two rules for calculating the error signal of anode. For an output
node,

(5.11)

where dpj is the desired activation level for node j with respect to activity generated by
the input pattern p.
The major contribution of the GOR lies in its formula for computing the delta values for
the bidden nodes, since before the GOR, there was no learning procedure for multilayer,
feed-forward networks. For an arbitrary node j in a bidden layer, the rule for calculating
the error signal is,
382 Real-time fault monitoring ofindustrial processes

8pj =fj(ap)'Lßpkwlej (5.12)


k

where the summation is over all the k nodes to which the node j sends output.
As the name back propagation suggests, the basic idea behind this computation of error
signals for the hidden nodes, it to propagate errors back through the system that are
based on observed discrepancies between the values of output nodes and the expected
output for a training pattern. The error signal focuses on adjusting connection strengths
that are most responsible for the output error. Then, if,
1
fj(a) = -a.
l+e J

performing the differentiation, yields for the error signal,


(d pj - Y pj)Ypj(1- Ypj) ifj eL
{ (5.13)
8 pj = Ypj (1- Ypj )"'Lßpkwlej otherwise
k

where L is the set of indices for the output nodes.


Learning involves two phases. First, the inputs are propagated in a feed-forward fashion
through the network to produce output values that are compared to the desired outputs,
resulting in a error signal for each of the output nodes. Second, the error is propagated
backward through the network using the following procedure. The deltas (error signals)
for the output layer are calculated first, and these deltas are used recursively to calculate
the deltas for the adjacent hidden layer using (5.13). These deltas can then be used to
update all the weights in the network.
Cybenko (1989), amongst others, has proved that a two-Iayer perceptron network can
approximate any continuous mapping with an arbitrary accuracy, provided sufficiently
many hidden units are available. He has also shown that arbitrary decision regions can be
arbitrarily weil approximated by a two-Iayer perceptron network. Unfortunately the
papers do not tell how these accurate approximations can be achieved.
The back-propagation algorithm does not always find the global minimum but may stop
at a local minimum. Baba (1989), proposed a random optimisation method for finding
the global minimum of error function in neural networks. In practice the type of mini-
mum has little importance as long as the desired mapping or c1assmcation is reached with
a desired accuracy. The optimization criterion of the back-propagation algorithm is not
very good from the pattern recognition point of view. The algorithm minimizes the
square error between the actual and the desired output - not the number of faulty c1assi-
fications, which is the main goal in pattern recognition. The algorithm is too slow for
practical applications especially if many hidden layers are used (pao, 1989, Wasserman,
1989). To improve on speed, Cho and Kim (1993) present three categories for rapid
backpropagation learning. Their performance is evaluated and their merits and short-
Fault diagnosis using ANNs 383

comings are briefly discussed. Apart from this, a back-propagation network has poor
memory. When the network leams something new it forgets the old one. Despite of its
shortcomings the back-propagation is very widely used.

5.4.1.2 Recurrent high-order neural networks (RHONNs)

Recently, there has been a concentrated effort towards the design and analysis on leam-
ing algorithrns that are based on the Lyapunov stability theory. Polycarpou and Ioannou
(1992) present a general formulation for modeling, identifying and controlling nonlinear
dynamical systems using various neural network architectures is presented, and analytical
results concerning the stability of these schemes are obtained. Gaussian radial-basis-
function networks have also been used for adaptively controlling dynamical systems with
unknown nonlinearities.
A special architecture of RNN is that instead of having so-called hidden layers, one can
enhance the input pattern with additional high-order terms and then usually find that a
flat net with no hidden layers suffices for the purpose. This kind of architecture is called
Recurrent High-Order Neural Networks (RHONNs).
High-order networks are expansions of the first-order Hopfield and Cohen-Grossberg
models (described later) that allow higher-order interactions between neurons. RHONNs
have a superior storage capacity, while stability properties of these models for fixed
weight values have been proved. Furthermore, several authors have demonstrated the
feasibility of using these architectures in applications such as grammatical inference and
target detection.
Kosmatopoulos et al. (1992) present efficient learning algorithrns for recurrent high-or-
der neural models and analyze their stability properties.
The performance of RHONNs during the learning phase is superior to the performance
of conventional architectures (backpropagation, etc). Convergence is achieved in much
less iterations, e.g., for a task of learning 75 input/output pairs (with each input to be a
pattern of 30 real number feature values and each associated output a single scalar), a
backpropagation architecture needs 50-80 iterations, while a RHONN architecture only
6 iterations in order to converge with a system error in the order of 10-3.
Recurrent neural network (RNN) models are characterized by a two way connectivity
between neurons. This distinguishes them from feedforward neural networks, where the
output of one neuron is connected only to neurons in the next layer. In the simple case,
the state history of each neuron is determined by a differential equation of the form
x·1 ==-a.x·+b·~w
1 1
.. y.
1~ 1) J
j

where x j is the state ofthe i-th neuron, aj' bj are constants, wif is the weight connecting
the j-th input to the i-th neuron, and Yj is the j-th input to the above neuron. Each Yj is
384 Real-time fault monitoring ofindustrial processes

either an external input or the state of a neuron passed through a sigmoidal function, i.e.,
Y.J,=S(Xj) where S(.) is a sigmoidal nonlinearity.
The dynamical behavior and stability properties of neural network models of tbis form
have been extensively studied by Hopfield, as weil as other researchers. These studies
showed encouraging results in application areas such as associative memories, but they
also revealed limitations of tbis simple model.
In a recurrent second-order neural network the total input to the neuron is not only a
linar combination of the components Y.J" but also of their products YjYk. Moreover, one
can pursue along this line to incIude bigher-order interactions represented by triplets
Y.J'YkY/, quadruplets, etc. Tbis cIass of neural networks forms a recurrent bigher-order
neural network (RHONN).
Consider now a RHONN consisting of n neurons and m inputs. The state of each neuron
is governed by a differential equation of the form

x· = -a·x· +b'[~W"
1 1 1 ~ 1)
1
n ydj(k)(t)]
j
k=1 jeIk

d
where {/1' h,.·.,I is a collection of L not-ordered sub sets of {I, 2, ... , m + n}, aj • bj
are real coefficients, W jj are the (adjustable) synaptic weights ofthe neural network, and
~(k) are non-negative integers. The state ofthe i-th neuron is again represented by xi'

and y = [Yl Y2 '" Y m+n]T is the vector consisting ofinputs to each neuron, defined
by,
S(xl)
Yl

Yn S(xn )
y=
Y n+l
=
Ul

Y n+m Um

where u =[Ul U2 ••• Um] T is the external input vector to the network. The function
S(.) is a monotone increasing, differentiable sigmoidal function ofthe form
1
S(x) = a -ß-Y
l+c x

where a, b are positive real numbers and g is areal number. In the special case that
a=b=l, g=O, we obtain the logistic function, and by setting a=b=2, g=1, one obtains the
hyperbolic tangent function; these are the sigmoidal functions most commonly used in
neural network applications.
Fault diagnosis using ANNs 385

The weights of the RHONN are adjusted using the learning algorithm proposed by
Kosmatopoulos et al. (1992),

wk(t + 1) = wk(t) + 11[ ny?k)(t)]C(t)


} EIlt

where e(t)=x(t)-c(t) denotes the state error.


An important advantage of the RHONN model is that one can obtain efficient learning
algorithms that also guarantee the stability of the overall system. In addition, real time
performances can be achieved due to the fast learing phase (e.g. exponential weights
convergence) as weil as due to their dynamical and adaptive nature (dynamic adjustment
ofthe weights) see Kosmatopoulos et al. , (1993).

5.4.2 Unsupervised learning

The multilayer, feedforward and the Hopfield neural networks both exemplify supervised
learning. In this section, neural networks based on unsupervised learning are shown.
Specifically, networks that are used to determine natural clusters or feature similarity
from unlabeled sampies are explored. The "cluster discovery" capability of such
networks leads to the descriptor se/j-organizing.
Fundamentally, unsupervised learning algorithms (or "laws") may be characterized by
first-order differential equations (Kosko, 1990). These equations describe how the
network weights evolve or adjust over time. Often, some measure of pattern associativity
or similarity is used to guide the learning process, which usually leads to some form of
network correlation, clustering, or competitive behavior.

5.4.2.1 Adaptive Resonance Arcl,itectures (ART)

Neural self-organizing architectures based on Adaptive Resonance Theory (AR1)


(Carpenter and Grossberg, 1987) consist principally of a pair of interacting neural
subsystems, as shown in Fig. 5.6. Neural unit interconnections are both intra- and inter-
system. The concepts of competitive learning and interactive activation are fused in this
approach, in a manner that leads to a stable learning algorithm. The intersystem feedback
structure is apparent from Fig. 5.6. The control signals shown are used to regulate the
system operational mode, as described subsequently, and distinguish this system from a
simple Hopfield network.
Fig. 5.7 shows an expanded view ofthe neural subsystems ofFig. 5.6. Note that the FA
and FB layers are totally interconnected, that is, the activation of each FA unit is fed to
all FB units and vice versa. This interlayer feedback structure is used to facilitate
resonance when a match between an encoded pattern and an input pattern occurs. This
386 Real-time fault monitoring of industrial processes

typifies an ART1 architecture. The FA subsystem may be viewed as the "bottom" layer
that both "holds" the input pattern and, through the bottom-up weights bij (the
interconnection strength from FA unit j to FB unit i), forms the FB layer excitation.

Neural subsystem FB

Top - down Bottom-up


Control - activation activation

Neural subsystem FA

Input pattern, x

Figure 5.6 Structure of ART network.

The FB layer is composed of "grandmother" cells, each representing a pattern dass. FB


unit activations are fed back to the FA layer units via the interlayer tij interconnections.
These may be viewed as long-term memory (pattern storage) interconnections. Most
importantly, the FB layer units employ a self-exciting, competitive, neighbor-inhibiting
interconnection structure, whereby each FB unit reinforces its own output through a
positive interconnection between its output and one of its inputs, while maintaining a
negative (inhibitory) connection to every other FB layer unit. A sampIe implementation
ofthis structure and resulting action is as follows:
Interconnection weights wy" for units totally within layer FB, are determined by,

w ..
1)
={1
-E
1=)

i"* j and i, j correspond to units in FB


(5.14)

This corresponds to an "on-center, off-surround" interconnection strategy. Inter-Iayer


interconnection weight values tij' where i corresponds to units in the FA layer, are
described later. The competition or inhibition paramater Bis a design parameter, with the
constraint,
1
E<-- (5.15)
N FB
where NFB is the number of units in layer FB. Competition dynamics in layer FB are
modeled via,
Fault diagnosis using ANNs 387

Oi(n+1)=fj[Oi(n)-S~Ok(n)l i=1, 2, ... , N FB (5.16)


r~k

where Oj = fi(aj) and fi( ) is a unit activation-output mapping function that must be
monotonically nondecrreasing for positive aj and zero for negative aj, and is a fun-
damental part of the MAXNET structure (Lippman, 1987). Thus, only one pattern e1ass
is designed to "win" if the overalI network converges for a given input pattern. The
reader will note some similarity of this local competition or inhibition-based structure
with the Kohonen structure discussed subsequently. The overalI ARTl architecture,
then, is a cooperative-competitive feedback (recurrent) structure.

-E _---_

.1
FB

Figure 5.7 Expanded view of ART networks

The previous static network architecture description only partially characterizes the
operation of the network. Because of the feedback structure, temporal dynamics are an
important component ofboth recognition (recall) and learning (encoding). These actions
are governed by the additional control signals shown in Fig. 5.7 and enable two phases of
operation: an attentional phase engages FA units only when an input pattern is
presented; an orienting phase successively "thins" units in FB, until a winner, or pattern
e1ass is found. If it is not possible to determine a winner, an uncommitted FB unit is used
to represent this new pattern e1ass, thus facilitating learning.
When presented with an input pattern, the ART network implements a combined
recognition/learning paradigm. If the input pattern is one that is the same as, or e10se to,
388 Real-time fauIt monitoring of industrial processes

one previously memorized, desired network behavior is that of recognition, with possible
reinforcement of the FB layer on the basis of this experience. The recognition phase is a
cydic process of bottom-up adaptive filtering (adaptive since the weights bjj are
changeable at each iteration) from FA to FB, selection of a stored pattern dass in FB
(the competition), and mapping ofthis result back to FA, until a consistent result at FA is
achieved. The top-down feedback of the competition winner output from FB to form FA
activations that may be viewed as encoded or learned expectations. This is then the
network state of resonance and represents a search through the encoded or memorized
patterns in the overall network structure. If the input pattern is not recallable, desired
behavior is for the FB layer to adapt or learn this dass, by building or assigning a new
node henceforth representing this pattern dass.
An algorithm that accomodates binary input {-I, I} features is as folIows:
1. Select e, p and initialize the interlayer connections as folIows:

to. = I
1) Vi , J' (5.17)

o I
b··=- (5.18)
1) 1+ n
Equations (5.18) and (5.19) are specific cases ofmore general constraints that must
be placed on the initial values of tij and bij' Equation (5.17) satisfies the so-called
lemplate learning inequality, whereas (5.18) satisfies the direct access inequality.
2. Present d-dimensional binary pattern x = (xl> x2, ... , xd)T to the FA layer.
3. Using biJ" determine the activations of the FB layer units, that is, each unit has
activation,
B =ai j Lbijx (5.19)
j

4. U se the competition-based procedure of (5.16) to determine a "winner" or unit with


maximum activation (and therefore output) in FB. Each unit in FB therefore
"competes" with all others in FB, until iteration within FB yields only one active
unit. Denote the output (activation.fi( ) is the identify function) ofthe winning unit
as 0 jFBwin , that is,

O~Bwin = max {orB } (5.20)


°k EFB

Related to the winner unit in (5.20) is the function mU), used for weight updates
and shown in (5.25).
5. The top-down verification phase begins. Using the winner unit found in step 4, this
result is then fed back to FA via the top-down or lij' interconnections using,
aJFA =t ..o FBwiD
1) } (521)

Fault diagnosis using ANNs 389

for each unit in FA. The fed-back FA unit activations (or outputs) are then
compared with the given input pattern. This is an attempted confirmation of the
winning unit dass found in step 4. Numerous comparisons are possible, with the
overall objective of determining whether the top-down and input activations are
sufficiently dose. For example, since the inputs are binary, the comparison
(5.22)

may be used. In ART!, Ilxl =LilxJ Here pis a design parameter representing
"vigilance" ofthe test, that is, how critically the match should be evaluated.
6. If (5.22) is true, that is, the test succeeds, the bif and lif interconnections are up-
dated to accommodate the resuIts of input x using discrete versions of the slow
leaming dynamic equations:

i jJ =alm(j)[-P1tij + fj(Xj)] (5.23)


where we recall that lif is the strength of interconnection from unit j in FB to unit i
in FA. a\ is a positive parameter that controls the learning rate, and al is a positive
constant that allows gradual "forgeting" or decay. /;(xj)is the output of FA unit i
using input Xi as activation, and m(j) is described more fully below and in (5.25).
Note that the competitive interconnections within FB defined in (5.14) are not
adjusted through (5.23); only the top-down interlayer weights lif are modified.
Similarly, the bottom-up weights are adjusted by,

bJi = a2m(j)[ -P2bJi + fj(Xj)] (5.24)


where bji is the strength of the interconnection from unit i in layer FA to unit j
in layer FB. Similarly, a2 and ßz are analogous to those in (5.23). Function mCJ)
is used to restrict the updating of weights to those involving only the winning
class 0jFBWin as defined in (5.20), using,

m(j) = {I 'f
I
- FBwin
0J - 0J (5.25)
o otherwise
If the test of (5.22) faUs, this unit is ruled out and step 4 is repeated until a winner
can be found or there are no remaining candidates.
Equations (5.23)-(5.25) represent one example of a learning strategy in the ART
approach. Carpenter and Grossberg (1987), provide separate "slow" and "fast" learning
procedures. Parameters al and a2 control the rate at which the system learns or adapts
and must be chosen carefully. Learning rates that are too slow, yield systems that are
rigid (or nonadaptive in the extreme case). Conversely, learning rates that are too fast
cause the system to display chaotic (or what is termed "plastic") behavior. In the
extreme case, the system tries to learn every input pattern as new dass. Thus, a trade-off
390 Real-time fault monitoring ofindustrial processes

exists between system insensitivity to novelty (truly new patterns) and an overly plastic
behaviof. Simplified versions ofthese updating strategies are shown in Pao, (1989).

5.4.2.2 Kohonen maps

Kohonen, (1984), has suggested an alternative unsupervised neural learning structure


involving networks that perform dimensionality reduction through conversion of feature
space to yield topologically ordered similarity graphs or maps or clustering diagrams
(with potential statistical interpretations). In addition, a lateral unit interaction function is
used to implement a form of local competitive learning.
Fig. 5.8 shows possible I-D and 2-D conflgurations of units to form feature or pattern
dimensionality reducing maps. For example, a 2-D topology yields a planar map,
indexed by a 2-D coordinate system. Of course, 3-D and bigher dimensional maps are
possible. Notice that each unit, regardless of the topology, receives the input pattern
x=[xl x2 ... xdF in parallel. Considering the topological arrangement ofthe chosen
units, the d-D feature space is mapped into I-D, 2-D, 3-D, and so on. The coordinate
axes used to index the unit topology, however, have no explicit meaning or relation to
feature space. They may, nevertheless, reflect a similarity relationsbip between units in
the reduced dimensional space, where topological distance is proportional to
dissimilarity.

Unpu! pallern) x _~ _ _ _ _--',,--.

Unpu! pallern) x -r-----,r-----A,,-..,

Figure 5.8 Topological map configurations.

Choosing the dimension of the feature map involves engineering judgment. Some PR
applications naturally lead to a certain dimension; for example, a 2-D map may be
Fault diagnosis using ANNs 391

developed for speech recognition applications, where 2-D unit clusters represent
phonemes. The dimensions of the chosen topological map may also influence the training
time ofthe network. It is noteworthy, however, that powernd results have been obtained
by just using l-and 2-D topologies.
Once a topological dimension is chosen, the concept of an equivalent dimension
neighborhood (or cell ofbubble) around each neuron may be introduced. An example for
a 2-D map is shown in Fig. 5.9. Tbis neighbourhood, denoted Ne' is centered at neuron
ue, and the cell or neighborhood size (characterized by its radius in 2-D, for example)
may vary with time (typically in the training phase). For example, initially Ne may start as
the entire 2-D network, and the radius of Ne shrinks as iteration proceeds. As a practical
matter, the discrete nature of the 2-D net allows the neighborhood of a neuron to be
defined in terms of nearest neighbors; for example with a square array, the 4 nearest
neighbors of ue are its N, S, E, and W neighbors; the 8 nearest neighbors would include
the "corners." In I-D, a simple distance measure may be used.

Figure 5.9 A topological neiborhood Ne of unit Ue showing


shrinking training iteration nj.

Each unit uj in the network has the same number of weights as the dimension of the input
vector and receives the input pattern x in parallel. The goal of the self-organizing
network, given a large, unlabeled training set, is to have individual neural clusters self-
organize to reflect input pattern similarity. Defining a weight vector for neural unit Uj as
w,=(wjl> wi2, ... , wjd)T, the overall structure may be viewed as an array ofmatched filters,
wbich competively adjust unit input weights on the basis of the eurrent weights and
goodness of match. A useful viewpoint is that each unit tries to become a matched filter,
in competition with other units.
Assurne that the network is initialized with the weights of all units chosen randornly.
Thereafter, at each training iteration k and for an input pattern x(k), a distance measure
d(x, w j ) between x and wj , Vi in the network is computed. Tbis may be inner product
measure (correlation), Euclidean distance, or another suitable measure. For simplicity,
the Euclidean distance is adopted. For a pattern x(k), a matcbing phase is used to define
a winner unit ue, with weight vector W e, using,

Ilx(k) - wc(k)11 =m~n{llx(k) - Wj(k)ll} (5.26)


1
392 Real-time fault monitoring of industrial processes

Thus, at iteration k, given x, c is the index of the best matching unit. This affects all units
in the currently defined cell, bubble or cluster surrounding uc' Nik) through the global
network updating phase as folIows:
Wj(k)+a(k)[X(k)-Wj(k)]; i eNe(k)
~~+D= { . (5.27)
wj(k) ; 1!l: Ne(k)

Note that (5.27) corresponds to a discretized version ofthe differential adapation law:

dw· =a(t)[ x(t)-w;(t) ] ; i eNe(t)


_1 (5.28)
dt

d;; = 0; i !l:Ne(t) (5.29)

Clearly, (5.28) shows that d(x, wi ) is decreased for units inside Nc , by moving wj in the
direction (x-wi)' Therefore, after the adjustment, the weight vectors in Nc are left
unchanged. The competitive nature of the a1gorithm is evident since after the training
iteration, units outside Nc are relatively further from x. That is, there is an opportunity
cost ofnot being adjusted. Again, ais a possibly iteration-dependent design parameter.
The resuiting accuracy of the mapping depends on the choices of Nik), a(k), and the
number of iterations. Kohonen cites the use of 10,000-100,000 iterations as typicaI.
Furthermore, a(k) should start with a value c10se to 1.0, and gradually descrease with k.
Similarly, the neighborhood size Ne(k) deserves careful consideration in algorithm design.
Too small a choice of Ne(O) may lead to maps without topological ordering. Therefore, it
is reasonable to let Nc(O) be fairly large (Kohonen suggests 1/2 the diameter of the map)
shrioking NeCk), perhaps Iinearly, with k to the fine-adjustment phase, where Ne(k)
consists only ofthe nearest neighbors ofunit uc' Of course, a limiting case is where Nik)
becomes one unit.

5.5 ANN-basedfault diagnosis

As pointed out in the introduction, an ANN-based fault monitoring scheme has, in a way,
the same structure as a corresponding model-based scheme: a number of signals, which
are deemed typical of the state of the process that is being monitored, is fed into a neural
machine. The neural machine outputs a fault vector which is manipulated by adecision
logic to decide if a fault has occured and possibly isolate it and estimate its size.

5.5.1 Choice ofneural topology

As far as the choice of neural topology is concemed, no general guidelines exist. It is


true however that feed-forward, multilayered networks employing the GDR (or back
Fault diagnosis using ANNs 393

propagation) learning rule are used in the majority of published applications. Its elegant
structure, is however offset by two factors:
• It may not converge.
• It has a slow convergence of O(NJ) where N is the number of weights.
The first problem can be usually overcome by multiple starts with different random
weights and by a low value of the learning rate 11 (Lippman, 1987). To accelerate the
learning procedure dedicated parallel hardware can be used for the computations. The
extent of both drawbacks seems to depend on parameters 11 (learning rate) and a
(momentum factor). Unfortunately, their optimum values cannot be determined apriori
and furthermore they may change during the training (i.e. they are time-varying). Their
adaptive setting is the subject of ongoing research (Cho and Kim, 1993).
Since the appropriate choice of a network topology cannot be made a-priori, it is good
practice to compare the performance of various topologies and choose the best
performer. This procedure is not straightforward, however. Not only one has to
compare different topologies, but also different configurations ofthe same topology. In a
feedforward, multilayered ANN the values have to be found empirically. Moreover, the
number of hidden layers and the number of nodes per hidden layer must also be found by
experiment. This is a weil known drawback in implementing this kind of ANN. Node
activator functions must also be chosen amongst the class of possible alternatives
(threshold, sigmoid, hyperbolic, Gauss etc.). It follows therefore that a logical procedure
for optimum network topology is to search each proposed topology for its best
configuration and then choose the best amongst the best. Sorsa et al. (1991) have used
this idea in comparing three topologies: a single-Iayer perceptron, a multilayer
perceptron and a counter-propagation network which combines a Kohonen layer for
classification with an ART architecture for mapping. Results were obtained on a
simulated model of a heat exchanger and a continuous stirred tank reactor. Their results
will be detailed in the examples section.

5.5.2 Choice %utput/ault vector and classijication procedure

What is really asked from an ANN-based fault diagnosis system is to recognize fault
patterns inherent in signals carrying fault information. Thus, as already pointed out, a
fault diagnosis problem can be viewed as a pattern recognition problem.
Neural networks have been used for pattern recognition for sometime and there exist
some powerful theorems in this area. In fact, Mirchandani and Cao (1989), have shown
that in a d-dimensional space, the maximum number of regions that are linearly separable
using h hidden nodes is given by,
d
M(h,d) = L(~) if h > d
j=o
394 Real-time fault monitoring ofindustrial processes

=2h ifh ~d
This theorem holds for hard-limiting nonlinearities, i.e. binary outputs. However, the
conclusions can be extended to other types ofnonlinearities (sigmoid etc). The behaviour
of such networks is of course more complex, because the decision regions are bounded
by smooth curves instead of straight line segments.
In traditional pattern recognition techniques the pattern classification is carried out
throught aseries of decision functions. A classification of d-dimension pattern space with
M clusters, may be viewed as a problem of defining hyperplanes to divide the d-
dimensional Euclidean space into M decision regions. More complex decision functions
will be needed for linearly unseparable decision regions. Moreover, probability models
are often employed under the premise of prior probabilities, because perfect typical
reference pattern examples are not easy to obtaine. How to select the suitable decision
function forms and how to modify the concerned parameters of the decision functions are
not easy to be determined for the traditional pattern recognition methods.
Similarly to traditional decision theory, neural networks perform the classification by
creating decision boundaries to separate the different pattern classes. However, unlike
traditional classifiers, when a classification is realized with neural networks, the decision
functions are not needed to be given beforehand. The whole mapping from sampIe space
into decision space is developed automatically by using the learning algorithrn. The
knowledge of fault patterns is stored distributively in the highly interconnected nonlinear
neuron-like elements. Moreover, it is these nonlinear activations in the network that lead
to the strong classification ability of the artificial neural networks for the high dimension
pattern space.
The usual pattern vector employed in ANN-based fault diagnosis has dimension equal to
the number offaults that must be detected. In theory, an 1 in the ith position indicates an
ith type fault, while a zero pattern vector signals normal operation. In practice, however,
the network is trained for the values of 0.9 and 0.1 for fault and no-fault cases
respectively, since 0 and 1 are limiting cases for sigmoid activators (usually employed),
thus stalling the learning procedure if used. After training, a fault of type i is declared if
the ith element of the output pattern vector exceeds a threshold. This threshold must be
defined considering false alarm rates and it is usually calculated by simulation. A value of
0.5 is a safe guess.
Note that with this formulation, multiple faults can be detected, if the network has been
trained with this situation. An alternative procedure, mimicking parameter estimation
techniques, would be to produce a system parameter vector as an output pattern vector.
In this way, the neural network would act as a parameter estimator. Fault decision would
then be accomplished, using any of the methods discussed in Chapter 2. This decision
phase could also be implemented by a neural network. Thus the inputs to this second
network would be the parameter estimates, while its output would be a pattern vector
having the structure discussed previously.
Fault diagnosis using ANNs 395

5.5.3 Training sample design

The appropriate selection of input training data is a very important stage in the de-
velopment of an ANN-based fault diagnosis system. There is little guidance in the
literature regarding the choice of representative sets of examples for training with
continuous inputs, because most studies involve binary inputs. Most studies also used a
c10sed set of possible inputs. Training dynamical systems, however, requires continous
signals.
The first step of the procedure is to decide on the system parameters that are repre-
sentative of the system's condition. Tbis is of course application-dependent, but it may be
safely assumed that the input/output signals of a state-space representation of the plant
will be adequate. It may be neccessary to do some pre-processing on the input signals,
such as scaling or filtering. The total number of sampies needed depends on the
network's characteristics, i.e. topology, activator functions, learning rules etc. It is
evident that a small training sampie is a desired system characteristic.
The training sampie must contain signals from every possible fault situation of the plant
and in a representative range of values. Tbis may be impractical or even dangerous in
certain situations of critical faults (eg. nuclear reactors, aircraft) and simulation data is
then needed. This in part offsets the comparative superiority of neural networks
regarding the point ofmodel necessity. Even more, it is true that most published research
in ANN-based fault diagnosis relies on simulated process models. Is tbis a sign that tbis
approach is not implementable? Tbis question cannot be answered now, since it is
acceptable to use simulated models in early stages of development of new ideas.

5.6 Application examples

In tbis section, it is hoped to clarity many of the points discussed earlier and iIlustrate the
applicability of the various methods.
The cited examples span a considerable part of industrial fields where ANN-based fault
diagnosis is proposed as an alternative to other techniques. The presentation is structured
in such a way as to highlight and enlighten the following crucial points:
• process model and fault models
• network topology, configuration and learning rule
• input training signals
• output patter vector
• results
The examples that folIoware only a representative sampie of available literature and
additional references are cited at the end ofthe chapter.
396 Real-time fault monitoring ofindustrial processes

5.6.1 Applications in Chemical Engineering

The field of Chemical Engineering is especially suited for applying ANN-based fault
diagnosis systems. The nature of chemical processes, i.e. nonlinear, nonstationary and
uncertain dynamic plants, can be accomodated by neural network structures.
Because modem chemical plants are extremely complex, they are susceptible to
equipment malfunction and operator error. The complexity hampers the operator's ability
to diagnose and eliminate 'potential process upsets or equipment failures before they can
occur (Himmelblau, 1978). Hence, a continuing question in chemical engineering is how
to use the process state vector to make or aid decisions about possible action or control
at each time increment. Current techniques rely on expert systems, modeling using
classical techniques in the time or frequency domains, and statistical analysis.
STREAM I

Figure 5.10 Three continuous stirred tank reactors in series.

As an example consider an application reported by Hoskins and Himmelblau (1988),


illustrating the diagnosis of faults for a simple system composed of three continuous
stirred tank reactors (CSTR) as shown in Fig. 5.10.
Process and fault models. Each piece of equipment is operated isothermally and with
constant fluid volume (i.e., no fluid accumulation is permitted). The state variables used
as fault indicators are the flow rate, the temperature and the concentration of
components A and B in streams 1 and 4. All six state variables are monitored, and their
sensors are assumed to function properly. All readings are taken at steady state.
Table 5.1 lists six selected faults, labeled A through F, each affecting the operation of
tbis process. The possible faults involve the system flow rate, the temperature and the
inlet concentration of component A. Table 5.2 shows the sensor measurement patterns of
the values of the six state variables associated with the six selected faults used for
training the network.
Fault diagnosis using ANNs 397

Table 5.1 List of selected faults

A InIet Concentration of Component A Low


B InIet Concentration of Component A High
C InIet Flow Rate Low
D InIet Flow Rate High
E Temperature Low
F Temperature High

Table 5.2 Sensor measurement patterns of six selected faults


FR=flowrate (ft3_min), T=temperature, (OF), C=concentration (lb-moles/ft3)
superscript denotes stream; subscript denotes component

Measurements

Faults FR T ClA ClB CA4 CA4

A 18.0 190 0.3 3.18 0.2275 3.252


A 18.0 190 0.6 2.88 0.3755 3.104
B 18.0 190 1.3 2.18 0.5990 2.881
B 18.0 190 1.6 1.88 0.6681 2.812
C 13.0 190 1.0 2.48 0.4475 3.033
C 15.0 190 1.0 2.48 0.4777 3.002
D 22.0 190 1.0 2.48 0.5600 2.920
D 26.0 190 1.0 2.48 0.5958 2.884
E 18.0 150 1.0 2.48 0.8960 2.584
F 18.0 210 1.0 2.48 0.3102 3.170
F 18.0 230 l.0 2.48 0.1703 3.310

Network architecture. Fig. 5.11 shows the network architecture used for fault
detection and diagnosis. It consists of six inputs corresponding to the six state variables
of the system, three hidden nodes, and six output nodes corresponding to the six
respective process faults listed in Table 5.1. The GDR rule with various learning rates
was used to train the network (0.25<17<0.9, a=0.99).
Input training signals. Twelve measurement patterns were used to train the network:
the 11 measurement patterns listed in Table 5.2, plus one measurement pattern for a
normally operating system. These measurement patterns were obtained from a digital
computer simulation program designed to model the dynamics of an arbitrary
combination ofreactors and/or vessels. A second-order reaction (2A----+2B) was assumed
to occur between components A and B in each tank with a frequency factor of 5. Ox 1014
(Ibmole)/(ftJ)(min) and an activation energy of 4.47xI()4 (Btu/lbmole). Because the
398 Real-time fault monitoring ofindustrial processes

values of the sensor readings in the input patterns to the network are factors in the
equations that update the values of the weights in the learning procedure, measurements
with larger magnitudes exert a greater influence on leaming. To remove tbis bias, the
simulated measurement data were scaled (as indicated by the preprocessing boxes
depicted in Fig. 5.11) so that the inputs to the network varied continuously over the
range of -1.0 to 1.0.

Outputs
A B c D E F

Inputs

Figure 5.11 Trained network (numbers in circles represent biases ofnodes).

Since one of the handicaps that fault detection and diagnosis must overcome is imprecise
sensor measurements random noise was added to the inputs, modifying them to,
x' =x + D

where 11 was a noise vector in wbich each element contained a random real value in the
range of 0-1 0% of the input, and x' was the input vector with the added noise.
Fault diagnosis using ANNs 399

Output pattern vector. During training, the target values used for the output nodes
were set to 0.1 and 0.9 rather than 0 and 1. A learning criterion ofO.01 for each pattern
error Ep (ofEq. 5.9) was set to terminate the learning process.
Results. Noisy data require more complex decision regions. The learning results are
shown in Fig. 5.12. By using a sigmoid discriminant function, a multilayer feedforward
ANN and the GDR learning procedure, the network could properly classify inputs. The
plots in Fig. 5. 12 exhibited the same general trends as in the linearly separable case.
Note, however, that the range of the number of hidden nodes for low convergence rates
becomes smaller, especially for the high learning rates.
To demonstrate the generalization capabilities of the ANN, the percent rnisclassification
was exarnined for this example with the three hidden nodes and the learning rate
parameter set to 0.40. To test the ability ofthe network to recognize new input patterns,
35 measurement patterns not used in the training process were presented to the trained
network. These new measurement patterns were chosen to be representative of the
measurement space for the six faults and the normal operating system. Fig. 5. 13 shows
the percent of correct classifications as a function ofthe number of input patterns (with
added noise) used to train the network.

4000

• ~mbol leamin~ rate


3000
0.25
2000 0 0
• •
0.40
1000 0.65
6. 0.90

I:::.
Time
Steps 11

90

80

70

so
40

14 16 18
Number of hidden nodes

Figure 5. J2 Experimental resu1ts.


400 Real-time fault monitoring ofindustrial processes

Perfect generalization occured when only two measurement patterns in the training set
bad been used for each fault group (Le., 12 input patterns total in the training set, two of
which were used for the normal operating system). Increasing the number of
measurement patterns maintained the same excellent performance level. However, some
failure to generalize efficiently was observed for the training set containing only seven
training patterns because the training set was so restricted that it was not representative
of the general mapping. This result was not surprising since the seven-input pattern
pattern training set included no example patterns that were representative of the
crossover between the normal and faulty region .

• •
/
WO

90

% COJTect HO
Responses on
NcwInput
Patterns 70

20

10
Figure 5.13 Generalization
0
4 R 12 16 20 24 28 32 capacity vs. training set size.
Number of Input Patterns in Training SeI

Additional references. Venkatasubramanian (1989, 1990) and his research team have
reported experiments on a sitnilar CSTR. They have used a feedforward neural network
with backpropagation and compared two methods of presenting input patterns: raw time-
series data and moving-average data values. Two methods of discretizing the desired
output were also compared: a linear and an exponential. Extensive experimentation
showed better performance of the "linear" discretization, while raw time-series data input
produced slightly better results.
Sorsa et al. (1991) conducted a comparison study on a simulated process consisting of a
heat exchanger and a CSTR. They compared the performance of a single-Iayer
perceptron, a multilayer perceptron and a counterpropagation network consisting of a
Kohonen and an ART layer. The process had 14 noisy measurements and 10 typical
faults. The multilayer perceptron with 4 hidden nodes, using a hyperbolic tangent as the
nonlinear element was able to correctly identify the faults in all cases. The same group
(Sorsa et al. 1993) have also investigated the use ofradial basis function ANNs to fault
diagnosis for dynamical processes not in steady state operation. An orthogonal least
squares algorithm developed by Chen et al. (1991) is used to train the network. A
simulated CSTR with set-point changes is used to test the validity of the proposed
approach and prelitninary results are protnising.
Fault diagnosis using ANNs 401

5.6.2 Applications in CIM

Due to their efficient problem solving capabilities, parallel processing model, and ability
to spontaneously react to environment changes, neural networks have prompted interest
in their application to various real-time dynamic manufacturing systems (Moon, (l990),
Lo and Bavarian, (1991), Gien et al. (1993)).
Diagnostic problem solving methods, based on either deep reasoning (from the first
principles) or shallow reasoning (Davis, 1984), are considered to be unsuitable for
domains with changing and short-lived processes (Reed et al., 1988). In a typical
application of a robotic assembly, a sequence of short-lived processes (typical of robot
operations) brings about continuous changes to the state of the assembly components.
Such characteristics necessitate flexible and adaptable solutions with efficient real-time
response capabilities for the detection of and the recovery from unexpected problems
during execution. The commonly used expert system solution for monitoring and
diagnosing can be inefficient and inflexible (Schutte et al., (1987), Zeilingold and Hoey,
(1990)), particularly when it involves a large number of mies (Ieading to a large and
computationally expensive search space) which require frequent updates due to the
environment changes.
Specifically, the monitoring and diagnosing of assembly execution errors, a1though
recognized as an important problem in robotic assembly (de Mello and Sanderson,
(l990), Kusiak and Finke, {I 988), Chang and Wee, (1988)), has not been addressed
adequately. Solution to this problem is generally difficult due to the real time assembly
constraints and the complex dynamic interactions between various components, such as
robot, conveyor system, tools, and parts. The real-time constraints are particularly
critical to the solution when computationally intensive sensory information (tactile,
vision, etc.) is to be processed and accessed for the purpose of monitoring and
diagnosing during each assembly step.
Syed et al. (1993) have proposed a neural network approach to the solution of tbis
problem, by implementing an unsupervised map, namely the Kohonen map, in a robotic
subassembly involving a fastener in a dishwasher power module.
Process and fault models. As just mentioned, a subassembly involving a fastener in a
dishwasher power module is considered. A tactile sensor is attached to the end-effector
ofthe robot arm. A generic robot operation, such as (pICKUP fastener FROM table),
may have several execution instances for the same part. These instances can differ from
one another in terms ofthe robot-part surface contact point and/or the approach angle of
the robot end-effector. Therefore, each operation instance has its own particular part
handling error characteristics.
To constmct a neural map, one needs to establish a correlation between the error
characteristics and the observed assembly interaction data. F or example, a mean part
surface contact area of between 0.0 and 0.5, measured using a tactile sensor, is cor-
402 Real-time fauIt monitoring of industrial processes

related with the possible error in the execution of the (LIFT part) operation. The size of
the surface contact area indicates how properly the part is grasped.
The input to each dimension in the neural map must be represented numerically. The
output data from sensory systems are normally in numerical form, and, thus, can be
directly used as input to neural maps. However, robot operation identifiers have to be
converted into corresponding numerical values. This can be accomplished by assigning
each operation an equal numerical interval. Table 5.3 summarizes the numerical
representation for a 2-D type-lI map, when the fastener is used in the subassembly. For
the input vector (~l> ~2) the value of ~l denotes the operation type and ~2 represents the
mean measurement value.
In Fig. 5.14, the shaded areas in the 2-D space correspond to the abnormal regions. The
abnormal regions are defined by all possible values ofthe input vector (~l' ~2)'

Table 5.3 An example 2-D input space

PICKUP fastener FROM tahle ~1 ~2

TOP {loc=(O.5, 10.0, 15.4), angle=90.0} 0.0-0.2 0.75-1.00


2 TOP {loc=(O.5, 19.0, 15.4), angle=90.0} 0.0-0.4 0.75-1.00
3 SIDE {loc=(0.5, 10.5, 15.4), angle=O.O} 0.4-0.6 0.00-0.25
4 SIDE {Ioc=(0.5, 20.5, 15.4), angle=O.O} 0.6-0.8 0.00-0.25

Network architecture. A self-organizing Kohonen map is used for diagnosis. Details


for its learning are discussed in section 5.4.2.2. The leaming factor a(k) is given by:
dist 2
2,12
a(k)=E·e
where,
• dist is the distance between the winning neuron and the actual one in the neural
grid. As the nodes are plotted in the input space by using their input weights, the
distance associated with nodes must be relative to their positions in the grid and not
to their positions in the input space.
• E and L1 are two parameters that have to be adapted to optirnize the training. Their
values decrease with the number of iterations to improve and optimize the training
process. The following equations give the relation between the initial parameter
values and their values at iteration (IOxpxq):
E(IO x p X q) = (Ps)q x E(O)
Fault diagnosis using ANNs 403

..1(10 x P x q) =(Pi1)q x ..1(0)


where O<,ue<1 and 0<,ui1<1.
Training. Fig. 5.15 shows the topology of output nodes after 100,000 training it-
erations. The incoming weights, Wif' are indicated by the intersections of lines on the
array of output nodes. Fig. 5.16 displays the final organization of the array of output
nodes after 200,000 iterations. The final organization preserves the topology of the
abnormal regions of the input space.

--
'. - ..I
<:> <...>
CI
.B
§
Co.) §0.75 ~
~ 0.5
.s 0.2 .s
~
~ ~

0.25
c:::
CI
Cl.)
E
•o 0.2 0.4 0.6 0.8
c:::
CI

E
Cl.)
O..:r.-........-r-'T"""""'""T"""""'"
o 0.2 0.4 0.6 0.8
operations types operations types

Figure 5.14 Network training inputs Figure 5.15 Intermediate node positions

Results. The final map obtained through the training is utilized during the assembly
process to monitor the robot operation. A winning node is determined, whenever the
sampie of sensory input vectors (~l> ~2) are applied to the network. The correctness of
the robot operation is deduced from the winning node j, the density of nodes (d) in its
neighborhood region, and the density threshold ~.
The threshold value and the neighborhood region are defined heuristieally by an op-
erator. An example size for the neighborhood region is a rectangle ofO.08xO.lS relative
to the ~l and ~2 axis, respectively. For tbis example, the value of density threshold, ~" is
set to 10. With these values for threshold and neighborhood region, let us consider two
different operation instances from Table 5.3. For the operation instance in column 2, an
input vector (0.38, 0.5) will indieate an execution error since the ~ will be 15. On the
other hand, an input vector (0.25, 0.9) for the operation instance in column 4 will not
show an execution error as the value of ~ will be zero.
Additional references. Yamashina et al. (1990), used a feedforward, multilayer ANN to
diagnose failures in a pneumatic servovalve used in automated production systems. A
time-series vibration signal is monitored by an accelerometer and resulting data are
summarized by six characteristic parameters. Four types of failures were considered, and
a separate neural network was designed for each case. A eonjugate gradient method
eoupled with a variable metrie method was seen to produee more reasonable seareh
404 Real-time fauIt monitoring of industrial processes

directions and avoid oscillations. The diagnosis perfonnance was very promising,
reaching false alann probabilities ofO.Ol and lower.

U 1
...8
g 0.7
u
.~ O.
U
...8
c:::
o
o~~~ ..........~..,
Cl.>
E o 0.2 0.4 0.6 0.8
operations types Figure 5. J6 Final node positions.

Fault diagnosis in rotating machines has also been an application area of ANNs. Chow et
al. (1991), have again used a feedforward, multilayer ANN to study the diagnosis oftwo
of the most common types of incipient faults in a single phase squirrel-cage induction
motor: stator winding fault and bearing wear. Accuracies of97,3% were reported using
a network of 16 hidden nodes, trained from 35 training data patterns by the back-
propagation algorithm.
Tinghu et al. (1993), have used similar techniques to analyze and diagnose five types of
typical faults in rotating machinery (unbalance, seal rub, misalignment, rotor crack, oil
whirl) based on the standard frequency spectrum waveform features which are
represented by power ratios in nine different frequency intervals.
Barschdorff et al. (1993), have investigated the ability of neural networks to diagnose
tool wear in cutting processes like tuming or grinding. They have used cutting force
components and vibrations of the workpiece holder as suitable indicators of tool wear. A
typical feedforward network is compared to a developed Condensed Nearest Neighbor
(CNN) network (Barschdorff and Bothe, 1991). Results showed some benefits of the
CNN against back-propagation networks. They also indicated that process parameters
can be used as inputs to increase the variety of cutting conditions under which the system
operates efficiently.

5.6.3 Power systems diagnosis

Fault location in power systems is defined as the identification of a fauIt or double fauIt
from the system components such as trasmission lines, buses, transfonners and circuit
breakers in substations through analyzing the on/off status of several relaying systems or
tripped order of circuit breakers. The difficulties in the estimation are derived from the
malfunction of relaying systems or circuit breakers themselves, that is they sometimes do
not operate when they should do, or they do the switching when they should not do.
Fault diagnosis using ANNs 405

Conventional methodologies which have been applied so far to the problem in power
systems, include:
• Logical expression (Wake and Sakaguchi, 1984).
• Expert systems (Matsumoto and Sakaguchi, 1983).
• Parameter estimation techniques (Stavrakakis and Dialynas, 1991).
Ogi et al. (1991), have used a modular neural network approach for power system and
equipment diagnosis. Despite its shortcomings the GDR was used as the learning rule in
this application as weH.
Plant and fault models. An example system with six buses, two transformers and two
transmission lines with their protective relaying systems is used as shown in Fig. 5.17. A
fault location related to buses, transmission lines and transformers must be estimated
from the on/off status of relaying systems or circuit breakers in addition to the hypothesis
of malfunctions in the relaying systems or circuit breakers themselves. The following is a
list ofthe components ofthe sampie power system:
• Relays (A. m• B. m C. m• Tl.' T2•• LI •• LI.) 26
• Circuit breakers (CB .• ) 11
• Fault components given 10
Bus (Ab A 2, B b B 2, Cb C2) 6
Transmission line (L b L2) 2
Transformer (Tl> T2) 2
The names of relays include a suffix which has the following meanings:
m: Main protective
p: Primary backup protective
s: Secondary backup protective
t: Third backup protective (which covers opposite direction of s)
A 3-layer feedforward network was used for fault location. Its input layer received the
on/off status of relays and circuit breakers and its output layer indicates which
components have failed. This indication is shown by the largest element of the output
vector.
Training. To train the network, a back-propagation learning algorithm with epoch
training was used, that is weight updates were performed after presentation of the entire
data set. The convergence criterion used was the usual absolute maximum error between
desired and ~iual output. Training patterns which satisfied the criterion were
progressively excluded from the weight update sequence. The training patterns consisted
of 41 input/output pairs. The first 10 of them were concerned with normal operation, 22
with a circuit breaker malfunction and 9 with a relaying system malfunction (Table 5.4)
Results. To test the efficiency of the network, its response to non-trained patterns was
examined. Two experiments were conducted: in the first, two fault components with all
relays and circuit breakers operating normally was simulated while in the second a single
406 Real-time fault monitoring ofindustrial processes

fault with two circuit breakers malfunctioning was applied. The results showed that
ANNs with more than 50 hidden units were able to classify the non-trained patterns
correctly.
Table 5.4 Input patterns

Alm CBI CBz lAll


A 2m CBI CB3 IA21
B lm CB6 CB4 Cs., IBII
Bzm CB6 CBS CBS IBzI
Clm CB9 CBII ICII
C'zm CBlO CBII IC'z1
TIm CBz CB4 ITII
T2m C~ CBS IT21
LIBm Cs., LICm CB9 ILII
LzBm CBS LzCm CBlO ILzI
Alm CBI Tlt CBz CBz lAll
Alm CBz T2t C~ CBS lAll
Alm CBI T2t CB3 CBS IAz!
A 2m CB3 Tlt CBz CB4 IAz!
Blm CB4 CB6 LICs C~ IBII
B lm CB4 Cs., LzCs CBlO T2s CB3 CBS IBII
Blm CB6 Cs., Tis CBz CB4 IBII
Bzm CB6 CBS LzCs CBlO IBzI
Bzm CB6 CBg LICs C~ Tis CBz CB4 IBzI
Blm CB6 CBS T2s C~ CBS IBzI
Clm CB6 LzBs CBS ICII
Clm CBII LIBs Cs., ICII
C'zm CBlO LIBs Cs., C'zl
Czm CBII ~Bs CBs ICzI
TIm CBzTlp CB4 ITII
TIm CB4Tlp CBz ITII
T2m CB3T2p CBS I Tz!
TIm CBS
T2p CB3 IT21
TIm Tlp
CBz CB4 ITII
LIBm Cs., LICm LI~ C~ /LII
LICm C~ LIBm Cs., /LII
LzBm C~ LzCm
CBlO LzBm
t~ CBlO
CBS
ILzI
LzCm Lzi!;, ILzI
Tlt CBz CB4 T2t C~ CB6 lAI A 21
Tis CBz CB4 T2s CB3 CBS LzCs CBlO LICs CBlO IBI Hzi
LIBs Cs., LzBs CBS ICI C'zl
Tlp CBz CB4 ITII
T2/!.CB3 CBS IT21
LI~ Cs., LICm C~ /LII
LI m Cs., L I2 CBS ILII
CBS Lzm CBlO
Lz~
Lzm CBS LzCp CBlO
ILzI
ILzI
Fault diagnosis using ANNs 407

A2 --;---..,.---'-1 Al

TZ.
T2p
T2,
T2I

81

C2 --"'I"'-....L..--.-I Cl
I I I I
~t-_J I
--~-~

Flgure 5.17 Example power system.

5.6.4 Neural Jour-parameter controller

The four parameter controller, (Nett, 1988), illustrated in Figs. 5.18 and 5.19, is a
generalization ofthe familiar two-parameter controller (Antsaklis, 1992). This controller
has two vector inputs and two vector outputs, resulting in a controller with four matrix
parameters. Its various elements are: r is the reference input, a the diagnostic controller
output designed to reproduce the failures, Ye the controller output that is manipulated by
the plant and should be considered the ideal actuator input, ue the manipulated controller
input, n a is an exogenous input accounting for unmodeled sensor signals, u is the actual
actuator output, Y is the plant output fed into the sensor, z is the plant variables not used
by the controller and w is the unmanipulated plant input.
The linear controller can be described by the following relation,

[Ya] [KK KK r]
c -
l1
21
12 ][
22 UC

The objective of the additional controller output a is to identify and reproduce sensor
and actuator failuresJs andfa' Thus, the overall control objectives are:
• Achieve set point tracking
408 Real-time fault monitoring ofindustrial processes

• Reject unmeasured disturbance w at the plant output z.


• Reject both actuator and sensor noises (71a , 71s) and failures ifa,fs) at z.
• Achieve above objectives with limited u and in the face of plant modeling errors.

COMMANDS
-----+-------r------~
r-____~)_----~A~CTU~ATOR
COMMANDS

S_T_IC;.,.S-+______~
DIAGNO.... }oo-------I
MEASUREMENTS

Figure 5.18 Four-point controller.

r z

controller

a
w

Figure 5.19 General controller and plant.

These requirements lead to certain conflicts (Nett et al., 1988): reproducing sensor or
actuator failures at the diagnostic output contradicts the requirement for noise and
disturbance rejection. Also, sensor diagnostic performance has to be traded against
actuator diagnostic performance.
Relying on the fact that a nonlinear controller should outperform its linear version,
Konstantopoulos and Antsaklis (1993) implemented a four-parameter controller in a
neural network. The general structure of a system designed for actuator failures is
shown in Fig. 5.20.
The neural controller has two main inputs: a reference signal r and the output of the plant
Yp' Experience has shown that delayed system signals enhances training performance.
For this reason, delayed reference inputs, plant outputs and controller outputs were input
to the neural network. It also has two outputs: a diagnostic output aac!, and y c' which
Fault diagnosis using ANNs 409

can be considered the ideal actuator input. The controller was trained with the following
objectives:
• To achieve set point tracking and isolate and reproduce actuator faults.

diagnostic a
output act

reference
input r Yp
PLANT
Yc u plant
output
r ac~uator
nOlse
failure

fact' nact

Figure 5.20 Neural network CDC controller for actuator failures.

The nonlinear plant model used is described by,

Y (k + 1) = Yp(k)Yp(k -l)(Yp(k) + 2.5) + u(k)


p 1+ y;(k) + y;(k -1)
A neural model of the above process was used to train the neural controller as described
in the sequel.
Network architecture. The usual multilayer, feedforward neural network was
employed. Experimentation revealed that a good performer was a network with two
hidden layers consisting of 20 and 5 nodes respectively. The network was trained with
the back-propagation GDR rule having momentum ofO.9 and learning rate ofO.Ol.
Training. The reference input used was a random signal in [-2, 2]. Failures were
introduced as a sinusoidal function at specific time instants. The number of delayed
signals was experimentally found to be 3. Training was done in two stages. First the
network was trained as a conventional controller. The resulting weights were used as
initial weights for the second stage where the network was trained to meet the diagnostic
requirements as weil.
Results. The overall structure met the design requirements satisfactorily. In Fig. 5. 21
some sampie results are given. In this example, the real plant was used and a ramp
failure occured at the actuator.
Design trade-off was accomplished by assigning different weights to the diagnostic and
controloutputs. In this way, by assigning a larger weight to the control objective, better
410 Real-time fault monitoring of industrial processes

reference tracking was achieved, whereas better reproduction of actuator failures was
obtained by assigning larger weight to the diagnostic output.
1~--~~~----------------~-,

-0.

50 100 150 200 250 300 350 400 450 500

Figure 5.21 Real plant during test.


Actuator failure (solid) and diagnostic output (dotted line).
3 delays, 200,000 iterations, actmean=0.1059, ramp failure.

Additional references. Naidu et al. (1990) have developed a neural network sensor
failure detection system along the lines suggested by Nett's work on the four-parameter
controller just discussed. The back-propagation topology was used and compared to
Finite Integral Squared Error (FISE) diagnostics as weIl as to the nearest neighbor
cIassifier for an Interna! Mode! Contro! (IMC)-controlled system involving an uncertain
linear, time-invariant, first-order plant and linear or nonlinear plants that lie within the
model uncetainty bounds. Detailed studies produced promising resuIts.

5.6.5 Application 0/ neural networks in nuclear power plants monitoring

Historically, utilities and other operators of nuclear plants have relied on human op-
erators to monitor the plants and to diagnose any problems which occur. With the
notable exceptions ofThree Mile Island in the United States and Chernobyl in the former
Soviet Union, this approach appears to have worked reasonably weil. However, there is
clear evidence linking these accidents and a number of troublesome operational incidents
("near accidents") over the years to "operator error". One possible solution is to
automate the plants and "take the operator out of the loop". For a variety of reasons
(legal, regulatory, and other), such a solution is not practical at the present time. The
alternative approach of "backstopping" the operators by providing them with the resuIts
of automated surveillance (including diagnostics) of the overall plant, is considered here.
Fault diagnosis using ANNs 411

The large number of process parameters and system interactions pose difficulties for the
operators, particularly during abnormal operation or emergencies. During such
situations, individuals can be affected by stress or emotion which may influence their
performance in varying degrees. Taking some ofthe uncertainty out oftheir decisions, by
providing real-time diagnostics and assistance, has the potential of increasing plant
availability, reliability and safety by avoiding errors that lead to trips or that endanger the
safety of the plant. The emerging technology of ANNs offers a method of implementing
real-time monitoring and diagnostics in a nuclear power plant. The various advanced
technologies, generally regarded as being within the scope of artificial intelligence,
especially neural networks, are believed to be appropriate for these tasks. The overall
objective is to provide the operator with necessary information about the power plant in
a way that would be useful, timely and non-intrusive. Special emphasis is given to the
early detection of abnormalities and deviations from normal operation, with the intent
that the operator could take corrective or compensating action, if appropriate.
Generally, the developed technology involved three specific tasks that were undertaken
using a variety of methods .. These where,
1. Diagnostics based on pattern recognition in time-records and related
representations ofvariables, (e.g. spectral densities).
2. Feature detection based on recognition ofpatterns in data, and
3. Modeling of phenomena and systems with interpretation of input-output relation-
ships.
Many projects involved both pattern recognition and modeling. In most cases,
comparison of predicted results (based on models developed from data taken when the
system was working properly) or patterns (learned by neural network models from data
presented to it ) with actual results or patterns is involved. Often, data had to be
preprocessed to put it into an acceptable form (e.g., a fast Fourier transformation ofthe
time-series to produce a spectral plot of the data) before it can be introduced into a
neural network. Onee a neural network has been trained to reeognize the various
conditions or states of a complex system, it takes only one recall cycle of the neural
network, typically a few milliseconds, to detect or identifY a specific condition or state. If
the neural network is implemented in hardware, the detection or identification is almost
instantaneous. Typically, the measured variables from the nuclear power plant systems
are analog signals that must be sampled, digitized, preprocessed, and normalized to
expected peak values before they are introduced into neural networks. The neural
networks are usually simulated on modern high-speed digital computers that carry out
the calculations serially. However, it is possible to implement neural networks using
specially designed microchips where the network calculations are truly carried out in
parallel, thereby providing virtually instantaneous outputs (microsecond response times)
for each set of inputs.
412 Real-time fault monitoring of industrial processes

Monitoring of nuclear power plant sensors and systems.


Sensor validation. A neural network-based process for automated sensor validation
during both steady-state and transient operations has been developed by Upadhyaya and
Eryurek (1992). A neural network model of the process (or a portion of the process) can
be developed using experimental data. The input layer of artificial processing elements
(PEs) has several input signals, and the output layer usually has just one signal which is
to be predicted. The neural network is trained over the range of anticipated operating
conditions while the system is known to be operating properly. In operation, the neural
network model repeatedly predicts the output value, based on the actual inputs. As a
sensor deteriorates or fails, the measured value (wbich is erroneous) deviates from the
predicted value (based on the input values and the neural network model). A comparison
of the predicted and actual values will indicate the validity of the sensor reading. Tests
with data from the Experimental Breeder Reactor - 11 (EBR-II) and a commercial four-
loop PWR nuclear power plant, have shown that this technique can be used to validate
data from sensors placed in nuclear power plants. Networks for the validation of
readings of hot leg temperature, reactor power and control rod position for a four-loop
PWR have demonstrated the usefulness of tbis approach.

Plant-wide monitoring using neural networks. The technique for sensor validation has
been expanded into a plant-wide monitoring system by Upadhyaya and Eryurek, (1992),
using an autoassociative neural network where the inputs and the outputs are the same
variables. The number of artificial neurons in the intermediate layer (usually about two or
three times the number of input nodes) was selected to minimize the training time and
maximize the ability of the neural network to generalize. The neural network was trained
over the range of operation· using the same data for the input vector and the desired
output vector. Backpropagation using a sigmoidal function with an adjustable coefficient
was used to train the network when the system was operating properly. Und er these
conditions, the neural network outputs represent estimates of the instantaneous values of
the output variables, and all of these estimates are virtually identical to the actual
outputs. When a sensor begins to drift or a failure is introduced into a data channel, the
actual value (neural network input) changes, but the corresponding predicted value
(neural network output) remains virtually unchanged. Hence, monitoring the differences
between the estimates predicted by the neural network (outputs) and the actual values
from the system (inputs) provides a method ofidentifying drift or instrumentation system
(or sensor) failure. An alternative interpretation of these differences might be that the
input-output relationship of the system from which the signals come, may have changed
due to system failure or changes of some sort in the system. Tbis technique was applied
to data from eighteen signals from the Experimental Breeder Reactor - 11 (EBR - 11)
during increase in power from 45% to 100%. Errors between estimates predicted by the
trained neural network and the actual values during normal operation were usually less
than 0.5%. When one of the sensors failed or an error was introduced into one of the
sensor outputs (the inputs to the neural network), the corresponding output ofthe neural
Fault diagnosis using ANNs 413

network changed only slightly. Hence, a difference between the network output (the
predicted value of the variable) and the actual variable identified that sensor or
instrumentation channel as the one with a problem.
Monitoring of check valves for operability. A1though there are many possible failure
mechanisms for check valves, the most common problems associated with check valve
failures are due to system flow oscillations or system piping vibrations which induce
check valve component wear and thus component failure. A technique involving the use
of a neural network for the analysis of acoustical data from check valves to evaluate their
status has been reported by Ikonomopoulos, Tsoukalas and Uhrig (1992). The power
spectral density (PSD) of the sampled time-series at a point on the check valve body
near the hinge pin is used as the input to the neural network, and the PSD of the sampled
time-series at another point on the check valve body near the backstop is the desired
output of the neural network. The network is trained while the flow varies over the
normal range of operation when the valve is known to be operating properly. The neural
network is then used in a monitoring mode to predict the output sensor PSD ±rom the
input PSD and a comparison is made between the predicted and actual output PSDs.
Deviations indicate that the interrelationship between the input and output signals has
changed due to a change (failure) of the valve. Analysis of time-records from two
piezoelectric accelerometers attached to the body of acheck valve on a large Boiling
Water Reactor Nuclear Power Plant has been used to demonstrate this process.
Comparison of spectra between identical 30-inch check valves (one broken and one
normal), operating under identical conditions clearly demonstrated that this technique
can identify the failed valve. The index of normal system behavior is the mean square
difference, obtained by summing the square of the difference in individual spectral values
between the predicted and actual spectra of the failed valve, divided by the number of
spectral values. Records for three 6-inch check valves (one normal and two that failed
for different reasons) operating und er identical conditions indicated that failures with
different degrees of severity give different values of mean square differences.

Loose parts monitoring. The detection of loose parts in the primary or secondary
system of a nuclear power plant is based on the identification of sounds produced by
tumbling parts hitting a pipe wall, the tube sheet of a steam generator, or other surfaces
that are part of the coolant system boundary. The spectrum of the sound is dependent
upon the size and shape of the loose part and the materials of construction of both the
part and the system boundary. Once the sound spectrum for parts of different sizes and
shapes (those most likely to break loose, e.g., the hinge pin of an upstream check valve)
has been measured, a pattern matching technique can then be used to identify the part or
parts. To date, neural networks have not been used for this purpose, but they have been
used in Germany to negate false alarms of a loose parts monitor. In one of the German
plants, there is a metal to metal contact (not caused by a loose part) that produces a
sound that is detected by the loose parts monitor. To overcome this problem, a neural
414 Real-time fault monitoring ofindustria1 processes

network was trained to identify the unique sound of the metal contact. Then when this
sound occurs, the neural network identifies it and enables the loose parts monitor alarm.

Monitoring of performance and efticieney. Of the hundreds of values of quantities


used to calculate indices of performance (efficiency, heat rate, specific fueVenergy
consumption etc.) of power plants, many remain constant because they are related to the
design of the system. With neural networks, it is possible to model the dynamic portion
of a process using only those variables that change during operation. Once the neural
network model has been developed and trained, it is then possible to determine the
sensitivity of each output (e.g., efficiency or an appropriate performance index) with
respect to changes in each input. The larger sensitivity coefficients indicate the input
variables that mostly influence efficiency and performance. Hence, efforts to improve
performance should concentrate on these quantities. This technique was applied to
TVA's Sequoyah Unit #1 Nuclear Power Plant by Guo and Uhrig (1992). Of
approximately 130 quantities used to calculate the heat rate (inversely related to
efficiency), only 26 actually changed with time. Records ofthese variables for 45 weekly
measurements were used as inputs to a neural network to develop a model of the
thermodynamic processes of the plant. The training of the network involved a technique
developed by Pao, (1989), in which a Kohonen neural network was used to cluster the
data into 22 clusters with an Euclidean distance of about 1.0. Then the centroids of these
clusters were used as inputs to a neural network. Backpropragation was used for training
until the overall system error was about 0.01. When the original data (not the centroids
of the clusters used in training) were introduced into this model for recall, the results
were usually within 0.1 % of the measured values.
Optimization of efticieney using sensitivity analysis. The neural network model for
the thermodynamic behavior of the Sequoyah Nuclear Power Plant described above was
then used for a sensitivity analysis in which the partial derivative of each output variable
with respect to each individual input variable was evaluated. The larger the partial
derivative, the more influence the particular input variable has on the output (efficiency
or heat rate) and the sign (+ or -) indicates the direction of the influence. The ratio of the
largest sensitivity coefficient to the smallest was greater than 10 for the Sequoyah
Nuclear Power Plant heat rate output. The sensitivity of the heat rate with respect to
different variables can suggest which efforts that improve efficiency, are likely to be most
effective (i.e., the utility should devote its efforts to improving the quantities that have
the largest sensitivity coefficient).

On-line thermal margin (DNBR) estimation. On-line thermal margin monitoring in


nuclear power plants using neural networks has recently been described by Kim, Lee and
Chang, (1992), where a neural network model was introduced to predict the DNBR
(departure from nuc/ear boiling ratio) for a given set of operating conditions in a
Korean nuclear power plant. Since DNBR is used to estimate the core thermal margin
Fault diagnosis using ANNs 415

(i.e., the difference between the predicted DNBR and the limiting DNBR), it is a
parameter that is critically important for safety in nuclear power plants. The approach
used was to train a neural network to map the plant variables being monitored to the
DNBR as calculated by the computer code COBRA. The neural network was trained
over the range of input variables that was expected to occur during the fuel cycle. A fully
connected three-Iayer feedforward neural network was used for estimating DNBR
performance of the core. The output layer had a single processing element (PE)
representing the DNBR for the given plant operating parameters, which were the input
variables. A statistical sensitivity analysis relating the DNBR to the various parameters
indicated that the major parameters affecting the DNBR of a PWR during plant
operation were the core inlet temperature, the core power (ofheat flux), the enthalpy rise
peaking factor, the core inlet flow rate and the system pressure; hence, the input layer
had five PEs. The DNBR obtained from the neural network using data not used in the
training process showed that under steady state conditions, the resuIts agreed with those
obtained from COBRA caJculations within ±2.5% in virtually a11 cases.

Diagnosis of nuclear power plant transients. When a nuclear power plant is operating
normally, the readings of the instruments in a typical control room form a pattern (or
unique set) of readings that represents anormal state of the plant or system. When a
disturbance occurs, the instrument readings undergo a transition to a different pattern,
representing a different state that may be normal or abnormal depending upon the nature
of the disturbance. The fact that the pattern of instrument readings undergoes a transition
to a new state that is different, is sufficient to provide a basis for identifying the transient
or the change of the state of the system. In implementing such a transient diagnosis
system in a nuclear power plant a large number (perhaps 20 to 200) of output variables
from the plant are sampled simultaneously, normalized to expected maximum values,
preprocessed if necessary, and transmitted to the input layer of a neural network. The
unique pattern among these 20 to 200 variables represents the condition of the plant at
that particular instant. When the system is operating at a steady state or changing slowly,
the pattern of variables at each sampling instant remains the same or changes slightly,
and the output of the neural network remains the same. However, at a time At after a
transient begins, the sampled values form a different pattern (i.e., the relationship
between the variables has changed and continues to change as the transient progresses).
When the sampled values are fed to a trained neural network, it gives an indication of the
state of the system. Successive sets of sampled inputs will indicate the same transient is
under way if the pattern is adequately developed. Indeed, there is a whole group of pat-
terns associated with each unique transient that must be included in the training set.
Neural networks trained on simulator transients. Work by Bartlett et al., (1992), has
demonstrated the validity of the concepts discussed above. Data from the training
simulator at TVA's Watts Bar Nuclear Power Plant provided data for some 22 to 27
variables for seven different accident transients (loss of coolant in the hot leg of the
reactor coolant system (RCS), loss of coolant in the cold leg ofthe RCS, main steam-line
416 Real-time fault monitoring ofindustrial processes

break in containment, main feedwater line break in containment, total loss of off-site
power, control rod ejection, and steam generator tube leak). Simultaneously sampled
values of these time records at equally spaced time intervals, constituted the input
vectors to the neural network for training. An "auto-adaptive" stochastic learning
technique was developed by Bartlett and Uhrig, (1992), and applied to a special neural
network with a dynamic node structure (e.g., the number of nodes in each of the three
bidden 'layers used was optimized). A new method for the stochastic optimization of
these inter connections using a Monte Carlo training procedure was developed to train
tbis network to identify these seven different nuclear power plant transients. This general
approach has been continued by Kim, Aljundi and Bartlett, (1992), using data from the
San Onofre Nuclear Power Training Simulator.
Guo and Uhrig, (1992), simplified the diagnostic neural network by using a modified
backpropagation technique to train a neural network with extensive lateral inhibition in
the middle layer. Twenty two inputs and seven outputs (one for each transient used in
the training) were used. The middle layer had sixteen PEs arranged in a four by four
array with negatively weighted connections in both directions between adjacent
(including diagonal) PEs and self-feedback on each node with positive weights. In all
cases, tbis neural network was able to detect the transient before the plant tripped, even
in the presence of 2% noise. For fast transients, the diagnosis was almost instantaneous.
This work was extended through the use of a sensitivity analysis to determine the most
important input variables for each transient. Then individual modular neural networks
with the five or six most important input variables and a single output were used to
detect each transient. These modular networks were much simpler, did not require lateral
feedback in the middle layer, and gave equally good, if not better, results. The problem
was that it was necessary to develop the complex neural network with lateral feedback in
order to utilize sensitivity analysis to identify the most important variables for each
transient. To overcome this problem, a genetic algorithm optimization was performed to
select the most important variables for each transient.

Using neural networks to identify abnormal events. Neural network techniques have
been applied by Ohga and Seki, (1991), to identify an abnormal event that caused a trip
in a BWR in Japan. A primary feature of the system was that the result of the neural
network analysis was conflrmed using the knowledge base on plant status when each
event occurs. The neural network recognized the change patterns of the state variables
and output the event code corresponding to the abnormal event. The neural network had
three layers with 40, 4, and 3 nodes in the three layers. Five kinds of state variables were
used in the neural network. For each state variable, eight data values were acquired
before, at, and after a plant trip. Sampling times were different before and after the trip.
Data were normalized and sent to the neural network. The event identification method
was tested using a workstation. The test data were prepared, based on the simulated
results of a transient analysis program. Data were prepared for different plant
Fault diagnosis using ANNs 417

configurations (changing fuel burn-up, beginning or end of fuel cycle, and abnormal
progression speed of any variable, etc.). Test results showed that the neural network
could identify a trained event even when the plant conditions were different from those
used during training and when the data acquisition system contained noise.

Prediction of plant parameters using artificial neural network model. Artificial


neural networks were applied to the prediction of nuclear reactor parameters in load-
following operations by Roh, Cheon, Kim and Chang (1992). The system used consisted
of four parameters processing banks of neural networks. Each bank contained two types
of neural networks, a general multi-layer network and a hybrid functionallink network,
that attempted to learn or to infer signal behaviors. The overall prediction results agreed
weil with the actual plant data, thereby providing the plant operators with accurate
indications of system state variables. Such information is necessary in control problems
and for validating instrumentation outputs and monitoring system performance.
Data from the Korean Kori-4 Nuclear Power Plant were used for training the neural
networks. The five input parameters were axial offset, control rod position, critical boron
rates, burn - up rates and a code-calculated trend search parameter. Each parameter
corresponded to the desired output for the only two load change modes (100% to 80%
and 70% to 100%). A total of 48 load change patterns were used for learning by the
neural networks. The results obtained from the neural networks agreed with calculations
within 3.4% and 4.4% for the hybrid functional-link network and the general multilayer
network respectively.

Connectionist expert system. An expert system that has a neural network in its
knowledge bases is called a connectionist expert system. A backpropagation neural
network model was applied to a connectionist expert system for the identification of
transients in a nuclear power plant by Cheon et al. (1992). Connectionist expert systems
that incorporated neural networks into the diagnostic process yield great benefits in
terms of speed, robustness, and knowledge acquisition and demonstrate the feasibility of
connectionist expert system's applications to the identification of transients in nuclear
power plants. When a transient disturbance occurs, the sensor outputs or instrument
readings undergo a transient from the existing pattern to a different pattern that
represents a different state of the plant. The transient identification is approached from a
pattern-matching perspective in that an input pattern is constructed from symptoms, and
that symptom pattern is matched to an appropriate output pattern that corresponds to the
transient that occurred.
The connectionist expert system has significant advantages over the traditional rule-
based expert system. Results showed that once the network had been fully trained with
various patterns, it could identify the transient easily, even with incomplete or distorted
patterns. Furthermore, multiple transients were identified.
418 Real-time fault monitoring of industrial processes

The connectionist expert system approach is most appropriate for classification problems
in environments where data are abundant and noisy, and where humans tend to generate
brittle and perhaps contradictory IF-THEN rules. Since connectionist expert systems are
very fast, they are weIl suited for real-time applications.

Severe accident management. Severe accident management system on-fine network


(SAMSON) is a computational tool used in the event of a nuclear power plant accident.
Doremus, (1992), claims that SAMSON examines over 150 status points monitored by
plant process computers after a severe accident has begun and makes predictions about
when core damage, support plate failure, reactor vessel failure, and containment failure
will occur. These predictions are based on the current state ofthe plant assuming that all
safety equipment not already operating will fail. The status points analyzed include
radiation levels, flow rates, pressure levels, temperatures and water levels.
SAMSON uses neural networks trained using backpropagation learning to make
predictions. Previous training on data from an accident analysis code has allowed
SAMSON to associate different states in the plant with different times to critical failures.
The accidents currently recognized by SAMSON include steam generator tube ruptures
and loss of coolant accidents, with breaks ranging from less than 0.001 square feet in size
to breaks greater than 3 square feet. SAMSON contains several neural networks for each
accident type and size, and chooses the correct network after the accident classification
by an expert system. SAMSON also provides information concerning recovery strategies
and the status of plant sensors.

Hybrid (neural-fuzzy) system for transient identification. A unique hybrid system has
been developed by Tsoukalas, Ikonomopoulos, and Uhrig, (1991), for the identification
of transients in complex systems. It couples a rule-based expert system using fuzzy logic
to a pretrained artificial neural network and uses a model-reference approach to help in
the identification of noisy data. The expert system performs the basic interpretation and
processing of the model data. A set of pretrained artificial neural networks provides the
models of the transients. Membership functions (from fuzzy logic) that condense
information about a transient into a form convenient for a rule-based identification
system and characterize the transients are the outputs of the neural networks. After
training the system is capable ofperforming faster than real-time. To demonstrate the use
of tbis system, two classical transients, (a) a rupture of the main steam line and (b) a
rupture of a main feedwater line, were simulated on a computer. Three parameters,
pressurizer pressure, hot leg temperature and steam level indication were chosen for
differentiating these transients. Time series of these three variables during the transient
were the inputs to the neural network. The output is a membership function of a fuzzy
system. Tests showed that this system is capable of differentiating between the two
accidents, even when the three inputs are corrupted with up to 20% random noise.
Fault diagnosis using ANNs 419

Control rod wear recognition. Wear of cladding on power eontrol rods of nuclear
power plants eause rod clusters to be replaeed prematurely. To mitigate wear by
repositioning the eontrol rods, it is neeessary to identifY the loeation of eaeh wear sear
and to measure its depth during inspeetions. Boshers et al., (1992), have deseribed a
method that eombines algorithmie, rule-based and neural network methods to perform
the inspeetion (eurrently performed by human operators) to identifY the wear sears and
find eertain quantitative properties of the sear. The funetions of the neural network based
software are:
1. To identifY and extraet wear seetions.
2. To determine wear information including wear position, peak wear depth, and
eross-seetional are loss, and
3. To organize this information into a data base.
Initial feature extraetion and neural network proeessing for wear reeognition had been
implemented. The neural network grouped all types of wear (single peak, double peak,
ete.), into one class for the initial reeognition purpose. The prototype neural network
used eight features, initially extraeted from the time reeords, as inputs, and the proper
values for these eight quantities as the desired outputs. A threshold was established,
whieh if exeeeded, indieated that a speeifie fault feature has been deteeted. In preliminary
tests, output values have exeeeded 50% of the threshold value, but there were no
misclassifieations.

5. 7 The integration 0/ neural networks in real-time expert systems

Expert systems are espeeially useful in situations where the knowledge of an expert is ex-
plicitly aeeessibie. In other words, one must be eapable of translating the knowledge into
a model which consists of data and rules and which describe sufficiently the behavior of
the real world. However, if an expert cannot explain how (s)he solves a certain problem
(when intuition is used), then an expert system to model this knowledge is not of mueh
help. This is not the ease for neural networks: they are able to find by self-Ieaming the
solving path of the problem. However typical problems that exist when using neural
networks are:
(a) Finding the right arehiteeture and (b) the right pre-proeessing ofthe inputs.
Therefore it is reeommended before solving a eomplex problem using neural networks,
to look for appropriate literature on similar topies. In general though, neural nets ean be
quite sueeessful in eertain areas, and are especially, more flexible than other methods.
From this perspeetive one eould say that in eertain eases it would be appropriate to
partition the knowledge in "formalisable" (preeise) knowledge and in "unformalisable"
(vague) knowledge. The formalisable knowledge eould be modeled by an expert system,
420 Real-time fault monitoring of industrial processes

the unformalisable knowledge by a neural network. Aglobai projeet eycle is shown in


fig.5.22.
Problems which are likely to oeeur during the knowledge aequisition are:
1. Knowledge seems to be intuitive, but in reality is not. The reason is that the expert
has diffieulties in formalising knowledge;
2. There is no example data available for the intuitive part of the expert's knowledge.
The first problem is very diffieult to reeognise. A good preparation of the interviews is
usually very important (time is very often a bottleneek).
The seeond problem ean be very diffieult. If there are no (or not sufficient) examples
available, then there should be at least a possibility to "judge" the neural net outputs on
their eorreetness. If this is not the ease, the use of a neural net is certainly not the correct
solution.
Topology of integration. There are several possibilities for integrating neural networks
in expert systems:
a. Neural net funetions ealled by the inferenee engine ofthe expert system.
b. Neural net functions to pre-process (and post-process) the data in the inference
engine.

Definition of the Classes, Invesligation suitable for NN


Objeets, Rules ete

Praeticable Investigate

r
with neural nets?
L -_ _-..,._ _ _---' no other possibilities
yes
Integration
Implementation K - - - - - I Implementation
of the Neural Net
Expert System translation
in C-Code

Figure 5.22 Global development cycle for integrating neural nets in Expert Systems.
Fault diagnosis using ANNs 421

5.7.1 The AI COmpOllellts

A number of reasons for justifying the need for real-time expert systems are the
following:
• There exist too many inputs to be monitored effectively by humans.
• There exist too many complex relations between the inputs, which are essential to
make adecision.
• Decisions should be taken faster than a human could do.
• The system should run 24 hours a day, 7 days a week without loss of quality.
• Rard to find qualified personnel.
• Too many personnel staffneeded to run system effectively.
Some real-world examples are:
• Monitoring ofthe Rubble Space Telescope (NASA) (6000 sensors).
• Network monitoring (Bank of Canada).
• Monitoring car manufacturing.
• Nuc1ear Power Plant Risk simulator.
• Satellite monitoring and control (ESA).
• Process monitoring and control of chemical processes.
• Trafiic Control.
One ofthe tools that can be used for these kind of applications is RTworks. RTworks is a
software development tool for real-time monitoring and control applications. RTworks
makes use ofthe c1ient/server concept: it breaks its major tasks into three types ofproc-
esses: inference engine processes, data acquisition pro ces ses and human interface proc-
esses (graphical interface). With a traditional expert system shell, the inference engine,
data acquisition, and user interface would be all grouped together into one large process,
potentially tying up resources, such as memory and CPU, and making it difficult to react
quickly to critical events. In this case, one could distribute processes over several
computers (e.g. LAN).
The inference engine of a real-time expert system should be adapted to copy with real-
time applications. In addition to the inference strategies' forward chaining and backward
chaining, the inference engine should also offer time-driven rules. For instance a rule
could run periodically (every 10 seconds). Another relevant needed feature is the use of
temporal reasoning in rules (e.g. if during the last 5 minutes the power has been
decreased for 15 seconds, then ... ). The speed of the inference engine should be fast
enough to be used by the real-time application. Typically, the inference engine of
RTworks pro ces ses about 12.000 rules per second.
Wh at does real-time mean? There exist many definitions of real-time. It is commonly
assumed to mean "fast", in the sense that a system is considered real-time if it processes
data quickly. A better definition states that "the system responds to incoming data at a
rate faster than it arrives". An overview oftypical response times is shown in Table 5.5.
422 Real-time fault monitoring ofindustrial processes

It shows that real-time has not to be real fast to mean real-time; many traditional business
applications have "real-time" elements to them.

Table 5.5 Response times oftypical real-time problems.


Turning valves at a paint factory 200mS
Satellite Ground Station monitoring 1-10 Seconds
Network Monitoring Alarm System 1 Minute
Ship tracking I Hour
Movie Theater Advertising Daily
Tracking movement of continents Yearly

Integration of user-defined processes into real-time expert systems. The RTWorks


concept is one of an open system, whereby the integration of other languages (like C)
should be possible. User-defined prograrns in C can be directly linked to the inference
engine. In this case, a compiled neural net in C-code could be integrated as a new
function inside the inference engine.
Neural network software suitable for integration. The situations in which neural
networks are appropriate, is already discussed above. The technical integration that a
neural network tool should offer, will be discussed next.
A neural network software should be capable oftranslating the (trained) neural net in C-
code (or any other common language) for further use in other tools. One of the neural
network development tools that are available and offer this, is NeuralWorks Professional
II Plus from NeuralWare Corp. NeuralWorks has proven to be a good neural network
tool by his great flexibility and capabilities. NeuralWorks supports about 35 leam
network topologies and it is possible to implement its own learning functions.
An important feature ofNeuralWorks as already mentioned, is its ability to translate net-
works in C Code. This makes it possible to integrate neural nets in a wide variety of
tools. In this case, it can be easily linked to the inference engine of RTworks.
Technical integration of NeuralWorks in RTworks. A typical example would be the
following situation: a system which consists of a monitoring subsystem and a diagnosing
subsystem (fault diagnosis). The monitoring subsystem could be typically handled by an
expert system while the diagnosing subsystem by the use of a neural network (assuming
this cannot be comprehended in formal rules). Namely, one can often detect errors easily
with formalised rules. A fault diagnosis, especially where there exist many influencing
factors to the cause of the problem, is very difficult to describe in a couple of simple
rules. The use of neural networks could be quite useful as a fault diagnosis system.
A "real world" example could be the monitoring of manufacturing in a paper factory
using a real-time expert system. The supporting diagnosis in case of failure can be done
with a neural network. In this case the presence of an expert is less needed. Example data
on diagnosis cases could be gathered and could be further used to train the neural net.
Fault diagnosis using ANNs 423

The integration of RTworks (or any other relevant software) and N euralWorks can be
accomplished as follows: first the expert system performing the monitoring task is buHt.
Rules must be defined that will detect an eITor during monitoring.
Next a neural network is trained for the fault diagnosis task, which should be able to give
an analysis ofthe eITor (cause, impact etc.). To accomplish this, which inputs are needed
must be deterrnined first and second, relevant example data must be gathered. After
successful training one can integrate the net in RTworks (or any other relevant software)
by translating it in C-code. These next steps are as follows:
1. Translate the neural net in C-code.
2. Link the compiled C-code to the inference engine as a new user-defined function
(rtlinkie nn.c).
3. Build mIes that activate the diagnosing subsystem (function).
Some typical example mies are:
IF "deviated behavior" && Error found?
THEN error = TRUE; && yes
IF error && when error is detected
THEN cause = NEURALNETDIAG(input1, input 2... );
Info("_hci", Cause of the problem is:", cause); && send a message to the operator
In RTworks it would be furthermore possible to divide the incoming data in a monitoring
and a diagnosis datagroup. On startup only monitoring data will be received. In case
there is an eITor, only data relevant to analyse the problem will be received. In this way
one can reduce the data transfer.

References

Antsaklis P.J. (1992). Neural networks for the intelligent control ofhigh autonomy sys-
tems. Intelligent Systems Technical Report 92-9-1, Department of Electrical
Engineering, University ofNotre Dame.
Barschdorff D., Monostori L. and T. Kottenstede (1993). Wear estimation and state
classification of cutting tools in turning via artificial neural networks. Proceedings,
International Conjerence on Fault Diagnosis TOOLDIAG 93, Toulouse, France, April
5-7,669-677.
Baba N. (1989). A new approach for finding the global minimum of eITor function of
neural networks. Neural networks, 2.
Barschdorff D., Monostori L., A.F. Ndenge and G.W. Wöstenkühler (1991).
Multiprocessor systems for connectionist diagnosis of technical processes. Computers in
Industry, 17, 131-145.
424 Real-time fault monitoring ofindustrial processes

Bartlett E. and RE. Uhrig (1992). Nuclear Power Plant Status Diagnostics Using An
Artificial Neural Network. Nuclear Technology, 97.
Boshers J. A, Saylor C., Kamadolli S., Wood R and C. Isik. (1992). Control Rod Wear
Recognition Using Neural Nets. In DJ. Sobajic, (Editor), Proceedings of the 1992
Summer Workshop on "Neural Networks Computing for the Electric Power Industry,
Stanford, CA, August 17-19.
Carpenter G.A and S. Grossberg (1987). A massively parallel architecture for a self-
organizing neural pattern recognition machine. Computer Vision, Graphics and Image
Processing, 37,54-115.
Carpenter G.A and S. Grossberg (1987). ART2: Self-organization of stable category
recognition codes for analog input patterns. Applied Optics, 26, 3, 4919-4930.
Chang K. and W.G. Wee (1988). A Knowledge-based planning system for mechanical
assembly using robots. IEEE Expert, 18-30.
Cheon S.W., Kang G.S. and S.H. Chang (1992). Application of Neural Networks to
Connectionist Expert System for Identification of Transients in Nuclear Power Plants.
Proceedings, 2nd International Forum. Expert Systems and Computer Simulation in
Energy Engineering, Erlangen, Germany, March 17-20, pp 22-1-1 to 22-1-5.
Chen S., Cowan C.F.N. and P.M. Grant (1991). Orthogonal least squares learning al-
gorithm for radial basis function networks. IEEE Transactions on Neural Networks, 2,
2,302-309.
Cho S.B. and J.H. Kim (1993). Rapid back-propagation learning algorithms. Circuits,
Systems and Signal Processes, 12, 2.
Chow M.-y., Mangum P.M. and S.o. Yee (1991). A neural network approach to real-
time condition monitoring of induction motors. IEEE Transactions on Industrial
Electronics, 38, 6,448-453.
Cybenko G. (1989). Approximation by superpositions of a sigmoidal function.
Mathematics ofControl, Signals and Systems, 2, 303-314.
Davis R (1984). Diagnostic reasoning based on structure and behavior. Artifical
Intelligence, 24, 347-410.
Doremus R (1992). SAMSOM: Severe Accident Management System On-Line
Network. In DJ. Sobajic, (Editor), Proceedings of the 1992 Summer Workshop on
''Neural Network Computing for the Electric Power Industry, Stanford, CA, August 17-
19.
Feldman J.A and D.H. Ballard (1982). Connectionist models and their properties.
Cognitive Science, 6, 205-254.
Fault diagnosis using ANNs 425

Feng x., Zhang Y. and Q. Chen (1993). FauIt simulation of a variable thrust liquid
rocket engine based on neural networks. Proceedings, International Conference on
Fault Diagnosis TOOLDIAG '93, Toulouse, France, April 5-7, 787-791.
Gien D. et al. (1993). A neuro-fuzzy approach for real-time diagnosis on flexible manu-
facturing cells. Proceedings, Sixth International Conference on Neural networks and
other industrial and cognitive applications, Nimes, France, October 25-29.
Guo Y. and K.J Doodley (1992). Identification of change structure in statistical process
contro!. International Journal of Production Research, 30, 7, 1655-1669.
Guo Z. and RE. Uhrig RE. (1992). Using Modular Neural Networks to Monitor
Accident Conditions in Nuclear Power Plants. Proceedings of the SPIE Technical
Symposium on Intelligent Information Systems, Application of Artificial Neural
Networks III, Orlando, FL, April 20-24.
Guo Z. and R.E. Uhrig (1992). Use of Artificial Neural Networks to Analyze Nuclear
Power Plant Performance. Nuclear Technology, 99.
Himmelblau D.M. (1978). FauIt detection in chemical and petrochemical processes.
Elsevier Publishers, Amsterdam.
Hopfield J.J. (1982). Neural networks and physical systems with emergent computa-
tional abilities. Proceedings of the National Academy of Sciences (Biophysics), 79,
2554-2558.
Hoskins JC. and D.M. Himmelblau (1988). Artificial neural network models of knowl-
edge representation in chemical engineering. Comput. Chem. Eng., 12, 881-890.
Ikonomopoulos A, Tsoukalas L.H., Mullens JA and R.E. Uhrig (1992). Monitoring
nuclear reactor systems using neural network and fuzzy logic. Proceedings, 1992
Topical Meeting in Advances in Reactor Physics, March 8-11, Charleston, U. S.A
Ikonomopoulos A, Tsoukalas L. H. and RE. Uhrig (1992). Use ofNeural Networks to
Monitor Power Plant Components. Proceedings of the American Power Conference,
Chicago, IL., April 13-15, 1992
Kim K., Aljundi T.L. and E. Bartlett (1992). Confirmation of Artificial Neural Networks:
Nuclear Power Plant FauIt Diagnostics. Transactions of the American Nuclear Society,
66, Chicago, IL, November 15-20.
Kim HK, Lee S.H. and S.H. Chang (1992). Neural Network Model for On-Line
Thermal Margin Estimation of a Nuclear Power Plant. Proceedings of the Second
International Forum, Expert Systems and Computer Simulation in Energy Engineering,
Erlangen, Germany, March 17-20, pp 7-2-1 to 7-2-6.
Kohonen T. (1984). Self-organisation and associative memory. Springer-Verlag, Berlin.
426 Real-time fault monitoring ofindustrial processes

Konstantopoulos I.K. and P.J. Antsaldis (1993). The four parameter controller: A neural
network implementation. Proceedings, IEEE Meditteranean Symposium on New
Directions in Control Theory and Applications, Chania, Greece, June 21-23.
Kosko B. (1990). Unsupervised learning in noise. IEEE Transactions on Neural
Networks, 1, 1,44-57.
Kosmatopoulos E. B., Ioannou P. A and MA Christodoulou (1992). Identification of
Nonlinear Systems Using New Dynamic Neural Network Structures. Proceedings, 31st
IEEE Conference on Decision and Control, Tucson, Arizona, USA, December 16-18.
Kosmatopoulos E. B., Christodoulou M. A and P.A Ioannou (1993). Learning laws
that ensure exponential error convergence. Proceedings, 32nd IEEE Conference on
Decision and Control, San Antonio, Texas, USA, December 15-17.
Kusiak A and G. Finke (1988). Selection ofprocess plans in automated manufacturing
systems. IEEE Transactions of Robotics and Automation, 4, 4.
Lippmann R.P. (1987). An introduction to computing with neural nets. IEEE ASP
Magazine, 4, 4-22.
Lo A and B. Bavarian (1991). Scheduling with neural networks for flexible manufactur-
ing systems. Proceedings, IEEE International Conference on Robotics and Automation,
818-823.
Mirchandani G. and W. Cao (1989). On hidden nodes for neural nets. IEEE
Transactions on Circuits and Systems, 36, 661-664.
Moon y.B. (1990). Forming part-machine families for cellular manufacturing: A neural-
network approach. Internatioanl Journal of Advanced Manufacturing Technology, 5,
278-291.
Matsumoto K. and T. Sakaguchi (1983). Methods to determine the restoration plan of
power system by a knowledge based system. Transactions, lEE ofJapan, 103 B, 3.
de Mello L. S. H. and AC. Sanderson (1990). AND/OR Graph representation of as-
sembly plans. IEEE Transactions on Robotics and Automation, 6, 2, 188-199.
Miguel L.J., Baeyens E. and J.L. Coronado (1993). Application ofan ART-3 based neu-
ral network to fault diagnosis in dynamic systems. Proceedings, International
Conference on Fault Diagnosis TOOLDIAG '93, Toulouse, France, April 5-7, 713-717.
Minsky M. and Papert S. (1969). Perceptrons - An introduction to computational ge-
ometry. MIT Press, Cambridge, Mass.
Naidu R.S., Zafiriou E. and T.1. McAvoy (1990). Use of neural networks for sensor
failure detection in a control system. IEEE Control Systems Magazine, 10, 49-55.
Nett C.N., Jacobson CA and AT. Miller (1988). An integrated approach to controls
and diagnostics: the 4-parameter controller. Proceedings, 1988 American Control
Conference, 824-835.
Fault diagnosis using ANNs 427

Ogi H., Tanaka H. and Y. Akimoto (1991). Module neural network application for
power system/equipment diagnosis. Proceedings oj ESAP '91, April, 1991.
Ohga Y. and H. Seki (1991). Using a Neural Network for Abnormal Event Identification
in BWRs. Transactions oj the American Nuclear Society, 63, 110-111.
Pao Y. (1989). Adaptive pattern recognition and neural networks. Addison Wesley,
N.Y.
Passino K.M., Sartori MA and P.J. Antsaklis (1989). Neural computing for numeric-
to-symbolic conversion in control systems. IEEE Control Systems Magazine, 9, 44-52.
Peng Y. and JA Reggia (1989). A connectionist model for diagnostic problem solving.
IEEE Transactions on Systems, Man and Cybernetics, 19, 2, 285-298.
Polycarpou M. M. and PA Ioannou (1992). Neural Networks as On-Line
Approximators ofNonlinear Systems. Proceedings, 31st IEEE Conjerence on Decision
and Control, Tueson, Arizona, USA, December 16-18.
Rauch H.E., Kline-Schoder Rl, Adams lC. and H.M. Youssef (1993). Fault detection,
isolation and reconfiguration for aircraft using neural networks. Proceedings, AIAA
Conjerence on Guidance, Navigation and Control, August '93.
Rauch H.E. and D.B. Schaechter (1992). Neural networks for control, identification and
diagnosis. Proceedings, World Space Congress, Washington, D.C., August 28-
September 5.
Ray A.K. (1991). Equipment fault diagnosis - A neural network approach. Computers
in Industry, 16, 169-177.
Reed N.E. et al. (1988). Specialized Strategies: An alternative to first principles in diag-
nostic problem solving. AAAI, 364-368.
Roh M. S., Cheon S.w., Kim H.G. and S.H. Chang (1992). Prediction of Nuclear
Reactor Parameters using Artificial Neural Network Models. Proceedings 0/ the 2nd
International Forum on "Expert Systems and Computer Simulation in Energy
Engineering", Erlangen, Germany, March 17-20.
Rummelhart D.E. and lL. McClelland (1986). Parallel Distributed Processing -
Explorations in the microstructure of cognition, Volume 1: Foundations. MIT Press,
Cambridge, Mass.
Rummelhart D.E. and lL. McClelland (1986). Parallel Distributed Processing -
Explorations in the microstructure of cognition, Volume 2: Psychological and biological
models. MIT Press, Cambridge, Mass.
Schutte P. et al. (1987). An evaluation of a real-time fault diagnosis expert system for
aircraft applications. Proceedings. 26th IEEE Conjerence on Decision and Contro/.
428 Real-time fault monitoring of industrial processes

Sorsa T. and HN. Koivo (1991). Applications of artificial neural networks in process
fault diagnosis. Proceedings, IFAC Fault Detection, Supervision and Safety for
Technical Processes, Baden-Baden, Germany, September 10-13.
Sorsa T., Koivo H.N. and H. Koivisto (1991). Neural networks in process fault diagno-
sis. IEEE Transactions on Systems, Man and Cybernetics, 21, 4, 815-825
Sorsa T., Suontausta J. and H.N. Koivo (1993). Dynamic fault diagnosis using radial
basis function networks. Proceedings, International Conjerence on Fault Diagnosis
TOOWIAG '93, Toulouse, France, April 5-7, 160-169.
Stavrakakis G.S. and E.N. Dialynas (1991). Efficient computer-based scheme for im-
proving the reliability performance of power sub stations. International Journal oj
Systems Science, 22, 9, 1527-1539.
Suna R. and K. Berns (1993). Pipeline diagnosis using backpropagation networks.
Proceedings, oj Sixth International Conjernece on Neural networks and other indus-
trial and cognitive applications, Nimes, France, October 25-29.
Syed A., El-Maraghy H.A. and N. Chagneux. Application ofKohonen maps in real-time
monitoring and diagnosing of robotic assembly. Proceedings, International Conjerence
on Fault Diagnosis TOOWIAG '93, Toulouse, France, April 5-7, 780-786.
Tinghu Y., Binglin Z. and H Ren (1993). A neural network methodology for rotating
machinery fault diagnosis. Proceedings, International Conjerence on Fault Diagnosis
TOOWIAG '93, Toulouse, France, April 5-7, 170-178.
Tsoukalas L. H., Ikonomopoulos A. and R.E. Uhrig (1991). Hybrid Expert System-
Neural Network Methodology for Transient Identification. Proceedings ojthe American
Power Conjerence, Chicago, IL, April 29-May 1.
Wake T. and T. Sakaguchi (1984). Method to determine the fault components ofpower
system based on description of structure and function of relay system. Transactions,
lEE ojJapan, 101 B, 10.
Wasserman P.D. (1989). Neural Computing: Theory and Practice. Van Nostrand
Reinhold, N.Y.
Widrow B. and M.A. Lehr (1990). 30 years of adaptive neural networks: Perceptron,
Madaline and Back-propagation. Proceedings ojthe IEEE, 78, 9.
Upadhyaya B. R. and E. Eryurek (1992). Application of Neural Networks for Sensor
Validation and Plant Monitoring. Nuclear Technology, 97, p 170.
Vaidyanathan R. and V. Venkatasubramanian (1990). Process fault detection and diag-
nosis using neural networks: Dynamic processes. Proceedings, AIChE Annual National
Meeting, Chicago, U.S.A.
Venkatasubramanian V. and K. Chan (1989). A neural network methodology for proc-
ess fault diagnosis. AIChE Journal, 35, 1993-2002.
Fault diagnosis using ANNs 429

Watanabe K, Matsuura 1., Abe M., Kubota M. and D.M. Himmelblau (1989). Incipient
fault diagnosis of chemical processes via artificial neural networks. AIChE Journal, 35,
1803-1812.
Widrow B. and M.E. Hoff (1960). Adaptive switching circuits. 1960 IRE WESCON
Conv. Record, Part 4, August 1960,96-104.
Widrow B. and R.Winter (1988). Neural nets for adaptive filtering and adaptive pattern
recognition. IEEE Computer Magazine, March, 25-39.
Yamashina H, Kumamoto H, Okumura S. and T. Ikesaki (1990). Failure diagnosis of a
servovalve by neural networks with new learning a1gorithm and structure analysis.
International Journal of Production Research, 28, 6, 1009-1021.
Zeilingold D. and 1. Hoey (1990). Model for aspace shuttle safing and failure detection
expert system. Proceedings, 5th Conference on Artificial Intelligence for Space
Applications.
CHAPTER6

IN-TIME FAlLURE PROGNOSIS AND FATIGUE LIFE


PREDICTION OF STRUCTURES

6.1. Introduction

The in-time failure prognosis and safety assessment of today's high risk industrial
structures implies the accurate estimation of the residual lifetime of the structure in the
course of its service. Reduction of the operation cost, estimation of the structural aging,
structure life extension, prevention of catastrophic accidents, environmental protection,
are some of the aspects that have to be considered in the management of complex high
risk industrial systems as nuclear power plants, chemical plants, off-shore structures,
marine structures, gas (LNG, LPG) installations etc.
To achieve a realistic safety assessment, the capability of modeling correctly the
uncertainties, of updating the estimates on the basis of any new data available and of
using field expert knowledge and heuristics is required.
Thus, a collection of data information has to be achieved on material properties, defect
distribution (position and size), degradation mechanisms affecting the structure, record
and forecast ab out loads and environment, assumptions about the states which are
considered as dangerous for the component.
A whole series of inspection instruments and techniques have evolved over the years and
new methods are still being developed to assist in the process of assessing the integrity
and reliability of parts and assemblies. Non-destructive testing (NDT) evaluation
methods are widely used in industry for checking the quality of production, and also as
part of routine inspection and maintenance in service.
Because of the obvious importance of the subject, and the fact that most of the
inspection methods are based on well-established scientific principles, there is a great
number of publications suitable for use in the engineering practice (see periodicals as
"NDT International", "Materials evaluation", etc.). In the present chapter the concept of
the in-time failure prognosis and realistic safety assessment of, mainly, metallic structures
will be first defined and the recent NDT methods with some of their representative
In-time failure prognosis and fatigue life prediction of structures 431

applications will be presented. The concept of inspection will also be darified. Analytical
modeling and expert knowledge modeling approaches will be described for damage
mechanism analysis and in-time failure prediction in structures with the ability to use
fresh data and information continuously or periodically coming from the component or
the structure during operation for modifying and improving the prediction.
Application examples from the nudear, marine, mechanical and manufacturing sectors
will be presented to delve into the matter.

6.2 Recent non-destructive testing (ND1) and evaluation methods


with applications

6.2.1 Introduction

Any safety assessment of metallic structures can be thought of as a global procedure


relying on a number of steps or single procedures to be concatenated: non-destructive
testing (NDT) and evaluation (NDE), material characterization, analysis of loads, stress
analysis, fracture mechanics analysis, fatigue crack growth (FCG) analysis, failure
analysis.
Any system for in-time structural safety assessment and early failure prognosis must have
the ability to analyze and interpret large quantities of information and potentially allow a
time dependent updating of the interpretations in order to achieve the following goals in-
time:
• Identification of the actual state of the structure and of the damage process actually
taking place.
• Prevision of the future behavior of the structure.
• Decision and planning of appropriate actions.
From the above formal definition of the in-time safety assessment and failure prognosis
concept, NDT and evaluation (NDE) is obviously the first important procedure which
must be performed successfully in order to achieve the above goals.
Engineers are weil used to assessing the properties of a material by means of
standardized tests on prepared test pieces. Much valuable information is obtained from
these tests induding data on the tensile, compressive, shear and impact properties of the
material, but such tests are of a destructive nature. In addition, the material properties, as
a
determined in standard test to destruction, do not necessarily give a dear guide to the
performance characteristics of a complex-shaped component which forms part of some
larger engineering assembly.
Defects of many types and sizes may be introduced to a material or a component
during manufacture and the exact nature and size of any defects will influence the
432 Real-time fault monitoring of industrial processes

subsequent performance of the component. Other defects, such as fatigue cracks or


corrosion cracks, may be generated within a material during service. The origins of
defects in materials and components are shown in fig. 6.1 (from Hull and Vernon,
1988). It is therefore necessary to have reliable means for detecting the presence of
defects at the manufacturing stage and also for detecting and monitoring the rate of
growth of defects during the service life of a component or assembly.
Defects which may be introduced during the manufacture
of raw materials or the production of castings
I
Stress Shrinkage Gas
I I
Slag Segregation
cracking porosity porosity inclusions

Defects which may be introduced du ring the manufacture


of components

Machining Heat treatment Welding Residual


faults defects defects stress cracks

Defects which may be introduced during component assembly


I
Missing I ncorrectly Additional Additional
parts assembled welding stress
parts defects cracking

Defects generated du ring service Iife


I
I I
Creep Thermal
Fatigue Corrosion Stress Wear
corrosion instability

~Corrosion
fatigue
Figure 6. J Origins of some defccts found in materials and components.

Using well-established physical principles, a number of non-visual inspection systems


have been developed which will provide information on the quality of a material or
component and which do not alter or damage the components or assemblies which are
In-time failure prognosis and fatigue life prediction of structures 433

tested. The basic principles and major features ofthe main non-destructive testing (NDT)
systems are given in Table 6.1

Table 6.1

System Features Applicability


Liquid penetrant Detection of defects which Can be used for any metal,
break the surface many plastics, glass and
glazed ceramics

Magnetic partiele Detection of defects which Can only be used for


break the surface and sub- ferromagnetic materials
surface defects elose to the (most steels and irons)
surface

Electrical methods Detection of surface de-fects Can be used for any metal
(Eddy currents, and some sub-surface defects.
acoustic emission) Can also be used to measure
the thickness of a non-
conductive coating, such as
paint, on a metal

Ultrasonic testing Detection of internal de-fects Can be used for most


but can also detect surface materials without limi-
flaws tations on the maximum
material thickness
Radiography Detection of internal de-fects, Can be used for many
surface defects and the materials but theTe are
correctness of part as-semblies limitations on the maxi-
mum material thickness

The various non-destructive test methods can be used, in practice, in many different ways
and the range of equipment available is extensive. Compact and portable equipment is
available which can be used, both inside a test house or out on site, or the basic test
principle can be incorporated in some large inspection system dedicated to the
examination of large quantities of a single product or a small range of products or the
components of a structure.
Tbis applies to all the test methods described in tbis Chapter. When non-destructive
testing systems are used, care must be taken and the processes controlled so that not
only qualitative but quantitative information is received and that this information is both
accurate and useful. If non-destructive testing is mis-applied it can lead to serious errors
of judgment of component quality.
434 Real-time fault monitoring of industrial processes

It is necessary that the most dangerous possible failure modes of a component be


anticipated, and from this the types and limiting sizes of potentially dangerous defects
deduced. In the first instance this is the responsibility of the product or structure
designer, and thus it is he who should specity initially what defects are unacceptable and
give guidance on the appropriate method of inspection. Van Dijk and Boogaard,
(1992),describe an approach and format which allows effective selection of inspection
systems by defining general objectives regarding detection performance and reliability.
F or the successful application of non-destructive testing the test system and procedures
must be suited to the inspection objectives and the types of flaws to be detected, the
operator must have sufficient training and experience, and the acceptance standards must
be appropriate in defining any undesirable characteristics of a non-conforming part.
In conventional design, a "design stress" is established by dividing a specific value of
yield or proof stress by a suitable safety factor, and this "strength" is assumed to be
representative of the material used to make a component. The fracture mechanics
approach to design, however, recognises that flaws can exist in a component, before and
during life in service, and attempts to describe quantitatively the effects of such flaws on
component integrity. Fracture mechanics describes the capacity of critical structural
components to resist the onset of rapid crack growth. Components are characterized by a
material property called the critical stress intensity factor or jracture toughness and the
largest flaw that can be tolerated in any specific section of a component. In addition,
service environment is taken into account in this quantitative assessment.
The role of non-destructive inspection is to guarantee with a level of confidence that
cracks corresponding to a critical size for fracture, at the design load, are absent from a
component when the component is used in service. It might be necessary to guarantee,
with confidence, that cracks smaller than the critical size are absent also. It is important
to allow for sub-critical crack growth, especially in components subjected to fatigue
loading or to corrosive environments, so that such components can achieve a minimum
specified service life before catastrophic failure occurs. In some situations, periodic
service inspection or constant monitoring might be necessary to ensure that cracks do
not reach a critical size. The use of fracture mechanics concept in design places a
premium on the ability of the various non-destructive methods to detect small cracks.
The difference between the critical size and the smallest detectable size becomes the level
ofsafety (Lucia 1985, Dufresne et al. 1988).
The introduction of any inspection system incurs cost but very often the effective use of
suitable inspection techniques will give rise to very considerable financial savings.
While effective quality control inspection can result in financial savings and help to
prevent catastrophic failures in service, it is also true to say that the imposition of too
many or too sensitive inspection systems can be very wasteful in terms of both time and
money. Excessive inspection may not result in an increase in product performance or
reliability. Absolute perfection in a product is impossible to achieve and attempting to get
very dose to the ideal can prove to be very expensive.
In-time faHure prognosis and fatigue life prediction of structures 435

The main non-destructive testing systems are briefly described in the next section.

6.2.2 The main non-destructive testing methods

6.2.2.1 Liquid penetrant inspection

Liquid penetrant inspection is a technique which can be used to detect defects in a wide
range of components, provided that the defect breaks the surface of the material. The
principle of the technique is that a liquid is drawn by capillary attraction into the defect
and, after subsequent development, any surface-breaking defects may be rendered visible
to the human eye. In order to achieve good defect visibility, the penetrating liquid will
either be coloured with a bright and persistent dye or else contain a fluorescent
compound. In the former type the dye is generally red and the developed surface can be
viewed in natural or artificial light, but in the latter case the component must be viewed
under ultra-violet light if indications of defects are to be seen. There are five essential
steps in the penetrant inspection method. These are:
Surface preparation. All surfaces of a component must be thoroughly cleaned and
completely dried before it is subjected to inspection. It is important that any surfaces to
be examined for defects must be free from oil, water, grease or other contaminants if
successful indication of defects is to be achieved.
Application 0/ penetrant. After surface preparation, liquid penetrant is applied in a
suitable manner, so as to form a film of penetrant over the component surface. The liquid
film should remain on the surface for aperiod sufficient to allow for full penetration into
surface defects.
Removal 0/ excess penetrant. It is usually necessary to remove excess penetrant from
the surface ofthe component. Some penetrants can be washed offthe surface with water,
while others require the use of specific solvents. Uniform removal of excess penetrant is
necessary for effective inspection.
Development. The development stage is necessary to reveal clearly the presence of any
defect. The development is usually a very fine chalk powder. This may be applied dry,
but more commonly is applied by spraying the surface with chalk dust suspended in a
volatile carrier fluid. A thin uniform layer of chalk is deposited on the surface of the
component. Penetrant liquid present within defects will be slowly drawn by capillary
action into the pores of the chalk. There will be some spread of penetrant within the
developer and this will magnifY the apparent width of a defect. When a dye penetrant is
used the dye colour must be in sharp contrast to the uniform white of the chalk-covered
surface. The development stage may sometimes be omitted when a fluorescent penetrant
is used.
436 Real-time fault monitoring of industrial processes

Observation and inspection. After an optimum developing time has been allowed, the
component surface is inspected for indications of penetrant "bleed back" into the
developer. Dye-penetrant inspection is carried out in strong lighting conditions, while
fluorescent-penetrant inspection is performed in a suitable screened area using ultra-
violet light. The latter technique causes the penetrant to emit visible light, and defects are
brilliantly outlined.
The liquid penetrant process is comparatively simple as no electronic systems are
involved, and the equipment necessary is cheaper than that required for other non-
destructive testing systems. The establishment of procedures, and inspection standards
for specific product parts, is usually less difficult than for more sophisticated methods.
The technique can be employed for any material except porous materials, and, in certain
cases, its sensitivity is greater than that of magnetic particle inspection. Penetrant
inspection is suitable for components of virtually any size or shape and is used for both
the quality control inspection of semi-finished and finished production items and for
routine in-service inspection of components.
The system is used in the aerospace industries by both producers for the quality control
of production and by users during regular maintenance and safety checks. Typical
components which are checked by this system are turbine rotor discs and bl ades, aircraft
wheels, castings, forged components and welded assemblies. Many automotive parts,
particularly aluminum castings and forgings, including pistons and cylinder heads, are
subjected to this form of quality control inspection before assembly. Penetrant testing is
also used for the regular in-service examination of the bogie frames of railway
locomotives and rolling stock in the search for fatigue cracking.

6.2.2.2 Magnetic particle inspection

Magnetic particle inspection is a sensitive method of locating surface and some sub-
surface defects in ferro-magnetic components. The basic processing parameters depend
on relatively simple concepts. In essence, when a ferro-magnetic component is
magnetised, magnetic discontinuities that Iie in a direction approximately perpendicular
to the field direction will result in the fomlation of a strong "leakage field". This leakage
field is present at and above the surface of the magnetised component, and its presence
can be visibly detected by the utilization of finely divided magnetic particles. The
magnetic particles which are used for inspection may be made from any ferro-magnetic
material of low remanence and they are usually finely divided powders of either metal
oxides or metals. The particles are classified as dry or wet according to the manner in
which they are carried to a component. Dry particles are carried in air or gas suspension
while wet particles are carried in liquid suspension. The application of dry particles or
wet particles in a liquid carrier, over the surface of the component, results in a collection
of magnetic particles at a discontinuity. The "magnetic bridge" so formed indicates the
location, size, and shape ofthe discontinuity.
In-time failure pro gnosis and fatigue life prediction of structures 437

Magnetisation may be induced in a component by using permanent magnets, electro-


magnets or by passing high currents through or around the component. The latter
technique is widely used for production quality control application because high-intensity
magnetic field can be generated within components. Hence, good sensitivity in flaw
indication and detection is attained.
The effectiveness of defect indication will depend on the orientation of the flaw to the
induced magnetic field and will be greatest when the defect is perpendicular to the field,
as it is shown in fig. 6.2.

- - - - - - -- - --- - - - - - - --- --- - - - - - - - - - -

--- ------- -- ----- - - - - - - - - - -----


-- - - - - - - - - - - - - - - - - - - - - - - - - - - --
--------------------- - - - - -----

Figure 6. 2 Magnetic flaw detection.


Detectable surface leakage fields produced by defects A and B. Defect C Iikely to remain
undetected.

A component with a continuous hole through it can be magnetised by energising a


straight conducting cable passing through the hole. This inspection technique is often
used in the examination of parts such as pipe connectors, hollow cylinders, gear wheels
and large nuts.
The principal industrial uses of magnetic particle inspection are in-process inspection,
final inspection, receiving inspection and in maintenance and overhaul.
Although in-process inspection is used to highlight defects, as soon as possible in the
processing route, a final inspection gives the customer a better guarantee of defect-free
components.
During receiving inspection, both semi-finished purchased parts and raw materials are
inspected to detect initial defects. Incoming rod and bar stock, forging blanks and rough
castings are inspected in this way.
The transportation industries (road, rail, aircraft and shipping) maintain planned overhaul
schedules at which critical parts are inspected for cracks. Crank-shafts, frames, flywheels,
crane hooks, shafts, steam turbine blades and fasteners are examples of components
vulnerable to failure, in particular fatigue failure. Hence, there is a need for regular
inspection.
438 Real-time fault monitoring ofindustrial processes

6.2.2.3 Electrical test methods (eddy current testing (ECT»

If a coil carrying an alternating current is placed in proximity to a conductive material,


secondary or eddy currents will be induced within the material. The induced currents will
produce a magnetic field which will be in opposition to the primary magnetic field
surrounding the coil. This interaction between fields causes a back e.m.f in the coil and,
hence, a change in the coil impedance value. If a material is uniform in composition and
dimensions, the impedance value of a search coil placed elose to the surface should be
the same at all points on the surface, apart from some variation observed elose to the
edges of the sampie. If the material contains a discontinuity, the distribution of eddy
currents (and their magnitude) will be altered in its vicinity and there will be a consequent
reduction in the magnetic field associated with the eddy currents, so the coil impedance
value will be altered.
Eddy currents flow in elosed loops within a material and both the magnitude and the
timing or phase of the currents will depend on a number of factors. These factors inelude
the magnitude of the magnetic field surrounding the primary coil, the electrical and
magnetic properties of the material, and the presence or otherwise of discontinuities or
dimensional changes within the material. Several types of search coil are used, two
common types being the flat or pancake type coil which is suitable for the examination of
flat surfaces, and the solenoid type coil which can be used in conjunction with solid or
tubular cylindrical parts. For tubes, a solenoid type coil may placed around the tube or
inserted into the bore.
These techniques are highly versatile and, with the appropriate equipment and test
method, can be used to detect surface and sub-surface defects within components,
determine the thickness of surface coatings, provide information about structural
features, such as crystal grain size and heat treatment condition, and also to measure
physical properties ineluding electrical conductivity, magnetic permeability and physical
hardness.
In the case of ferro-magnetic materials there is a continuous spectrum of effects ranging
from predominantly magnetic effects at low frequencies to eddy current effects at the
higher frequencies, where the magnetic effect is suppressed. At the higher-frequency end,
the techniques used relate to assessment of the distortion and reduction of the eddy
current fields induced within the material. Changes in eddy current fields will indicate
those defects which affect the flow of eddy currents in the surface layers of the material,
such as cracks. At low frequencies it is the magnetic effects wbich predominate and the
effect that the material has on the B-R loop is observed, and tbis relates to structural
properties such as hardness. With non-magnetic materials, only eddy current effects
occur, irrespective of the frequency but, in general, inspection techniques for non-
magnetic materials use the higher frequencies, that is greater than 1kHz.
For many test and inspection purposes, a coil or coils are mounted in a holder as an
inspection probe. As stated earlier, a coil is frequently wound around a ferrite core and,
In-time faHure prognosis and fatigue life prediction of structures 439

while the search coil is generally protected by a plastic casing, the end of the ferrite core
often projects beyond the plastic case. Eddy current test probes do not need any coupling
fluid between them and the testpiece, unlike ultrasonic probes, because they are coupled
to the material by a magnetic field, and consequently little if any surface preparation is
necessary prior to inspection. Many types of inspection probes have been designed but
generally they can be divided into surface probes and hole probes.
It is necessary to calibrate eddy current test equipment, and reference testpieces for
calibration purposes should be made from material of sirnilar type and quality to that
which is to be tested so as to have the same conductivity value. A test-block should
contain aseries of defects of known size and shape and these are frequently made by
making several fine saw-cuts of varying but known depth.
In many situations users will also use defective parts containing, for example, fatigue
cracks as reference and calibration pieces.
One method of representing the signals from eddy current inspection probes is by the
phasor technique or phase analysis. When it is only necessary to detect changes in one of
the parameters which affect impedance and all over factors are constant, then the
measurement of a change in impedance value will reflect a change in that parameter.
However, there are many instances where it becomes necessary to separate the responses
from more than one parameter, and to separate the reactive and resistive components of
impedance. This requires the use of more sophisticated instruments but in this way it
becomes possible to identify the type of defect present and not merely its position.
There is a phase difference between the reactive and resistive components of the
measurement voltage. Consider the voltage as vectors A and B. The frequency is the
same for both and, therefore, the radian velocity OJwill be the same for both (=27if).
The resistive and reactive components of a measurement (probe coil) voltage can be fed
to the "X" plates and "Y" plates respectively of a cathode ray oscilloscope and displayed
as a two-dimensional representation.
The impedance changes caused by various types of defect or by changes in conductivity
will give screen displays as shown in figs. 6.3b and c.
The eddy current system is a highly versatile system and can be used to detect not only
cracks but several other conditions, including corrosion. Corrosion of hidden surfaces as,
for example, within aircraft structures, can be detected using phase-sensitive equipment.
It is a comparative technique in that readings made in a suspect area are compared with
instrument readings obtained from sound, non-corroded material (Hagemaier et al.
1985).
An eddy current test system can also be used for the routine inspection of aircraft
undercarriage wheels. The wheel is placed on a turntable and the probe coil which is
mounted at the end of an adjustable arm, is positioned near the bottom of the wheel. As
the wheel turns on the turntable so the probe arm moves slowly up the wheel, giving a
440 Real-time fault monitoring of industrial processes

elose helical search pattern. It is necessary to use a second probe to check under the
wheel flange. A hand-held probe is used for tbis part ofthe inspection.

Conductiv,ty
change

V...SlSllncf' - - -

{al

Air point
_Stcel

~
I

Air / ' Titanium


point - - Stainless steel
- Magnesium
- AI-eu-Zn alloy
-Copper

(bI (cl

Figure 6.3
(a) Vector point.
(b) Impedance plane display on oscilloscope, showing differing conductivities.
(c) Impedance plane display, showing defect indicatioDS.

The ability of eddy current techniques to determine the conductivity of a material has
been utilized for the purpose of checking areas of heat-damaged skin on aircraft
structures. If the type of aluminum alloy used in aircraft construction becomes over-
heated it could suffer a serious loss of strength. Tbis is accompanied by an increase in the
electrical conductivity ofthe a1loy. The conductivity ofsound material is generally within
the range of 31 to 35 per cent IACS. Defective or heat-damaged material would show a
conductivity in excess of35 per cent IACS.

6.2.2.4 Ultrasonie testing

Ultrasonic techniques are very widely used for the detection of internal defects in
materials, but they can also be used for the detection of small surface cracks. U1trasonics
are used for the quality control inspection of part processed material, such as rolled slabs,
In-time failure prognosis and fatigue life prediction of structures 441

as weil as for the inspection of finished components. The techniques are also in regular
use for the in-service testing of parts and assemblies.
Sound waves are elastic waves which can be transmitted though both fluid and solid
media. The audible range of frequency is from about 20 Hz to about 20 kHz but it is
possible to produce elastic waves of the same nature as sound at frequencies up to 500
MHz. Elastic waves with frequencies higher than the audio range are described as
ultrasonic. The waves used for the non-destructive inspection of materials are usually
within the frequency range 0.5 MHz to 20 MHz.
Piezo-electric materials form the basis of electro-mechanical transducers for ultrasonic
NDT. The original piezo-electric material used was natural quartz. Quartz is still used to
some extent but other materials, including barium titanate, lead metaniobate and lead
zirconate, are used widely. When an altemating voltage is applied across the thickness of
a disc of piezo-electric material, the disc will contract and expand, and in so doing will
generate a compression wave normal to the disc in the surrounding medium. When
quartz is used the disc is cut in a particular direction from a natural crystal but the
transducer discs made from ceramic materials such as barium titanate are composed of
many crystals fused together, the crystals being permanently polarised to vibrate in one
plane only.
Wave generation is most efficient when the transducer crystal vibrates at its natural
frequency, and this is determined by the dimensions and elastic constants of the material
used. Hence, a 10 MHz crystal will be thinner than a 5 MHz crystal. A transducer for
sound generation will also detect sound. An ultrasonic wave incident on a crystal will
cause it to vibrate, producing an altemating current across the crystal faces. In some
ultrasonic testing techniques two transducers are used - one to transmit the beam and the
other acting as the receiver - but in very many cases only one transducer in necessary.
This acts as both transmitter and receiver. Ultrasonic is transmitted as aseries of pulses
of extremely short duration and during the time interval between transmissions the crystal
can detect reflected signals.
The presence of a defect within a material may be found using ultrasonics with either a
transmission technique or a reflection technique.
Normal probe transmission method. In this method a transmitter probe is placed in
contact with the testpiece surface, using a liquid coupler, and a receiving probe is placed
on the opposite side ofthe material (see fig. 6.4).
If there is no defect within the material, a certain strength of signal will reach the
receiver. If a defect is present between the transmitter and receiver, there will be a
reduction in the strength of the received signal because of partial reflection of the pulse
by the defect. Thus, the presence of a defect can be inferred.
442 Real-time fauIt monitoring of industrial processes

6 ------ Transmitter
probe

I
l

=-Oefect

Q ____
T Receiver
probe

Figure 6.4 Normal probe transmission technique

This method possesses a number of disadvantages. These are:


(a) The specimen must have parallel sides and it must be possible to reach both sides of
the piece.
(b) Two probes are required, thus doubling the possibility of having inefficient fluid
coupling.
(c) Care must be taken that the two probes are exactIy opposite one another.
(d) There is no indication of the depth of a defect.
Angle probe transmission method There are certain testing situations in which it is not
possible to place a normal probe at right angles to adefeet and the only reasonable
solution is offered by angle probes. A good example of this technique is in the inspection
ofbutt welds in parallel-sided plates. The transmitter and receiver probes are arranged as
in fig. 6.5a. If there is any defect in the weid zone, this will cause a reduction in the
received signal strength. Distance AB is known as the "skip distance" and for the
complete scanning of a weid the probes should be moved over the plate surface as shown
in fig. 6.5b. In practice, both probes would be mounted in ajig so that they are always at
the correct separation distanee.
Angle probe rejlection method Defects can also be detected using one angle probe in the
reflection mode, as shown in fig. 6.6. It is important when using an angle probe in this
type of test that the flaw deteetor be aecurately calibrated using a reference test-block.
The design and use of a calibration block are covered in HuH and Vernon (1988).
A Rayleigh, or surface, wave can be used for the detection of surface cracks (see fig.
6.7). The presence of a surface defect will reflect the surface wave to give an echo signal
in the usual way. Surface waves wiII follow the surface contours and so the method is
suitable for shaped components such as turbine blades.
In-time failure prognosis and fatigue life prcdiction of structurcs 443

A B
(a)

B
I;
I
I
I
I

(b)

Figure 6.5 Angle probe transmission method


(a) probe positions and skip distance;
(b) scanning method for complete inspection of butt weid.

Defect

Figure 6.6 Reflective technique with angle probe.


444 Real-time fault monitoring of industrial processes

b \ .. Defect

~~50~:::::::;;;"IFi(r7//'-----,1

Flgure 6. 7 Crack detection using a surface wave probe.

The reflection method has certain advantages over the transmission method. These are:
(a) The specimen may be of any shape.
(b) Access to only one side of the testpiece is required.
(c) Qnly one coupling point exists, thus minimizing error.
(d) The distance of the defects from the probe can be measured.
The information obtained during an ultrasonic test can be displayed in several ways.
''A'' scan display. The most commonly used system is the "A" scan display (see fig. 6.8).
A blip appears on the CRT screen at the left-hand side, corresponding to the initial pulse,
and further blips appear on the time base, corresponding to any signal echoes received.
The height of the echo is generally proportional to the size of the reflecting surface but it
is affected by the distance travelled by the signal and attenuation effects within the
material. The linear position of the echo is proportional to the distance of the reflecting
surface from the probe, assuming a linear time base. This is the normal type of display for
hand probe inspection techniques.

Initial Backwall
pulse echo

5mall Defect
defect echo

(a) (b)
Figure 6.8 "A" scan display.
(a) reflections obtained from defect and backwall.
(b) representation of"A" scan screen display.
In-time failure prognosis and fatigue life prcdictlon of structures 445

A disadvantage of the "A" sean is that there may be no permanent reeord, unless a
photograph is taken of the sereen image, although more sophistieated modern equipment
have the faeility for digital reeording.
"B" scan display. The "B" sean enables a reeord to be made of the position of defeets
within a material. The system is ilIustrated in fig. 6.9. There needs to be co-ordination
between the probe position and the traee, and the use of "B" sean is eonfined to
automatie and semi-automatie testing teehniques. With the probe in position "1" the
indication on the screen is as shown in fig. 6.9, with (i) representing the initial signal and
(ii) representing the backwall. When the probe is moved to position "2", line (iii) on the
display represents the defeet. This representation of the testpieee eross-seetion may be
recorded on a paper chart, photographed, or viewed on a long-persistenee screen.

-
I
I
I
I
I I

I I
(i) -------T----r--r--r-
I I
I I
I I I
I I L.J
(iiil L...J

(i0-------

Figure 6.9 "B" sean display.

"e" scan display. While the "B" sean gives a representation of a side elevation of the
testpieee, another method, termed "C" sean ean be used to produee a plan view. Again,
the "C" scan display is eonfined to automatie testing (Nielsen (1981), Yanagi, (1983».
Identijication 0/ de/ects.
By means of uItrasonic methods not only ean the exact position of internal defeets be
determined but it is also possible, in many eases, to distinguish the type of defeet. In the
following, the various types of signal response received from particular types of defect
will be eonsidered.
(a) Defect at right angles to the beam direction. When no defect is present, a large echo
signal should be received from the backwall. The presence of a small defect should give a
446 Real-time fault monitoring of industrial processes

small defect echo and some reduction in the strength of the backwall echo. When the
defect size is greater than the probe diameter the defect echo will be large and the
backwall echo may be lost (fig. 6.10), depending on the depth ofthe defect in relation to
beam spread in the far zone.

(a) (b) (c)


Figure 6.10 Effect of defect size on screen display.
(a) defect free-initial pulse and backwall echo only;
(b) small defect echo but large backwall echo;
(c) large defect echo with small backwall echo.

(b) Defects other than plane defects. Areas of micro-porosity will cause a general
scattering of the beam, giving some "grass" on the CRT trace and with loss of the
backwall echo (fig. 6.11a). A large spherical or elliptical inclusion or hole would tend to
give a small defect echo coupled with a small backwall echo (fig. 6. 11 b), while a plain
trace showing no echo at all could be an indication of a plane defect at some angle other
than normal to the path ofthe beam (fig. 6.11c).

J______
Figure 6.11 (a) Micro-porosity, (b) Elliptical defect, (c) Angled defect.
In-time failure prognosis and fatigue life prediction of structures 447

(e) Laminations in thiek plate. The plate should be completely scanned in a methodical
manner, as shown in fig. 6.12. The indications of laminations are a closer spacing of
echoes and a more rapid fall-off in the size of the echo signals. Either or both of these
indications are signs oflamination (see fig. 6.13).
(d) Lamination in thin plate.
A thin plate may be considered to be a plate of thickness less than the dead zone of the
probe. Asound plate will show a regular series of echoes with exponential fall-off of
amplitude. A laminated region will show a close spacing with a much faster rate of
amplitude fall-off. The pattern may change from an even to an irregular outline. It is this
pattern change which, in may cases, gives the best indication of lamination in thin plate
(fig. 6.14).

Probe

Figure 6.12 Method of scanning a large surface.

(a) (b)
Figure 6.13 Indication of lamination in thick plate: (a) good plate; (b) laminated plate

(e) WeId defeets. Ultrasonic testing using angle probes in either the reflection or
transmission mode is a reliable method for the detection of defects in butt welds and for
determining their exact location. It is, however, fairly difficult to determine with certainty
the exact nature of the defect, and much depends upon the skill and experience of the
448 Real-time fault monitoring of industrial processes

operator. Ir, following ultrasonic inspection, there is any doubt in the mind of the
operator about the quality of a weid, then it would be wise to check radiographically the
suspect area.

(a) (b)
Figure 6.14 Indication of lamination in thin plate: (a) good plate; (b) laminated plate

(a) (b)
Figure 6.15 Detection of radial defects in:
(a) tubes,
(b) solid bar; normal probe in position A will not show defect but angle probe at B will.

(f) Radial defects in cylindrical tubes and shajts. A radial defect in a cylindrical member
is not generally detectable using normal probe inspection, as the defect will be parallel to
the ultrasonic beam. In these circumstances the use of an angle probe reflection
technique will clearly show the presence of defects (fig. 6.15).
As has been seen in the foregoing paragraphs, ultrasonic test methods are suitable for the
detection, identification and size assessment of a wide variety of both surface and sub-
surface defects in metallic materials, provided that there is, for reflection techniques,
access to one surface. There are automated systems which are highly suitable for the
routine inspection of production items at both an intermediate stage and the final stage of
manufacture. Using hand held probes, many types of components can be tested, including
in situ testing. This latter capability makes the method particularly attractive for the
routine inspection of aircraft and road and rail vehicles in the search for incipient fatigue
cracks (Yanagi, 1983). In aircraft inspection, specific test methods have been developed
In-time faHure prognosis and fatigue Iife prediction of structures 449

for each particular application and the procedures listed in the appropriate manuals must
be followed if consistent results are to be achieved. In may cases a probe will be specially
designed for one specific type of inspection.
In nuclear plants, chemical plants, pipelines, vessels, off-shore and marine structures
material damages due to fatigue loading causes can be detected and dimensioned
efficiently using ultrasonic NDT methods (Landez et al. 1992). The continuous
monitoring by ultrasonic can help to monitor high stress concentration or cracked zones.
Ultrasonic probes are perrnanently stuck on the region to monitor the critical area or (in
case of existing crack) the crack tip and the crack root (by diffraction and reflection
respectively). Any change in the UT signal amplitude indicates that a modification took
place in the inspected zone (formation ofa crack or propagation ofthe existing one).
Under the assumption that no additional non-linear effects affect the measurement, it has
been shown that this method can detect damage trom nearly 10% of life span.

6.2.2.4. Radiography

Very-short-wavelength electromagnetic radiation, namely X-rays or y-rays, will penetrate


through solid media but will be partially absorbed by the medium. The amount of
absorption which will occur will depend upon the density and thickness of the material
the radiation is passing through, and also the characteristics of the radiation. The
radiation which passes through the material can be detected and recorded on either film
or sensitised paper, viewed on a fluorescent screen, or detected and monitored by
electronic sensing equipment. Strictly speaking, the term radiography implies a process
in which an image is produced on film. When a permanent image is produced on
radiation-sensitive paper, the process is known as paper radiography. The system in
which a latent image is created on an electrostatically charged plate and this latent image
used to produce a permanent image on paper is known as xeroradiography. The process
in which a transient image is produced on a fluorescent screen is termedjluoroscopy, and
when the intensity of the radiation passing through a material is monitored by electronic
equipment the process is termed radiation gauging.
It is possible to utilize a beam of neutrons rather than X-rays or y-rays for inspection
purposes, this being termed neutron radiography. The process of neutron radiography
involves the transmission of neutrons through a component or assembly and the
production of a radiograph on film.
After an exposed radiographic film has been developed, an image of varying density will
be observed with those portions of the film which have received the largest amounts of
radiation being the darkest. As mentioned earlier, the amount of radiation absorbed by
the material will be a function of its density and thickness. The amount of absorption will
also be affected by the presence of certain defects such as voids or porosity within the
450 Real-time fault monitoring ofindustrial processes

material. Thus radiography can be used for the inspection of materials and components
to detect certain types of defect.
The use of radiography and related processes must be strictly controlled because
exposure ofhumans to radiation could lead to body tissue damage.
Radiography is capable of detecting any feature in a component or structure provided
that there are sufficient differences in thickness or density within the testpiece. Large
differences are more readily detected than small differences. The main types of defect
which can be distinguished are porosity and other voids and inclusions, where the density
of the inclusion differs form that of the basis material. Generally speaking, the best results
will be obtained when the defect has an appreciable thickness in a direction parallel to the
radiation beam. Plane defects such as cracks are not always detectable and the ability to
locate a crack will depend upon its orientation to the beam. The sensitivity possible in
radiography depends upon many factors but, generally, if a feature causes a change in
absorption of 2 per cent or more compared with the surrounding material, then it will be
detectable.
Radiography and ultrasonics (see § 6.2.2.3) are the two methods which are generally
used for the successful detection of internal flaws that are located weIl below the surface,
but neither method is restricted to the detection of this type of defect. The methods are
complementary to one another in that radiography tends to be more effective when flaws
are non-planar in type, whereas ultrasonic tends to be more effective when the defects
are planar.
Radiographic inspection techniques are frequently used for the checking of welds and
castings, and in many instances radiography is specified for the inspection of
components. This is the case for weldments and thick-wall castings which form part of
high-pressure systems.
Radiography can also be used to inspect assemblies to check the condition and proper
placement of components. It is also used to check the level ofliquid in sealed liquid-filled
systems. One application for which radiography is very weIl suited is the inspection of
electrical and electronic component assemblies to detect cracks, broken wires, missing or
misplaced components and unsoldered connections.
Radiography can be used to inspect most types of solid material but there could be
problems with very high or very low density materials. Non-metallic and metallic
materials, both ferrous and non-ferrous, can be radiographed and there is a fairly wide
range of material thicknesses that can be inspected (Grangeat et al., 1992). The
sensitivities of the radiography processes are affected by a number of factors, including
the type and geometry ofthe material and the type offlaw.
Although radiography is a very useful non-destructive test system, it possesses some
relatively unattractive features. It tends to be an expensive technique, compared with
other non-destructive test methods. The capital costs of fixed X-ray equipment are high
but coupled with this considerable space is needed for a radiography laboratory,
In-time faHure prognosis and fatigue life prediction of structures 451

including a dark room for film processing. Capital costs will be much less if portable X-
ray sets or y-ray sources are used for in situ inspections, but space will still be required
for film processing and interpretation.
The operating costs for radiography are also high. The setting up time for radiography is
often lengthy and may account for over half of the total inspection time. Radiographic
inspection of components or structures out on sites may be a lengthy process because the
portable X-ray equipment is usually limited to a relatively low energy radiation emission.
Similarly, portable radio-active sources emitting y-radiation tend to be of fairly low
intensity. This is because high-intensity sources require very heavy shielding and thus
cease to be truly portable. In consequence, on-site radiography tends to be restricted to a
maximum material thickness of 75 mm of steel, or its equivalent. Even then, exposure
times of several hours may be needed for the examination of thick sections. This brings a
further disadvantage in that personnel may have to be away form their normal work posts
for a long time while radiography is taking place.
The operating costs for X-ray fluoroscopy are generally much lower than those for
radiography. Setting-up times are much shorter, exposure times are usually short
and there is no need for a film processing laboratory.
Another aspect which adds to radiography costs is the need to protect personnel from
the effects of radiation, and stringent safety precautions have to be employed. This safety
aspect will apply to all those who work in the vicinity of a radiography test as weil as to
those persons directly concemed in the testing.

6.2.2.5 Acoustic emission (AE)

High-frequency waves, at frequencies within the range 50 kHz to 10 MHz, are emitted
when strain energy is rapidly released as a consequence of structural changes taking
place within a material. Plastic deformation, phase transformations, twinning, micro-
yielding and crack growth result in the generation of "acoustic" signals which can be
detected and analysed. Hence, it is possible to obtain information on the location and
structural significance of such phenomena.
Basically there are two types of acoustic emission form materials - a continuous type and
an intermittent or burst type. Continuous emission is normally of low amplitude and is
associated with plastic deformation and the movement of dislocations within a material,
while burst emissions are high-amplitude short-duration pulses resulting from the
development and growth of cracks.
Acoustic emission inspection offers several advantages over conventional non-
destructive testing techniques. For example, it can assess the dynamic response of a flaw
to imposed stresses. When a crack or discontinuity approaches critical size there is a
marked increase in emission intensity, and hence, a warning is given of instability and
catastrophic failure. Also, it is possible to detect growing cracks of about 2x 10-4 mm in
452 Real-time fault monitoring of industrial processes

length. This is a much smaller size than is detectable by conventional techniques. In


addition, acoustic emission inspection requires only limited access and may be performed
directly on components in service.
The usual type of transducer employed for the sensing of acoustic emissions from
components uses a lead zirconium titanate element with a high electromechanical
coupling coefficient. The signal from the transducer is amplified, filtered and processed
to give an audio and/or visual recording.
The development of new broad-band transducers and the use of powernd tools for signal
analysis allow quite important steps forwards in the application of AB techniques. It has
been possible to achieve quantitative source characterization and 3-dimensional source
location.
This last achievement enabling the through thickness advance of a crack point to be
monitored, has been got on laboratory specimens, but its extension to real structures
seems feasible (Godfrey et al, 1986). Because the information carried by acoustic
emission is not explicit, the application of DSP (Digital Signal Processing) and statistical
pattern recognition techniques is of great help in order to bring to light the information
content of the AB signals. It is in particular possible to discriminate noise and reject it,
and also characterize the acoustic emission source (Grangeat et al., (1992), Singh and
Udpa (1986), Tukuda and Mitsuoka (1986».

6.2.2.6 Other non-destructive inspection techniques

Optical inspection probes. Optical inspection probes are a major aid to visual
inspection as they permit the operator to see clearly inside pipes, ducts, cavities and other
openings to which there is limited access. The basic parts of an inspection probe system
are the objective lens head which is inserted into the cavity, the viewing eyepiece, and the
illumination system. The development of fiber optical systems has permitted major
advances to be made in the design and construction of inspection probes.
Optical inspection probes are of two general types, rigid or flexible, but within both of
these categories there are many different sizes and designs available.
A rigid inspection probe comprises an optical system with a viewing eyepiece at one end.
Illumination is conveyed to the inspection point through an optical fiber bundle and both
the optical and illumination systems are enclosed within a stainless steel tube. Light from
an external source, which is usually a variable intensity mains and/or battery-operated
quartz-halogen lamp, is conveyed to the probe through an optical fiber light guide.
Rigid probes are produced in many sizes from the smallest, with tube diameters of 2 mm
or less, up to large probes with tube diameters of 15 or 20 mm. The maximum usable or
working length of a probe is the extent to which it can be inserted into an opening; it is
not a constant and it varies with the value of probe diameter. Probes of all diameters are
In-time faHure prognosis and fatigue life prediction of structures 453

made in a variety of lengths but the maximum working length of a 2 mm diameter


instrument is about 150 mm. The maximum working length for an 8 mm diameter probe
can be up to 2 m and for larger sized probes usable lengths may extend up to 4 or 5 m. It
is not practical, even for large diameter devices, to increase the usable length beyond
about 5 m, without incurring considerable loss of quality in the eyepiece image.
Inspection probes may be designed to give either direct viewing or to view at some angle
to the line of the probe. Some of the viewing angles catered for by the instrument
manufacturers are 15°,60°,80°,90°and l200. In addition to a range ofviewing angles
the objective lens system can be designed to give a narrow, intermediate or wide field of
view. It is also possible to have instruments which possess adjustable prisms to vary the
viewing angle. This is only possible in probes with diameters of about 8 mm or greater,
but such probes are often made to give any viewing angle between 60° and l200.
Rigid inspection probes are extremely useful instruments but, like all delicate
instruments, have to be handled carefully. The prob es, particularly those of small
diameter, can be damaged very easily if mishandled.
The usefulness and versatility of light inspection probes is increased by the use of flexible
probes. These incorporate a fiber optic coherent image guide and a separate fiber light
guide for illumination, both contained within a flexible plastic or braided metal sheathing.
The external diameters of the flexible prob es generally range from 4 mm to about 15 mm
and the working lengths may be up to about 3 m. Flexible probes are usually designed to
provide either direct viewing ahead or viewing at 90° to the probe axis, but the larger
diameter probes can be produced with a movable inspection head, the position of which
is controlled from the eyepiece end.
Inspection probes, then, are extensions to the human eye, and are used to view areas
which would otherwise be impossible to inspect without either dismantling or even
cutting open the part or assembly. The images produced at the eyepiece end of a probe
may be photographed and a permanent record secured. It is also possible to mount a TV
camera as a substitute for the normal eyepiece lens and display the resulting image on a
monitor screen. Such an installation is described by Namioka et al. (1992) to inspect the
inside condition of pipelines.
Laser-induced ultrasonics. Arecent development in non-destructive testing is the
generation of ultrasonic pulses within a material without the necessity to have a
transducer crystal in contact with the testpiece. The ultrasonic pulses are produced by
focusing aseries oflight impulses from a laser on the surface of the testpiece.
The laser, which may be situated up to 10 meters from the testpiece, sends out aseries of
very short high-energy light impulses and these impulses, each of ab out 20 nanosecond
length, are converted by thermo-mechanical effects into sound impulses at a frequency
within the range 1 MHz to 100 MHz. A laser pulse, incident on asolid surface, produces
rapid heating ofthe surface, at the point ofincidence, resulting in a localized temperature
rise. Thermal expansion of the "hot zone" causes generation of ultrasonic waves, which
454 Real-time fault monitoring of industrial processcs

propagate across the component surface and within the component body. The intensity
of the incident laser impulses is such that no damage is caused to the surface of the
testpiece. The emission from a second laser illuminates the surface of the testpiece and
ultrasonic echoes returning to the testpiece surface cause deflections which cause a
modulation ofthe reflected light from the illuminating laser.
The third component of the system is an interferometer which analyses the modulated
reflected light signal and converts it into a signal which can be presented on the screen of
a cathode ray tube in a manner similar to the usual type ofultrasonic signal display.
The main advantages of this technique are that no mechanical coupling is necessary and
the acquisition ofresults is rapid. Laser-based ultrasonic interrogation systems are in use,
currently, to detect the existence of piping and liquid metal level in cast steel ingots.
Although the sensitivity of the system is lower than that of some of the more
conventional techniques, for example, ultrasonic pulse-echo testing, the system has
attracted some interest for the continuous monitoring of components on process lines in
the manufacturing industries.
Time-of-flight difTraction. A new ultrasonic technique has been developed, namely
time-of-flight diffi"action (TOFD), which relies on the diffi"action of ultrasonic waves
from crack or defect tips, rather than reflection, as in pulse-echo. The technique is very
useful in determining the true size of fatigue cracks, even though a crack may be pressed
together by the applied load or residual stress network. With conventional pulse-echo
testing, complete or partial transmission ofthe wave pulses across the "closed" crack can
lead to errors in the analysis of crack size, because of the reduction in amplitude of the
reflected signals. TOFD is so called because it relies on the wave propagation times to
indicate and locate the diffi"action source. An example of applying the technique is shown
in fig. 6.16.

Crack
adjacent to
weid

Figure 6.16 Probe and wave path geometry as used to measure the size of a crack in a welded
joint.
In-time failure prognosis and fatigue life prediction of structures 455

The low signal-to-noise ratio often necessitates signal averaging, and comparison and
subtraction of surface waves also may be necessary.
Crack depth gauges. Cracks which appear at the surface of a material can be readily
detected using liquid penetrant of magnetic particle inspection methods, but neither of
these methods will give an accurate assessment of the depth of a crack. Crack depth
gauges are frequently used in conjunction with these other non-destructive tests to give a
measure of the depth of tlaws which have been located. One simple but effective device
for this consists of two closely spaced electrical contacts. The gauge is placed on the
surface of the material and the electrical resistance between the two contact points
measured. When the gauge is placed on the testpiece with the contacts on either side of a
surface crack, the measured resistance will be greater as current now has to follow an
extended path around the crack. The meter scale is generally calibrated to give a direct
reading of crack depth.
Thermography. Thermography is concerned with the mapping of isotherms, or
contours of equal temperature, over the surface of a component. Heat-sensing materials
or devices can be used to detect irregularities in temperature contours and such
irregularities can be related to defects. Thermography is particularly suited to the
inspection of laminates. The conduction of heat through a laminate will be affected by the
presence of tlaws in the structure, resulting in an irregular surface temperature profile.
Typical tlaws which can be detected are unbonded areas, crushed cells, separation of the
core from the face plates and the presence of moisture in the cells of honeycomb
structures.
Thermographic methods may either be of the direct contact type, in which heat-sensitive
material is in contact with the component surface, or indirect contact, in which a heat-
sensitive device is used to measure the intensity of infrared energy emitted from the
surface.
Pulses of heat energy, from a source, are directed at the component under test. It is
usual, but not essential, to direct the incident energy on to one surface of a component
and observe the effects at the opposite surface after conduction through the material.
Flaws and irregularities in structure will affect the amount of conduction in their vicinity.
If it is impossible to have access to both surfaces, the technique can still be used. The
heat energy incident on the surface will be conducted away through the material at
differing rates, depending on whether or not tlaws are present.
Direct contact methods include the use of heat-sensitive paints and thermally quenched
phosphors. Indirect contact methods, which offer greater sensitivity, involve the use of
infra-red imaging systems with a TV-video output.
Beat-sensitive paints. Heat-sensitive photo-chromic paints are effective over a
temperature range from about 40°C to 1600c. Some paints show several color changes
within their reaction temperature range and, with careful application, will have a
sensitivity of the order of ±5°C. When heat reaches the painted surface by conduction
456 Real-time fault monitoring of industrial processes

through the material the paint colour changes, usually with a bleaching effect. Where a
tlaw impedes conduction the colour will be unchanged. On the other hand, if heat energy
is directed at the painted surface the reverse effect will show up as heat is conducted
away from the surface more rapidly through good regions than through defective areas.
Thermally quenched phosphors. These are organic compounds which emit visible
light when excited by ultra-violet radiation. The brightness of the emission decreases as
the temperature of the compounds increases. Phosphors are available that are useful at
temperatures up to about 40()OC and with aresolution of±l°C.
Thermal pulse video thermography. In this system no physical contact is necessary
with the material and very rapid rates of inspection are possible. A high-intensity heating
source is used to send pulses of infra-red energy into the material. The surface is scanned
by an infra-red thermal imager with a TV-video output. This system, again, can either be
used for sensing heat transmitted through the component or for single-sided inspection
when only one surface is accessible. Very good sensitivities are possible. Digitized image
processing to provide image enhancement is also possible.

6.2.3 Signal processing (SP)for NDT

NDT is gradually adapting to the use of the most recent developments in digital signal
and image processing. By signal processing (SP) is meant digital techniques that
transform an input signal into an output signal or into parameters. In this very broad
sense, not only averaging and Fourier transform but also TOFD, SAFT, 3-D
reconstruction, expert system based signal classification, adaptive learning network and
neural networks can be classified as SP techniques (more details are given in the
following).
Non-destructive testings generally result in an inverse problem: given a set of external
measurements compute the location and size of defects inside the material. Although the
basic equations differ from one method to the other ("direct problem" formulations are
different) their common feature is that they cannot be simply inverted because ofi) noise,
ii) lack of measurements, iii) incomplete modeling and iv) all together.
NDT noise itself is rarely the widespread additive Gaussian white noise. A first example
is the response of coarse grained materials when tested with ultrasonics: each grain
behaves as a retlector so that the whole response does not resemble a random white
noise and classical averaging does not work. In EC (eddy eurrent) testing of some steam
generator tubes, tlattening noise can be decomposed into several narrow band
components (it is therefore non-random, but it is not "stable" either). Moreover noise
and tlaw frequency spectra are the same: if this tlattening noise is bandpass-filtered, the
useful information is also filtered out. Another example originates from gammagraphy
testing of thick wall sampies. The radiographs are corrupted by a granular noise due to
thickness and film, which has to be modeled adequately before processing.
In-time faHure prognosis and fatigue life prediction of structures 457

Digital signal processing could be difficult and hence expensive to implement. Therefore
one has to implement it only in these cases where all other means have failed. Tbis is why
smart experimental setups and acquisition schemes have been developed. Among them
Synthetic Aperture Focusing Technique (SAF1), Time 0/ Flight Diffraction (TOFD)
and numerous enhancements of these basic techniques have been developed (Ludwing
and Roberti, 1989). SAFT is some sort of "beamforming" already known in array
processing for underwater acoustics or RADAR. For each pixel of the insonified
specimen the A-scan signals received at n-transducers are averaged after time-sbifting.
The sbifts are computed from the different distances between one transducer and the
pixel under study. Scanning the sampie results in an enhanced image because of
constructive addition of waveforms.
The aim is to better und erstand the content of the measured signals and then to be able to
simulate it (together with its accompanying noise) for study and method evaluation
purposes. The field of image processing has become a big consumer of sopbisticated
modelisation based on stochastic processes on one hand (Boolean and Markov models)
and Bayesian procedures on the other hand. Boolean models represent the granular noisy
part of images by assuming it is a Poisson-type random spatial distribution of some basic
pattern (usually the convex part of a Gaussian whose parameters are randomly chosen).
Tbis proved appropriate for radiograph modelisation. Markov models account for the
reasonable idea that statistical relationships between one pixel and the rest of an image
are summarized in a window around tbis pixel. These models are used in many
processing tasks and particularly in reconstruction. Time-frequency domain methods are
required for extracting physical parameters of interest when these involve joint variations
oftime and frequency. Techniques based on the Wavelet Transform appear to be suitable
for acoustic signal processing particularly for detection and description of burst, whose
arrival time and waveforms are unknown. Lastly the resolution procedure based on the
dassical Bayes' rule is an elegant way to introduce human "knowledge" on the desired
image and on its transformation. To sum up, besides the mathematical aspects that can
discourage NDT persons, the important point in this general stochastic approach is its
ability to take prior knowledge into account as probability laws and distrurbing noise as
random processes. It is amusing (although weil known) that introducing such apparently
complex tools leads at the end to tractable calculations and interesting results (Singh and
Udpa (1986), Ludwing and Roberti (1989), Grangeat et al. (1992)).
An important issue in US (ultrasonics) is to restore signals from austenitic welds because
they are severely corrupted by noise. Several techniques have become popular. Theyare
called "averaging" although tbis reference to a "linear" mix could be misleading. Spatial
averaging consists in selecting for example the minimum values in a number of
waveforms produced by different probe locations dose to each other, wbile jrequency
averaging does the same but from different frequency bands of a single signal. Both
methods are based on the reasonable assumption that signal (i.e. defects) responses are
coherent whereas noise (grain reflections) responses are not. Signal to noise
enhancements ofup to 10 dB have been reported.
458 Real-time fault monitoring of industrial processes

In flattening noise from some steam generator tubes, filtering has to be done by specific
digital techniques. They are based on a noise reference either picked from the signal itself
or provided by an auxiliary signal (this is the so called correlofilter). When the signal is
not stationary, one convenient way to filter it is to let a feedback loop estimate the filter
coefficients from the measured sampies. The output of the filter can be used as a noise
estimation and subtracted from the original signal.
After clean signals have been recovered, an automatic decision about their nature is
desired. Besides the statistical techniques (principal component analysis, discriminant
analysis and others) a classification scheme got a tremendous favour at the end of the
70's: the Adaptive Learning Network (ALN). ALN is an empirical combination of
candidate parameters, in which a non-linear polynomial model is constructed. At each
iteration the model "grows", that is the coefficients and the structure of the model are
determined simultaneously. The model's output can be either a classification or an
estimation of some parameter of interest. ALNs have been tested both for US
(Ultrasonics) and EC (Eddy Currents) signals. As revealed, they performed more or less
like classical multidimensional statistics and apparently disappeared from reports.
As the question of what are the optimal parameters remains open, another approach has
been proposed for EC signals. The idea, first used for hand print character recognition,
consists in retaining only the first terms of some sort of Fourier development of the EC
complex signal. These terms are then used as features for classification. The experience
about these Fourier descriptors applied to support plates discrimination is that they are
too global to allow an accurate localization of small flaws.
Since the rediscovery ofRosenblatt's perceptron in the 80's, NNs (neural networks) have
been proposed for numerous tasks. A particular combination has proved to be fruitful:
1. A three-Iayer architecture (one hidden layer).
2. The back propaga'tion algorithm to estimate the weights.
3. Classification purposes.
Whereas NNs have given an opportunity to revisit once more some classical problems,
their new features are:
1. Efficient hardware implementation.
2. Some preprocessing capability (Komatsu et al. , (1992), Parpaglione, 1992).
Nevertheless, the amount of relevant examples has a par~ount importance in NN
approach as weil as in other approaches. It is surprising that nobody has compared ALNs
and NNs at least from a NDT point ofview.
In-time faiJure prognosis and fatigue Iife prediction of structures 459

6.2.4 Applications 0/ SP in automated NDT

A few successful applications of SP in automated NDT for defect detection and


characterization and for enhancing the defect detection sensitivity, are described here.
Maraging and austenitic stainless steel welds - Defect detection and characterization
using pattern and cluster analysis.
High sensitivity defect detection and characterization in weldments of these materials
continues to be of interest. Tbis is primarily due to the fact that these weldments are used
in large numbers in critical and heavy industry applications. Dendritic (hence anisotropie)
mierostructures of these weldments, especially in the tbickness range of 10 to 40 mm
pose problems to ultrasonic testing. Considering these facts, the ASME boiler and
pressure vessels code has recommended that in the case of austenitie stainless steel
weldments, any defect that is 10% of tbickness should be recorded and monitored. DSP
procedures, by using very effective cluster and pattern analysis algorithms have been
developed (Raj, 1992). These enable detection and characterization of defects down to
1% of weid tbickness (14.0 mm weid tbickness) in austenitie stainless steel welds. W ork
on bigher thieknesses is in progress. The complexity of tbis problem is an excellent area
for the development of an expert system, for offering adviee in carrying out effective
NDE on these weldments.
In the case of maraging steel weldments used in the rocket motor casings by the
aerospace industry, tight cracks (3 mm x 1 mm) produced by fatigue loading were
detected and characterized (Raj, 1992) using similar cluster and pattern analysis
principles. Detection of such small defects for tbis applieation, enhances the payload
capacity of the rocket, resulting significant in economie and technologieal gains.
In both the above cases, the cluster analysis methods use the crosspower spectrum
(between signals from weid noise and those from defects), to obtain cluster elements.
The pattern analysis method generates a pattern called the demodulated autocorrelogram
(DMAC) pattern from the autocorrelation function of a signal and study its features.
Characterization of foreign inclusions in composite material using pattern analysis.
Unwanted foreign inclusions find their way into composite materials during their
manufacture. It is known that these inclusions affect the load bearing capacity and
performance of these materials. Conventional immersion ultrasonic testing is able to
detect these inclusions but fails to characterize them into different categories. C-scan
imaging techniques are convenient for characterising such foreign inclusions in composite
materials. A simple and effective method utilizing the DMAC pattern analysis, for the
characterization of these inclusions has been developed by Kalyanasundaram et al.
(1991). The procedure functions on template matching and thus can be easily automated
and offers an excellent way to use neural networks for pattern matching and
classification.
460 Real-time fault monitoring of industrial processes

Acoustic emission (AB) signal analysis. Acoustic emission signal analysis has yielded
important information in the detection of leaky components under pressure, in
pressurized heavy water reactors. In one of the above problems, the ratio of the spectral
energies present in different bands of the power spectrum of the AB signal, is used in
order to detect the leaking component, since the signal-to-noise (SNR) ratio was very
poor. This is an example where problems due to poor SNR were overcome by
appropriate use ofDSP.
In the NDE of rotating machinery, such as steam turbines and turbine generators, AB is
used to detect malfl.mctions such as rubbing and bearing tilt. In order to detect and
transmit AB signals from an operating rotor (to enable on-line processing), a wireless AB
monitor has been used, which can detect and transmit AB signals ranging from 50 kHz to
250 kHz. Acoustic emission parameters such as events, energy values, amplitude
distribution, frequency components, skewness and kurtosis values have also been
correlated with the "health" of cutting tools, used in lathes. Failure prediction in
gearboxes by the processing and analysis of its vibration (rotational) signals have been
done with success. It has been concluded that the imminent failure could be predicted
accurately using cepstrum analysis (see Chapter 1). Vibrations in the gear meshings have
been monitored to detect failure in gears, where the tooth meshing vibration components
and their harmonics are elirninated from the spectrum of the time domain average. The
reconstructed time signal shows the presence of defects (if present) which otherwise
cannot be seen in the time domain average. This again underlines the importance and
usefulness of SP in the field of acoustic testing.
Time of Flight DitTraction (TOFD) technique for defect sizing. When ultrasonic
waves encounter a crack-like defect, not only reflection but also production of scattered
and diffracted waves takes place, over a wide angular range from defect tips. The
separation of diffracted waves in space and hence in time directly relate to the size of the
defect. By knowing the delays for different waves, it is possible to compute the size and
location of the defect.
This technique has been used and the results are found to be in conformity with ASME
XI with respect to determination of maximum acceptable defect height and depth of
upper and lower edges for internal defects which lie deeper than 30% of the specimen
thickness in steel exceeding 12 mm thickness. Again, the results are in conformity with
modified ASME XI for all defects in steel exceeding 10 mm thickness.
Synthetic Aperture Focusing Technique (SAFT) for increased resolution. In this
procedure, a larg~ aperture focused probe is synthesized electronically, thereby
increasing the fundamental resolution and defect sizing accuracy of the technique. A
wide angle compression probe and a point flaw in the specimen is assumed for the
purpose of simplicity. When the transducer scans over the specimen, each reflected echo
for various scan positions with respect to the position of closest approach of transducer
to the flaw, is delayed in time due to the greater distance travelled by ultrasonic waves. If
the individual scans are shifted by an amount equal to their predicted time delays, they
In-time failure prognosis and fatigue life prediction of structures 461

will come into coincidence with each other and when they are summed, the resultant will
be a large amplitude response. If the same procedure is repeated centered around another
position, the above time shift compensation does not produce a set of self-coincidence
scans which results in a significantly smaller response. The time shifts can be achieved
either electronically or digitally using a computer. This technique is an excellent example
of the advantages that accrue form the combination of conventional and advanced
techniques. Typical applications of this important technique, apart from radar, is the in-
service inspection of pressure retaining boundaries for accurate defect sizing.
Reduction of random noise using split spectrum processing. This technique is
implemented by splitting the frequency spectrum of the received signal by using
Gaussian, overlapping band pass filters having central frequency at regular intervals. If
the inverse Fourier transform is taken for N filters, N time domain signals are obtained.
These N number of time domain signals are subjected to algorithms such as minimisation
and polarity thresholding for extracting useful information. The split spectrum
processing technique is widely applied in the analysis of signals form noisy materials Iike
centrifugally cast stainless steels, carbon epoxy composites, welded joints and c1added
materials.
The ALOK technique The ALOK technique was conceived and developed by the
Fraunhofer Institute for NDE techniques (IzfP), Saarbrucken, Germany. The principle of
this technique is to characterize a reflector by its time of flight characteristics rather than
on the basis of its reflected amplitudes. A modified version of this technique, developed
by Siemens, rapidly acquires a manifold of amplitude and corresponding time of flight
values in each A-scan, concentrating on the relevant A-scan information by a specific
pattern recognition process. ALOK provides remarkable advantages with respect to
general improvement of the inspection, increase in the information density (reduction of
documentation) and simplification of data evaluation.
Microstructure and mechanical properties characterization using acousto-
ultrasonics. This approach is based on the concept that spontaneously generated stress
waves produced during failure interact with material morphology. By introducing
ultrasonic waves into the material, simulated acoustic stress waves are produced which
are affected by the material condition. The waves are measured in the form of stress
wave jactors (SWF), defined as the number of oscillations higher than a chosen threshold
in the ringdown oscillations of the output signal. The SWF is correlated to the
microstructure and mechanical strength. Damages in the specimen produces
corresponding changes in the signal attenuation resulting in lower SWF readings.

6.2.5 Conclusions

The techniques mentioned in the above sections are a selection of some of the more
recent developments in NDT and NDE. By the turn of this century, the number of
techniques that will be used in the field ofNDE, will be significantly large in number and
462 Real-time fault monitoring of industrial processes

varied in approaches. Development of techniques would be driven by the need to


characterize the number of engineering materials and the number of pro ces ses used to
manufacture these materials. Each of these processes would demand monitoring of a
variety of parameters in real-time, in order to ensure high productivity, minimal wastage
and acceptable characterization levels.
The knowledge and databases pertaining to these aspects will be very large, but the
expertise to assimilate and apply this large amount of knowledge at every stage of the
production and NDE of materials and components, will be scarce. Intelligent and timely
application of this knowledge in the production and NDE of manufactured materials and
structural components in service, would be the key for success in industrial ventures.
The use of neural networks for NDE signal analysis and classification, fuzzy logic for
decision making and expert systems for NDE are being increasingly studied and used
(see Georgel and Zorgati, 1992, Schlicht and Zhirabok, 1992, Garribba et al., 1988). The
motivation behind the use of these new concepts is the need for the acquisition, analysis,
storage, retrieval and presentation of very large amounts of possibly imprecise and
incomplete data. NDE, being an interdisciplinary area that affects and is affected by a
large number of complex variables ("concepts" or "inputs") in any problem, is a fertile
ground for these new concepts to grow and contribute.
Stavrakakis and Psomas, (1993), described popular neural network models powerful in
perforrning complex pattern classification for NDT data interpretation. A hybrid system
applied to the classification of ultrasonic images of weid defects, was described. Rules
and neural networks were used for different aspects of the classification task. Replacing
those rule-based modules, which classified echodynamies and potential defect areas, with
neural network modules, facilitated successful classification of previously unseen test
examples. The case of ECT data evaluation using NN was also examined. A neural
network which used optimum input parameters showed good potential as a tool for
automatie ECT data evaluation. Finally, experimental results performed at an Acoustic
Emission Laboratory were presented, to evaluate the location of an acoustie emission
source in a tube whieh simulates a fault, using a neural network model. Comparative
simulation results showed very good performance of the ANN-based acoustie emission
data interpretation method.
The ANN-based NDT data interpretation can learn and keep the advantages and avoid
the drawbacks of the present methods. Due to the parallel computation mechanism
inherent in the ANN structure, hardware IC ANNs are much quieker and more suitable
for on-line applications.
Acoustic NDE methods as applied to new materials, on-line acoustic NDE for real-time
monitoring of plants and components, life prediction of components and plants,
intelligent processing of materials and components are some of the new directions where
acoustic NDE methods would be required to contribute in the coming decade. In order
to do so, acoustic NDE methods would increasingly rely upon basic and advanced signal
In-time failure prognosis and fatigue life prediction of structures 463

analysis methods, pattern and cluster analysis methods and exploit the advances that are
being made presently in the field of artificial intelligence (AI).
It can be conclusively stated that acoustic NDE methods at their present state of
development would gain significantly by intelligent and balanced use of these
advanced concepts.

6.3 Real-time structural damage assessment andfatigue life prediction


methods

6.3.1 Introduction

Structural damage assessment is a process that involves decision-making under


uncertainty. Under the effect of the operating conditions in industrial plants and
structures, potentially dangerous phenomena affecting the structural integrity, are Iikely
to take place as fatigue due to high levels of temperature and pressure, creep-fatigue
interaction, stress corrosion, material embrittlement, etc. The analysis of these
mechanisms is of basic importance in the evaluation of the structure or the component
lifetime. This may be carried out at various levels:
• Damage analysis at the level of the granular and intergranular crystalline structure of
the material (physico-metallurgical approach).
• Analysis of the macroscopic effects of damage at the defect propagation level
(phenomenological approach).
From an analytical point of view, the reliability of a structure or a component can be
thought of as the probability of absence of crossings of a safe level by accumulated
damage processes. The best approach to its assessment is constituted by the theory of
stochastic processes (Lucia, 1985) and details will be given in the next paragraphs. In a
real-time structural damage assessment, starting with the initial defect distribution, which
gives the absolute number of defects according to depth after the structure fabrication,
real-time, mainly acoustic, NDE results (US and AE) permit to modify the initial defect
distribution by the growth of defects during the structure's Iifetime. The initial defect
distribution and the non-detection probability of the ultrasonic inspection are exponential
functions with a negative exponent. This means that they are decreasing with increasing
defect depth. The defect distribution will be updated after each in-service inspection and
all defects l~ger than the acceptable size are to be removed with the exception of those
which have gone undetected.
NDE data can be processed continuously with current signal analysis computer means
(see Section 6.2), while in-service inspection cannot be performed continuously because
of obvious economical and engineering limitations. For this purpose, a new strategy for
inspection and repair of structural elements and systems is presented by Thoft-
464 Real-time fault monitoring of industrial processes

Christensen and Sorensen (1987). The total cost of inspection and repair is minimized
with the constraints that the reliability of elements and/or of the structural system are
acceptable. The design variables are the time intervals between inspections and the
quality of the inspections. Numerical examples are presented to illustrate the
performance of the strategy. The strategy can be used for any engineering system where
inspection and repair are required.
Although the physico-metallurgical aspect is important for the understanding of rupture
mechanisms in structures, in the literature the phenomenological approach is more
commonly found, based on laboratory tests and semi-empirical models, using mainly
linear fracture mechanics (Lucia, 1985, Kozin and Bogdanoff, 1992). The fracture
mechanics relationsbips allow a link to be established between the defect dimensions, the
load level and the stress intensity. The stress intensity can be compared with the
material's resistance to rupture. Given the complexity of the phenomena involved
however, experimental support is essential for the definition of rupture mechanism
models. The problem arises at the moment of transferring the laboratory results, if
obtained in over-simplified environment and loading conditions, to component reality
and, on tbis basis, predicting the lifetime.
During the past twenty years a rather extensive effort has been devoted to developing
techniques that permit the accurate in-time prediction of structures fatigue life (see in
"Theoretical and Applied Fracture Mechanics", "Engineering Fracture Mechanics",
"International Journal of Fracture ", "Structural Safety" and "International Journal of
Fatigue"). As the knowledge related to fatigue of structures and materials expanded, it
became clear that in many cases fatigue could be treated from a propagation point of
view. Tbis knowledge has led to the development of phenomenological, fracture
mechanies and probabilistic fracture mechanies, stochastic process, time-series analysis
and knowledge-based approaches to assess in-time the fatigue crack growth (FCG) and
the failure probability in structures. Tbis is the "heart" of the in-time fatigue life
prediction leading to increased life of structures subjected to dynamic loads. These
approaches will be presented and discussed in the following.

6.3.2 Phenomenological approach lor latigue lailure prognosis

The interpretation of damage as the birth and propagation of defects in the elementary
structure of the material, caused by the altemating stress field acting on imperfections of
the crystalline network, on distortions due to impurities, etc. is generally accepted.
Qualitatively one can distinguish:
• An initial nucleation stage (defect generation).
• A transition stage (defects coalescence).
• A propagation stage (unstable growth ofthe largest defects).
In-time faHure prognosis and fatigue life prediction of structures 465

In general, NF being the number of cycles to rupture and No, NT, Ne the cycles at the end
ofthe nucleation, transition and propagation stages, one has:
(6.1)
wbich expresses the fact that each stage is determined by the level reached in the
preceding stage. The damage accumulation mechanisms are presumably different in the
three stages and in each one they are significantly dependent on the environmental
conditions and the stress field; for tbis reason the construction of a unified model of
interpretation seems unlikely. Furthermore, the relative importance (in terms of number
of cycles) of the above three fatigue stages depends on the intensity of the altemating
load. Tbis rather complex picture can be partially c1arified by the consideration that
experimentally evident fatigue failures in operating components are mainly caused by
fabrication defects, generally introduced during welding procedures. As a consequence,
the nucleation and transition stages are relatively less important than the propagation
stage, starting from the fabrication defects. These defects are, in fact, usually larger than
the defects of the elementary structure of the material.
To date, efforts have concentrated on development of independent models for cyclic
constitutive behavior, cyclic crack initiation and cyclic crack propagation. However the
transition between crack initiation and crack propagation has not been thoroughly
researched as yet, to be integrated in a unified life prediction method (Bhargava et al.
(1986), Lankford and Hudak (1987), Halford et al. (1989».
Having said that, the problem is: how to estimate the time to failure of a cyclically loaded
structural component. The estimation ofthe lifetime (expressed, e.g., by the number of
load cycles allowed) is more than just a research problem; it is a practical problem of
current design, considered by the current standards. For this reason it appears opportune
to present first the criteria of fatigue design of the ASME Sect. III standards and the
acceptance criteria of the defect propagation rates of the ASME Sect. XI standards.
The fatigue design and residual life time criterion of the ASME standards coincides with
the Miner rule. It is still the most commonly used, not only for components subject to
ASME standards but also, and especially, for components in the aeronautical industry for
wbich the dimensioning for fatigue is often the main consideration.
The ASME standards for components of conventional (Section VIII, Div. 2) or nuclear
(Section III) installations are based on limitations imposed to a "cumulative factor of
use" U:
U = LUj = Lnj / N j :s; 1 (6.2)
j j

where nj is the number of cycles envisaged for the cyclic load of type i and Nj the
allowed number of cycles as deduced from the fatigue curves (S-N curves). An S-N
curve is given as,
N=Ks-m(s> So) (6.3)
466 Real-time fault monitoring of industrial processes

where N is the number of cycles to failure under constant-amplitude stress range S, K and
mare the S-N curve parameters, and So is a stress cut-off level below which no damage is
accumulated.
The S-N curves used are generally obtained from monoaxial cyclic load tests at constant
amplitude, on notched sampies: a suitable safety factor, ofthe order of3 or 4, is applied
to the mean experimental curve to account for the considerable scatter of results for
design purposes. With a safety factor of 4 the design reliability is of the order of 99.9%
for a standard deviation of20%.
The fatigue damage accumulation D follows Miner's law as
No 1 1 No
D=L =-LSj (6.4)
i=l Ni (Si ) K i=l
in which No is the total number of stress cycles. In a deterministic design, it is assumed
that failure occurs when D= 1.
To account for model's (6.4) uncertainty by a random variable A, the fatigue limit state
function is defined as:
1 No
g(Z) =Ll-D =Ll--LSf (6.5)
K i=l
If one further defines that any random variable x can be expressed as,
x = Bxxc (6.6)
in which Xc is the characteristic value of x and Bx is a normalized random variable
associated with x, the limit state function (6.5) is then ofthe following form:
1 No
g(Z) =Ll- B K L(BsSi,c)m (6.7)
1( c i=l
in which the S-N exponent m is usually treated as a deterministic constant.
Failure is defined by the e,:ent g(Z)SO, and g(Z»0 identifies a safe state. The failure
probability is,

Pr = P(g(Z) ~ 0) =Jg(Z)!>ofz(z)dz (6.8)

wherefz(z) is the multivariate density fimction of Z.


For a large class of engineering problems, the basic random variables are generally
modeled by continuous probability functions and the failure probabilities are generally
small. Hence, it is generally preferable to apply the analytical FORMISORM, as the
methods are very efficient and accurate for small failure probability problems. Detailed
In-time faHure prognosis and fatigne life prediction of structures 467

computational procedures ofFORMISORM are given in Madsen et al. (1986), where the
reliability index ais applied, which is related to the failure probability by
ß= - ~l(Pf) (6.9)
in which tP( ) is the standard normal distribution function.
From technical or economical considerations, the required safety level in the design Iife
may sometimes have to be achieved by additional safety measures such as inspection, so
that the design safety can be updated to the required safety level. The principal of
reliability updating is based on the definition of conditional probability,

P(PI/) = P(P n /) (6.10)


P(I)
where I is the inspection event and F is the structural failure event which is described by
F=g(Z) 5fJ. Computational methods for updating the failure probability may be found in
Madsen et al (1986).
The question of the accuracy of the Miner rule has often been raised, a1though few valid
attempts have been made to verify it experimentally because of the high cost of these
experiments. During the seventies, many experimental campaigns were undertaken, in
particular by NASA and Messerschmitt-Bolkow-Blohm, leading to the conclusion that
the Miner rule has some relevant drawbacks, since there are no criteria to estimate a
priori whether the prediction will be conservative or not and because the parameters
which influence the estimation are not known.
The simplicity of the Miner rule is the result of the main hypothesis on which it is based:
the linear accumulation of damage. This hypothesis corresponds to the assumption of a
constant failure rate which is in contrast with the experimental evidence of fatigue
damage, characterized by an increasing rupture rate.
Recently, Ben-Amoz, (1992), developed a cumulative damage theory to predict the
residual fatigue life in two-stage cycling. Based on the concept of bounds it was shown
that the mean residual fatigue Iife, as weil as the entire test data scatter, can be bracketed
by appropriate bounds. However, the use of such a theory requires prior knowledge of
the Iife fraction spent in initiating fatigue crack. It is shown there that the crack initiation
Iife fraction can be determined from two-stage cumulative damage tests. The bounds are
derived from both high-to-Iow and low-to-high amplitude fatigue tests. Furthermore, it is
shown that the two bounds actually coincide to give an exact expression for the crack
initiation Iife fraction.

6.3.3 Probabilistic fracture mechanics approach for FCG life estimation

The prediction of the fatigue damage accumulation relies on the S-N curves where the
number of cycles to failure is based on a large visible crack size and the remaining fatigue
468 Real-time fault monitoring of industrial processes

life is conservatively neglected. The prediction of the fatigue crack growth (FCG) using
the fracture mechanics, however, describes the crack growth physically and is able to
calculate the fatigue life up to fracture accounting for possible inspection effects. Tbis
approach is thus more sopbisticated and its application is becoming widespread.
Some hundreds of more or less different relationsbips can be found in the literature for
expressing the fatigue growth of cracks (Hoeppner and Krupp (1974), Akyurek and Bilir
(1992». Some of them are purely theoretical or based on microscopic properties of
material, but the most widely employed are semi-empirical and have been developed
mainly as interpretative models of experi-mental results: they allow the prediction of the
behavior of the crack size "a" as a function of "N" the number of stress cycles. This
prediction is based on the integration of the growth rate, for fixed initial conditions of the
defect. The result arrived at is not, however, in general, representative of the real growth
situations. Tbis is due both to the fact that the initial conditions have a considerable
scatter of values and to the fact that, for the same initial conditions, there is an intrinsic
variability in the process of damage by fatigue which leads to a distribution of values "a",
at cycle N (Virkler et al., 1979). It thus appears natural to consider the relationships for
defining the growth rate as stochastic (Ghonem and Dore, 1987). In tbis context,
therefore, prediction methods can be seen as being based on the integration of the FCG
relationsbips with the parameters or the initial conditions or the loads represented by
random variables. These procedures lead to the determination of a distribution of
dimensions for the propagated defect, at cycle N, or of a distribution of the number of
cycles N for a given propagation from ao to af These distributions are the basis for the
prediction of the residual life of structures stressed by fatigue. Three randomization
methods are presented by Lucia (1985), as an indication ofthe vast range of applications
of tbis methodological approach.
ProbabiJistic models for fatigue crack growtb.
As mentioned before, the statistical variability in crack growth depends on many
undetermined factors which can be classified as small differences of material intrinsic
properties, loading environment, specimen geometry, measuring system, even
microstructure and the state of stress, etc. In general, the fatigue crack growth can be
expressed by the following nonlinear relation,
da/eiN= Q(AK, Kc ' R, Kth , a, ...) (6.11)
where,
~ ...) a non-negative function,
a half-crack-Iength, mm
N number of fatigue cycles, (cumulative load cycles)
AK stress intensity factor range at the crack tip, given by the relation
AK=S(na)lI2F(a), MPa ml12
S applied stress (load) range, MPa
F(a) crack shape geometrical factor (see Verreman et al., 1987, Dufresne
et al., 1988)
In-time faHure prognosis and fatigue life prediction of structures 469

fatigue crack growth (FCG) rate, m/cycles


algebraic ratio ofthe minimum to maximum load (stress) in a cycle
material fracture toughness (i.e. the critical value ofthe stress
intensity factor), MPa ml!2
FCG threshold stress intensity, MPa ml!2
The problem of determining the fracture toughness parameter for a specific industrial
structure material (a double-sided spiral submerged arc welded pipeline) from
measurements, is analytically treated by A1-0baid, (1992). The FCG threshold is the
upper bound of a set of amplitudes at a given Kmax> which, when applied to a fatigue
crack, do not produce crack propagation, independent of the way these amplitudes are
applied. Cycles with amplitude AK greater than threshold produce FCG at that Kmax:
If the lower limit for fatigue crack growth under zero-tension fatigue is defoned a10ng
with K~ax ' a FCG threshold has to exist and has to be measured over the fatigue
loading range between K~ax and Kr:- It must be noted that Kc iIIustrates some failure
condition. Marci, (1992), clarifies these concepts and discusses the experimental
procedures to determine K th for a specific material as weil as the requirements to ensure
transferability of the experimentally measured threshold to service type fatigue loading.
The empiricallaws which are most often used in engineering are the Paris-Erdogan and
Forman laws (Hoeppner and Krupp, 1974, Lucia, 1985, Akyurek and Bilir, 1992):

da = C(Il.K)m (6.12)
dN

da C(Il.K)m
=--"'-----'--- (6.13)
dN (I-R)Kc -i1K'
where the stress cut-off level So(.IlKth ) is a function ofthe threshold ofthe stress intensity
factor range (AKth ), below which there is no crack growth, and C, m=crack
growth parameters.
Laws (6.12) and (6.13) are almost universally applied to stage-lI FCG, that is, crack
growth at a1temating stress intensity values somewhat larger than the threshold
a1temating stress intensity value AKth , but below the value of AK at which unstable crack
propagation begins to occur.
All of the factors and parameters mentioned above are treated as random variables in
probabilistic fracture mechanies (PFM). Therefore, the statistical investigation and the
accumulation of statistical data of these parameters are very necessary and important for
the reasonable and economic design of fatigue structures. In particular, for fatigue
reliability analysis of structures, a probabilistic or stochastic model is required for fatigue
crack growth. As a result, many probabi-Iistic or stochastic fracture mechanies models
470 Real-time fault monitoring ofindustrial processes

have been developed to deal with the variability ofcrack growth (Lucia, 1985, Journet
and Pelloux, 1987, Ghonem and Dore, 1987, Cortie and Garrett, 1988, Zhu and Lin,
1992, Nisitani et al., 1992). These models have their realistic physical or microstructural
basis for some special conditions. A major problem with these models is the difficulty in
obtaining sufficient data due to time and money. For tbis reason, some models are not
verified by experimental data, and it is difficult to apply some models in engineering.
The purpose here is to present a simple probabilistic model which is easy for designers to
use in predicting crack growth behavior. In the model, crack growth parameters C and m
in the Paris-Erdogan and Forman laws are considered as random variables, and their
stochastic characterizations are found from a crack growth experiment with small sampie
size. Furthermore, using the COVASTOL computer program (more details for tbis
program are given later), the statistical distributions of crack growth rate da/dN and
cycIes to reach a given crack length are obtained. The experimental resuIts are used to
verity the theoretical prediction of the statistical properties of fatigue crack growth
behavior for aluminum 2024-T3 test specimens.
Material inhomogeneity has long been considered to be an important factor in crack
initiation. However, it also has considerable influence on crack growth, wbich is not
commonly perceived in deterministic fracture mechanics. Material inhomogeneity is
usually negligible in crack growth under generallaboratory conditions, especially under a
random spectrum, because the fatigue stress dominates the scatter aspects of crack
growth. However, there is considerable variability in a well-controlled test under a
constant amplitude spectrum. For a good probabilistic model of crack growth, material
inhomogeneity must be involved.
Several different approaches have been followed for the probabilistic modeling of
material inhomogeneity. The most common approach is to randomize the crack growth
parameters. For the crack growth equations of Paris-Erdogan and Forman there are
several randomizations possible: both C and m in the Paris-Erdogan and Forman laws
could be random variables; or C could be a random variable and m a constant; or m could
be a random variable and C a function of m. However, C is really not a material constant
(as was initially assumed by Paris), but it depends on the mean stress or stress ratio,
frequency, temperature, etc. In particular, the stress ratio R is recognized to have
significant influence on C.
Then, Paris-Erdogan and Forman equations can be transformed respectively to yield:
In(daldN) = In(C) + m In(AK) ~Y= In(C) + mX (6.14)
where,
Y=ln(da/dN), X=ln(AK)
and,
In(da/dN) = In(C) + m In(AK) -ln[(1 -R)Kc-AK] ~Y= In(C) + m X (6.15)
where,
In-time faHure prognosis and fatigue life prediction of structures 471

y= In(da/dN) + In[(1- R)Kc - AKJ, X= In(AK)


According to relations (6.14) and (6.15), the C and m values of each specimen can be
obtained from the test results of "a" vs. N curves and linear regression analysis, which is
identical to the method ofleast-squares or the method ofmaximum likelihood.
There is a considerable experimental evidence that the crack length a (N) as a function of
the number of accumulated cycles can also be modeled by the following three
exponential forms (Hoeppner and Krupp, 1974).

a(N) = CI N111J.

a(N) = C2 (loglO N)m2 (6.16)

a(N) = C3Cm3N
where Cj and mj ; ;=1,2,3, are functions of applied load, material characteristics,
geometrical configuration of the component and the initial quality of the product being
tested. Equations (6.16) can be rewritten as:
In a(N) = In Cl + ml InN
In a(N) = In C2 + m2 In [logloN.1 (6.17)
In a(N) = In C3 + m3N= C* + m~
Thus, regression lines of various types can be obtained for: the crack growth data
reported for each test, for all data from a given specimen geometry, and for all data
considered as one group.
The raw data from a crack propagation test are the half crack length, a, and the number
of cumulative load cycles, N, needed to grow the crack to some crack length, a, from
some reference initial crack length.
The current interpretation of these data is to report the FCG rate, daldN, vs. AK, AK
being the stress intensity factor range at the crack tip for each individual test. The
graphical representation ofthese data includes a log-log plot of daidN vs. AK, leading to
the best fit straight line on this plot, see fig. 6.17. This data processing method is strictly
related to the use of the well-known Paris-Erdogan low or Forman law as a model for
the FCG rate.
The overall variability encountered in FCG rate data depends on the variability inherent
in both the data collection and data processing techniques. If C and m are taken as
random variables, C and mare related. Cortie and Garrett (1988) have shown that the C-
m correlation, while present, does not possess any fundamental significance and is purely
the result of firstly, the logarithmic method conventionally used to plot the data, and
secondly, the nature of the dimensions of the physical quantities used in the Paris-
Erdogan equation. In the light of the probabilistic theorem, distribution of crack growth
rate daidN as a function of AK can be deduced form stochastic characterizations of C
472 Real-time fault monitoring of industrial processes

and m as weil as the above logarithmic equations Crack growth rate da/dN is able to
accept a log-normal distribution (i.e. In(da/dN) is able to accept a normal distribution),
and its mean and variance are given using the Paris-Erdogan equation, by (two-variable
prediction method):

E[ln : ; ] = E[ln(C)]+E[m InilK]

cr[ ln( :; )] = {cr [ln(C) + cr2[m In(ilK)] + 2Pcmcr[ln(C)] cr[ln(ilK)]}1I2


2 (6.18)

where AK can be taken as any value, and the means, variances and correlation Pcm of C
and m are taken from the statistical analysis of the raw FCG test data (Virkler et al.,
1979, Stavrakakis et al., 1990).

-5-8
-6-0 i IJ
I' tf I

"ä)-6-2 I· .

~-6 -4
~
E-6'6
Z-6'8
'0
~ -7-0
'0

~ -7-2
-J -7'4
-7,6
-7'8

-8'0 'r-~""'--'---'--"'----r-~-'--~"'--"--''---'--"T---.J
0·92 0·97 1{)2 1·07 1-12 ,.,7 1·22 1-27 1-32 1·37 1·40

Figure 6.17 Summary of FCG rate data for the Virkler et al. (1979) case calculated by the
ASTM E647-83 standard rnethod.

If m is a constant and C a randorn variable, by the same principle, the crack growth rate
da/eIN as a function of AK, can be shown to follow a log-normal distribution with mean
and variance given by (single-variable prediction method):
In-time failure prognosis and fatigue life prediction of structures 473

E[ln :~ ] =E[ln(C)] + m InAK


a[ln(:~)] = a[ln(C)+m InAK] = a[ln(C)] (6.19)

The Forman law can be used similarly.


The Shapiro-Wilk and Kolmogoroff-Smimov tests can be applied to test if the
logarithmic crack growth rate ln( dw'dN) could accept a normal distribution. Virkler et
al., (1979), have shown that daidN can accept a log-normal distribution, at least at a
10% level of significance, and the mean and variance of the experiment agree with the
predictions. The means agree more closely with predictions. Moreover, the two-variable
prediction method shows a slightly better correlation with experimental results than the
single-variable prediction method, from more comparisons between predicted and
experimental results (which are omitted for conciseness).
The distribution of cycles to reach a given crack length. The two-variable prediction
method, in which C and mare random variables, is applied to obtain the distribution of
cycles to reach a given crack length. The cycles to grow from an initial crack length ao
to another length aj , denoted Nglai , are inverted to:
ao

N lai = J.a i da (6.20)


g ao ao C(AK)m

From eq. (6.20), Ngla, is a joint random variable of C and m. The Monte-Carlo
ao
simulation technique can be applied to get a convenient distribution of Nglai through
ao
simulating distributions of C and m.
The distribution of crack lengths after a given service life (number of cycles). This
procedure computes the propagation of a given defect or distribution of defects in a
given position and the corresponding failure probability during accidental loading. It is
thus more meaningful for real-time fatigue life prediction than the previous one.
The competence for facing such problems of cumulative structural damage has been
acquired in the Components Diagnostics and Reliability Sector at the Joint Research
Center of EEC, Ispra-Italy, from the development of analytical models for the
representation of cumulative damage process and for the estimation of lifetime
distribution under fatigue loading. Two numerical codes have been developed to tbis end,
namely COVASTOL and RELIEF. The COVASTOL code has been developed in the
framework of a more general study on the in-time estimation of nuclear reactor pressure
vessels residual life time and failure probabilities. It is based on the application of
probabilistic linear elastic fracture mechanics on statistical distributions of data
conceming flaws, material properties and loading conditions (see Dufresne et al., 1988).
474 Real-time fault monitoring ofindustrial processes

The RELIEF code is based on the representation of the process of damage accumulation
as a semi-Markovian stochastic process; no assumptions are made ab out the elementary
mechanisms causing the accumulation of damage. The latter approach will be presented
in the next section.
The COVASTOL code estimates the FCG rate by Paris' law with statistically distributed
coefficients. The probability of onset of unstable crack propagation is estimated through
the convolution of the distributions of the stress intensity factor and of the material
resistance expressed by the static fracture toughness. The great advantage of this model
is its simplicity, while tests are necessary to determine the coefficients m and C. It
should, of course, be kept in mind that the Paris relationship does not generally describe
correctly the behavior of cracks in the nucleation stage or near fracture; for small AK, for
example, the propagation rate is overestimated (Nisitani et al., 1992). However, it should
be pointed out that no model describes in its entirety the crack propagation phenomenon.
Under these conditions, the definition of at least three ranges of AK should allow more
accurate FCG predictions. In that respect it is also certain that the different methods of
treating the original data (a, N) introduce a scatter connected to the more or less
pronounced importance of the subjective factor in each method. The method in the
COVASTOL code is as follows (see also ASTM E647-83 standard):
• starting with the experimental data in each of the ranges considered, (daldN)mean is
computed for a certain number of AKi levels;
• A linear regression is performed to determine, according to these values, the
parameters m and C relative to Paris' law for the AK ranges considered. In each of
these classes of AK a mean value of m is computed and retained and from this value
the distribution of C is calculated. This distribution is presented in the form of a
histogram of five class intervals.
It is quite important to mention here that in the operations connected with the fatigue
crack growth calculation, a special procedure is implemented for the combination of the
histograms, as folIows:
• a given pair ofvalues (or class intervals) ao, bo (elliptical defects are considered) is
combined with every class interval of C (coefficient of Paris law) only for the first
stress transient;
• after that, ao, bo are transformed in one pair ofhistograms aI, bIo In the subsequent
transients only combinations among class intervals of the same order are taken into
account.
Conceming the width of the defects, because no data are usually available from
manufacturers, its distribution is calculated by estimating the probability for two or more
defects (assumed with one weId bead width) to overlap, both in horizontal and
transversal sectiono
The defect length and width distributions so obtained correspond to the observed defects
in a weId or a structure after fabrication and before repair, and are corrected
In-time failure prognosis and fatigue life prediction of structures 475

automatically in order to take into account the sampie size, the accuracy of the
measurement equipment, the size of acceptable defects according to the construction
ruIes, and the reliability of the NDT methods (probability of having undetected and
correspondingly unprepared defects).
To consider all combinations among a, b and C class intervals at every stress transient
would in fact mean continuousIy mixing the material properties, whose scattering has, on
the contrary, to be applied only once to the fatigue phenomenon considered as a whole.
The modeling defined above was introduced in the COVASTOL computer code thus
allowing calculation of the propagation along the two axes of an elliptical defect
subjected to a periodicalloading. Temperatures and stresses as a function oflocation and
time are given as deterministic analytical functions for each situation.
The probability of onset of unstable crack propagation is calculated as the convolution of
fracture toughness and stress intensity factor histograms. Its evolution is followed during
the stress transients as well as the evolution of any defect.
The COVASTOL program outputs give on the one hand the crack growths and if needed
the evolution of defect size distribution and on the other hand the rupture probability
associated with each defect size. The crack growth and the rupture probability
computation procedure for internal and surface defects as well as test cases to calculate
the rupture risk of welded steel Pressure Vessels are well presented and analyzed by
Dufresne et al. (1986, 1988) where the reader is referred for details.
The failure probability, when a sophisticated program like the COVASTOL code is not
available, can be calculated by using the limit state function concept. By integrating the
Paris-Erdogan law (6.12) one obtains:

fN [F(a)&]m
llo
da =CLS~
N

j=} J
(6.21)

where ao is the initial crack size and aN is the crack size after N stress cycles. For a given
critical crack size a c ' the failure occurs when ac-aN.s O. Hence the limit state function
can be expressed as:

(6.22)

A similar limit state function using the Forman law (6.13) can be easily evaluated. The
failure probability of the structure can be evaluated using the equations of Section 6.3.2
and the statistical distributions ofthe parameter C (m can be considered constant) and of
the load sequence Si' All the above analysis concerns mainly FCG under static loading
conditions.
Zhu and Lin, (1992), propose a new analytical procedure to predict the fatigue life and
reliability of a mechanicalor structural component, with random material resistance to
476 Real-time fault monitoring of industrial processes

crack growth and under random loading. The procedure is applicable when the fatigue
crack growth is a slow process compared with the stress process, which is the case for
the high cycle fatigue. In the special case, in which the stress is a narrow-band stationary
Gaussian process and a randomized Paris-Erdogan crack growth law is applicable,
analytical expressions have been obtained for the probability densities of the fatigue crack
size and the fatigue life, and the reliability function. A numerical example is given for the
case of a degrading system. The accuracy of the proposed analytical procedure is
confirmed by comparing the theoretical and simulation results.
Quality of the fatigue life prediction and failure prognosis. From the above
discussion it is clear that the quality of the prediction depends directly on the quality of
the method used to process the raw FCG experimental data and to estimate the
parameters of the probabilistic fracture mechanics model. A poor estimation of the
parameters will lead to an inaccurate prediction of the life-time, even if sophisticated
FCG prediction models are used (Stavrakakis et al., 1990).
The currently used standard method to estimate the parameters, the ASTM E647-83, has
several weak points. The determination of the derivative, daldN, required by this
method, introduces a scatter in the FCG rate data which varies considerably with the data
processing technique used. Thus, a significant variation in daidN at a given l1K level is
introduced due to the raw FCG data processing technique. This variability introduced on
the estimated FCG model parameters distributions due to the FCG raw data processing
technique leads to a pessimistic structural reliability assessment. Moreover, the ASTM
E647-83 standard method is strictly related and thus limited to the application of the
Paris law or Paris-like models to describe the FCG phenomenon.
Stavrakakis, (1992), proposes a general method to process the raw FCG data based on
non-linear regression techniques. In this method, the parameters of any probabilistic FCG
rate model are estimated directly form its integral form, namely a=j(N). It is not
restricted to the application of the Paris law for FCG predictions and handles, because of
its generality, any probabilistic FCG rate relationship with any number of parameters.
This method permits a significantly reduced contribution to the variation in daidN at a
given l1K level, due to the raw data processing method. The performance of the method
is evaluated using the integrated computer program COVASTOL for structural reliability
assessment, when the FCG rate model coefficients are determined by the currently used
ASTM E647-83 standard method and the new technique proposed there.
The two methods were used to process the Virkler et al. experimental data. The Virkler
et al. data comprises 68 replications with constant load amplitude cycling loading. The
data consists of the number of cycles required to reach 164 crack lengths, starting at
9mm and terminating at 49.8 mm, for each replication. Center crack (CCT) aluminum
2024-T3 test specimens were employed. The 68 sampie functions of time (cycles) to
reach a half crack length a are plotted, statistically analyzed and discussed in Virkler et
a/., (1979). The non-linear FCG rate model considered was the Paris law for
convenience.
In-time failure prognosis and fatigue life prediction of structures 477

In order to evaluate the influence of the FCG data processing method on the results of a
FCG prediction program, the COVASTOL program was run for the same initial crack
and stress transients conditions as in the Virkler et al. experiments, namely ao=9mm and
/1(1 =48MPa except for the Paris law parameters, i.e. the mean value of n and the C-
histograms (Stavrakakis, 1992). First, the prediction of the defect propagation after a
service-life of 2xl05 cycles as resulted from the COVASTOL program is performed,
when the Paris law parameters used are those derived by the standard ASTM E647-83.
Then, the defect propagation prediction after 2x 105 cycles is calculated by the
COVASTOL program for the same initial and loading conditions, but with the C-
histograms, n-mean values derived by the non-linear regression method. Finally, the real
defect distribution (histogram) after a service life of 2x 105 cycles, derived directly from
the Virkler et al. experimental data, is given.
A comparison of the predicted defect histogram for the propagated crack length after 2x
105 cycles with the real defect histogram has shown that even if the real crack length
classes are predicted, the predicted probability of the upper classes (crack-Iength-a
between -31mm and -40mm) is very high (-50 per cent) compared to the reality (less
than -10 per cent). Moreover, the prediction gives a small probability (-8 per cent) for
fast crack propagation and fracture (crack length a> -55mm up to -7Imm) that does
not exist in reality. This is a quite conservative (i.e. pessirnistic) prediction.
A comparison of the predicted defect histogram after 2x 105 cycles with the real defect
histogram has shown that the predicted probabilities ofthe different crack-Iength classes
differ by less than 10 per cent from those of reality, and a successful coincidence between
the two histograms occurs.
Thus, it is obvious that even if the variability introduced by the raw FCG data processing
techniques ofthe ASTM E647-83 standard does not induce a significant amount ofbias
in the processed results it can induce an unacceptable bias in the final FCG prediction and
residual-life-time results which make them conservative and thus less realistic.
In the above experimental evaluation tbe Paris law was used because tbis is tbe case in
the COVASTOL program. This is not restrictive in any case. An analysis of exarnining
the applicability of the unified fatigue crack propagation (FCP) approach proposed
earlier for the FCP in engineering plastics such as PMMA and PVC is described by Chow
and Wond (1987).
A Paris-like formulation is proposed to characterize FCP in polymeric materials and it is
found, using measurements, that it is able to assess satisfactorily the FCP in both PMMA
and PVC materials.
In this way, all the considerations of this section can be easily extended using tbis
formulation to assess in-time the FCP phenomenon in polymeric materials and plastic
pipes.
478 Real-time fault monitoring ofindustrial processes

6.3.4 Stochastic pTocess appToacl, lOT FCG life pTediction

In general both the loading actions and the resistance degradation mechanisms have the
characteristics of stochastic processes. They can thus be defined as random variables
which are functions of time. The particular load history which affects a component is one
of the possible realizations of the stochastic load process and the same applies for the
environmental condition or for the evolution of the dimensions of a defect inside the
component.
The prediction of the component lifetime is to a large extent based on the representation
of the stochastic processes which act on the component. The damage accumulation
mechanisms can, in general, be represented by a positive "damage rate" function such
that the measure of damage is a monotonie increasing function of time.
The physical situation to be contemplated is as folIows: a structural component is in
operation in a certain environment. During cyclic operation, irreversible changes occur.
These irreversible changes accumulate until the component can no longer perform
satisfactorily. The component is then said to have failed. The time at which the
component ceases to perform satisfactorily is called the time-to-failure or the lifetime of
the component.
The process by which the irreversible changes accumulate is called a cumulative damage
(CD) process. Fatigue, wear, crack growth, creep are examples of physical processes in
which CD takes place.
The particular damage process of interest here is the FCG as experienced for instance in
failures of pressurized mechanical systems having a structure which contains defects as a
result of technological operations like weldings. The defect dimensions, although
continuous variables, are in fact associated with a discrete level or state which allows
(without excessive restrictions) the use of well-known mathematical tools for discrete
Markov processes.
The damage levels are represented by the states >=1, 2, "', b; b being the conventional
rupture state. The loading process is represented at cycle x by the transition matrix Px :
PI ql 0 0 0
0 P2 q2 0 0

P,-
x- (6.23)

0 0 0 0 0 Pb-l qb-l
0 0 0 0 0 0 1
wherepj' qj>O,Pj+lJ,,-=I; j=l, 2, ... , b-l.
In-time failure prognosis and fatigue life prediction of structures 479

As the transition between the states is governed by eq. (6.23) the damage state at cycle x
is linked to that at cycle x-I by:
(6.24)
and thus,
x
Px=PoTIPk (6.25)
k
which describes a unitary jump (UJ) stationary stochastic process.
The relationships (6.23)+(6.25) represent the mathematical basis ofthe discrete Markov
process; from them one can easily find the probability distribution of the number of
cycles to failure and of the damage level at a given number of cycles x (Bogdanoff and
Kozin, 1985).
The sampie functions (SFs), that is the functions a(N) of each sampie of the set form
FCG experiments, are the complete, even if elementary, representation of the damage
process. Starting from a set of SFs and from the first two statistical moments (mean
value and variance of the cycles) related to a given value of the crack size a, the
Markovian model (the above three equations) ofthe FCG process can be defined. This is
called a unitary-jump (VJ) stationary B-model of CD. The mathematical details of this
operation may be found in Bogdanoff and Kozin, 1985. The important point to be
remarked here is the fundamental hypothesis of a Markovian process (that is the
statistical independence of damage states).
The Markovian hypothesis characterizes a process "without memory" of the past events
except those which occurred in the time immediately before. This assumption is purely
theoretical because any damage state depends on the past history (Lucia et af., 1987,
Kozin and Bogdanoff, 1992, Bogdanoff and Kozin, 1985).
The disadvantages of the Markovian assumption are related mainly to an overestimation
of the variance of the predicted time-to-failure distributions, when the initial crack
population is different from the trivial case of a single crack, located at the origin of the
SFs set.
The way to overcome this limitation of the B-model of CD has been suggested by
Bogdanoff and Kozin (1984) which consider the propagation of a population of cracks as
the superposition of many elementary propagation processes, each one starting from a
particular crack size ab belonging to a given initial distribution. This can be done if one
thinks in terms of many VJ stationary B-models, each starting from crack size ak and
considering as random variables the differences in cycles (~-Nk)' where Nk is the cycle
number corresponding to ak' These new variables constitute a sub set of the main random
variable set ~ defined for every j#-k.
The statistical moments of the first and second order corresponding to these variables are
expressed taking into account the statistical dependence between ~ and Nk- The
480 Real-time fault monitoring ofindustrial processes

application of the method of statistical moments to the random variables (~-Nk) or/and
(T;-Tk) (lj being the holding time in the state S) for the estimation ofthe parameters of
each VJ Markovian model implicitly introduces the statistical dependence of the
theoretical random variables ~ (or T;).
A CD model having these characteristics is called a semi-Markovian B-model, because
the fundamental assumption of an elementary Markovian model is disregarded and the
dependence between ~ levels is considered. With tbis difference in mind the computer
code REliEF 2·0 was developed at the Joint Research Center of EEC at Ispra-Italy
wbich optimizes the efficiency of the Markovian scheme according to the above
considerations. The calculation of the first and second order statistical moments to
estimate the CD B-model parameters is now included in the code itself due to the
dependence of tbis calculation step on the current crack size. In particular the evaluation
of the covariances has now to be carried out in order to account for the statistical
dependence between the number of cycles at the different crack sizes describing the
process.
In their recent work, Kozin and Bogdanoff, (1992), propose and study a probabilistic
macro model ofFCG based upon a micro result from the reaction rate theory. A center
crack panel under periodic tensile load is the basic physical situation considered. The
moders explicit dependence on the temperature and the wave form of the periodic load
indicates the importance of these two quantities in the evolution of the crack length. The
straightforword relation of the semi-Markovian B-model parameters with the parameters
of tbis probabilistic model illuminated many of the complexities that are experimentally
observed in the FCG process.
The simplicity and flexibility of models based on Markov schemes is the reason for their
frequent appearance in the literature. In the case in which the emphasis is rather upon the
stochastic process of the loads and environmental conditions than upon the mechanism of
damage accumulation, the traditional techniques for the treatment of processes of this
type become more important. It is in tbis context that the Caldarola and Bolotin methods
are described representatively by Lucia (1985). Many others can by found in the
literature.
The structural reliability, in its most stringent formulation, can be defined as the
probability that the largest of the loads envisaged is smaller than the smallest of the
resistances hypothesized. This means that what one needs to know is the distributions of
the extreme values of the loads and of the resistances, rather than their effective
distributions. Tbis observation, together with the fact that the possible distributions of
the extreme values of a random variable are asymptotically independent of the
distribution of the variable itself, leads to the consideration of the extreme values theory
as a fundamental ingredient of structural reliability.
Some methods, all based on the hypothesis that the lowest resistance has a Weibull
distribution, have been proposed by Freudenthal, Ang and Talreja and presented by
Lucia, (1985).
In-time failure prognosis and fatigue life prediction of structures 481

A comparative study of the probabilistic fracture mechanics and the stochastic


Markovian process approaches was performed by Stavrakakis et al., (1990). The two
computer codes COVASTOL and RELIEF based respectively on the randomization of a
differential crack growth law and on the theory of discrete Markov processes were
applied for fatigue crack growth predictions using the Virkler et al. (1979) and Ghonem
and Dore (1987) sets of data of crack propagation curves from specimens. The results
are critically analyzed and an extensive discussion is cited there on the merits and
limitations of each approach.
The analysis and the presented resuIts of the last two sections permits one to deduce
some propositions on the applicability of the two codes (and consequently of the
corresponding theoretical approaches) for real-time structural reliability assessment.
The COVASTOL code can be used for any real structure FCG prediction and reliability
safety assessment given that any stress transient corresponding to some defined real
conditions can be applied. Code outputs giving the predicted defect distributions could
be more realistic - less pessimistic - if the scatter introduced by the treatment of the raw
(a, N) data could be reduced, i.e. by using the method proposed in Stavrakakis (1992).
For the COVASTOL code numerical experimentation, raw data on defects (number;
positions; dimensions) and material properties (FCG law parameters; static fracture
toughness; crack arrest toughness; etc. ) have to be converted into histograms by
preliminary processing. Temperatures and stresses as a function of location and time are
given as deterministic analytical functions for each situation. An important advantage of
the COVASTOL code is that it is data-base independent, that is it can be applied for
FCG predictions for stress transients other than those catalogued in the component
material fatigue properties databank. Code outputs give the evolution of defect size
distribution and the rupture probability associated with each defect size. Thus the
COVASTOL code has a very large applicability to problems involving real structures.
The RELIEF code can be used in real situations where a representative databank exists
for well-defined stress transients and environment conditions. The possibility to apply
catalogued crack-growth processes, obtained in certain elementary loading conditions, to
other more complex situations occurring in real structures, is a delicate problem which
should be carefully exarnined in the particular context, with the approximations decided
by the analyst. This necessity restrains the applicability of the code on structures which
are loaded (even accidentally) with a limited small number of stress transients during
their life.
The RELIEF code is based on the representation of the damage accumulation as a
stochastic process by omitting any analytical fracture mechanics relation. It is important
to recall that the markovian approach (B-model), as implemented in the RELIEF code,
may be applied to any type of damage process (creep, corrosion, material embrittlement,
etc.) when a representative description of the process is given in terms of a SFs set.
Even though the structural reliability assessment performed by the RELIEF code is more
482 Real-time fault monitoring of industrial processes

precise than that performed by the COVASTOL code (smaller scatter), its applicability is
limited with respect to complex real situations.
FCG predictions allowed by the RELIEF code are those concerning the SF sets (same
material, environment conditions, type of load) corresponding to the different loading
intensities (stress transients) which have been catalogued in the databank.

6.3.5 Time series analysis approachfor FCG prediction

The principle underlying this methodology is that the fatigue crack growth data (N, a)
occur in a form of a time series where observations are dependent. This dependency is
not necessarily limited to one step (Markov assumption) but it can extend to many steps
in the past of the series. Thus, in general, the current value Na (=number of cycles at
crack size a) of the process N can be expressed as a finite linear aggregate of previous
values of the process and the present and previous values of a random shock u (Solomos
and Moussas, 1991), i.e.
(6.26)
In eq. (6.26) Na' Na-I> N a-2, ... and Ua' ua-I> ua-2, ... represent respectively the number
of cycles and the value of the random shock at the indexing equally spaced crack sizes a,
a-I, a-2, ... The random shock u is modeled as a white noise stochastic process, whose
distribution is assumed to be Gaussian with zero mean and standard deviation O'u
(specified by the structure random loading conditions).
Defining the autoregressive operator 01 order p by,
rp(B) = 1 - rpIB - rp2 B2 - .. , - rpJY'
and the moving-average operator olorder q by,
B(B) = I - BIB - B2B2 - ... - BIfl
Eqn. (6.26) can be rewritten compactly, as,
rp(B)Na = B(B)ua
It is recalled that B stands for the backward shift operator defined as BWa=Na-s-
Another closely related operator, to be used below, is the backward difference operator
V defined as VNa= N a - N a-1 and thus equal to I-B.
In an attempt to physically interpret the above equations and connect them to the
observed inhomogeneous crack propagation properties, one could associate the
autoregressive terms to the mean behavior of each individual test curve and the moving-
average terms to the non smoothness within it, which is due to the inhomogeneity of the
material ahead ofthe crack tip. In this manner, this spatial irregularity is approximated by
the homogeneous eandom field u.
In-time faHure prognosis and fatigue Iife prediction of structures 483

The autoregressive moving-average model (ARMA) as fonnulated above, is limited to


modeling phenomena exhibiting stationarity, i.e., broadly speaking, fluctuating about a
fixed mean. Clearly, this is not the case for the fatigue crack growth curves for which
nonstationary processes will have to be employed. It is possible though that, even under
these circumstances, the processes still possess a homogeneity of some kind. It is usualfy
the case that the dth difference of the original time series (or, a nonlinear transfonnation
of it) exhibits stationary characteristics. The previous ARMA model could than be
applied to the new stationary process VdN and the eq. (6.26) will correspondingly read,
qiß)VdNa = fX..B)ua (6.27)
This equation represents the general model used here. Clearly, it can describe stationary
(d=0) or nonstationary (d;t.(), purely autoregressive (q=0) or purely moving-average
(p=O) processes. It is called autoregressive integrated moving-average (ARIMA) process
oforder (p, d, q). It employsp+q+l unknown parameters qJt> ••• , qJp; ~, ... , Oq; O"U' which
will have to be estimated from the data.
Expecting that the fatigue crack growth curves would eventually reveal some stationary
characteristics, the task of estimating the aforementioned unknown parameters is
undertaken below. A phenomenological theoretical model will thus be built identifYing
the mechanism of crack propagation under certain loading and geometrical conditions.
An outcome of direct practical importance will evidently be the possibility of forecasting
the future behavior of the series Na from its current and past values. This, of course, will
be expressed in a probabilistic manner, in the fonn of a distribution.
Elaborating brieflyon the tenninology, ifthe values of N are known up to a current crack
size a and a prediction of N is desired for t steps ahead (i. e., at crack size a+t), then
one refers to "origin a", "lead time I' and "forecasted value NJI)". The methodology
employed is capable ofproviding, beyond a "best" value ofthe forecast, probability limits
on either side of it for a set of convenient values, for example 50%, 95%. If a+t is
chosen to represent a critical value of the crack size, these forecasted results will
obviously yield the distribution ofthe time-to-failure.
In the FCG type of series one expects relationships to exist (i) between observations of
successive number of cycles in a particular record (the previously tackled problem); and
(ii) between observations for the same crack size in successive records. Starting from the
ARIMA model, it can be deduced that a seasonal series can be mathernatically
represented by the general multiplicative model (Solomos and Moussas, 1991)

qJp(B)f/Jp(BS)VdV~N A =Bq (B)8Q (B S)UA (6.28)

In this equation the parameters p, d, q and the operators qJp(B) and 0iB) are exactly as
those defined for the ARIMA model and refer to the aforementioned point (i), while V
=1-Bs f/Jp(BS) and 8 Q(BS) are proper polynomials in Bs of degrees P and Q, respectively,
representing relationships of point (ii) above. This multiplicative process is said to be of
order (p, d, q)x(P, D, Q)s'
484 Real-time famt monitoring ofindustrial processes

The building of the model for a specific physical problem is composed again the same
steps: identification, estimation, diagnostic checking. The general scheme for determining
a model includes three phases, which are:
• Model identification, where the values ofthe parameters p, d, q are defined.
• Parameter estimation, where the {tp} and {B} parameters are determined in some
optimal way, and
• Diagnostic checking for controlling the model's performance.
As is stated however by Box and Jenkins (1976), there is no uniqueness in the ARIMA
models for a particular physical problem. In the selection procedure, among potentially
good candidates, one is aided by certain additional criteria. Among them are Akaike's
information criterion (AIC) and Schwartz's Bayesian criterion (SBC). If L=L(tpl' ... , tpp,
Ot, ... , Bq, CJu ) represents the likelihood function, formed during the parameter estimation,
the AlC and SBC are expressed, respectively, as,
AIC= -21n L + 2k
SEC = -21n L + In(n)k (6.29)
where k, the number of free parameters (=p+q) and n, the number of residuals, that can
be computed for the time series. Proper choice of p and q calls for a minimization of the
AlC and SBC. Last, in the overall efficiency of the model, the principle of parsimony
should be observed. Inclusion of an excessive number of parameters might give rise to
numerical difficulties (ill-conditioning of matrices etc.), and might render the model too
stiff and impractical.
Model building. It is weil known that the ARMA model (6.26) can be written for
identification purposes in the form of an observation equation as follows (see also
Chapter 3):
~1

A ~p A
y=Na =[Na - 1 N a - 2.. ·Na - p u a -ua-l ... -Ua _ q ] 1 =uTB
'1

(6.30)
In-time faHure prognosis and fatigue life prediction of structures 485

An ARIMA model can also be written easily in a similar form, considering the new
stationary process VdN in the place ofthe process N.
On the other hand, as mentioned earlier in Section 6.3.3, the Paris-Erdogan and Forman
logarithmic FCG equations (6.12) and (6.13) are the most suitable for accurate FCG
prediction purposes because they can model satisfactorily the curves of fig. 6.17. The
experimental points of fig. 6.17 do not form exactly a straight line. However, straight
lines modeled by the Paris-Erdogan and Forman logarithmic FCG equations (6.14) and
(6.15) of Section 6.3.3. can adequately represent large portions ofthem. The logarithmic
equations (6.14) and (6.15) can also be rewritten in an observation form as folIows:

Y =[1 xfO~C)]!uT9 (6.31)

The same considerations are obviously valid for the FCG laws of crack length as an
exponential function of the number of accumulated cycles, presented before.
It can therefore be claimed that quite efficient linear regression models for the fatigue
crack growth phenomenon have been constructed. In addition, they have the advantage
of being compact, easily presentable and implementable. They can thus serve in practical
situations, as they can readily furnish updated predictions of a component's residual
lifetime after periodic inspections.
Every such model is built based on the primary form of information of the crack growth,
i.e. the (N, a) sampie functions, and consequently is suitable for a specific set of
geometric and loading conditions. The possibility of utilizing the same model under
different conditions, or of attacbing physical significance to its parameters, can also be
envisaged.
In particular, if one considers moving windows of data of appropriate length, iterative
regression techniques can be used to track the varying conditions. In tbis way an adaptive
prediction method is introduced by Stavrakakis and Pouliezos, (1991), wbich is
especially desirable in such cases, since the parameters of the logarithmic Paris-Erdogan
(6.14), logarithmic Forman (6.15), ARMA, ARIMA and logarithmic exponential FCG
models (6.17) change with time (number of cycles), due to the continuous variation of
the conditions related with the FCG condition (stress transients, random overloads,
temperature, material properties, inspection technique variability, etc.).
To denote explicitly the dependence of the various regression models (6.14)+
(6.17) estimated parameters on the number of cycles, the observation equation
derived before for the various model cases may be written more accurately as,
y= uT 6(N) (6.32)
For n pairs of (a(N), N) experimental points, the weil known linear least-squares
regression formula gives,

(6.33)
486 Real-time fault monitoring of industrial processes

where U, y hold the information for the whole set of data.


Iterative methods that update the estimate whenever new information is available can
also be used. For accurate detection purposes, a moving window regression formula is
more appropriate, since it is more sensitive to parameter changes during the variation of
the thermomechanical conditions of the structures. As shown in Stavrakakis and
Pouliezos, (1991 ), a moving window estimate is given by the following recursive
equations (see also Appendix 3.A):
O(k + 1) = O(k) - P(k + 1)[T(k + l)O(k) - 8(k + 1)]

P- 1(k + 1) = P-1(k) + T(k + 1) (6.34)


where,
T(k+I) = u(k+l)u T(k+l) - u(k-nw+l)uT(k-nw+l)
8(k+l) = u(k+l)y(k+l) - u(k-nw+I)y(k-nw+l)
and nw is the window length.
The proposed FCG prediction algorithm consists ofthe following steps:
Step 1: Compute O( nw ) for the first nw pairs of (tn a, N) data, using the one-shot linear
least-squares regression formula and one of the FCG linear regression models
proposed before. If an ARMA or ARIMA model is adopted the AlC and/or
SBC criteria (6.29) must be applied off-line using a big amount of raw FCG
data in order to determine the appropriate structure of the model for the specific
case.
Step 2: Process the pair of data coming from the next inspection using the moving
window regression formulae.
Step 3: Estimate the one step ahead predicted value for a(N) using the adopted model
equation(s) and O(N). The value of N (number of cycles) used in tbis one step
ahead predictor must be the number of cycles for the next inspection according
to the inspection-maintenance schedule of the structure.
Step 4: The predicted value of a(N) is checked against the predetermined critical crack
length threshold aC' If a(N) ~ ac an emergency condition is declared and
appropriate action should be taken, otherwise go to step 2.
The estimator variance p(k+ 1) is automatically updated by the above moving window
procedure, thus the predicted defect variance can be easily calculated. It is then possible
to determine in real time, Le. at any number of cycles N, the probability of structural
failure, i.e. the probability U(N) that an inadmissible failure state will occur
U(N) = P[a(N) ~ acl = 1- P[a(N) < ac ] = 1- R(N) (6.35)
where R(N) is the structural reliability function.
In-time faHure prognosis and fatigue life prediction of structures 487

The FCG law ena(N)=C*+m3N (see Section 6.3.3, eqs. (6.17)) is fitted by Stavrakakis
and Pouliezos, (1991), into the Virkler et al., (1979) data, using the linear moving
window regression technique described before. The "deterministic" value of the
parameter m3 is estimated to be 6·89xl0-6 and the mean value and variance of the
parameter C* are estimated as 1·94 and 7·67x 10-3 respectively.
In tbis case, the failure probability of the structure or component can be calculated in
closed form as follows (see for details Stavrakakis and Pouliezos, 1991):

U(N) =9-'( N ~ A) (6.36)

where the parameters A, Bare:


A = In ac -- E{C*} , B 2 = [var{C*}]
-2
m3 m3

Parameter A represents an estimation of the mean number of cycles in order to attain the
critical crack length ac-
Parameter B determines the role of the quality of the product, i.e. variability of the
properties of the material and of the loading and thermal conditions, or the measurement
error introduced by the crack detection method.
The failure probability of a cracked aluminum-2024-T3 structure constructed using the
aluminum corresponding to the loading conditions of Virkler et al., (1979), experiment
can be calculated using the above equations. The failure probability for the Virkler's
experiment at N=2xl0 5 cycles and for a critical crack length a c=32.68mm, is found, by
applying the above equations, to be cl (200000)=0.0274. Parameters A, B were found to
be 224958.15 and 12715.5 cycles respectively. From the propagated crack length
bistogram at N=2xl0 5 cycles derived directly from the Virkler et al., (1979), experiment
the same probability is evaluated as U(200000)=O·0294. Tbis represents a discrepancy of
6.8%.
The usefulness of the moving window method is illustrated using one set of Virkler's
data. Simulation runs for the one-step ahead predictor indicated that the optimum
window length was nw=4. Tbis produced a maximum absolute prediction error of 0.23
over the whole range of data. If predictions of longer horizon are required, simulation
runs could establish the corresponding optimum window length.
In cases where crack length measurements are available on-line using appropriate
hardware equipment, the recursive nature of the method makes it suitable for an
integrated automatic safety alarm system.
Autoregressive integrated moving-average processes have been employed by Solomos
and Moussas (1991) for the modeling of the number of cycles over the crack size for the
fatigue crack propagation phenomenon. Even though no perfect stationarity conditions
488 Real-time fault monitoring ofindustrial processes

have been obtained in the treatment ofthe Virkler et al. (1979) records, an overall good
performance of the derived models has been observed. It has been found that a single
r~rd can be reproduced satisfactorily by an ARIMA process of order (p, d, q)=(2, 3,
1). The quality of the forecasts depends upon the origin; an early origin allows for short
forecasts while a later origin yields unconditionally good forecasts. A multiplicative
ARIMA process of order (p, d, q)x(P, D, Q}s=(l, 2, l)x(O, 1, 1)89 has been found to
represent very efficiently the whole set of the fatigue crack records. Its forecasting
capabilities are excellent both at reproducing existing data, and at the monitoring and
prediction of new experiments.

6.3.6 Intelligent systems fOT in-time stTuctuTal damage assessment

As it has already been discussed, the mathematical FCG models available for representing
the relevant physical processes are only approximate representations of the physical
reality, having peculiar, but often ill-defined characteristics of precision, sensitivity and
range ofvalidity. Furthermore, they do not constitute an exhaustive representation ofthe
reality. The knowledge to be used, related to various fields, is not fully representable by
algorithms or mathematical tools but contains also qualitative and heuristic parts. Any a
priori estimate of the life span distribution of a structure shows therefore, quite a large
scatter which can be progressively reduced by using proper updating techniques.
Traditional algorithmic approaches are unable to cope with such a complex context.
Expert systems are potentially, the breakthrough. Expert systems, roughly consisting of
a procedure for inferring intermediate or definitive conclusions on structural damage and
rernnant lifetime, using the domain knowledge and the accumulating service data, can
deal with real world problems by properly incorporating all the knowledge which may
become available. An expert system for structural reliability assessment must have the
ability to analyse and interpret large quantities of information in order to achieve the
following goals:
• Identification of the actual state of the structure and of the damage process actually
taking place.
• Prediction of the future behavior of the structure.
• Decision and planning of appropriate actions.
The bone of the expert system can be thought of as a coordinator and manager of
operators which mutually collaborate and supply the information the system needs. Each
step of the assessment procedure (e.g. defect population identification, material
properties selection, microdamage analysis, macrodamage analysis, etc.) can constitute
one operator or be subdivided into more specialized operators. The user can exploit
interactively the functions performed by the operators. Rules and decision criteria can be
modified under a set of metarules. The modular array allows an easier representation of
the base of knowledge and an incremental construction of the system (see also Chapter 4
and Jovanovic et al. (1989».
In-time failure prognosis and fatigue life prediction of structures 489

An expert system for assessing damage states of structures will consist of an interpreter,
data-base and rule-base. All the rules involved are described through production rules
with certainty factors. Ihe inspection results are used as the input data. Ihe inspection
results regarding cracks are firstly input into the system, rules concerning their damage
degree, cause and expanding speed are implemented to provide a solution for the damage
assessment. Ihis inference procedure is performed as shown in fig. 6.18.
Ihe uncertainties involved in the input data and rules can be taken into account by
introducing certainty factors. Damage pattern, damage cause and deterioration speed are
employed to interpret the inspection data from the multi-aspect point ofview.
Certainty factor. Most of the data available in the damage assessment generally include
certain kinds of uncertainty and experience-based knowledge may be vague and
ambiguous. Ihus, an expert system should, have the ability to treat these uncertainties in
a logical manner. Ihe certainty factor is calculated hereafter. Input data and production
rules are written as folIows, with certainty factors:
Data 1: Cl; Data 2: C2 ; ..., Data p: Cp
IF Ant. 1, Ant. 2, ... , Ant. m
THEN Con. 1: Ci, Con. 2: Cl, ... , Con. n: C~
where Ant. and Con. denote alltecedent and conclusioll, respectively, and Cp and Cj are
certainty factors. p, m and 11 are the numbers of input data, antecedents and conclusions,
respectively. At execution of the inference procedure using the rules, including the
certainty factors, the following must be done:
I, Calculate the certainty factor for the resultant antecedent.
2. Calculate the certainty factor for the resultant conclusion.
3. Determine the final conclusion and calculate its certainty factor for more than two
rules which provide the same conclusion.
One can employ the following calculation methods corresponding to the items above:
1. Cm= min(C], Cl> . , "Cm), where Cil! is the certainty factor for the resultant
antecedent
2, COIIT,k=C',IIXC;' where Ci- is the original certainty factor for the k-th conclusion and
COllf,k is the certainty factor ofthe k-th output.
3. Ihe certainty factor C for the final conclusion is calculated as folIows, using Cout,k
C = max (COllt,! , Collt,2, .. " Cout,k)
Suppose that inspection data are as given in Iable 6.2,
490 Real-time fault monitoring of industrial processes

Input inspection data

Rules for damage level

Occurence time of cracks

Rules for causes of cracks

Rules for propagation of cracks

No

Rules for damage pattern

Rules for damage propagation pattern

Output damage pattern, damage cause and


damage propagation pattern

Figure 6.18 The expert structural damage assessment inference process.

Table 6.2 Example ofinspection data

Insoection item Result CF


Direction of cracks 2 directions 0.9
Width of cracks Middle 0.5
Interval of cracks Small 0.7
Fracture Large 0.5

By dividing the data-base and rule-base into several groups, it becomes possible to
reduce the execution time which is proportional to the number of available rules.
Fig. 6.19 shows examples of rules for the inference process.
In-time failure prognosis and fatigue life prediction of structures 491

(damage-degree-2-1
if
(direction-of-cracks 2-directions=CF1)
(width-of-cracks middle=CF2)
(interval-of-cracks small=CF3)
then
(*deposit (damage-degree A (*times 1.0 (*min=CF1 =CF2=CF3)))))
(damage-degree-4-1
if
(fracture large=CF1)
then
(*deposit (damage-degree A(*times 1.0 (*min=CF1 )))))

Figure 6.19 Examples ofrules for the damage degree of reinforced concrete bridge decks.

In practice, the values of the certainty factors involved in the input data and production
rules are given by an expert who has been engaged in maintenance work for more than
20 years. First, matching succeeds in the rule (damage degree 2-1), where 0.9, 0.5 and
0.7 are prescribed for =CFI, =CF2, and =CF3, respectively. The symbol = denotes CF;
is a variable. Second, Cin is calculated as 0.7, using step (1), i.e., C;n=min(=CFI, =CF2,
=CF3). According to step (2), Coutk is obtained as 0.7 from 0.7xl.O. This leads to the
conclusion that damage state is A ~th CF=0.7. Similarly, the rule (damage degree 4-1)
leads to another conclusion that the damage state is A with CF=0.5. From these two
conclusions, the final conclusion is that damage state is A with CF=0.7, using step (3).
In the MYCIN approach (see Chapter 4), the certainty factors are formally defined,
extensively tested and correct resultsldiagnoses have been obtained in many
circumstances.
Evaluation method. Usual damage state evaluation is based only on the information
obtained from visual inspection. If one desires a high accuracy in its evaluation, the
damage degree ought to be classified into several categories, where too many categories
may often induce a contradiction among them derived by each individual and make
classification meaningless. To increase the evaluation accuracy, one can introduce three
damage measures; damage pattern, damage propagation pattern and damage cause. An
appropriate damage pattern is chosen among prescribed basic damage patterns. Similarly,
the most probable damage propagation pattern is determined by using the inference
results of the crack occurrence time, crack pattern, cause of crack, and serviceability of
the concrete deck. Basic damage patterns are determined by considering the following:
• Pattern 1: Severe damage is seen all over the structure.
• Pattern 2: Severe damage is concentrated at the structure edges.
• Pattern 3: Severe damage is concentrated at both ends of a structure component.
492 Real-time fault monitoring ofindustrial processes

• Pattern 4: Severe damage is concentrated at the overhang portions ofthe structure (if
these portions exist).
• Pattern 5: Severe damage is concentrated in the structure's center region.
• Pattern 6: Severe damage is not seen all over the structure.
To demonstrate the usefulness ofthe expert system in FCG real-time assessment, a plate-
girder bridge with four main girders and seven cross beams is employed by Shiraishi et
al. (1991).
A large number of rulesuseful for the damage assessment could be acquired through an
intensive interview with well-experienced engineers on repair and maintenance work.
The use of certainty factors can lead to a reliable conclusion using vague and ambiguous
data and rules. Introducing the three damage measures such as damage pattern, damage
propagation pattern and damage cause, it is possible to give useful information to predict
the change of structural durability in the future.
The damage causes are estimated on the basis of damage degree, damage pattern, and
loss of serviceability and the estimation is important to clarify the occurrence mechanism
of damage as well as useful for establishing an efficient repair and maintenance program
(see Lucia and Volta, 1991).
Recently Vancoille et al. , (1993), have developed a new module that explicitly deals with
corrosion troubleshooting. During the development of this module it was observed that
expert systems are not always suited to carry out part of the tasks involved in corrosion
troubleshooting. Therefore, the possibilities of neural networks were investigated. It was
realized that they have so me potential that might open completely new perspectives in
dealing with problems where expert systems tend to fai!. The combination of expert
systems and neural network techniques gives rise to powerful architectures that can be
used to solve a wide range of problems.
In cases where conventional analytical techniques cannot provide a useful means for the
evaluation of system reliability, techniques based on expert opinions may be used until
such time that either performance data can be obtained and/or mathematical modeling of
system reliability along with adequate field or laboratory data can be used. The expert
opinion technique can also be used in conjunction with an analytical approach in cases
where the performance data are sparse but system failure modes are well known
(Mohammadi et al. , 1991).
Specific examples of engineering systems for which the expert opinion approach can be
used in lieu of acquiring data from conventional sources are given next.
Bridge inspection. In this problem, the evaluation of bridge components, i.e.,
determination of their levels of deterioration and extent of damage is conducted by
experts (bridge inspection personnei). The results of an inspection are then verified,
analyzed and used along with structural analyses to arrive at a specific rating for a given
bridge. The rating is indicative of the level of structural integrity of the bridge.
In-time failure prognosis and fatigue life prediction of structures 493

Interior gas piping systems. Interior gas piping systems operate under low pressure,
1.75 to 14.00 kPa (0.25 to 2.0 psi). Under normal operating conditions, the internal
stresses are low and do not impose any safety problems. However, there are many
factors (such as poor installation practice, component malfunction, loose joints due to
external factors, etc.) that can contribute to system failure resulting in a leak. An expert
opinion approach can effectively be used (i) to identify components' modes offailure; and
(ii) to compile system performance data for reliability evaluation purposes (Mohammadi
et al., 1991, Sandberg et al., 1989).
Human error. The impact of human error on the reliability of an engineering system is
another problem that may be investigated using the expert opinion approach. One typical
example is fabrication errors occurring during construction of a facility. Identification of
factors that may ultimately promote structural failure and evaluation of the likelihood of
occurrence of such factors can be done using the expert opinion approach.
In the above three examples the objective is weil defined, i.e., the objective is to acquire
information on the performance of a system and to determine its reliability. In certain
non-engineering areas, however, the objectives may be unknown or not clear. Thus a
separate expert opinion survey may be used only to arrive at a set of objectives and
attributes to the problem being investigated.
In engineering problems, because the objectives are often weil known, the expert opinion
approach becomes simply a data collection process that can be used for one or more of
the following tasks:
• Identification of failure modes in terms of component or system performance.
• Establishment of statistics or occurrence rates for individual modes of failure.
• Fault-tree and event-tree analyses and identification of the sequence of events
(scenarios) whose occurrence would lead to the formation of a top event (in fault
tree analysis) or aseries of consequences (in event tree analysis).
The general process of the expert opinion method is very much dependent on the type of
problem. As described earlier, in cases where the problem's objectives are weil defined
and the parameters influencing these objectives are also known, the procedure
degenerates to a data collection scheme for ranking or scaling the objectives and their
associated parameters. Many engineering problems fall under this category and represent
cases each with a limited number of weil defined objectives. Each objective may then be
expressed with a performance level and aseries of attributes. In other extreme cases
where uncertainties exist in specific objectives and their attributes, the expert opinion
approach may become very complicated. Generally problems associated with societal or
economics issues fall under tbis category. In such cases the method may have to be
repeated for several rounds before a final decision on the objectives can be made.
The following list presents the basic elements of the method and can weil be expanded
for certain cases.
1. Discuss why the expert opinion approach is employed instead of other methods.
494 Real-time fault monitoring of industrial processes

2. Identify aseries of objectives in the study. If the objectives are not weH defined, a
separate expert opinion approach may be used to arrive at definite objectives.
3. Solicit expert opinions for ranking or scaling these objectives. At this stage the final
refinement of the rankings may be done in more than one round if time and money
permit and especially if a somewhat large discrepancy in the opinions is observed.
4. Summarize the findings in a form that can be used as a mathematical tool for the
system risk analysis or merely as a support document. The findings mayaIso be
evaluated using statistical methods. Of course, prior to these steps, experts must be
identified.
A case study is presented by Mohammadi et al., (1991), to demonstrate the applicability
of the expert opinion approach in system reliability evaluation. In this case study, the risk
associated with leak development in several interior gas piping system is evaluated and
the results are presented. The structure considered in the case study is a simple system
made of components with binary modes of failure. For more complicated structures with
multiple independent and/or dependent modes of failure the reliability formulation and
evaluation of results require additional analyses including the translation of the expert
opinion data into numerical values that can be used in the formulation of the individual
modes of failure. One objective of the case study presented there was to compare an
existing system (black steel piping system) with a new product (corrugated stainless steel
tubing). In the absence of reliable performance data on these systems the expert opinion
approach was employed. As demonstrated in this example, the approach offers an
effective method in the analysis of system reliability of each system and the evaluation
and comparison ~fthe performance ofthe two systems.
To treat the uncertainty and ambiguity involved in the expression in terms of natural
languages, it is useful to introduce the concept offuzzy sets.
Garribba et al., (1988), present a specific application of fuzzy measures relevant to
structural reliability assessment for the treatment of imperfections in ultrasonic inspection
data.
Looking from a general point of view at the problem of combining multiple non-
homogeneous sources of knowledge, whilst the structure of the composition problem can
differ from one case to another, the preservation of a general pattern may be suppossed.
Thus, the investigation and characterization of this pattern can help to highlight the
nature of the dependencies between the different sourees.
Assessment of damaged structures is usually performed by experts through subjective
judgments in which linguistic values are frequentIy used.
The fuzzy set concept is then used to quantify the Iinguistic values of the variables of
damage criteria and to construct the rules. Assessments from the same group of experts
may result in rules with the following cases:
1. Similar antecedents and consequents.
2. Similar antecedents but different consequents.
In-time failure prognosis and fatigue life prediction of structures 495

3. Similar consequents but different antecedents.


4. Different antecedents and consequents.
In the case of similar antecedents and consequents (1), fuzzy set operations need not be
used. The total number of similar mies determines the weight of damage levels in the
mies. In the case where several mies have similar antecedents but different consequents,
these mies can be combined. For example, consider a case in which there are five mies
with similar antecedents but three of the consequents indicate that the damage level is
very severe or "DL is VSE" an two others indicate that the damage level is severe or "DL
is SEV". The combined consequents ofmles 1 and 2 can be represented by the following:
eONS 1: DL is VSE (0.6) AND
eONS 2: DL is SEV (004)
where eONS denotes "consequent" and where (0.6) and (0.4) are obtained from 3/5 and
2/5, indicating the weights ofeONS 1 and eONS 2, respectively.
Two mies can have similar consequents but different antecedents, as is shown below:
Rule (expert 1): Rule (expert 2):
ANT 1: DEQ is VSE AND ANT 1: DEQ is SEV AND
ANT 2: loe is VSE ANT 2: loe is VSE
where ANT denotes the antecedent, DEQ, is equipment damage level, and loe is the
injury level of the occupants; then these mies can be combined through the use of an OR-
gate as folIows:
ANT 1: DEQ is (VSE OR SEV) AND
ANT 2: loe is VSE
The antecedents and consequents ofthe mies may be different:
Rule (expert 1): Rule (expert 2):
ANT 1: AR is MAJ AND ANT 1': AR is VSB AND
ANT 2: RT is VLO ANO ANT 2': RT is VLO AND
ANT 3: Re is EXP ANO ANT 3': Re is VEX ANO
ANT 4: RA is ABO ANT 4': RA is ABD
eONS 1: DL is VSE eONS 1': OL is SEV
where AR, RT, Re and RA are the arnount of repair, repair time, repair cost, and resource
availability, respectively. The linguistic values MAJ, VSB, VLO, EXP, VEX, and ABD
denote major, very substantial, very long, expensive, very expensive, and abundant,
respectively.
Then if the mies do not conflict with each other, they will stay as they are. But if
conflicting mies occur as in the ANT 1 and ANT 1', ANT 3 and ANT 3', and eONS 1
and eONS l' in the above exarnple, a combined mle should be sought through the use of
496 Real-time fault monitoring ofindustrial processes

a fuzzy relation such that R11 =MAJxVSE, R31 = EXPxVSE, R1'1 '=VSBxSEV, and R3'1'
=VEXxSEV, where Rij is the fuzzy relation between ANT i and CONS j; R11 and R1'1'
are contained in the classes of all fuzzy sets of (ARxDL); and R31 and R3'1' are
contained in the classes ofall fuzzy sets of(RCxDL).
The combined relations of R11 and R1'1' can be obtained through the use of the
modified combined fuzzy relation method introduced by Boissonnade wbich is an
extension ofMamdani's approach wbich combined all relations though fuzzy disjuctions.
The method uses modified Newton iterations to reach an optimal solution for the
combined fuzzy relations. Details of these techniques can be found in Chapter 4 and
Hadipriono and Ross (1987). Through the use of tbis method, the combined relations of
R11 and R1'1' yield R111'1'. A similar procedure is performed for R31 and R3'1' to
yield R313' 1 '. The fuzzy composition between R111 ' l' and R31 3' l' results in R1 31 ' 3',
contained in the classes ofall fuzzy sets of(ARxRC). The fuzzy set value for AR and RC,
is the projection of R131 ' 3' on planes AR and RC, respectively. The result now yields
two mies with similar antecedents but different consequents. Hence, similar procedures
can be applied as in cases (2), (3) and (4).
A complete rule may require the participation of the three damage criteria. Therefore, the
mies should also be combined to incorporate the functionality, repairability, and
structural integrity of the damage structure. Zadeh developed the extension principle to
extend the ordinary algebraic operations to fuzzy algebraic operations. One method
based on tbis principle is the DSW technique introduced by Dong, Shah, and Wong (see
Hadipriono and Ross, 1987). The technique uses the lambda-cut representations of fuzzy
sets and performs the extended operations by manipulating the lambda-intervals. For
brevity, further details ofthese techniques can be obtained in the above references.
In order to accommodate the effect of each damage criteria on the total damage, in tbis
study, one can include the weighting factor ofeach criterion. For example, ifthe weights
of the damage level assessed, based on the above three damage criteria, are assumed to
be "high" (HIH), "fairly high" (FHI), and "moderate" (MOD), respectively, and the
values of the damage level are DL 1, DL2, and DL3, respectively, then the overall
combined damage level becomes,
DL _ (HIHxDL1)+(FHlxDL2)+(MODxDL3)
(6.37)
tot - HIH+FHI+MOD
Based on the complete rules, new or intermediate rules can be constructed through
partial matcbing.
Consider the following production rule: "IF deformation (DF) is very severe (VSE),
THEN damage level (DL) is severe (SEV)". When a fact shows that DF is VSE, the
consequent is then realized. However, when the value of DF does not match exactly,
e.g., the fact shows that "DF is SEV", then partial matcbing is in order.
In-time failure prognosis and fatigue life prediction of structures 497

Tbis can be performed by the following fuzzy logic operations: truth junctional
modijication (TFM), inverse truth junctional modijication (ITFM), and modus ponens
deduction (MPD). Briefdescriptions ofthese operations follow:
TFM, first introduced by Zadeh, is a logic operation that can be used to modifY the
membership function of a linguistic value in a certain proposition with a known truth
value. Suppose that damage level (DL) is "negligible" or NNE and is believed to be
"false", or FA. This proposition can be expressed as
P: (DL is NNE) is FA; NNEcDL, FAc T
where DL is a variable (universe of discourse), T is the truth space, and NNE and FA are
the values of DL and T, respectively. The symbol c denotes "a subset of ". Modification
of tbis proposition yields,
P ': (DL is DL 1): DL 1cDL
where DL 1 is a value of DL. A grapbical solution is shown in fig. 6.21 where the fuzzy
set NNE and FA are represented by Baldwin's model, (1980), and plotted in figs 6.21.b
and 6.21.a, respectively.
Note that the axes offig. 6.21.a are rotated 90° counterclockwise from fig. 6.21.b. Since
the elements of FA are equal to the membersbip values of NNE, they are represented by
the same vertical axis in fig. 6.21. Tbis means that for any given element of NNE, one
can obtain the corresponding element of FA. Also, since the membersbip values of FA
and DL 1 are the same, the membersbip values of DL 1 can be found as shown by the
arrowheads and plotted in fig. 6.21.b.

t fez)
1.0

I
I
---t---

f(t) 1.0 (a) o (b) 1.0 z

Figure 6.21 TFM graphical solution.


498 Real-time fault monitoring of industrial processes

ITFM is a logie operation that ean be used to obtain the truth values of a eonditional
proposition. Suppose a proposition, p, is expressed as "damage level is negligible given
damage level is severe"; then the proposition can be rewritten as,
P: (DL is NNE) I (DL is SEV)i NNE, SEVcDL
The ITFM reassesses the truth of(DL is NNE) by modifying this proposition to yield,
P ~. (DL is NNE) is T1 i T1 c T
where T1 is the new truth value for (DL is NNE). The truth value, T1 ean also be
obtained through the graphieal solution shown in fig. 6.22. Suppose NNE and SEV are
again represented by Baldwin's model. The values NNE and SEV are first plotted as
shown in fig. 6.22.b. Sinee the truth level is equal to the membership value of NNE they
lie on the same vertieal axis. Henee, for eaeh membership value of NNE, the
corresponding element of T1 is also known. Then too, sinee the membership value of T1
equals that of SEV, for any given element of both NNE and SEV, one ean find the
eorresponding element and membership value ofT1. The truth value, T1, in fig. 6.22.a is
eonstructed by sueeessively plotting the membership values of SEV (dl, d2, ete.) from
fig. 6.22.b at eaeh truth level. Note that the axes in fig. 6.22.a are rotated 90°
eountercloekwise from fig. 6.22.b.

t fez)
1.0

f(t) 1.0 (a) o (b) 1.0 z

Figure 6.22 ITFM graphical solution.

Modus ponens deduction (MPD) is a fuzzy logie operation whose task is to find the
value of a consequent in a production rule, given the information about the anteeedent. A
simple MPD is: A implies B and given A, then the eonclusion is B. Consider again the
proposition: "if deformation is very severe, then damage level is severe", (IF DF is
In-time failure prognosis and fatigue life prediction of structures 499

VSE, THEN DL is SEV). However, suppose further information is available, i.e.,


"deformation is severe" (DF is SEV). These propositions can be represented by the
following:
P: (DF is VSE) ~ (DL is SEV)
P ': (DF is SEV)
where the symbol ~ represents the implication relation between (DF is VSE) and (DL is
SEV). This example can be conveniently solved through the following graphic
representation. Through the ITFM, P and P , can be combined:
P ": (DF is VSE) is T1 ~ (DL is SEV)
One can obtain the truth value of(DL is SEV), i.e., T2, through the use ofthe implication
relation operation introduced by Lukasiewicz (Hadripriono and Ross, 1987). He
incorporated the truth relation, denoted as I, of "if P1 then P2" or "P1~P2". The
parameters of the truth relations, I, are the elements of T2 and T1. These relations, for
different values ofthe elements ofT2, are shown in fig. 6.23.a as parallellines.
The intersections of 1 and T1 yield the membership values of T2. Subsequently, the truth
value, T2, can be found and plotted as in fig. 6.23.a. Now (DL is SEV) is T2 can be
modified through the TFM process to give (DL is DL 1) in fig. 6.23b which concludes
that DL is "dose to fairly severe".

1',t fez)
1.0

f(t), f(t'), f(t',t) 1.0 (a) o (b) 1.0 z

Figure 6.23 MPD graphical solution.

Brown and Yao, (1983), developed an algorithm to illustrate the effect of qualitative
parameters in existing structures. In their analysis, quality Q; is used to describe the
condition, such as good, fair, poor, etc., of the i-th parameter or structural component.
500 Real-time fault monitoring of industrial processes

This description is based on the inspector's observation. Associated with each


parameter's quality is its consequence C; which describes the consequence that this
pararneter's quality has on the structure. For exarnple, connections in "poor" condition
may lead to a "catastrophic" consequence. The total ejject T is a union of all the
parameters inspected along with their consequences. It can be calculated as
(6.38)

and,
T(j,k) = max[min [Qj (j),C j (k)]] (6.39)
j

in which Qj(*) and Cj(*) are the membership or degree ofbelonging at the numerical
rating * of quality Q and consequence C, respectively, for the i-th parameter; T(j,k) is the
(j,k) element of the total effect matrix T; and the symbols u and n represent,
respectively, the relations union and intersection between two fuzzy events. A fuzzy
relation R is then developed relating the consequence to the safety reduction N The
safety reduction N describes the level of resistance reduction, verbally, according to the
type ofthe resulting consequence. For example, a "catastrophic" consequence may lead
to a "very large" safety reduction. The fuzzy relation R can be calculated in the same
manner as the total effect T,
(6.40)

Once the total effect T and the fuzzy relation Rare obtained, a safety measure S can be
computed by combining Twith Rthrough the operation called composition.
S=T-R
and,
S(j,Ji.) = max[min[T(j,k),R(k,Ji.)]] (6.41)
k
in which S(j,t) is the (j,t) element of the safety measure matrix S and R(k,t) is the (k,t)
element ofthe fuzzy relation matrix R Using a fuzzifier, which in this case extracts the
element with the largest numerical value in each colurnn of the safety measure matrix S,
yields a safety junction F. The colurnns of the matrix S represent the levels of reliability
reduction. The safety function F shows the degree of belonging for each level of safety
reduction which corresponds to increase in probability of failure. This function will give
engineers some idea of the possible reductions for the design reliability after an
inspection was done. The engineers may use the results to assist them in deciding on the
priority of their actions or in allocating resources in order to maintain the current usage
of the structure.
In-time faHure prognosis and fatigue life prediction of structures 501

An illustration of the Brown and Yao algorithm for structural damage assessment using
fuzzy relation approach can be found in Chou and Yuan, (1992), where a typical rigidly
connected plane frame was analyzed.
The fuzzy relation approach presented by Brown and Yao, (1983), incorporating
qualitative parameters in assessing existing structures was modified by applying a filter to
the total effect T.
Since the fuzzy relation approach presented by Brown and Yao, failed to differentiate the
importance of various levels of consequences from the total effect T, a filtering process
was presented by Chou and Yuan (1992) which is used to emphasize the more critical
effects over the minor effects. The total effect T can be modified to
Tf(j,k)=m~x{(k/ m)~(j,k)} (6.42)
1

in which T f (j,k) = (j,k) element of the filtered total effect matrix 7f;
(TiV,k)=min[Qli), Clk)] and m is the total number of numerical ratings used to define
consequence C. Note that this filtering equation assumes that the numerical rating is in an
ascending order of seriousness. That is, 0, 1, 2 and 3 are the numerical rating for an
"insignificant" consequence while 15, 16, 17 and 18 are the rating for a "catastrophic"
consequence. Due to the filtration, the membership values for the less serious
consequence will be reduced substantially. This reduction may lead to a low membership
value in the safety function. However, if one is only interested in the relative degree of
belonging in the safety reduction that the existing structure may have, it would be more
appropriate to normalize the safety function with the highest membership being 1.
A filtering process is applied to the total effect Tbecause the focus here is at the overall
consequence of the structure. Six different filtering processes were considered in order to
determine if the results would alter significantly. The filtering processes are shown
graphically in fig. 6.24. In each process, m is the same as defined for the filtering
equation and k is the numerical rating used for consequence C The discontinuous
filtering functions (figs. 6.24.b and 6.24.c) yielded unsatisfactory results. The reason is
that the zero slope region of the filtering function has the same effect as a no filter. The
results from the remaining fiItering processes (figs. 6.24.d, 6.24.f and 6.24.g) were
similar to that obtained from the linear filtering function offig. 6.24.a.
It was suggested that perhaps a modified membership function for consequence Ci would
be fundamentally more sound. A membership function reflects the degree of belonging a
numerical rating for a verbal description. An individual consequence Ci is not intended to
represent the integrated effect of a structure. It only contributes to the overall effect (that
is the function of a total effect 1). Thus, modifying Ci in general is not desirable. In
addition, if the unanimous professional opinion of a bad consequence has a numerical
rating of 12 and a catastrophic consequence has a rating of 18, then these opinions
should not be altered just because a catastrophic consequence is more serious than a bad
consequence.
502 Real-time fault monitoring of industrial processes

ä 1 .g
'';::::

~gp ~
'C .§
~ ~
O~----------------m-- ~O~---------
m
k
k

(k/m)1I2

0+------------------ 0+-------------------
m m
k
k

.~ = 0
'';::::

~ =
<:I

~
.j =
'C
tlJl

re O+-__~-------------
~
~
m 0
m
k k

Figure 6.24 Types offiltering processes.

Based on the representative cases studied, the filtered fuzzy relation algorithm yields
safety fimctions F which are in tune with intuition. The concept will enhance the current
practice of relying heavily on the inspector's experience to analyze the qualitative
information. Although a rigid frame was used to illustrate the algorithm presented, the
application of the concept is not constrained to buildings only. It can apply to any
structural system. The information required for the analysis is the condition (through
inspection) of the parameters within the system, and the consequence and effect on the
overall performance ofthe system associated with the condition of each parameter.
A sensitivity analysis on the shapes, degrees of curvature, shifting, gradient and
maximum membership values was performed. It was found that all, but two of the factors
examined have no effect, or only very insignificant effects, on the safety function F. The
most pronounced effects are those due to changes in the gradient and due to shifting. In
In-time failure pro gnosis and fatigue life prediction of structures 503

the gradient study, the gradient will influence the range of the safety reductions. The
lower gradient value will yield a wider range of the safety reductions with a high
membership value. In the shifting study, the levels of safety reduction having a high
membership value will shift in the same direction as the membership function for
consequence C or safety reduction N
Based on the results of membership function sensitivity analysis, the safety function F, in
general, is not significantly affected by the preciseness of membership functions
developed for every parameter considered. Thus, the algorithm presented by Chou and
Yuan, (1992), has practical applications in assessing aging infrastructure system with
minimal expert input to establish the necessary membership functions.
The methods presented heretofore are among the most important for the representation
of reasoning under uncertainty. These methods, which the artificial intelligence
community refers to as uncertain inference schemes, contain also Bayesian decision and
causal network approaches. Bayesian decision theory may be described as statistical
inference using the Bayesian position, i.e. a personalistic view of prob ability. If
probabilities are thought to describe orderly opinions, Bayes' theorem describes how the
opinions should be updated in the light of new information. Two major problems have
been identified with the implementation ofBayesian inference schemes: (1) combinatorial
explosion for realistic networks and (2) the estimation of prior probabilities. Because it
has been shown that humans display characteristics such as representativeness,
conservatism and the gambler's fallacy when dealing with probability, it has been
assumed that a subjective estimation of prob ability is not meaningful. These results may
be thought of as an impetus for the development of other uncertain inference schemes, in
the sense that alternative methods for modeling the "true" meaning ofwhat humans term
"belief' have been sought. Others have attempted to develop methods for easier
assessment in order to use the Bayesian approach.
The causal network approach is a computationally tractable form of Bayesian decision
theory. A causal network is defined as an acyclical directed graph in which probabilistic
nodes are connected in a causal manner. "Directed" means that the nodes are connected
by arrows and "acyclical" means that the arrows cannot form a circle or cycle in the
graph or network. The connection in a causal manner allows for an easier assessment of
probabilities. In real-world usage, assessment is frequently made in the direction of,
observable ~ unobservable
or,
effect ~ cause
which is generally more difficult to assess than going from,
effect f-- cause
The latter assessment is made in the causal direction.
504 Real-time fault monitoring of industrial processes

A causal network is shown in fig. 6.25. In this figure, the conditional independence
assumption is illustrated.

Figure 6.25 A typical causa! network.

The following equations are used to evaluate nodes e, C and d:


n n
P(Ck) = LLP(CkICj,dj)p(cj.dj ) (6.43)
j j

where p(eklci,d} is the probability of ek conditioned upon ci and ~" If the events
represented by nodes Ci and~. are independent, thenp(c;,~) = P(Ci)P(~).
p(Cj) = LLP(cjI81m ,Q)p(81m )P(q) (6.44)
m n

m n
It can be seen from the above equations that node e is conditionally independent of nodes
al> a2' b 1 and b2 (because there are no arrows connecting these nodes) given that nodes
C and d are updated using nodes al> a2, b 1 and b2. The condition for the graph to be
acyclical, i.e. it cannot contain any cycles, means that the node can never become
conditional upon itself. It is noted that a consequence of the Bayes theorem is that it
holds in both (arrow) directions.
In some instances, anode y may be conditional upon several, say, n nodes, x r , where r
= 1,2, 3, "', n. In order to reduce the assessment of 2n prob ability values to n values, a
technique called the noisy OR gate was developed (Reed, 1993). In the noisy OR gate,
the probability of y conditional on n nodes, xr> r = 1, 2, 3, "', n, is estimated as
In-time faHure prognosis and fatigue life prediction of structures 505

n
P(yl x l,x2,X3'''''Xn ) = 1- I1 (1- p(ylxr » (6.45)
r=1

In tbis equation, P(Ylxl)' P(Ylx2)' P(Ylx3)' "', p(Ylxn ), are assessed and then used to
estimate P(Ylx} , x2, x3' "', xn ),
Dubois and Prade, (1986), discuss specific features of probability and possibility theory
with emphasis on semantical aspects. They fig. out how statistical data and possibility
theory could be matched. As a result, procedures for constructing weak possibilistic
substitutes of probability measures and for processing imprecise statistical data are
outlined. They provide new insights on relationsbip between fuzzy sets and probability
theory.
In this way, fuzzy causal networks can be constructed to improve reasoning with
uncertainty in structural damage assessment. Probability, fuzzy set theory and the
Dempster-Shafer theory have been combined in SPERIL developed by Yao, (1985), and
Brown and Yao, (1983), for evaluating the post-earthquake structural system safety.
The Dempster-Shafer Theory has been developed primarily to model measures of belief
when the probability distribution is incompletely known. This theory enables one to
include the consideration of lack of evidence. Dempster's rule for combining evidence
from different sources is provided in the method. The interval notation used in tbis
method suggests bounded limits on probability values. Although tbis method has been
combined successfully with others, the main criticism of it is that many consider it a
generalization of probability .
It should be obvious from the previous presentation that it is not simple to undertake a
definitive comparative study of the various uncertain inference schemes. First of alt, it is
clear that the modeling approaches are different in the definitions and assessments of
uncertainty. Secondly, the manner in wbich the uncertainty (ies) is (are) combined is
different. Converting from one scheme to the other numerically for alt cases is not trivial,
if at all possible. However, measures for comparison were defined (i.e. clarity,
completeness, hypothetical conditioning, comptementarity, consistency, etc.) and
comparison results of fundamental properties of uncertain inference schemes are
summarized in Reed, (1993).
506 Real-time fault monitoring ofindustrial processes

6.4 Application examples

6.4.1 NucleaT TeactoT safety assessment using the pTobabilistic fTacluTe


mechanics method

Research on incidents occurring with conventional pressure vessels has shown that in
90% of cases, the initial defects were Iocated in a weId. For tbis reason, the present
analysis is primarily concemed with defects in welds; under cladding defects have also
been considered in order to evaluate their harmfulness.
Data were collected from 3 European manufacturers: BREDA (Italy), FRAMATOME
(France) and Rotterdam Nuclear (Netherlands). Each manufacturer filled in, for each
weId, a standard form, giving complete information on NOT results (US or X-ray) before
and after repair: instrument calibration, weId size and description and position of tbe
defect in azimuth, in deptb and in relation to tbe axis of symmetry ofthe weId. Alloftbis
information was sent in confidence to tbe Ispra JRC of EEC-Italy wbicb processed and
barmonized tbe data.
A total of338 meters ofPWR and BWR sbell were analyzed. Tbe main conclusions are
as follows (Dufresne et al. , 1986, 1988):
• Density of defects: tbe number of defects per weId varies from 0 to 50 (mean value is
13).
• Position of tbe defects in tbe weId: tbere is no clear distribution of tbe defects
according to tbeir deptb and to tbeir position in relation to tbe axis of symmetry of
tbe weId, but, for a given weid, defects are frequently gatbered in some limited areas
of tbe weId, tbis probably being due to maladjustment of a parameter during tbe
welding process.
• Lengtb of tbe defects: tbe cumulative distribution fimction before repair sbows tbat,
for defects larger than 20mm, tbe log-normal distribution is a good approximation.
Witb regard to tbe widtb distribution of tbe defects, unfortunately no data has been
obtained from manufacturers. After discussion witb experienced welding operators, it
seems that a defect Iarger tban a single pass is very unlikely. Therefore the number and
distribution of defects wider tban one pass bave been calculated by estimating tbe
probability for two or more defects to overlap, botb in azimutbal and transversal section.
Tbis probability is calculated using tbe Monte Carlo method.
Tbe defect lengtb and widtb distributions so obtained correspond to tbe observed defects
in a weId after fabrication and before repair, but tbe distribution to be incorporated in tbe
code must be processed in order to take into account tbe following factors: tbe sampIe
size, tbe accuracy of tbe measurement equipment, tbe reliability of tbe NOT metbods and
equipment, and tbe size of acceptable defects according to the construction rules.
In-time faHure prognosis and fatigue life prediction of structures 507

An overall statistical interpretation of all the available (da/dN) vs. AK measurement


points has been made for SA 508 and SA 533 steels, using four laws: Paris, Forman,
Priddle and Walker (see Hoeppner and Krupp, 1974). Numerical coefficients have been
calculated for each of these formulas by linear regression from experimental results and
by several partitions of the measurement range. Paris' law: da/dN=C(AK)'" has been
found to be the most suitable for this first application to reactor pressure vessels. The
values of coefficients C and m have been defined, in 4 different domains, as a function of
the value of AK and for two types of environment: air and water. The values of the m
coefficient giving the best determination coefficient have been chosen and the
corresponding "C" values have been determined in histogram forms. The rate of
propagation of defects under water is higher than in air above a given threshold AKt. It
has been assumed that for the values of AK<AKt, the propagation rates in air and water
are identical. The values of Mt are correlated with R (R=umi'/umax ).
When an internal defect emerges at the surface, one can assume that it becomes semi-
elliptic, the major axis does not change, and the small axis doubles with respect to its
initial value.
Loading of the vessel has been computed for 22 observable and incidental conditions
(ANS Conditions I and 11 and second category conditions according to the French
Standard Order dated February 26, 1974). The frequency of occurrence of these
conditions is determined on the basis of statistical data, rather than from "envelope data"
which are used in safety assessments. Fatigue crack growth was calculated according to
the above mentioned assumptions for all defect sizes and for all positions in the
thickness.
Failures can occur in different situations: either under normal, upset, emergency of
faulted conditions or during pressure tests. In the first case, it can be due to a through-
crack propagated by fatigue or in an instability when the stress intensity factor becomes
larger than toughness. Evaluation of those phenomena has shown that for crack
distribution computed at the end of the plant's life the probability of rupture was
negligible.
The second category encompasses emergency or faulted conditions (ANS Conditions III
and IV, and Third and Fourth Category Conditions according to the French Order dated
February 2,1974). The probability ofoccurrence ofthese later conditions during the life
of the power plant is very low, and they are therefore not taken into consideration for
fatigue crack propagation. Identification and quantification of these conditions have been
performed using event tree analysis, followed by classification into a limited number of
"envelope" faulted conditions to which a probability of occurrence is assigned. Faulted
conditions so defined concern:
• LOCA with different break sizes, different temperatures of the safety injection water
(storage pool water), and different times to operate the core cooling recirculation
system.
508 Real-time fault monitoring of industrial processes

• Steam break, with different break sizes, with and without electrical power, and
different tim es, to depressurize the primary circuit.
• Hot and cold overpressure during operation or start-up ofthe plant.
Concerning the pressure tests, the rupture probability has been computed for a test
pressure of206 bars and at different temperatures between 40°C and 100°C.
For all the situations thus defined, a thermoelastic analysis is made at different crack
depths. Temperature and the three mains stresses are computed at different steps of the
transient, and the stress intensity factor is computed with plastic zone correction.
Different approaches for rupture criteria have been considered: LEFM, EPFM and plastic
instability. After comparison of these three criteria with 141 representative experimental
data it has been concluded that the criterion proposed by Dowling and Townley gives the
best evaluation of fracture conditions.
Toughness distribution as a function ofthe RTNDT temperature of a 508 Cl 3 steel has
been calculated from experimental results. The shift in transition temperature (ARTNDT)
due to irradiation has been calculated as a function of the neutron flux and of the
phosphorus and copper content, according to the RG l.99 formula. From these data, it is
then possible to compute the toughness distribution at any point of the pressure vessel
and at each step of the transient.
The fracture prob ability is found by searching for the intersection of toughness and stress
intensity coefficient histograms at each transient step. The probability of initiating
unstable crack growth for the considered defect is equal to the maximum value of the
various probabilities thus obtained during the various steps of the transient.
The modeling of the initiation of unstable crack growth during a transient has been
determined according to the following assumptions:
• When a internal defect becomes a through-wall defect, it is transformed into a semi-
elliptical surface crack having the same excentricity and keeping its inner front at the
same position as the initial crack.
• When adefeet becomes unstable at the crack tip of the major axis, the defect
becomes an infinite length crack and the crack arrest criterion is applied at its inner
front.
• Every defect becoming unstable at its inner front is considered as Ieading to vessel
failure; the code does not consider the possibility of subsequent arrest in warmer and
Iess irradiated zones through the wall.
Calculations made with the COVASTOL code from the data compiled on French PWRs,
operated according to EDF, rules allow the following conc1usions to be drawn:
• Rupture probabilities at 40 years vary from 10-8 to 10- 12 according to the weId
Iocation and the safety injection water temperature; final results are presented in fig.
6.26.
In-time faHure prognosis and fatigne life prediction of structures 509

• The evolution of the rupture probability of various welds as a function of time is


calculated. It is shown that the belt weid is more sensitive to the time than the others.
This is due to the fact that, for this weid, time has two effects: crack growth and steel
embrittlement.
• The main rupture risk is due to a LOCA; the contribution of the different break size
is calculated, the associated conditional probability being of the order of 10-5. The
rupture probability of a PWR pressure vessel is therefore less than that of a LOCA by
several orders of magnitude.
• The protective effect of the cladding decreases markedly the fatigue growth of
defects and lowers the overall rupture probability by several orders of magnitude.
• The fatigue growth of defects located in the shell during normal and incidental
transients is very limited for the belt line weid; it is greater for the nozzle shell.
• The fatigue growth of defects located in the nozzles and their welds with the nozzle
shell is markedly larger than that ofthe defects located in the core shell.
• The rupture probability of the most irradiated area of the belt line weid is 105 times
greater than that of the least irradiated area (neutron shielding is not homogeneous)
over the whole periphery ofthe vessel).
• The most severe conditions for fatigue propagations are: cooling, load increase
between 15% and 100%, and steady state fluctuations (for the chosen modelization).
• The harmfulness of under cladding defects located in the nozzles depends greatly on
their location. The most harrnful defects are located in the vertical crossection in the
vicinity of the internal angle.
• Residual stresses, when large, can increase markedly the fatigue growth of defects,
and the rupture probability (up to a factor of 5).
• Taking the pressure vessellife duration into account, the rupture probability during a
pressure test is greater than the overall rupture probability, when the pressure vessel
temperature is less than 60°C during the test.
After a new NDT inspection, the data can be exploited to update the FCG model
parameters and consequently to update the above risk estimations. The adaptive models
of Section 6.3.5 could be used as a complementary prediction tool, especially for
individual defects evolution and the related risk assessment.

6.4.1 Marine structures safety assessment using the probabilistic


fracture mechanics method

In dealing with the quality control and/or the reliability of marine structures, attention
will be focused mainly on ships and offshore structures, leaving aside other types of
marine structures such as submersible vehicles, subsea installations and pipelines.
The most frequently techniques for the detection of cracks in weid seams of underwater
constructions are the magnetic particle test methods and the ultrasonic technique (see
Section 6.2.2). The advantage of the magnetic particle testing technique is the high
51O Real-time fault monitoring of industrial processes

detection capability in the indication of cracks in the weId seam and the simple evaluation
of the results. Disadvantageous is the fact that the area to be inspected has to be cleaned
accurately. The realization ofthe inspection with a heavy magnetizing yoke is a very hard
work for the divers, for in most cases it is not possible to have a secure support at the
structure. In low water depths the sun light hinders an evaluation of the indication, for
the fluorescence ofthe particles cannot be recognized. The inspection must be postponed
to the evening or night. The most important disadvantage of the magnetic particle testing
technique however is that it is very difficult to integrate into manipulator systems.

TEMPERATIJRE OF THE SAFETY


INIECIlON WATI!.R
.......... 20
lo ·e

..w,~_ .. - .. _. j-- - - -

- 10
2.410

-0

..
210 _ - 2.110
.,

0- -0

0-

Figure 6.26 Probability of rupture per year of a PWR pressure vessel after 40 years of
operation.

The ultrasonic technique can be applied easily in remote controlled manipulators and is
actually often used in such systems. The main disadvantage of the ultrasonic technique
for the detection of fatigue cracks is that the complicated and permanent changing
geometries of the three dimensional structure cause serious problems in the evaluation of
the signals. A further problem is the fact that ultrasonic inspection requires a very exact
In-time failure prognosis and fatigue life prediction of structures 511

sensor movement which can be difficult for the diver, so that a mechanical guide or a
manipulator is necessary. But it can be stated that for.the detection ofthe depth of cracks
with known position the ultrasonic technique can be applied with good success.
The eddy current testing method, which is a traditional technique for the detection of
surface cracks, is applied offshore for a very short time. The reason is that the signals of
weId seam roughness, and the changes of magnetic permeability and electric conductivity
caused by the welding process, superpose the crack signal which reduces the detectability
of cracks. The fact that the eddy current technique is now more interesting for the
inspection of welds, can be explained by the development of sensors and evaluation
techniques. The eddy current method has very low demands of surface cleaning, so that
inspections without removing dirty layers are possible. The signal evaluation is
independent from ambient conditions and the eddy current method can be applied easily
in remote controlled manipulators (Camerini et al., 1992).
Ship and offshore steel structures are designed to withstand the cyclic stresses created by
sea waves. Ship welding is a critical technology. It is difficult to manufacture welds in
which the defects are few and far between. Some smaller defects will always be present
in the welds and there is a risk of having large defects. The welds frequently determine
the strength and usability of a welded product, more than any other single factor.
Therefore, the presence of severe stress in as-fabricated structures may cause fatigue
cracking during service. If a corrosion or a fatigue crack is detected in the steel or the
welds of the ship structure during in-service inspection, it is necessary to assess its
significance with respect to the safety ofthe facility. The key to the assessment offatigue
cracks in marine structures is a validated analytical model that predicts the growth rate of
fatigue cracks and the corresponding probability of failure for the anticipated service
conditions (Stavrakakis, (1990), (1993)).
Fatigue crack growth behavior in metals is best described using the fracture mechanics
approach, in which fatigue crack growth rates, daldN, are correlated to the applied stress
intensity factor range Al( (see Section 6.3.3).
Cyclic stresses caused by sea-wave loading are random in nature, and fatigue analysis of
structures subjected to random loading is complicated. An equivalent-stress-range
scheme will be presented here, used to correlate fatigue crack growth results for
structures subjected to random loadings with those of structures subjected to constant-
amplitude loading.
Ifthe crack growth per cycle is much less than the crack length (daldN«a), and there
are no load-sequence interaction effects, then the total increment of crack growth in N
successive cycles, for a specific Al( range is,
N N
~>iai =C[(na)1/2 F]m~>'i
i=l i=l
512 Real-time fault monitoring of industrial processes

This result is derived by summing the crack-Iength increment per stress cycle calculated
using the Paris-Erdogan equation (see Section 6.3.3), for the N successive cycles, when
the applied stress-range is (Jj, i= 1, 2, "', N for the 1st, 2nd, .. .Nth cycle.
The average FCGR per stress cycle is then,

or

where,
li/rn
aeq =[ ~ La; ,
N
j= 1,2, ... ,N
N j=1

is the equivalent-stress range.


Here N should be large enough for the equivalent-stress range, (Jeq' to be representative
of a stress spectrum. The task then becomes the computation of (Jeq , from a random load
history. In the above derivation, load-sequence interaction effects are ignored. This
means that the weIl known Miner's rule is also applicable. This is adequate for sea-wave
loading, Cheng, (1985, 1988).
The equivalent-stress range for a random sea-wave load history, (Jeq' can be computed
direcdy from the exceedances spectrum or power spectral density function (PSDF) of the
anticipated (or assumed) sea-wave stress spectrum.
The exceedances spectrum expresses the accumulated number of stress cycles at each
normalized (or actual) stress range (normalized to the maximum stress range) over the
design lifetime, see fig. 6.27.a. For a given stress spectrum, the exact exceedances
spectrum varies, depending upon the cycle-counting method used, except in the case of
the constant-amplitude loading condition (Cheng, 1985, 1988).
For a complete description ofthe random load history, frequency and irregularity have to
be specified along with the exceedances spectrum. For a given stress spectrum, the exact
exceedances spectrum varies, depending upon the cycle-counting methods used, except
in the case of constant-amplitude loading.
A histogram, or probability distribution, is used to calculate the (Jeq of a random load
history expressed as an exceedances spectrum. The histogram corresponding to an
exceedances spectrum is usually readily available. If not, a histogram can be constructed
directly from the corresponding exceedances spectrum by dividing the axis of normalized
stress range into intervals and caJculating the frequency of occurrence for each interval.
For example, the ordinate in fig. 6.27.a is divided into 20 equal intervals. Then the
statistical data needed to construct a histogram can be extracted from this exceedances
spectrum. The resultant histogram obtained from fig. 6.27.a is presented in fig. 6.27.b.
In-time failure prognosis and fatigue life prediction of structures 513

~ 100 r---------------,
~
...,~
...g
80

60
~
I 40~
\0' 10' \0' \0' \0' 10' 10' 10' \0' \0'
12o~~
~~
o 10 IS 20 25 J

Figure 6.27.a Figure 6.27.b


Exceedances spectrum divided for construction of Stress-range histogram corresponding to the
histogram exceedances spectrum shown in fig. 6.27.a.

From the histogram offig. 6.27.a, (Jeq is evaluated from the equation,
l/m
(Jeq =[ Lrpf ]
where (J is the stress range, r is the frequency of its occurrence, and m is the exponent in
the Paris-Erdogan FCG rate equation.
The other way to describe the random load history, is to use the PSDF, as shown in fig.
6.28. The PSDF is a result obtained from a spectral analysis ofthe original random load
history. If the random load history is a stationary Gaussian process, as is commonly
assumed, then a PSDF exists, G(w), which possesses all the statistical properties of the
original load history (Cheng, 1988).
The two most important parameters in the random-loading fatigue analysis that can be
retrieved from the PSDF, are the root-mean-square (rms) value ofthe load amplitude and
the irregularity factor, a, ofthe random load history. The rms value is the square root of
the area under the PSDF.
The irregularity factor is defined as the ratio of the number of positive-slope zero
crossings, No, to the number of peaks, Fo per unit time in a load history:
a=NrlFo
The exact value of No and Fo can be evaluated from G(w) as folIows:
No = (M2 / Mo )L'2
and,
Fo = (M4 / Mi )112
where Mo, M 2 and M 4 are the zeroth, second and fourth moments of G(w) about the
origin (zero frequency) and are defined as,
514 Real-time fault monitoring of industrial processes

Mo r
= G(w)dw

M2 = J: w 2G(w)dw

M4 =J: w4G(w)dw
Thus,

'8'
'Wo' 1.0 -
C-'
.-=
.0"
f.Il
(l)
0.8 -

~ 0.6 -
";i
~ 0.4 -
8-
f.Il
0.2 -

~
~
0.4
0 0.1 0.2 0.3
Frequency (Hz)
Figure 6.28 Example ofpower spectral density function (double peaked spectra).

The free surface elevation of the sea can be modeled by an ergodic Gaussian process for
adequately short periods of time. This short-tenn description implies that the process is
homogeneous in time and in space, that is, its probabilistic properties do not change with
time nor with location. Thus, it is equivalent to estimate those properties from several
sea surface elevation records made at different times in the same point or made at the
same time in different points.
Each sea state is completely described by a wave spectrum. These spectra result from
physical processes and are therefore amenable to theoretical modeling. Various proposed
mathematical expressions to represent average sea spectra have appeared in the past. The
one that has become generally accepted and that has been cornrnonly used in response
analysis is due to Pierson and Moskowitz, although it is most cornrnonly seen in the
parametric fonn proposed in the International Ship Structures Congress (lSSC), see
Guedes-Soares (1984) and Hogben et al. (1976). A sea state is often characterized by an
In-time failure prognosis and fatigue life prediction of structures 515

average wave period Tz and a significant wave height Hs , which are related to the
spectral moments by:

T =_1 1110
z 2n ml

where mj = J: o/S(m)dm, is the moment of order i, of the sea spectrum S(co).


Developing seas have a more peaked spectrum, as has been demonstrated during the
JONSWAP project by Hasselmann et al., (1970). They proposed a spectral form that
accounted for the dependence on wind speed and fetch.
The JONSWAP spectrum has also been recommended by the ISSC (Hogben et al.,
1976), where a parameterization in terms of significant wave height and average period
was proposed.
Recent evidence on the adequacy of the ISSC and JONSWAP models to represent the
average shape of measured spectra has shown that both of these formulations represent
single peaked spectra while many of the measured spectra exhibits two peaks.
A spectrum with two peaks is expected to occur whenever the sea state contains two
wave systems with different average periods. Often this will be a combination of a swell
component and a wind sea component.
It must be noticed that the main feature of a double peaked spectrum is the partition of
the energy about two distinct peak frequencies. For a sea state of given Hs and Tz , a
double peaked spectrum has the energy more spread along the frequency axis than in the
case of a single peak spectrum.
The main idea behind the model of double-peaked spectra described hereafter is that they
can be represented by two spectral components of the JONSWAP type. The wind sea
part of the spectrum will often be in a developing situation, to which the JONSWAP
formulation is appropriate. As to the swell part, it is mostly its narrowness that justifies
the choice (Guedes - Soares, 1984).
Having defined the shape of the two spectral components, a double peaked spectrum
becomes completely described by the ratios of the peak frequencies and of the spectral
peaks. If the sea spectrum S is represented by the sum of a swell Ss and a wind sea Sw
component,
S(co) = Ss (co) + Sw (co)
its moments must be equal to the sum ofthe moments ofthe components:
mo=mo s + mo w
mj =mjs+mjw

where mik, i=O,1 and k=s,w, is the moment of order i of spectrum k.


516 Real-time fault monitoring of industrial processes

Having a sea state defined by the four parameters Hs ' Tz, HR and TR , where HR and TR
are the ratios of significant wave height and average period of the two spectral
components, each spectral component Sk' k=s,w can be given by the equation:
Sk(f) = SPM(f)r q, (m2 . sec)
where SpM is the ISSC spectrum:
SC!) = O.l1H 2T (Tj)-5 exp{ -0.44(Tj)-4}, (m2 . sec)
and r is the peak enhancement factor of the JONSWAP spectrum,
q = exp{ - (1.2961J - 1)2/2u2)}
f= OJ/2n
T= Tz ·F2
H=HsI.Jif
The JONSWAP parameter uis used at its mean values:
ua = 0.07 forf~ 1/1.296T
ub= 0.09 forf >1/1.296T
The quantities Fr and F 2 are two constants that correct for the difference in peak period
and area between a Pierson-Moskowitz (P-M) and a JONSWAP spectrum. The values
ofthese parameters depend on ras shown in Table 6.3.
The two additional parameters that define the double peaked spectrum are HR and TR.
The last one is easily determined from a measured spectrum, as the ratio of the spectral
peak frequencies. Other easily obtainable quantity is the ratio between the two spectral
ordinates SR' To relate this ratio with H R , it is necessary to obtain the expression of the
spectral ordinate at the peak frequency.
The peak frequency is determined by equating to zero the derivative of Sk with respect to
f If this is done, it follows that the ratio of the two spectral peaks Ssp and Swp is given
by:

A correction must be introduced to account for the unsymmetry of the component


spectra about the peak frequency. The fact that the spectra have higher energy above the
peak frequency implies that adding two spectral components of equal energy produces a
double peaked spectrum with the high frequency peak larger than the low frequency one.
This effect decreases with increasing separation between the two spectral peaks, being
thus of less importance in the case of two weIl defined peaks. However, it is easy to
In-time failure prognosis and fatigue life prediction of structures 517

correct for that effect, so as to make the procedure applicable regardless of the distance
between spectral peaks.

Table 6. 3 Values ofthe parameters F I and F2 on the JONSWAP spectrum

Pt = mo[JONSWAP] F2 = (mI / mo)[JONSWAP]


r mo[P-M] (mI / l11o)[P - M]
1 l.0 l.0
2 l.24 0.95
3 l.46 0.93
3.3 1.52 0.92
4 l.66 0.91
5 l.86 0.90
6 2.04 0.89

Having a double peaked spectrum defined by its 4 parameters, one determines first the
ordinates of the theoretical spectrum at the two peak frequencies, and their estimated
ratio SR- This value is larger or equal to the value ofthe spectral parameter SR- The value
of HR to be used in the above equations is thus corrected by the factor k R = SR / SR,
which upon substitution in the last equation results in:

=(k~~R J
O.5

HR

If a better accuracy is desired the procedure can be repeated until convergence is


obtained in the value of kR- Tbis generally occurs with two iterations.
It should be noticed that when SR is equal to zero this double peaked spectral
representation reduces to the single peaked JONSWAP spectrum. If in addition r is set
equal to one, the ISSC spectrum results.
Ship responses, regardless of whether the quantities of interest are motions or loads, are
basically inertia dominated, implying that except in extreme situations they are
proportional to wave height. It is the linearity of tbis relationship that justifies the
adoption of the spectral approach_
The response spectrum G(m) (i.e. short-term wave induced load effects) from which the
is calculated, can be determined from the knowledge of the structure transfer function
(Jeq
H(m) and the input sea spectrum Sem) as:
G (co) = S (co) - H2 (m) -
where m is the frequency ofthe wave components.
518 Real-time fault monitoring ofindustrial processes

If the sea spectrum describes a stationary Gaussian process, the assumption that the
transfer function is linear implies that the response is also a stationary Gaussian process,
thus completely described by the response spectrum.
These theoretical formulations describe average spectral shapes expected to occur in the
presence of a given wind or in a sea state of known characteristic wave height and
period. There is however considerable uncertainty in the shape of individual spectrum
due to the large variability of the generation process and of the estimation methods.
The sources of uncertainty in the spectral shape definition are discussed by Guedes-
Soares, (1984), and a method is proposed there to model them and to determine their
effect on the uncertainty of the response parameters. This treatment accounts for both
fundamental and statistical uncertainties in the spectral shape. The results are given in
terms of the response quantities predicted by the standard method of calculation of ship
responses. They indicate the bias and the coefficient of variation of the standard
predictions, being thus a representation of the model uncertainty of that response
calculation method.
The main feature ofthe standard response method is, in this context, the use ofthe ISSC
spectrum to represent all sea states. Thus the results can also be interpreted as the model
uncertainty ofthe ISSC spectrum in representing all types ofsea spectra.
In addition to the uncertainties related with the shape of the spectrum, ship responses are
also subjected to other uncertainty sources such as relative course and speed of the ship.
Thus it is often more meaningful to operate with expected values of responses than with
responses to specific sea states. Different possible formulations of rnean response and
rnean response and rnean bias are examined there.
If the load history under consideration is in the form of a narrow-band random process
(irregularity factor a= 1), the value of the equivalent stress range (1eq can be calculated
frorn the following c1osed-form expression (Cheng, 1988):

2( )[F(1+mI2)]lIm
(1eq = rms 2(1 12i+m / 2

where I(... ) is the gamma function and m is the exponent in the Paris-Erdogan FCG rate
equation. There are no c1osed-form solutions for wide-band (a<O.99) randorn loads.
For wide-band randorn loads, solutions derived frorn nurnerical analysis to convert PSDF
to (1eq are presented in graphical form in Cheng (1988).
FCG rates under sea-wave loading can thus be calculated using the probabilistic fracture
rnechanics approach described in Section 6.3.3 and the equivalent-stress-range concept
described above.
The procedures start with an anticipated (or assurned) stress spectrurn acting on a
component of interest. The equivalent-stress-range for the sea-wave loading is then
calculated.
In-time faHure prognosis and fatigue life prediction of stmctures 519

The COVASTOL computer program, presented in sections 6.3.3,6.3.4 and 6.4.1, or any
sirnilar FCG prediction program, can be applied in a straightforward manner to predict at
any time the residuallife-time and to estimate using inspection data the failure probability
of marine structures under sea-wave action. The main objective of applying this program
is to provide information about the availability of marine structures, which is important
for their efficient design, for defining optimal after-service inspection and maintenance
policies and for estimating the loss of marine structures financial risk.

6.4.3 Structural damage assessment using a causal network

In this section, a simple causal network is built for post-earthquake structural damage
assessment. "Uncertainty" in this context includes the description and prediction of
loading conditions, material properties of structural components, the difference between
the simplistic mathematical modeling of the structure and the actual behavior and loading
path, and the imprecision involved in construction. Evaluation of structural damage by an
expert involves all of these "uncertainties" in some implicit fashion which is assumed to
be appropriately characterized as a degree of belief in the severity of damage from
observations. For example, assigning a degree ofbeliefto an event such as "globalloss of
strength" as "moderate" is assumed to be an accurate characterization of damage
assessment.
The causal form of the network corresponds to the physical "pathway" of damage, i.e.
the two causes of structural failure are inadequate stiffness and insufficient strength.
Creating a causal network in which the component damage "caused" global damage
which "caused" a level of structure failure extent corresponds to the physical model of
damage which is familiar to structural engineers. It would also allow engineers to identify
in a concise manner how the damage would be evaluated at each stage.
It is cognitively easier to make (subjective) probability assessments in the causal
direction. Ifnecessary, the noisy OR gate (eq. (6.45» provides a method for generating
conditional probabilities.
The structural damage assessment network will be evaluated using fig. 6.25. Each node
or event can take on the value ofnone, slight, moderate and severe, i.e. the indices in the
equations (6.44), will all be defined on the range from unity to four. In fig. 6.25, the
events that the nodes represent, are as folIows:
520 Real-time fault monitoring of industrial processes

Node Meaning
Structure failure extent
Globalloss of strength
Globa110ss of stiffness
Component loss of strength
Component loss of stiffness
Global damage to strength
Global damage to stiffness

The network is interpreted as follows. The probability of "structure failure extent" taking
on the values of "none", "slight", "moderate" or "severe" is denoted as p(ek), where k=l,
2, 3, 4, represents the four values. The event "structure failure extent" is "caused" by
globalloss of strength and stiffness, nodes c and d, respectively. The events these nodes
represent are "caused" by "component loss of strength" and "component loss of
stiffness ", respectively, and "global structural changes" such as a change in the natural
frequency. It could be argued that component loss of stiffness is influenced by
component loss of strength; however, this influence is assumed to be small enough that
the two should be treated as different, distinct causes of failure. Preliminary conditional
probability values can be assessed on the basis of experience and limited consultation
with colleagues; additional values are estimated on the basis of the noisy OR gate. It is
necessary to normalize the probabilities generated in this manner.
This aspect ofthe evaluation process is the most complex, and the most uncertain. Given
the input or marginal probabilities for these nodes, the probability p (structure failure
extent) may be calculated using eqns. (6.43)+(6.45) of Section 6.3.6. First, marginal
prob ability values are assessed for nodes al> a2' b l and b2 . These values are input into
eqns. (6.44). Multiplication and summation of the marginal probabilities yield p(ci ) and
p(d), where i, )=1, ... , 4. These values are input into eqn. (6.43) for p(ek), where
multiplication and summation yieldp(ek)' k=1, ... ,4.
For purposes of illustration, numerical examples are given in Table 6.4. It can be seen
that the degrees of belief in the structure failure extent, seem reasonable for the given
inputs. For example, extreme inputs yield extreme results. For mixed inputs on the
component level, the structure failure extent appears to have the greatest degree of belief
associated with moderate damage, as would be expected. Given this information, a
structural engineer would be able to decide whether structural rehabilitation was
required. Structures for which the degree of belief in the structure failure extent being
moderate or severe was high, would be required to und ergo rehabilitation.
A more realistic causal network would inc1ude damage observations for each component
and individual types of global structural changes. The total component loss of strength
and stiffness would therefore be conditional upon the type and severity of damage
observed. Another extension of the present efforts should be to evaluate diagrams which
In-time failure prognosis and fatigue life prediction of structures 521

have horizontal arrows connecting nodes at each level. Any set of influences is permitted
in a causal network as long as the network is acyclic.

Table 6.4 Nurnerical exarnple for network offig. 6.25

p(al) p(a2) p(h l ) = p(h2) p(e) p(d) p(e)

[0 [0 [0 [0 [0 [0
0.33 0 0.33 0.29 0.14 0.19
0.34 0.5 0.34 0.41 0.48 0.5
0.33] 0.5] 0.33] 0.3] 0.38] 0.32]

e=structure failure extent, c=global loss of strength, d=global loss of stiffness,


Ql=component loss of strength, Q2=component loss of stiffness; bl=global damage to
strength, b2=global damage to stiffness.

Reed, (1993), presents a complete multiply-connected network for post-earthquake


damage assessment of a reinforced concrete building, see fig. 6.29. In this figure, the
final node represents "building failure extent". It can be seen that the two basic causes of
structural failure, global loss of strength and global loss of stiflhess, are related to the
framing type and component loss of strength and loss of stiflhess. Observations of
permanent drift, and stiflhess and damping changes are "caused" by the globalloss of
strength and global loss of stiflhess, so the arrows point into these variables. The
component damage, which is broken down into the strength and stiffness inc1udes
observed damage to structural as weIl as non-structural components. This damage
inc1udes observations of spalling, cracking and buckling. Although the diagram shows
only one component, in reality there are many components, which would be shown in a
3-D diagram.
At the present time, the only other structural damage assessment programs are SPERIL
(Yao, 1985) and RAM/NO (Lucia and Volta, 1991). They both employ the uncertain
inference approach described in Section 6.3.6.
This formulation considers both global and local damage, but not in an explicitly causal
manner. An experimental comparison can only be meaningful when the knowledge is
extracted from the same source(s) and the degree of detail is equivalent. Although the
input will be different in the sense that, say, fuzzy set theory will require membership
function, as opposed to subjective prob ability assessment, the type of input, e.g. "what is
the damage to this beam", must be similar.
Of course, the true test will be to evaluate actual damage cases. However, it does not
seem premature to recommend the causal network approach as one in which the
Bayesian updating is accomplished in an effective and reasonable manner.
522 Real-time fault monitoring of industrial processes

Figure 6. 29 Network for post-earthquake damage assessment of a reinforced concrete building.


In-time failure prognosis and fatigue life prediction of structures 523

References

Akyurek T. and O.G. Bilir (1992). A survey of fatigue crack growth life estimation
methodologies. Engineering Fracture Mechanics, 42, 5, p. 797.
Al-Obaid Y.F. (1992). Fracture toughness parameter in a pipeling. Engineering
Fracture Mechanies, 43, 3, p. 461.
Ben-Amoz M. (1992). Prediction offatigue crack initiation life from cumulative damage
tests. Engineering Fracture Mechanics, 41, 2, p. 247.
Bhargava V. et al. (1986). Analysis of cyclic crack growth in high strength roller
bearings. Theoretical and Applied Fracture Mechanics, 5, p. 31.
Bladwin lF., and B.W. Pilsworth (1980). Axiomatic approach to implication for
approximate reasoning with fuzzy logic. Fuzzy Sets and Systems, 3, p. 193.
Bogdanoff lL. and F. Kozin (1985). Probabilistic models of cumulative Damage. John
Wileyand Sons, N.Y.
Bogdanoff lL. and F. Kozin (1984). Probabilistic models of fatigue crack growth.
Engineering Fracture Mechanics, 20, 2, p. 225.
Box G.E.P. and G.M. Jenkins (1976). Time Series Analysis, Forecasting and Control.
Holden-Day, San Francisco, CA.
Brown C.B. and lT.P. Yao (1983). Fuzzy Sets and Structural Engineering. Journal 0/
Structural Engineering, ASCE, 109, 5, p. 211.
Camerini et al. (1992). Application of automated eddy eurrent techniques for off-shore
inspection. In C. Hallai-P. Kulcsar (Eds.), "Non-Destructive testing '92", Elsevier
Science Publishers.
Cheng Y.W. (1985). The fatigue crack growth of a ship steel in saltwater under
spectrum loading. International Journal 0/ Fatigue, 7, 2, p. 95.
Cheng Y.W. (1988). Fatigue crack growth analysis under sea-wave loading.
International Journal ofFatigue, 10,2, p. 101.
Chou K. C. and l Yuan (1992). Safety assessment of existing structures using a filtered
fuzzy relation. Structural Sa/ety, 11, p. 173.
Chow C. L. and K.H. Wong (1987). A comparative study of crack propagation models
for PMMA and PVC. Theoretical and Applied Fracture Mechanies, 8, p. 101.
Cortie M. B. and G.G. Garrett (1988). On the correlation between the C and m in the
Paris equation for fatigue crack propagation. Engineering Fracture Mechanocs, 30, 1,
p.49.
524 Real-time fauIt monitoring of industrial processes

D'Attelis C. et al. (1992). A bank of Kaiman filters for failure detection using acoustic
emission signals. In C. Hallai - P. Kulcsar (Eds.), "Non-destructive testing '92", Elsevier
Science Publishers.
Dubois D. and H. Prade (1986). Fuzzy sets and statistical data. European Journal 0/
Operational Research, 25, p. 345.
Dufresne J., Lucia A, Grandemange J. and A Pellissier-Tanon (1986). The
COVASTOL program. Nuclear Engineering and Design, 86, p. 139.
Dufresne J., Lucia A, Grandemange J. and A Pellissier-Tanon (1988). Probabilistic
vessels-study ofthe failure ofpressurized water reactor (PWR) vessels. Report EUR No
8682, JRC-Ispra (Italy), Commission ofthe European Communities.
Fukuda T. and T. Mitsuoka (1986). Pipeline inspection and maintenance by applications
of computer data processing and Robotic technology. Computers in Industry, 7, p. 5.
Garribba S. et al. (1988). Fuzzy measures of uncertainty for evaluating non-destructive
crack inspection. Structural Sa/ety, 5, p. 187.
Georgel B. and R. Zorgati (1992). EXTRACSION: a system for automatie Eddy
Current diagnosis of steam generator tubes in nuclear power plant. In C. HaIlai - P.
Kulcsar (Eds.), "Non-Destructive testing 92", Elsevier Science Publishing.
Ghonem H. and S. Dore (1987). Experimental study of the constant - probability crack
growth curves under constant amplitude loading. Engineering Fracture Mechanics, 27,
1, p. 1.
Fukuda T. and T. Mitsuoka (1986). Pipeline inspection and maintenance by applications
of computer data processing and Robotic technology. Computers in Industry, 7, p. 5.
Godfrey M.W., Mahcwood LA and D.C. Emmony (1986). An improved design for
point contact transducer. NDT International, 19,2.
Grangeat P. et al. (1992). X-Ray 3D cone beam tomography application to the control
ofceramic parts. In C. Hallai - P. Kulcsar (Eds.), "Non-Destructive testing '92", Elsevier
Science Publishers.
Guedes-Soares C. (1984). Probabilistic models for load effects in ship structures.
Report UR-84-38, Marine Technology Dept., The Norwegian Institute of Technology,
Trondheim, Norway.
Hadipriono F. and T. Ross (1987). Towards a rule-based expert system for damage
assessment of protective structures. Proceedings 0/ International Fuzzy Systems
Association (IFSA) Congress, Tokyo, Japan, July 20-25.
Hagemaier D.J., Wendeibo AH. and Y. Bar-Cohen (1985). Aircraft Corrosion and
detection methods. Materials Evaluation, 43, p. 426.
Halford et al. (1989). Fatigue life prediction modeling for turbine hot section materials.
ASME Journal 0/ Engineering/or Gas Turbines and Power, 11, 1, p. 279.
In-time failure prognosis and fatigue life prediction of structures 525

Hasselmann K. et al. (1976). A parametric wave prediction model. Journal 01 Physical


Oceanography, 6, p. 200.
Hoeppner D.W. and W.E Krupp (1974). Prediction of component life by application of
fatigue crack growth knowledge. Engineering Fracture Mechanics, 6, p. 47.
Hogben H. et al. (1976). Environmental conditions. Report 01 Committee 1.1,
Procedings 016th International Ship Structures Congress, Boston.
Hull B. and J. Vernon (1988). Non-Destructive Testing. MacMillan Education,
London.
Journet B.G. and R.M. Pelloux (1987). A methodology for studying fatigue crack
propagation under spectrum loading: application to rail steels. Theory and Applications
01 Fracture Mechanics, 8, p. 117.
Jovanovic AS. et al. (1989). Expert Systems in Structural Safety Assessment.
Springer-Verlag, Berlin, 1989.
Kalyanasundaram P. et al. (1991). Brit. Journal 01 NDT, 33, 5, p. 22l.
Komatsu H. et al. (1992). Basic study on ECT data evaluation method with neural
network. In C. Hallai and P. Kulcsar (Eds.), "Non-Destructive Testing '92", Elsevier
Science Publishers.
Kozin F. and J.L. Bogdanoff (1992).Cumulative damage model for fatigue crack growth
based on reaction rate theory. Engineering Fracture Mechanics, 41, 6, p. 873.
Landez J.P. et al. (1992). Ultrasonic inspection ofvessel closure head penetrations. In
C. Hallai - P. Kulsar (Eds.), "Non-Destructive testing 92", Elsevier Science Publishers.
Lankford J. and SJ. Hudak Jr. (1987). Relevance ofthe small crack problem to lifetime
prediction in gas turbines. International Journal 01 Fatigue, 9, 2, p. 87.
Lucia AC. (1985). Probabilistic structural reliability ofPWR pressure vessels. Nuclear
Engineering and Design, 87, p. 35.
Lucia AC., Arman G. and A Jovanovic (1987). Fatigue crack propagation:
probabilistic models and experimental evidence. In Trans. 9th SMiRT Conj, Vol. M,
Lausanne, p. 313.
Lucia AC. and G. Volta (1991). A knowledge-based system for structural reliability
assessment. Trans. SMiRT 11, Vol. SD1, Tokyo, pan.
Ludwing and Roberti (1989). A nondestructive ultrasonic imaging system for detection
of flaws in metal blocks. IEEE Transactions on Instruments and Measurements, 38, 1.
Madsen H.O., Krenk S. and N.C. Lind (1986). Methods of structural safety. Prentice-
Hall, N.J., USA
Marci G. (1992). A fatigue crack growth threshold. Enginneering Fracture
Mechanics, 41, 3, p. 367.
526 Real-time fault monitoring of industrial processes

Mohammadi 1. et al. (1991). Evaluation of system reliability using expert opinions.


Structural Safety, 9, p. 227.
Namioka T. et al. (1992). Development and experience pipeline inspection robots by TV
camera. In C. Hallai - P. Kulcsar (Eds.), "Non-Destructive testing '92", Elsevier Science
Publishers.
Nielsen N. (1981). P-scan system for ultrasonic weId inspection. British Journal of
NDT, March 1981, p. 63.
Nisitani H., Goto M. and N. Kawagoishi (1992). A small-crack growth law and its
related phenomena. Enginering Fracture Mechamics, 41, 4, p. 499.
Parpaglione M.C. (1992). Neural networks applied to fault detection using acoustic
emission. In C. Hallai - P. Kulcsar (Eds.), "Non-Destructive testing '92", Elsevier
Science Publishers.
Raj, B., (1992) "Reliable solutions to engineering problems in testing through acoustic
signal analysis", in "Non-Destructive testing 92", C. Hallai and P. Kulcsar (Eds.),
Elsevier Science Publishers.
Reed D.A. (1993). Treatment of uncertainty in structural damage assessment.
Reliability Engineering and Systems Safety, 39, p. 55.
Sandberg G. et al. (1989). The application of a continuous leak detection system to
pipelines and associated equipment. IEEE Transactions in Industry Applications, 25, 5,
p.906.
Schicht A. and A. Zhirabok (1992). The integrated Expert Systems for NDT in quality
control systems. In C. Hallai - P. Kulcsar (Eds.), "Non-Destructive testing 92", Elsevier
Science Publishing.
Shiraishi N. et al. (1991). An expert system for damage assessment of a reinforced
concrete bridge deck. Fuzzy Sets and Systems, 44, p. 449.
Singh G.P. and S. Udpa (1986). The role of digital signal processing in NDT. NDT
International, 19, 3, p. 125.
Solomos G.P. and V.C. Moussas (1991). A time series approach to fatigue crack
propagation. Structural Safety, 9, p. 211.
Stavrakakis G.S. (1990). Quality assurance of welds in ship structures. Quality and
Reliability Engineering International, 6, p. 323.
Stavrakakis G.S. (1992). Improved structural reliability assessment using non-linear
regression techniques to process raw fatigue crack growth test data. Quality and
Reliability Engineering International, 8, p. 341.
Stavrakakis G.S. (1993). An efficient computer program for marine structures reliability
and risk assessment. The Naval Architect, July/August '93, p. E342.
In-time failure prognosis and fatigue Iife prediction of structures 527

Stavrakakis G.S., Lucia AC. and G. Solomos (1990). A comparative study of the
probabilistic fracture mechanics and the stochastic markovian process approaches for
structural reliability assessment. International Journal Pres. Ves. Piping, 41, p. 25.
Stavrakakis G.S. and A Pouliezos (1991). Fatigue life prediction using a new moving
window regression method. Mechanical Systems and Signal Processing, 5,4, p. 327.
Stavrakakis G.S. and S.M. Psomas (1993). NDT data interpretation using Neural
Networks. In "Knowledge based system applications in power plant and structural
engineering", SMiRT 12 post-conference Seminar no. 13, August 23-25, Konstanz,
Germany.
Thoft-Christensen P. and 1.D. Sorensen (1987). Optimal strategy for inspection and
repair of structural systems. Civil Engineering Systems, 4, p. 17.
Van Dijk G.M. and 1. Boogaard (1992). NDT reliability - a way to go. In C. Hallai and
P. Kulcsar (Eds.), "Non-Destructive testing '92", Elsevier Science Publishers.
Vancoille MJ.S., Smets HM.G. and F.L. Bogaerts (1993). Intelligent corrosion
management systems. In "Knowledge based system applications in power plant and
structural engineering" SMiRT 12 post-conference Seminar no. 13, August 23-25,
Konstantz, Germany.
Verreman Y et al. (1987). Fatigue life prediction of welded joints - a reassessment.
Fatigue Fracture Engineering and Materials Structure, 10, 1, p. 17.
Virkler D.A, Hillberry B.M. and P.K. Goel (1979). The statistical nature of fatigue
crack propagation. ASME Journal Enginnering Materials Technology, 101, p. 148.
Yanagi C. (1983). Robotics in material inspection. The NDT Journal 0/ Japan, 1, No 3,
p.162.
Yao I.T.R. (1985). Safety and Reliability of Existing Structures. Pitman Publishing,
Marshfield.
Zhu W.Q. and YK Lin (1992). On fatigue crack growth under random loading.
Engineering Fracture Mechanics, 43, 1, p.l.
AuthoT index

Boden 344
A Bogdanoff464, 479, 480
Adamopoulos 368 Bonivento 101
Adams 101 Boogaard 434
Adelman261 Boose 265, 269
Ahlqvist 333 Bothe404
Akyurek 468, 469 Box 28, 484
Al-Obaid 469 Bradshaw 265
Ali 18, 19,20 Brailsford 317
Aljundi 416 Brown 499,501,505
Alty 269 Brole 271
Anderson 9, 16, 17, 19, 104, 105, 111, 273 Buchanan 261,263
An~aklis 138,285,371,408
Armstrong 171
C
Arreguy 338 Camerini 511
Ast 66 Cao 393
Athans 102, 113 Carlsson 211, 229
Carpenter 385, 389
B Carriero 280, 281
Baines 43 Cecchin 22
Ballard 375, 378 Chan 371
Bandekar 274 Chang 401, 414, 417
Baram 116 Chen 100, 139, 140, 148,330,331,401
Barschdorff371,404 Cheng 512, 513, 518
Bartlett 416 Cheon 417
Baskiotis 229 Chien 101
Basseville 3, 103, 118, 119, 120 Chin 57
Bavarian 401 Chitturi 18, 19
Beattie 163 ChoI67,247,382,393
Ben-Amoz 467 Chou 501, 503
Bennett 7,9 Chow93, 101, 129, 149,371,404,477
Benveniste 118, 120 Clark 125
Bems 371 Coats 167
Bhargava 465 Cohen 261, 262, 383
Bickel105 Console 275
Bierman 199 Contini 271,273
Bilir 468, 469 Cordero 208
Blazek 29, 37, 40 Cortie 470,471
Blount 271 Cue 50, 54, 64
Blumen 17 Cybenko 382
530 Real time fault monitoring of industrial processes

D Froechte 167
Fuchs 246
Daley 135
Dalla Molle 224, 225 G
Danai 57
Gaines 269
Darenberg 157
Gantmacher 142, 190
Davis 271, 401
Garrett 470, 471
De Kleer 277, 302, 356
Geiger 221,234,235
De Mello401
Gelernter 280, 281
Deckert 101, 120, 129
Gertler 93,95, 168,273
Dehoff229
Ghonem 468,470,481
DeLaat 161
Gien 401
Dernpster 505
Godfrey452
Desai 101
Goodwin 211,213,214
Dialynas 237, 406
Grangeat 450, 452, 457
Dixon 8
Gray 267,268
Dobner 167
Greene 113
Doel 322, 332
Grizzle 167
Dolins 288, 291
Grogono283
Dong496
Grossberg 383, 385, 389
Dore 468,470,481
Grober 261,262
Dorernus 418
Guedes-Soares 514, 515, 518
Dounias 271
Guo415,417
Dowdle 120
Gupta 277, 279, 282
Dubois 505
Gustaffson 120
Dufresne 434,468,473,475,506
H
E
Hadipriono 496
Edelmayer 301
Hagernaier 439
Elkasabgy 75
Halford465
Engell144
Hamilton 333
Eryurek412
Hammer 272, 329
F HasseImann 515
Hawkins 29
Favier 200
Hedrick 167
Feldman 375, 378
Henry 229
Feng 371
Hickman 260, 270
Fink 273, 275, 335
Himmelblau 26, 40, 100, 224, 370, 371,
Finke 401
396
Forbus 274
Hoeppner 468,469,471,507
Forsythe 261,263
Hoerl28
Fortescue 206
Hoey401
Frank 100, 124, 126, 127, 141, 148, 149,
Hoff 379
170,171,273
Hogben 514,515
Franklin 7, 9
Hopfield 373, 374, 378, 383, 384, 385
Freiling 275
Hoskins 371,396
Freyermuth 246, 273, 338
Author index 531

Hudak465 Kramer 28
Hudlieka 273 Krupp 468, 469, 471, 507
HuH 432, 442 Kuan 320
Hunter 29, 30 Kumamaru 102,223
Kusiak 402
I
Kwon 211,213,214
Ikeuem 56, 59
L
Ikonomopoulos 414,419
Ioqnnou 383 Lainiotis 102, 113
Irwin 204 Landez449
Iserman 181, 182, 183,246,247,273 Lankford 465
Ishiko 330, 331 Laws54
Lee 339, 343, 355, 356, 357, 415
J Lehr 378, 379
Janik 246 Lesser 273
Janssen 127 Li 66,70
Jenkin 111 Ligget 18
Jenkins 484 Lin470, 475
Johannsen 269 Ljund 36
Johnson265 Ljung 203,216
Jones 102, 116 Lo401
Journet 470 Lou 129, 146
Jovanovie 488 Loukis 71, 75
Lueas 29, 31
K
Lueia 434, 463, 464, 468, 469, 470, 479,
Kaiser 260 480,492,521
Kalouptsidis 203, 204 Ludwing457
Kalyanasundaram 459 Luger 260
Kaminski 249 Lusth 273, 275, 335
Kangethe 100 Lyon 43, 45, 53
Karakoulas 273, 297
Karkanias 138 M
Kasper 159 MaeDonald 268
Kawagoe 107 MaeNeilll7,18
Kendall 3, 4, 6, 9, 10, 12, 17, 19 Madsen 467
Kim 382, 393, 415, 417, 418 Maguire 204
Klein 264 Majstorovie 271
Kohonen 371, 374, 378, 387, 390, 392, Marei 469
393,401,402,403,415 Marsh 275
Komatsu 458 Maruyama 297,300,301,364
Konik 144 Massoumnia 100, 129
Konstantopoulos 371, 409 Matsumoto 405
Kosko 385 Mayne 208
Kosmatopoulos 383, 385 MeClelland 378, 380, 379
Kouvaritakis 138 Mehra 12, 101, 103
Kozin 464,479,480 Merrill 161,229
532 Real time fault monitoring of industrial processes

Merrington 229 Patton 100, 136, 139, 140, 145, 148, 153,
Miguel371 154
Milne 271 Pelloux 470
Minsky 374, 379 Peng 273
Mirchandani 393 Pengelly 66
Mironovskii 10 1 Peschon 101, 103
Mitchell 48, 50, 55, 62, 63 Pignatiello 29,35
Mitsuoka 452 Polycarpou 383
Mohammadi 492, 493, 494 Pomeroy285
Monostori 338 Pot 200
Moon401 Potterl0l, 135, 197,202
Moore 140 Pouliezos 12, 102, 106, 111, 112, 116, 121,
Morpurgo 271 122, 182, 195, 218, 221, 273, 339,
Moskwa 167 485,486,487
Moussas 482, 483, 487 Prade 505
Müller 102, 116 Prasad271
Mussi 271 Prock 296
Protopapas 334
N
Psomas 462
Naidu 371, 411
Namioka453
R
Narayanan 270 Raj 459
Nawab 261 Randa1l45, 50, 55, 59, 63, 64, 66, 71
Nett 408, 409, 411 Randles 16, 17
Neumann 246,337,338 Rasmussen 264
Nielsen445 Rauch 371
Nikiforov 103 Ray 101
Nisitani 470, 474 Reed 401,504,505,521
Nold 183 Reese 288, 291
Noore 325, 327 Reggia273
Novak334 Reiss 246
o Rhodes204,273,297
Rizzoni 167
Obreja302 Robert 18
Ogi 406 Roberti 457
Ohga 417 Roh 418
Ono 102 Ross 496
p Roth264
Rouse 261
Palm 28 Rummelhart 378, 379, 380
Pandelidis 277,356
Pao 370,382,390,415 S
Papert 374, 379 Saccucci 29, 31
Pappis 368 Sahraoui 335, 338
Parpaglione 458 Sakaguchi 405
Passino 285, 350, 371 Sandberg 493
Author index 533

Sanders on 401 Upadhyaya 413


Schneider 170, 171
Schutte 402
V
Seki 417 Van Dijk 434
Shafer 505 Van Soest 273
Shah496 Van Trees 119
Shibata 193 Vance 26
Shiraishi 492 Vancoille 492
Singh 452, 457 Vander Wie129
Smed229 Vasilopoulos 37
Solomos 482, 483, 487 Venkatasubramanian 371, 401
Sorensen 464 Vemon 432,442
Sorsa 371,393,401 Verreman 468
Soumelidis 301 Virkler 468,472,473,476,477,481,487
Spee 21 Viswanadham 270
Stavrakakis 102, 111, 116, 121, 182, 195, Volta 492,521
204, 218, 221, 237, 242, 273, 339,
406, 462, 472, 476, 477, 481, 485,
W
486,487, 511 Wahlberg 211,215
Stephens 21 Wake 405
Stubblefield 260 Wald 106, 151
Suman 101, 135 Wallace 21
Suna 371 Wanke 246
Syed 371, 402 WaIWick 320
Wasserman 382
T
Watanabe 100, 101,371
Takahashi 297, 300, 301, 364 Watts 111
Tanaka 102, 116 Wee 402
Tesch 317 Wehenkel 321
Theodoridis 204 Weiss 211
Thoft-Christensen 464 Widrow 378, 379
Thompson 14, 15 Wiele 297
Thomton 199 Willcox 136
Tinghu 404 Willner 102, 113
Tonielli 101 Willsky 93, 101, 102, 116, 120, 129, 149
Torasso 275 Witten 268
Tracy 33,40 Wond 477
Trave-Massuyes 274 Wong496
Tsoukalas 267, 414, 419 Woods 264
Tukuda452 Wu 66, 70
Tzafestas 242, 271, 351 WÜlmenberg 100, 141, 148
U Y
Udpa 452,457 Yamashina 371,404
Uhrig 414,415,417,419 Yanagi 445,448
Uosaki 101, 107 Yao 499,501,505
534 Real time fault monitoring of industrial processes

Yashchin 37
Ydstie 207
Yeh205
Yoon 272, 329
Yoshimura 101
Young 182
Yuan 501, 503
Z
Zeilingold 401
Zhu 470, 475
Subject index

testing 101
A
cluster 373, 385, 391, 392, 394,415
accumulated cycles 471, 485
diagrams 390
activation function 374, 376, 388, 389
COBRA416
activity 374, 381
competition 388
adaptive resonance theory 374, 385, 393,
condensed nearest neighbor 405
401
connectionist expert system 418
algorithm
content addresable memory model 373
modified Gram Schmidt 249
continuous spectrum 438
square root 197,204
control chart
U-D factorisation 198
CUSUM 29, 34, 36
analytical redundancy 99
exponentially weighted moving average
ARMAX model 187
29
ARX model 188
multivariate 33
ASCOS scheme 127
multivariate Shewhart 34
associative memory model 373
univariate Shewhart 26
attentional phase 387
correct detection 97
autocorrelation matrix 110
correlation coefficient 111
autoregressive model with exogenous signals
cost function 192
188
covanance
autoregressive moving average model with
matrix 197
exogenous signals 187
instability 196
autospectrum 60, 61
singularity 200
B cross spectrum 54, 60, 61
back-propagation 373, 380, 383, 393, 401, crosspower spectrum 459
405,406,410,411,413,417,418,419 CTLS 193
backward likelihood ratio function 107 curve analysis fauIt diagnosis 287
Bayes rule 114 cyc1e-counting 512
bearings failure diagnosis 48, 50, 66, 68, cyc1es to failure 466, 467
322 cyc1es to rupture 465
bilevel function 376 cyc1ic load 465
black box identification 215
D
bubble 391
data weights 187, 190,205,226
C decision function 100, 158, 159, 160
causal network 286, 519 deconvolution 148
chi -squared decoupling
distribution 105, 109 approximate 145, 147
non-central 119 dedicated observer scheme 124
random variable 119 departure from nuclear boiling ratio 415
536 Real time fault monitoring of industrial processes

detectability 118 methods 166


detection ontology 274
delay 97, 104 physical system, mathematical model
sensor noise 112 272,273
direct access inequality 388 shallow reasoning 271, 272
discrete Fourier transfonn 87 validation 283
distinguishability 119, 13 5 with neural networks 420
distribution
F
matrix 137
FA layer 385
of cycles 473
failure detection, see fault detection
Wishart 105
failures
disturbance decoupling 139, 147
hard 124
dynamic profile 116
soft 124
E false alann 97,98, 100, 106, 110, 119, 138,
effect of misalignment 50, 51 164, 165
eigenstructure assignment 100, 136 fast Fourier transfonn 89
equation, Rieatti 189 fatigue cycles 468
equivalent stress range 518 fatigue life predietion
error methods 463
parity 151 of structures 430
predietion 189 quality 476
ESCOS scheme 127 real time 473
estimation fault
instrumental variable 191 additive 95,96,97,99, 116, 120
least squares 212 isolability 97
estimator, self-tuning 209 isolation 93, 94, 97, 98, 99, 100, 101,
event-based fault diagnosis 284 102, 125, 126, 128, 162, 163
exceedanees speetrum 512, 513 multiplieative 95,96,97, 116
exeeptional events fuzzy logie diagnosis 364 non-additive 120
expert struetural damage assessment partial isolation 111
eausal network 503, 504, 519, 520 robustness 97, 100, 102, 105, 116, 118,
fuzzy causal network 505 120, 122, 123, 124, 126, 129, 136,
fuzzy relation 496, 500 137, 144, 147, 154, 158, 159, 160,
fuzzy set eoneept 494 165
expert system sensitivity 97, 153
attribute grammar 276 signature 42, 44, 57, 94, 99, 101, 118,
automatie proeess fault diagnosis 257, 131, 135, 153
271 veetor 393
automatie process fault monitoring 277 fault detection, see also fault monitoring
causal knowledge 272, 273, 275 actuator 94, 96, 123, 124, 137
connectionist 418 eomponent 94, 97, 101, 123, 124, 126,
deep knowledge 272, 274, 301, 329, 334, 127,137
352,359 decision on 218
hybrid reasoning 275, 352 detenninistic 94, 99, 124
hypothesis fonnulation/hypothesis testing of abnormal events in nuclear plants 417
274,275 in aerospace engineering 161
Subject index 537

in check valves 413 in low voltage bus 317


.in CIM 402 in machine tools 335
in control rod wear 419 in motor pumps 321
in cutting processes 404 in nuclear power plant 301, 411
in electrical drives 20 in power systems 320
in evaporators 224 in power transmission substations 318
in gas turbines 228 in robots and CIM systems 335
in grinding processes 21, 246 in rotating machinery 322
in heat exchanger 400 in SCADA system 311
in induction motors 75, 247 in supply networks 317
in industrial processes 246 parameters 205
in internal combustion engines 166 process 182
in jet engines 153 schemes 205
in loose parts 413 system 181, 182
in machining operations 246 FB layer 385
in nuclear plant (transients) 415 feedforward neural networks 373, 374, 378,
in power systems 237, 405 379, 383, 393, 399, 401, 404, 405, 406,
in pumps 231 410,416
inrobots 170,242,247 filter
in rotating machines 322, 404 extended Kaiman 227
in servovalve 403 KaIman 193, 194, 197, 199, 200, 202,
in stirred tank reactors 396 204,227
in system noise 121 normal-mode 103
in system parameters 122 filtering
in transition matrix 121 state variable 182, 183,235
in transportation engineering 156 Finite Integral Squared Error 410
instrument 94, 124, 158 firing frequency 376
neural networks 369 forgetting factor 191, 192, 193, 195, 200,
neural-fuzzy system 419 206,207,209,225,237,245
observer-based 93, 100, 122, 124, 125, time-varying 192
126,127, 136,141,155,170,171 variable 207
parameter estimation 179 four-parameter controller 408
parity relations 100 Fouriertransfarm 59,60,61,89,90,91
parity space 159, 167 frequency analysis 44, 45, 49
pattern recognition approach 122 function
qualitative 93 backward likelihood ratio 107
quantitative 93, 100 cost 192
sensor 123 likelihood ratio 101, 106, 164
stochastic 94, 101, 102 loss 191
unknown input observer 100 fuzzy logic fault diagnosis 297, 300
fault monitoring
G
ANN based 392
gain 206
algorithm 218
gas turbine diagnosis 70, 71, 74, 153, 228,
in mechanical and electrical domains
322
328
gearbox failure 64, 244
in distribution cables 320
General Machinery Criterion Chart 48
538 Real time fault monitoring of industrial processes

generalised delta-rule 373, 374, 380, 381, J


393,397,399,406,410 jet engine fault diagnosis 153
generaIised likelihood ratio test 102, 116,
K
118,119,120
Kaiman filter 101,102, 114, 117, 163, 193,
generalised ob server scheme 126
194,197,199,200,202,204,227
global minimum 382
bank 115
GLR 102, 116, 118, 119, 120
extended 115, 227
Gram-Schmidt orthogonalization 198, 201,
gain 102, 163, 188
249
innovations 103
graphicaI
knowledge
aids 37
acquisition 259, 260, 261, 265, 267,
display 37, 322
272,273,298
form 518
acquisition process 270, 271
interface 422
acquisition too1266, 269, 270, 271, 333
interface system 270
capturing 267
means of monitoring I
elicitation 258,262,263,269,271
representation 471
engineering 257, 258, 259, 261, 268,
solution 497, 498
269,274,284,343
user interface 332
representation 258, 261, 268, 269, 271,
Grossberg model 383
276,329,345
H -based systems 259
hidden layer 379 Kohonen model 374, 387, 390, 401, 402,
hierarchicaI ob server scheme 127 403,415
Hopfield model 373, 374, 378, 383, 385 Kronecker canonica1 form 142
hybrid Kullback discrimination index 223
expert diagnosis 339
system 334
L
layer 373, 378, 379
systems 344, 352
calculation 393
hyperbolic tangent fimction 384
FA385
hypertext266, 267, 270
FB385
hypothesis testing 4,221
hidden 379
I input 379
identifiability 183, 200 multiple 380
induction machine output 379
broken bars detection 75 single 374, 379
innovations leaming
sequence 101, 114, 115, 118, 122 ofneural networks 374,378
standardized 103 rate 381, 389, 390, 393, 397, 399, 403,
variance 108, 199 410
input layer 379 least squares 182, 212, 226, 233, 235, 237,
instability, covariance 196 244,245
instrumental variable 191 forgetting factor 225
inverse transform 88,89,91 non-recursive 189
isolability 97, 123 recursive 187, 190
recursive constant trace 193
Subjeet index 539

reeursive sliding window 195, 196 neighborhood 391


reeursive weighted 192 neuralnetwork
life eycle activation funetion 374, 376, 388, 389
of an automated plant 296 aetivity 374,381
validation 283 adaptive resonanee theory models 374,
lifetime 385,393,401
distribution 473 back-propagation 373, 380, 383, 393,
estimation 465 401, 405, 406, 410, 411, 413, 417,
ofthe eomponent 478 418,419
prediction 478 characteristics 374
likelihood ratio 221 condensed nearest neighbor model 405
funetion 101, 106, 164 • feedforward 373, 374, 378, 379, 383,
LLR 101, 106, 164 393, 399, 401, 404, 405, 406, 410,
reeursive 106 416
load eycles 465, 471 generalised delta rule 373, 374, 380,
logistie funetion 376 381,393,397,399,406,410
loss funetion 191 global minimum 382
LS 182, 189,212 Kohonen model 374,387,390,401,402,
403,415
M leaming 374, 378
maehine
supervised 374, 378
health through noise and vibration
unsupervised 373, 374, 385, 390, 402
analysis 1,43, 50
leaming rate 381, 389, 390, 393, 397,
monitoring 43,49, 58
399,403,410
induetion 257,259,268
MAXNET models 387
marine struetures safety 509
momentum 381, 410
Markov ehain 110
multilayer feedforward 393
MAXNET387
nodes 374, 375, 381
MMAF 102, 113, 163
output funetion 374, 375, 387, 394
model
radial basis funetion 383
autoregressive moving average with
reeurrent high-order 383
exogenous signals 187
self-organizing 373, 374, 385,403
autoregressive with exogenous signals
strueture 373
188
topologies 378
modified Gram Sehmidt algorithm 249
weights 379, 381
momentum 381, 409
with expert systems 420
moving window 104, 106, 119, 134, 187
neural-fuzzy 418
least squares algorithm 195, 196
neurons 372, 373
length 105
aetivation eharacteristies 376
mean 104
features 374, 375
multilayer
firing frequency 376
neural networks 393
output eharaeteristies 377
multiple model adaptive filters 102, 113,
state history 383
163
winning 403
N nodes 374, 375, 381
nearest neighbors 391 noise analysis 53, 64
540 Real time fault monitoring of industrial processes

non-destructive testing coefficients 150, 151


acoustic emission 433,451,452,460 normalized 153
crack depth gauges 455 equation 132, 152
eddy current 433, 438, 456, 511 error 151
laser-induced ultrasonics 453 failure direction in 135
liquid penetrant 433, 435, 555 function 132, 134, 150
magnetic particle 433, 436, 455, 509 function structure 150
optical inspection probes 452 generalized vector 132
radiography 433, 449, 450, 451 generalized parity space 13 2
thermography 455 primary equation 167
time-of-flight diffraction 454 relation 132, 149, 151
ultrasonic testing 433, 440, 447, 459 signature-to-parity error ratio 153
non-persistent excitation 191 space 93, 101, 129, 136, 159
nuclear reactor safety 506 structure 152
vector 131,134,167
o pattern
ob server 122
associator 373
ASCOS scheme 127
recognition 370, 372
dedicated observer scheme 124
perceptron 372,374, 379, 382, 393, 401
dynamics 137
Petri net fault diagnosis 291, 295,338,360
eigenvectors 139
post-earthquake damage assessment 519,
ESCOS scheme 127
521
fault detection 100, 122
power
full order 100,123
cepstrum 64,92
gain 100
spectrum 73, 89, 90, 91, 460
generalised ob server scheme 126
prediction error 189
hierarchical observer scheme 127
process model 182
reduced order 100
robust 136, 155 R
simplified observer scheme 125 radial basis function neural networks 383
UnknO\\l1 input 100, 141 reciprocating machine diagnosis 45, 70, 72
-based fault detection 93, 136, 170, 171 recurrent high-order neural networks 383
orienting phase 387 recursive estimation
orthogonal constant trace least squares 193
complement 131 least squares 182, 187, 190, 226, 233,
transformation 198 235,237,244
output least squares with forgetting factor 225,
function 374, 375, 387, 394 245
layer 379 sliding window least squares 195, 196
zeroing 138 weighted least squares 192
p redundancy 119, 124
analytical 93, 94, 101, 129
parallel processors 204
direct 100, 129
parameter estimation 93
hardware 93
in fault detection 179
parallel 10 1
parity
relations 123
checks 99, 10 I
software 170
Subject index 541

system 119 techniques 363


temporal 129, 133 RWLS 192
residual 94, 98, 99, 100, 101, 104, 112,
118, 123, 129, 130, 131, 137, 148
S
S-N curves 465
bias 135
sampIe variance, recursive window
covariance 103, 112
calculations 222
lifetime 430, 485
SAMSON 418
generation 93
sea spectrum 515,517,518
resonance 388
self-organizing neural network 373, 374,
Ricatti equation 189
385,403
RLS 187, 190, 225, 226, 233, 235, 237,
self-tuning estimator 209
244,245
sensitivity 97, 153
robust
sensor noise detection 112
black box identification 215
sequential probability ratio test 101, 106
disturbance ob server 158
backward 107
estimate 216
mean detection time 108
exponentially weighted moving average
SGLR 117
31
sgn function 105
fault detection 93, 94, 136, 141, 155,
shift structure 202
212,215
ship responses 517, 518
observer 141, 145
sigmoid function 375,376,384,399,413
parameter estimation 211
sign statistic 105
parity relations 149
signal processing
residual generation 137, 138, 149, 154,
acoustic signal 338, 460
155
applications in automated NDT 456, 459
signal detector 13
features 336
stochastic embedding 212
importance 261
robustness 97, 100, 102, 105, 116, 118,
in acoustic emission 452
120, 122, 123, 124, 126, 129, 136, 137,
symbolic data transformation 290
144, 147, 154, 158, 159, 160, 165
simplified generalised likelihood ratio test
e-structure assignment 136
117
observer based 136
simplified observer scheme 125
RTWorks 422
singular value decomposition 146
rule-based
spectrum processing 461
diagnosis 301
SPRT 101, 106
diagnostic systems 277
backward 107
expert system diagnosis 334
mean detection time 108
high-speed implementations 277
square root algorithm 197, 204
inference 273
state estimation error 123
interpreter 281
state variable filtering 182, 183,235
knowledge systems 277
statistical aids
languages 282
multivariate 15
program 278, 282
autocorrelation 18
programs 277
hypothesis testing 4, 13
reasoning 289
limit checking 3, 20
task 282
observation windows 3
542 Real time fault monitoring of industrial processes

sampIe mean 5,6, 8, 9, 16 T2 16, 17, 104


univariate 2 two stage methods 111
statistical process control, polyplot 38, variance 10, 12
40 whiteness 9,10,18,110,111
stochastic embedding 212,229 threshold function 100, 378
stress transformation, orthogonal 198
cycles 466, 468, 475
U
spectrum 518
U-D factorisation algorithms 198, 199, 200,
-range histogram 513
201,202,205
structural damage 463, 473, 488, 490, 501,
uncertainty
519,521
modeling 145
structural damage assessment
structured 97, 136, 137, 154
intelligent systems 488
unstructured 97, 147
phenomenological approach 463, 464
unknown input 137
probabilistic fracture mechanics 464,
observer 141
467,481,506,509,518
unsupervised leaming 373, 374, 385, 390,
stochastic process approach 478
402
time series analysis approach 482
subspace V
invariant 13 8 VDI 2056 48
observable 130 vibration
unobservable 130 components 460
supervised learning 374,378 signals 460
SVF 182,235 analysis 44
system analysis 58, 59, 60 and noise analysis application examples
64
T
criterion chart 48
template leaming inequality 388
signature 52
test
transducers
autocorrelation 18
accelerometers 51
covariance 17,105
mechanicallevers 52
drift 13, 23, 24
proximity probes 51
generalised likelihood ratio 102, 116,
velocity probes 51
118, 119, 120
mean 4, 6, 7, 15 W
non-parametric 105 wind-up 193
multivariable component sign test 105 window
residual mean 104 moving 104, 106, 119, 134
robustness 116 winner
sequential probability ratio 101, 106 unit 388
sign 7,8, 17 neurons 403
simple statistical103 Wishart distribution 17, 105
simplified generalised likelihood ratio
117
standard deviation 23, 28
statistical 94
steady state 12,23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy