Pratama 2014
Pratama 2014
1, JANUARY 2014 55
Abstract— Most of the dynamics in real-world systems are be merely interpreted qualitatively, inexactly, or uncertainly.
compiled by shifts and drifts, which are uneasy to be overcome Traditional approaches in designing the fuzzy system are
by omnipresent neuro-fuzzy systems. Nonetheless, learning in unfortunately overdependent on expert knowledge [2] and usu-
nonstationary environment entails a system owning high degree
of flexibility capable of assembling its rule base autonomously ally necessitate tedious manual interventions. This for brevity
according to the degree of nonlinearity contained in the system. leads to a static rule base, which cannot be tuned once its initial
In practice, the rule growing and pruning are carried out merely setting to gain a better performance. The designers noticeably
benefiting from a small snapshot of the complete training data to have to spend a laborious time to examine all input–output
truncate the computational load and memory demand to the low relationships of a complex system to elicit a representative
level. An exposure of a novel algorithm, namely parsimonious
network based on fuzzy inference system (PANFIS), is to this rule base constraining its practicability in evolving dynamic
end presented herein. PANFIS can commence its learning process and time critical environments.
from scratch with an empty rule base. The fuzzy rules can be This issue has led to the development of neuro-fuzzy
stitched up and expelled by virtue of statistical contributions of systems (NFSs) [42], a powerful hybrid modeling approach
the fuzzy rules and injected datum afterward. Identical fuzzy that assimilates the learning ability, parallelism, and robustness
sets may be alluded and blended to be one fuzzy set as a pursuit
of a transparent rule base escalating human’s interpretability. of neural networks with the human-like linguistic and approx-
The learning and modeling performances of the proposed imate reasoning traits of the fuzzy logic systems. The complex
PANFIS are numerically validated using several benchmark and dynamic natures of real-world engineering problems are in
problems from real-world or synthetic datasets. The validation general complicated by time-varying or regime shifting issues.
includes comparisons with state-of-the-art evolving neuro-fuzzy Classical NFSs are in contrast trained completely from offline
methods and showcases that our new method can compete and
in some cases even outperform these approaches in terms of data and remain as static models, which are impractical for
predictive fidelity and model complexity. nonstationary environments.
Recently, coping with nonstationary environments has
Index Terms— Evolving neuro-fuzzy systems (ENFSs),
incremental learning, sample-wise training. drawn intensive research works for typical NFSs, which are
able to adapt their parameters and to automatically expand
their design contexts simultaneously. Evolving NFS (ENFS)
I. I NTRODUCTION
based on the concept of incremental learning [3] accordingly
A. Preliminary opens a new unchartered territory as a plausible solution
phase with the use of an up-to-date dataset ought to be cannot reflect real data distributions properly and imposes
reciprocally executed whenever a new knowledge needs to be upper and lower bounds of training data to be available before
incorporated. A progressive learning of the new pattern will the learning process commences. On the other side, another
in essence modify the initial trained model in such a way that major shortcoming is a fact that ubiquitous approaches in
the previously learned but possibly still valid knowledge is devising EFNS benefit a product or a t-norm operator. This
completely erased. That is, the newly appearing information approach is deemed deficient as it does not underpin the
catastrophically discarded the model’s memory of the previous possible input variable interactions. To remedy this bottleneck,
learned knowledge because of its fixed learning capacity [8]. a normalized distance (a Mahalanobis distance) is one of the
Dynamic fuzzy neural network (DFNN), generalized DFNN, recipes as it emphasizes the possible input variable inter-
and self-organizing fuzzy neural network were lodged as the actions [15]. We can furthermore envisage multidimensional
solution providers for the aforementioned drawbacks. These membership functions stimulating a more appealing property
ENFSs do not necessitate a complete dataset to be available delineating ellipsoidal clusters in any directions, whose axes
at hand; yet, they gather the preceding training stimuli and are not necessarily parallel to the input variable axes [14]
in turn reuse them in the next training cycles. Such strategy where it is compatible with real-time or online requirements.
intrinsically retards model updates and ensues an excessive Another pivotal shortcoming of the state-of-the-art ENFSs
utilization of memory capacity, which is not in line with the is prone with outliers, which is exemplified by dynamic evolv-
crux of efficient learning machine especially in facing a system ing neuro-fuzzy inference system (DENFIS), self-constructing
with rapid varying attributes and a vast amount of training neuro-fuzzy inference network [34], [45]. This is engendered
data. The solely realistic likelihood to obviate this demerit is to by these approaches that solely pay attention to the distance of
merely execute a single sample in every training episode—as the fuzzy rules to the newest datum making use of the firing
achieved, e.g., in [11] and [12]. Parameter adaptation and rule strength of the fuzzy rule or Euclidean distance improbably
evolution are settled in a sample-wise or single-pass manner, undoing the likelihood that the outliers are unified to be
i.e., a single sample is loaded, the fuzzy model is updated with the fuzzy rules as the outliers can be distant or outside
respect to this data point and the loaded sample is immediately to the zone of influence. The so-called Learn++ algorithm
discarded, afterward, and so on. This warrants prompt updates was proposed in [49], which is in substance adaboost-like
with low virtual memory usage. algorithm. It was expanded in [50] to deal with dynamic
Apart from automatically proliferating fuzzy rules, an ENFS number of target class and in [51] to cope with regime drifting
is much more desirable to be capable of evicting inactive nature of training samples. The state-of-the-art algorithm of
components (neurons and rules), which are no longer descrip- this machine learning type was pioneered by Street and Kim
tive to reflect newly acquired data trends so as to strike a [56] with streaming ensemble algotirhm. Another prominent
coherent tradeoff between the predictive accuracy and the work, namely dynamic weighted majority, was proposed by
model simplicity. This deficiency is indeed suffered from Kolter and Maloof [57]. It is as with Learn++ devising a
evolving Takagi–Sugeno (eTS), simplified eTS (simp_eTS), passive drift detection. It is worth stressing that Learn++
flexible fuzzy inference system (FLEXFIS) [11], [12], and [19] family emphasizes the concept of ensemble and parsimonious
disabling to switch off superfluous fuzzy rules. By extension, network based on fuzzy inference system (PANFIS) is obvi-
in the noisy environment, there arises an avenue of a training ously capable of serving as base learner in Learn++ working
data point, which may be outlier, is wrongly recruited as a framework.
new rule. Fortunately, such cluster is usually occupied with The evolving algorithm was developed under the framework
few data points, thus contributing little during its lifespan. A of type-2 fuzzy system in [52]. Reference [53] outlines the
mismatch in generating a new rule can be counterbalanced extended version of [52] not only in type-2 fuzzy system
with the so-called rule base simplification strategy for the environment, but also in recurrent network topology. Research
sake of a compact and parsimonious rule base. In addition, work [54] exhibits the implementation of EFS in the embedded
an overcomplex rule base is always conjectured as a subject system. The variant of EFS to the classification cases was
of an overfitting issue deteriorating the generalizing ability presented in [55]. Note that the enhanced versions of PANFIS
of a model and conceals rule semantics (i.e., considering a in type-2 fuzzy system, recurrent network topology, or classi-
rule base with few hundred rules). Another paramount pre- fication problems are the subjects of future investigations.
requisite of an effective ENFS to remedy these bottlenecks is
an ad hoc manner in eliminating inconsequential rules miti-
gating its structural complexity affirming its robustness to an C. Proposed Algorithm
oversized rule base or is not greedy in crafting complementary A seminal ENFS namely PANFIS is proposed herein. PAN-
rules while retaining its best predictive accuracy [13], [14]. FIS is likewise capable of starting its rehearsal process from
On the one side, a major technical flaw in most of the scratch with an empty rule base. The fuzzy rules can be
ENFSs is induced by unidimensional membership functions henceforth extracted and removed during the training process
generating hyperspherical clusters. One can approbate that based on a novelty of a new incoming training pattern and the
self-organizing fuzzy modified least square, sequential adap- contribution of an individual fuzzy rule to the system output.
tive fuzzy inference system (SAFIS), and growing–pruning A single-pass manner prevails in the training process, thus
fuzzy neural network [13], [39], [40] engage this type of fuzzy keeping the virtual memory demand on a low-level and
system. The rule evolved by this type of membership function expediting model updates. A prominent aspect in PANFIS
PRATAMA et al.: PANFIS: A NOVEL INCREMENTAL LEARNING MACHINE 57
is the building of ellipsoids in arbitrary position (respecting policy of PANFIS, which involves a rule base management.
local correlation between variables) connected with a new Section IV explores the empirical studies and discussions on
projection concept to form the antecedent parts in terms of numerous benchmark problems including artificial and real-
linguistic terms (fuzzy sets). This is opposed to [15], where world datasets. With these, the new evolving method will be
all operations are drawn directly in multidimensional level compared against various other state-of-the-art evolving neuro-
nonaxis parallel thus hardly interpretable for an expert/user. fuzzy approaches in terms of predictive quality and model
This also means that we are achieving fuzzy rules in a clas- complexity. Section V concludes this paper.
sical interpretable sense. The inference scheme is, however,
still using the high-dimensional ellipsoidal representation in II. P ROJECTION OF E LLIPSOIDAL C LUSTERS IN
arbitrary directions as follows: A RBITRARY P OSITIONS TO F UZZY S ETS
Ri = exp(−(X − Ci )i−1 (X − Ci )T ) (1) The multidimensional membership function offers a con-
venient property in abstracting the data distribution in more
where C is the center or template vector of i th rule, C ∈
natural way capable of triggerring ellipsoids in any directions
1×u , and X is an input vector of interest X ∈ 1×u . i is a
in the feature space whose axes are not necessarily parallel.
dispersion or covariance matrix i ∈ u×u of i th rule, whose
The underlying grievance is, however, to stipulate the fuzzy
elements are the spreads of the (axis parallel) multidimensional
set representation of the hyperellipsoid in arbitrary posi-
Gaussians in each direction (dimension) σki k =1, 2 …,u and
tions entailing more faithful investigations as the covariance
i =1, 2, …,r . The inference scheme is written as follows:
matrix constitutes a nondiagonal matrix thus allowing for
r
more tractable rule semantics. The projection of ellipsoidal
Ri wi
r i=1 clusters in arbitrary positions to fuzzy sets is detailed herein
y= w i φi = W = r . (2)
i=1 to produce a classical-interpretable fuzzy rule (antecedent parts
Ri of rules can then be read as linguistic IF–THEN parts). The
i=1
projection is undertaken after each incremental learning cycle,
In this TS-type NFS, W labels the output parameters Wi =
i.e., whenever a model update took place (see subsequent
k0i + k1i x 1 + · · · + kui x u , W ∈ 1×(u+1)r , and ∈ (u+1)r×1
sections), thus representing the latest model version.
epitomize cluster or generalized (fuzzy) firing strength stem-
We advocate two possibilities of extraction, one faster but
ming from the firing strength of each fuzzy rules Ri .
more inaccurate one, one slower but more accurate one,
Another novel aspect concerns the rule pruning method-
with the hope of landing on the fuzzy set representation
ology, which employs an extended rule significance (ERS)
of multivariate Gaussian function. Fig. 1 shows these two
concept, supplying the blueprint of rule contributions. It
methods.
is extended from the concept in SAFIS approach [13] by
We suppose
that Ci is the centroid of the i th fuzzy rule,
integrating hyperplanes (in lieu of constant) consequents and
whereas −1 i j is the inverse covariance matrix where i j is
generalizing to ellipsoids in arbitrary position, allowing to
the element in the i th row and j th column. Therefore, the
prune the rules directly in the high-dimensional learning space.
fuzzy set is built as follows:
Two fuzzy sets, which are similar to each other, are grouped
and are in turn blended to be one single fuzzy set exploiting υi = c i (3)
a kerned-based metric [18] in conjunction with a transparent
explanatory module (rule). r
σi = max ( √ cos(ϕ(ek , ai )) (4)
The rules are dynamically evolved based on a potential of k=1,..,u λk
the data point being learned by the use of datum significance where λk is the eigenvalue with respect to kth dimension and
(DS) criterion. The initialization of new rules is furthermore ek is for eigenvector for the corresponding kth dimension.
consummated in a new way assuring ε-completeness of the μi is the modal value of the fuzzy sets (usually the center
rule base as well as fuzzy partitions, thus leading to an where the membership value is equal to 1) and σi is the
adequate coverage of the input space. Whenever no new rule spread of the set, according to a predefined α-cut; in our case,
is evolved, the focal points and radii of the fuzzy rules are using Gaussian fuzzy sets, we apply an α-cut value of 0.6
adjusted by means of extended self-organizing map (ESOM) whose cut reaches the inflection point, thus getting equal to the
theory to update neighboring rules with a higher intensity than width sigma. Meanwhile, ai epitomizes the vector representing
rule lying farer away. i th axis ai = (0, 0, ..., 1, ...0), where the value of 1 implies
Another important facet of PANFIS is the use of enhanced the i th position. ϕ illustrates the angle spanned between ai
recursive least square (ERLS) method (an extension of con-
T
and ek
ventional recursive least square (RLS), which is widely used
e j ai
in the ENFSs community to adjust the fuzzy consequences), ϕ(ek , ai ) = arccos . (5)
e j |ai |
which has been formally proven to underpin the convergences
of the system error and so the weight vectors being updated The demerit of the first approach is computationally pro-
(a detail of this proof can be found in the appendix). hibitive especially in overcoming the high-dimensional input
The remainder of this paper is organized as follows: features as the eigenvalue of the inverse covariance matrix
Section II details the projection of arbitrary ellipsoids to should be foreseen in every training episode. In lieu of the
fuzzy sets. Section III elaborates the incremental learning former, the second approach is put forward as an alternative
58 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014
system error denoting the degree of accuracy of PANFIS in value of the width for the Gaussian function influences to be
the nth training observation, which is able to be written as averaging, too small value of it leads to be overfitting [29].
follows:
|en | = |tn − yn | (12) C. Pruning of Inconsequential Rules and Merger of Similar
yn is the output of PANFIS in the nth episode, conversely, Fuzzy Sets
tn is the target value in nth episode. By performing u-fold This section aims to explore the rule pruning and rule merg-
numerical integration, we may obtain as follows: ing adornments of PANFIS. That is, these peculiar leverages
are intended to attain an economical and interpretable rule base
det( i+1 ) u
Dn = |en | . (13) while sustaining its best predictive quality.
S(X) 1) Pruning of Inconsequential Fuzzy Rules Based on ERS
The significance of the new datum is defined as its statistical Concept: PANFIS inherits the rule base simplification technol-
contribution to PANFIS’s output. Vice versa, the premise ogy of the so-called generalized growing and pruning radial
part of PANFIS is a hyperellipsoid in arbitrary positions, basis function (GGAP-RBF), SAFIS [10], [13]. As foreshad-
whereby each membership function delineates the partition owed, its predecessor is hampered to PANFIS algorithm as it
of the input space. Without loss of generality, the sampling is only compatible to the zero-order TSK fuzzy system and
data distribution S(x) can be replaced by the total volumes of the unidimensional membership function environments. To suit
existing fuzzy rules. Therefore, (10) can be further written as it in PANFIS learning scenario, numerous customizations are
follows: made. Contribution of the i th rule to the overall system output
det( r+1 )u for an input vector X n is defined in a very similar way to the
Dn = |en | r+1 u. (14)
i=1 det( i )
method for an online input sensitivity analysis and an input
selection used in [34] by
If an observation complies with (11), the new datum suffixes
to be hired as an extraneous rule. As g < Dn , the DS method Ri (x n )
E(i, n) = |δi | = |δi | E i (17)
appraises this datum high descriptive power and generalization
r
Ri (x n )
potential. Conversely, if g > Dn , the current fuzzy rules have i=1
confirmed its completeness in seizing the available training
1
stimuli. In this regard, the ESOM theory is taken place with the Ei = exp(−(X − Cin )T i−1 (X − Cin )) d x (18)
hope of refining the positions of the current fuzzy rules. The S(X)
X
guideline in allocating the value of g is in addition illustrated
elsewhere in this paper.
u+1
where δi = w1i + w2i x 1 + · · · + wki x k + · · · + wu+1,i x u
1) Initialization of New Fuzzy Rules Based on k=1
ε-Completeness Criterion: A selection of a new antecedent outlines the output contribution of the i th fuzzy rule, whereas
fuzzy set constitutes an indispensable constituent in the E i denotes the input contribution of the i th fuzzy rules.
automatic proliferation of the fuzzy rules. It should be Furthermore, the rule significance is defined as the statistical
comprehensively formulated to achieve an adequate coverage contribution of the fuzzy rule when the number of observation
in accommodating the universe discourses. In PANFIS approaches to infinity n → ∞. Referring to [10] and [13], the
learning platform, the allocation of the new premise input contribution E i can be derived by means of (18).
parameters is enforced according to the profound concept Generally
−1
speaking, the size of the inverse covariance matrix
of the ε-completeness criterion deliberated by [25]. In this i is much less than the size of the range X. Hence, (18)
viewpoint, for any inputs in the operating range, there exists can be simplified as follows:
at least a fuzzy rule so that a match degree is no less than ε ∞
for every individual input injected. Whenever the new fuzzy 1 X2
Ei ≈ (2 exp(−( ))d x)u (19)
rule is appended as a reaction to troublesome training datum S(X) i
0
supplied, PANFIS shall plug in this input datum as the focal
point or center of the new ellipsoidal rule. Furthermore, the z
(2)u X2
size of the new ellipsoidal region is triggered by setting the Ei = ( lim exp(− )d x)u . (20)
S(X) z→∞ i
width of the Gaussian function as follows. 0
Remark: In fuzzy applications, ε is usually selected as 0.5
Via u−fold numerical integration for any arbitrary probability
[30].
density functions p(x) of the input data manifold x (x ∈ u ),
Ci+1 = X n (15) the solution of (20) can be expressed as follows:
max(|Ci − Ci−1 | , |Ci − Ci+1 |) u
diag(i+1 ) = . (16) π /2 det( i )u
ln( 1ε ) ≈ . (21)
S(X)
One should envisage the width of multidimensional mem- On the one hand, the size of the range X namely S(x) is not
bership function playing prescriptive roles to PANFIS’s final necessary to be computed. On the other hand, to extract the
output because of forming a spread of the antecedent and a proper type of S(x) is not trivial and time intensive. In fact, the
zone of influence of the i th fuzzy rule. For instance, too large sensitivity of the fuzzy rule is signified by the contribution of
60 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014
the individual fuzzy rule over the total contributions of overall Another similarity measure as in [45] can be applied in
fuzzy rules. Therefore, we can justify the size of the range lieu of kernel-based metric. The similarity measure in [45] is
S(x) to be replaced by the overall contributions of the fuzzy exploited to designate whether a new fuzzy set is tailored or
rules pinpointed by cluster volumes for the sake of flexibility. the use of current fuzzy set is sufficient as a composition of
Accordingly, the statistical contribution estimation of the fuzzy new fuzzy rule. The first condition means that only in a case
rule is doable as follows: of a perfect match, the similarity measure between two fuzzy
sets has the maximal degree of one. Conversely, the second
det( i )u condition assures that an embedded set is not coupled with a
E inf (i ) = |δi | r . (22)
set covering it having a significantly larger width, which would
det( i )u result in a too inexact representation of the data in one local
i=1
region. Two fuzzy sets are solely coalesced, if the similarity
If the contribution vector contains the fuzzy rule contribu- degree exceeds a tolerable value Sker ≥ 0.8. The threshold
tions less than equal kerr , then these fuzzy rules are classed as is chosen as 0.8 as it is already delved in [18] as a coherent
outdated fuzzy rules. They should be dispossessed to mitigate option to steer the tradeoff of dissimilarity and readability.
the rule base complexity thereby eradicating the vulnerabilities The merging itself is performed by the following formula
of overcomplex network structures gained. To the best of our [18]:
knowledge, this method may excel most of other rule pruning cnew = (max(U ) + min(U ))/2 (26)
mechanisms as they solely consider the rule contributions at σnew = (max(U ) − min(U ))/2 (27)
the time an assessment is undertaken distorting the nature that
a fuzzy rule may be a paramount ingredient to the system where U = {c A ± σ A , c B ± σ B }, the underlying construct is to
in the next training episodes inflicting unstable fluctuations reduce the approximate merging of two Gaussian kernels to the
of rule base size. This method nonetheless pays attention exact merging of two of their α-cuts for a specific value of α.
on the significance of consequent parameters demonstrating Here, we choose α = e−1/2 ≈ 0.6, which is the membership
a particular operating region of a cluster as many of other degree of the inflection points c ± σ of a Gaussian kernel.
variants relax it. Small consequent parameters incur little This also guarantees ε-completeness of the fuzzy partition as
significance to the system. cuts the outer contours of the two sets at membership value
2) Merger of Similar Fuzzy Sets: Although fuzzy sets are 0.6, thus arriving at a larger coverage span than the original
well scattered originally, they are still susceptible to be signif- sets. An example is shown in Fig. 3. This concept can be easily
icantly overlapping to each other. It is undoubtedly inflicted generalized to arbitrary fuzzy sets employing the characteristic
by the projection concept, an example is shown in Fig. 2. If spread of the sets as σ , along a specific α-cut (usually in
the membership functions are very similar to each other, they [0.4, 0.6]).
can be fused into one new membership function to obviate
fuzzy rule redundancies and in turn to support an interpretable D. Derivations and Adaptations of Fuzzy Consequent Para-
explanatory module (rule) of PANFIS. Similarity between two meters
Gaussian membership functions A and B can be found by
benefiting a kernel-based metric method comparing the centers As outlined in Section I, PANFIS plugs in ERLS method in
and widths of two fuzzy sets in one joint formula [18] as deriving and updating the fuzzy consequent parameters, which
follows: feature a synergy between a local learning approach [11] and
an extended RLS. Suppose we arrive at n = z training episode
Sker (A, B) = e−|c A −c B |−|σ A −σ B | (23) and we are interested at minimizing the locally weighted
PRATAMA et al.: PANFIS: A NOVEL INCREMENTAL LEARNING MACHINE 61
z
JL = (Y − X T wi )T i (Y − X T wi ) (28)
n=1
where i is a diagonal matrix, where its main diagonal
elements comprise φi (x k ). Meanwhile, X is formed by x e =
[1, x 1 , ..., x u ]X ∈ z×(u+1) . Thereafter, (28) can be solved
assuming the linear subsystems are loosely coupled with level
of interactions expressed by φi .
Q i (n) = (I − αL(n − 1)x eT (n))Q i (n − 1) (35)
z
JL = JLi . (29) wi (n) = wi (n − 1) + αL(n)(t (n) − x eT (n)wi (n − 1)) (36)
n=1
where we assign wi0 = 0 and Q i0 = ωI . Note that the
Finally, the optimum solution of the local subsystem wi
system error is defined by (12), whereas the approximation
that minimizes the locally weighted cost function is obtained
error ê is the distortions between the true output values and
as follows:
the predictions of PANFIS while modeling one step ahead of
wi = (X T i X)−1 X T i T = ∧ T (30) the system behavior formulated as follows:
where ∧ is the Moore–Penrose generalized inverse of the ê = |tn − n Wn−1 | . (37)
, the weight vector wi is tangible to be unique when
We ought to comprehend that when the rule growing module
T is a nonsingular matrix. Therefore, there are various
is turned on, the covariant matrix Q r+1 is set as Q r+1 =
ways to compute the Moore–Penrose generalized inverse dis-
ωI . Regarding to the above equations, they imply that the
closed by publications [31], involving orthogonal projection
adaptations of the weight vector W are disallowed when the
method, orthogonal method, iterative method, and singular
approximation error ê is less than the system error e. The
value decomposition [32] method. On the one hand, if T
consummation of the RLS method substantiates PANFIS’s
is a nonsingular matrix, the pseudoinversion or orthogonal
predictive quality, as it aids PANFIS to always strike a
projection is ideally applicable. On the other hand, if T
reasonable tradeoff of the system error and weight vector
tends to be singular, we use Tichonov regularization
convergences. The proofs are given in the appendix. Moreover,
T = ( T + α I ) = X T i X = (X T i X + α I ) (31) we also compare PANFIS’s performance employing ERLS and
default RLS in the numerical examples. Generally speaking,
where we elicit the regularization parameterα by [39]
the ERLS boosts the predictive fidelity of PANFIS when it
2λmax predicts the footprints of the system.
α≈ (32)
threshold
where threshold set to 10^15 and λmax is the largest eigenvalue. E. Sensitivity Analysis of the Preset Parameters
After the weight matrix W has been optimally assembled by
This section purposes to confer the sensitivity of PANFIS’s
the local approach, it is allowed to be polished up recursively
predefined parameters. The coherent way in setting the preset
on the fly by the ERLS method. To detail this concept, the
parameters is explored to render PANFIS more ergonomic.
point of departure is to assure the convergence of the system
Accordingly, the Box–Jenkins gas furnace dataset dissemi-
error and weight vector being adapted. nated in [46] is cast to probe the sensitivity of kerr , g in PAN-
To obviate the convergence issue in RLS deliberated in
FIS’s algorithms. This dataset is compiled 290 input/output
[33], the RLS can be extended merely inserting an addi-
data points where its input attributes are process input variable
tional constant α conferring a noteworthy effect to foster
u(k), C O2 concentration in off gas. Conversely, the process
the asymptotic convergence of the system error and weight
output y(k) denotes the output attribute. More specifically, the
vector being adapted, which acts like a binary function. As
overall 290 training stimuli feed PANFIS and the nonlinear
reciprocal impact, the consolidated constant α shall be one
dependence of the system is orchestrated as follows:
if the approximation error ê is bigger than the system error
e. Otherwise, it shall stay constant at zero implying the y(t) = f (y(t − 1), u(k − 4)). (38)
preference of the existing weight vector rather than adjusting
In the following, the parameters kerr , gare opted as 0.1, 0.01,
it. In other words, the constant α is in charge to regulate the
and 0.001. Table I encapsulates the experimental results in
current belief of the weight vector W . It can be presented
terms of the number of rules and nondimensional error index
mathematically as follows:
(NDEI).
1, e ≥ |e| Arguably, the threshold g is a noteworthy parameter in
α= (33) controlling whether PANFIS augments its structure (stability),
0, otherwise
or tunes the current structure (plasticity). Hence, the thresh-
L(n) = Q i (n − 1)x e (n)((n)−1 + x eT (n)Q i (n − 1)x e (n))−1 old g fully steers a stability–plasticity dilemma disseminated
(34) by [28], which is a cursor to notify an action to be carried
62 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014
out in the rehearsal process. On the one hand, one may TABLE II
comprehend that a lower value of g chosen would induce F ORECASTING 60 Y EARS OF S&P-500
a better predictive accuracy of PANFIS. It, however, always
lands on a more complex fuzzy rule base invoking more fuzzy
rules to be crafted thereby suppressing the structural efficiency
and transparency and vice versa. On the other hand, the lower
value of kerr yields the more frugal rule base and the higher
value of it evokes more demanding rule base assembled and
in turn solidifies the predictive quality of PANFIS. PANFIS
is in essence supposed to gain a balance between them. To
actualize a synergy between the predictive accuracy and the
compactness of the rule base, we fix kerr = 10%g, g = 10−u
usually acquiring good results with respect to the numerical
results in Table I.
IV. N UMERICAL E XAMPLES
To evaluate the efficacy of PANFIS learning policy, three
empirical studies in the real-world and synthetic streaming Note that the target value is the absolute value of the S&P-
data are carried out to evaluate the self-reorganizing and 500 daily index. Fig. 4(a) pictorially shows the plot of one-
self-correcting mechanisms of PANFIS. The streaming data step-ahead outputs of PANFIS versus the target values of
employed herein are: standard and poor (S&P) index data, the training samples. Fig. 4(b) shows the same as Fig. 4(a),
hyperplane data, and NOx emission in a car engine data. however, it merely portrays the last 200 samples. Fig. 4(c)
PANFIS is benchmarked with its counterparts in the ENFS shows the deviations between the output predictions and the
to obtain profound insight of PANFIS’s efficacy in which the true output values occurring in the rehearsal process. Fig. 4(d)
qualities of the benchmarked algorithms are assessed by virtue shows the trace of PANFIS rules while attempting to learn all
of root-mean-square error (RMSE), and NDEI written in the regimes of the training data.
compact mathematical forms as follows: From Table II, we may deduce that PANFIS show-
N cases a favorable performance, which outperforms the other
n=1 (t (n)− y(n))
2 RM S E
RM S E = √ , N DE I = . (39) approaches conferring the highest modeling accuracy while
N ST D(t) employing the most compact and parsimonious structure.
A. Prediction of S&P Index Time Series Surprisingly, the ANFIS and LR methods, which are subsumed
as the nonevolving algorithms, can marginalize or can be
This section analyzes the sustainable self-reorganizing comparable the DENFIS, simp_eTS, EFuNN, and SeroFAM
behaviors of PANFIS in addressing the S&P-500 market index predictive accuracy. It is worth stressing that these methods
dataset. The dataset encompasses 60-year daily index value, necessitate a complete dataset at hand a priori enduring a
which can be found in the Yahoo! Finance website. A total of retraining phase benefiting from an up-to-date dataset when-
14 893 data were acquired from January 3, 1950 to March 12, ever they gather new training stimuli and imposing the explo-
2009. Importantly, this dataset is frequently unable to be mod- sion of number of parameters stored in memory.
eled with the traditional NFSs, which have a prefixed structure, Note that the total parameter shown in Table II encapsulates
because of the nonlinear, erratic and time-variant behaviors of the number of rule base parameters and training data made
the underlined dataset. All the training data embark to PANFIS use to model updates whereas the rule base parameter implies
algorithm to pursue the self-correcting mechanism of PANFIS the parameters invoked to form fuzzy rules mainly affected
in recognizing the underlying characteristics of the S&P-500 by number of input dimensions and fuzzy rules. Conversely,
index time series dataset. PANFIS is benchmarked against the evolving method is remarkable favoring to learn any data
its counterparts such as DENFIS, eTS, simp_eTS, adaptive points once in time without underpinned by the expert knowl-
network based on fuzzy inference system (ANFIS), linear edge. One may argue no substantial deterioration of LR’s
regression (LR), evolving fuzzy neural network (EFuFNN), predictive accuracy when smaller number of training data is
and self organizing fuzzy associative machine (SeroFAM) maintained in the memory. LR method is, even so, incompetent
[11], [29], [34]–[36], [47] so as to benchmark the superi- to commence its learning process with single training datum
ority of PANFIS algorithm contrasted with the state-of-the- thus disallowing viability in truly online situation usually only
art works. Apart from that, the ERLS and RLS methods are small snapshot of training data available at hand before the
contrasted to abstract the merit of the ERLS against RLS process runs.
methods and in turn to illustrate the effect of the ERLS method Generally speaking, SeroFAM and EFuNN are inferior to
in fortifying PANFIS’s performance. Table II tabulates the other algorithms. Obviously, this condition is stimulated as
consolidated results of all benchmarked systems. both approaches adopt Mamdani-based fuzzy system, whose
The input and output relationship of the system is governed premise and consequent parameters comprise the membership
by the following equations: functions. It is tangible that the TSK fuzzy system employs
y(t + 1) = f (y(t − 4), y(t − 3), y(t − 2), y(t − 1), y(t)). (40) precise mathematical models in the consequent part, which
PRATAMA et al.: PANFIS: A NOVEL INCREMENTAL LEARNING MACHINE 63
(a) (b)
(c) (d)
Fig. 4. S&P-500 dataset. (a) and (b) Modeling of the S&P-500. (c) System error. (d) Fuzzy rule evaluation in one of trials.
arguably incur higher predictive accuracy. Nevertheless, the by means of synthetic stream generator from the massive
trait of Mamdani-based fuzzy system is fairly endorsed, which online analysis [44]. In a nutshell, this task embodies a binary
yields more transparent rule semantics owing to endued by the classification problem laying out a random hyperplane in
fully linguistic rules. d-dimensional Euclidean space as a decision boundary. In
For clarity, the evolving layer in the EFuFNN solely self- the following, the point x i = (x 1 , ..., x d ) is placed in the
reorganizes [48] imposing the worst performance. An in- hyperplane, if it is in line with the following equation:
depth look of multidimensional membership function may
bear to additional parameters to dwell in the memory. This
d
wi x i = w0 . (41)
type can, nonetheless, trigger a more natural representation
i=1
of ellipsoidal clusters whose axes are not necessarily parallel
to input axes. One may envision the number of parameters In general, the hyperplane can establish a binary classifica-
hinging to the number of rules bred where the number of
d
tion problem in which wi x i > w0 pinpoints a positive
network parameters of PANFIS is still manageable in this i=1
paper case, notwithstanding simp_eTS consumes a smaller
d
number of parameters than PANFIS. It is worth noting that class, whereas wi x i < w0 approbates a negative class.
i=1
simp_eTS devised with unidimensional membership function The landmark of this dataset interestingly manifests a drift
flourishing spherical clusters. in which the underlying characteristic is to blend two pure
It can be seen in Fig. 4 that PANFIS is capable of swiftly distributions in probabilistic fashion. It can be delved that
addressing the nonstationary component of the training data the data are emanated by the first distribution with the prob-
automatically recruiting and evicting fuzzy rules in a timely ability of one at the beginning. Henceforth, this probability
fashion. A trivial analogy is the evolution of individuals in attenuates and in turn lands on the second distribution. In
nature during life cycle, specifically the autonomous mental connection with the experimentation, the data streams entangle
development of humans, which usually starts from scratch in sum 120k points where the drift is asserted after 40k
without any knowledge and important information can be samples.
memorized in a form of linguistic rules afterward. Vice versa, In this viewpoint, we accomplish a periodic hold-out
outdated knowledge can be forgotten without substantially test to figure out the self-reorganizing property of PANFIS
affecting their developments. More interestingly, PANFIS swiftly coping with the drift in the streaming examples.
mimics the true output values closely in which the system error That is, the first 1200 data samples are embarked in the
hovers around quiet small values. In the following, the ERLS first experiment in which 1000 data points are enumerated
method navigates to a substantial improvement for PANFIS to training process and 200 data are planned to a valida-
predictive fidelity in which it weakens when PANFIS exploits tion phase. In the second experiment, the next 1200 data
the standard RLS method. Indeed, the rule merging strategy of are cast and the portion of training and testing data are
PANFIS is beneficial to diminish the number of fuzzy sets and tantamount to the first experiment, and so on. Our brand
to warrant the transparency of the rule base. In contrast, other new algorithm is benchmarked with its counterparts such as
algorithms except SeroFAM in this circumstance exclude this ANFIS [47], eTS [11], simp_eTS [29], and FLEXFIS+ [18].
component. Fig. 5 shows the trace of the classification rate, the rule
evolution in the 41st experiment and the number of fuzzy
B. Hyperplane Data Streams rules scattered in each hold-out test. Meanwhile, Table III
This paper case alludes the viability of PANFIS in the summarizes the consolidated results of the benchmarked
classification problem using artificially generated data streams system.
64 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014
(a) (b)
(c)
Fig. 5. Hyperplane dataset. (a) Trace of the classification rate in each trial. (b) Fuzzy rule evolution in one of trials. (c) Fuzzy rule evolution of whole
process.
TABLE III
virtue of this datum. This rule probably does not contribute
M ODELING OF H YPER -P LANE
significantly during its lifespan, it is therefore a subject of rule
pruning module. The assembled rule base is not sufficiently
mature in the early training episodes thus leading to high
system errors in learning the training samples and breeding
the extraneous fuzzy rules. These rules, are later on inactive,
the rule pruning module is hence activated to evict these rules.
Fig. 5(b) shows the 41st periodic hold-out process where
the drift commences to ensue. That is, the changing data
distribution evokes that the previously appended rules are
invalid to delineate the current data trend. As more training
stimuli are injected, the rule base evolution ends up more
stable rule evolution where the rule growing and pruning do
not ensue too frequently. That is, the refinements of input and
output parameters are merely carried out as the current rule
From Table III, one may appraise PANFIS sheds a fruitful base is already representative to capture the training data.
impetus in learning in the nonstationary environment context An in-depth look elicits that the performance of PANFIS
in which PANFIS provides more convincing performance worsens in the 41st trial owing to drift. As the rule base
than FLEXFIS+, eTS, and simp_eTS. It is worth noting that is refurbished in the next training episode, the performance
PANFIS holds the best predictive accuracy and salvages the of PANFIS is boosted, which is in line with the increase of
rule base cost. More specifically, Fig. 5(b) shows the automatic the classification rates in the next experiments. By extension,
rule growing and pruning of PANFIS, which is effectual to Fig. 5 also shows the overview of number of fuzzy rules in
prevail drifts contained by the system without the detrimental every trial, which endorses the flexibility of PANFIS learning
costs to the model evolution. The rule pruning mechanisms, engine as it is capable of orchestrating its rule base size
in the following, are triggered in the early episodes of training according to the knowledge recognized. It is worth stressing
process. PANFIS commences its training process with the that ANFIS is inherently an offline method and depletes much
first training datum and the first fuzzy rule is crafted by training overhead emphasized by considerable parameters
PRATAMA et al.: PANFIS: A NOVEL INCREMENTAL LEARNING MACHINE 65
TABLE IV
M ODELING OF NOx E MISSIONS D ATASET
number saved in the memory, notwithstanding it excels the From Table IV, PANFIS showcases the best result in which
simp_eTS and FLEXFIS+, which are fully online algorithms. PANFIS can accurately resemble the footprints of 159 testing
In contrast, the simp_eTS manipulates unidimensional mem- data, while swiftly retaining the most frugal rule base. On the
bership functions, albeit it deposits the smaller number of one side, FAOS-PFNN and ANFIS, which are not devised to an
parameters than that PANFIS. online learning purpose, can land on the competitive modeling
quality as PANFIS. Nevertheless, we contend oversized rule
base yielded by these methods due to the absence of forgetting
C. Modeling of NOx Emission on Car Engine Tests mechanism in inconsequential fuzzy rules. On the other side,
The last study case is to model a real-world dataset driven FLEXFIS strikes the second best quality in terms of the
from tests with a real car [41]. Interestingly, the car engine structural load and in prognosticating the dynamic of NOx
data implicitly comprise various process changes in form of emission. One may discern that FLEXFIS undermines the
different operation modes, occurring in a wide range of the logic of online learning scenario because of a compulsory
engine operation map. In particular, the two essential main pretraining process with the use of some recorded data.
influencing factors for engine control, namely rotation speed
V. C ONCLUSION
and torque were not kept in a specific driving mode range (i.e.,
constantly to simulate driving on a motor highway), but from This paper explores an agile ENFS termed as PANFIS. The
time to time varied for simulating different driving behaviors. efficacy and viability of PANFIS prototype have been exem-
This brings in some sort of changing dynamics of the car plified through three study cases in miscellaneous benchmark
engine behavior, also with respect to NOx emissions and thus problems. Nonetheless, the comparisons against numerous
is expected to be modeled by static models (not being able state-of-the-art works have been enforced, which astonishingly
to adapt to new system states) with insufficient accuracy. We infers PANFIS as an overwhelming breakthrough to the evolv-
will also verify this when comparing our novel method with ing neuro-fuzzy field marginalizing its counterparts predictive
nonevolving fuzzy systems/models. accuracy and structural complexity facets. As a future work,
Nonetheless, the main task of PANFIS in this problem is the application of PANFIS in dealing with the classification
to predict NOx emissions from the exhaust in a car engine case is the subject of our further investigation.
by virtue of four input attributes as follows: N: engine
rotation speed (rpm). P2: pressure offset in cylinders (bar). VI. A PPENDIX
T : engine output torque (Nm). Nd: speed of dynamometer 1) Proof of System Error Convergence
(rpm). The nonlinear dependence of the system is regulated Theorem 1: If a nonlinear system is stable and it is desired
by the following equation: to be modeled by benefiting PANFIS, it will always deliver a
small finite value of the system error [6].
Nox n = [Nn−4 , P2n−5 , Tn−5 , Ndn−5 ]. (42) Proof: Assuming that the approximation error ê converges
at n ≥ n 0 , the approximation error always
complies with both
This dataset comprises 826 data pairs in which 667 data
of two constrains [7], [46] limn→∞ ê = 0, max ê(n) ≤ δ
are captured for the training purpose, whereas 159 of which
with the use of ERLS method, where δ is an arbitrary small
are injected to test the generalizing ability of the produced
constant. The system error is defined in (12). Conversely, the
rule base. By extension, PANFIS is benchmarked against its
counterparts such as eTS, Simp_eTS, ANFIS, FAOS-PFNN, approximation error is written in (39). The absolute value of
the system error is illustrated as follows:
and bayesian art fuzzy inference system [11], [12], [29], [37],
[38], [47] in which their predictive accuracies are assessed by |en | = |t
n − Wn n | (A1)
the RMSE. Table IV tabulates the consolidated results of all
= I − L nT x n |tn − Wn−1 n | (A2)
benchmarked system in the withheld evaluation set of 159 data
points whereas Fig. 6 displays the evolution of fuzzy rules in ên
= ≤ δ. (A3)
the training phase. 1 + n x e Q i,n−1 x e
T
66 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014
It should be emphasized that if 1 + n x eT Q i,n−1 x e is Finally, we are able to express (B2) as the following inequality:
bounded and is not equal with −1 for throughout training √
Wn − Wn−1 1 ≤ Mλmax Q 1 b ên . (B9)
process, it can be derived as follows:
As highlighted in (A6) lim n→∞ ê = 0. Hence, it follows that
= |en | 1+n x eT Q i,n−1 x e = ên ≤ 1+n x eT Q i,n−1 x e δ.(A4) the subtraction of the weight vector in the two consecutive time
To guarantee that (35) is encountered during the training instants also approaches to zero as the time reach infinity or
process, it leads to: is bounded to a small value.
1 ≤ 1 + n x eT Q i,n−1 x e ≤ S (A5) lim Wn − Wn−1 ∼
= 0. (B10)
n→∞
where S is a positive constant, Hence, we may infer that Theorem 2 is proven.
because of the arbitrary small
value δ, 1 + n x eT Q i,n−1 x e δ is certainly an arbitrary small
value. Therefore, we acquire that the approximation error is ACKNOWLEDGMENT
convergent: This paper reflects only the authors’ views.
lim ê ∼ = 0. (A6)
n→∞ R EFERENCES
Herewith, the Theorem 1 is proven. [1] L. A. Zadeh, “Soft computing and fuzzy logic,” IEEE Softw., vol. 11,
no. 6, pp. 48–56, Nov. 1994.
1) Proof of Parameter Convergence [2] L. T. Whye and Q. Chai, “eFSM-A novel online neural fuzzy semantic
Theorem 2: If a nonlinear system modeled by PANFIS is memory model,” IEEE Trans. Neural Netw., vol. 21, no. 1, pp. 136–157,
stable, the weight vector of PANFIS adapted by the ERLS Jan. 2010.
[3] E. Lughofer, Evolving Fuzzy Systems—Methodologies, Advanced Con-
approach will be bounded to a finite vector as time approaches cepts and Applications. New York, NY, USA: Springer-Verlag, 2011
infinity [7]. [4] W. A. Farag, V. H. Quintana, and G. Lambert-Torres, “A genetic-based
Proof: Using l1 matrix norm, we consider two consecutive neuro-fuzzy approach for modeling and control of dynamical systems,”
IEEE Trans. Neural Netw., vol. 9, no. 5, pp. 756–767, Sep. 1998.
time instants tand t − 1, which are detailed as follows: [5] E. Lughofer and P. Angelov, “Handling drifts and shifts in on-line data
streams with evolving fuzzy systems,” Appl. Soft Comput. (Elsevier),
wi,n − wi,n−1 1 = L n tn − x e,n
T
wi,n−1 vol. 11, no. 2, pp. 2057–2068, 2011.
1
[6] S. Wu and M. J. Er, “Dynamic fuzzy neural networks-a novel approach
= Q i,n x e,n ên 1≤ Q i,n 1
x e,n 1
ên 1 to function approximation,” IEEE Trans. Syst. Man Cybern., Part B,
Cybern., vol. 30, no. 2, pp. 358–364, Apr. 2000.
(B1) [7] G. Leng, G. Prasad, and T. M. McGinnity, “An on-line algorithm for
creating self-organizing fuzzy neural networks,” Neural Netw., vol. 17,
where no. 10, pp. 1477–1493, 2004.
√ √ [8] R. M. French, “Catastrophic forgetting in connectionist networks,” in
Q i,n 1
≤ M Q i,n 2
ρ(Q i,n
H Q )=
i,n Mλmax (Q i,n ) Encyclopedia of Cognitive Science, vol. 1. L. Nadel, Ed. London, U.K.:
(B2) Nature Publishing Group, 2003, pp. 431–435.
[9] E. Lughofer and P. Angelov, “Handling drifts and shifts in on-line data
λmax (Q i,n ) is the maximum eigenvalue of the Hermittian streams with evolving fuzzy systems,” Appl. Soft Comput. (Elsevier),
vol. 11, no. 2, pp. 2057–2068, 2011.
matrix Q i,n . From (34), Q i,n Pn may be presented as follows: [10] G.-B. Huang, P. Saratchandran, and N. Sundararajan, “A generalized
growing and pruning RBF (GGAP-RBF) neural network for function
Q i,n x e,n = Q i,n−1 x e,n (n−1 + x e,n
T
Q i,n x e,n )−1 . (B3) approximation,” IEEE Trans. Neural Netw., vol. 16, no. 1, pp. 57–67,
Feb. 2005.
As
foreshadowed in Appendix A, 1 ≤ [11] P. Angelov and D. Filev, “An approach to online identification of
1 + n x T Q n−1 x e,n ≤ S thereby leading to: Takagi-Sugeno fuzzy models,” IEEE Trans. Syst., Man, Cybern., Part
e,n
B, Cybern., vol. 34, no. 1, pp. 484–498, Feb. 2004.
= Q i,n−1 x e,n (n−1 + x e,n
T
Q i,n x e,n )−1 ≤ Q i,n−1 x e,n . (B4) [12] E. Lughofer, “FLEXFIS: A robust incremental learning approach for
evolving Takagi-Sugeno fuzzy models,” IEEE Trans. Fuzzy Syst., vol. 16,
After that, Q n may be written as follows: no. 6, pp. 1393–1410, Dec. 2008.
[13] H. J. Rong, N. Sundararajan, G. B. Huang, and P. Saratchandran,
Q i,n x e,n Q i,n−1 x e,n “Sequential adaptive fuzzy inference system (SAFIS) for nonlinear
Q i,n = ≤ . (B5) system identification and time series prediction,” Fuzzy Sets Syst.,
x e,n x e,n vol. 157, no. 9, pp. 1260–1275, 2006.
[14] J. A. Dickerson and B. Kosko, “Fuzzy function approximations with
Thus, Q n always complies with the following condition: ellipsoidal rules,” IEEE Trans. Syst., Man, Cybern. Part B, Cybern.,
vol. 26, no. 4, pp. 542–560, Aug. 1996.
Q i,n ≤ Q i,n−1 ≤ ... ≤ Q i,1 . (B6) [15] A. Lemos, W. Caminhas, and F. Gomide, “Multivariable Gaussian
evolving fuzzy modeling system,” IEEE Trans. Fuzzy Syst., vol. 19,
By a utilization of (B2), Q n can be formed as a function of no. 1, pp. 91–104, Feb. 2011.
the largest eigenvalue as follows: [16] P. Angelov, “Evolving Takagi-Sugeno fuzzy systems from data streams
√ √ √ (eTS+),” in Evolving Intelligent Systems: Methodology and Applications,
Mλmax Q i,n ≤ Mλmax Q i,n−1 ≤ ... ≤ Mλmax Q i,1 . P. Angelov, D. Filev, and N. Kasabov Eds., New York, NY, USA: Willey,
Apr. 2010, pp. 21–50.
(B7) [17] M. Setnes, “Simplification and reduction of fuzzy rules,” Interpretability
Issues In Fuzzy Modeling. Studies in Fuzziness and Soft Computing,
We suppose that, at time t, the upper bound of x e is b. J. Casillas, O. Cordón, F. Herrera, and L. Magdalena, Eds. vol. 128.
New York, NY, USA: Springer-Verlag, 2003, pp. 278–302.
u+1
[18] E. Lughofer, J.-L. Bouchot, and A. Shaker, “On-line elimination of local
x e,k = b. (B8) redundancies in evolving fuzzy systems,” Evolving Syst., vol. 2, no. 3,
k=1 pp. 380–387, 2011.
PRATAMA et al.: PANFIS: A NOVEL INCREMENTAL LEARNING MACHINE 67
[19] E. H. Mamdani and S. Assilian, “An experiment in linguistic synthesis [47] J.-S. R. Jang, “ANFIS: Adaptive-network-based fuzzy inference system,”
with fuzzy logic controller,” Int. J. Man-Mach. Stud., vol. 7, no. 1, IEEE Trans. Syst. Man. Cybern. Part B, Cybern., vol. 23, no. 3,
pp. 1–13, 1975. pp. 665–684, May/Jun. 1993.
[20] W. Pedrycs, “An identification algorithm in fuzzy relational system,” [48] M. J. Watts, “A decade of Kasabov’s evolving connectionist systems: A
Fuzzy Sets Syst., vol. 13, no. 2, pp. 153–167, 1984. review,” IEEE Trans. Syst. Man Cybern. C, Appl. Rev., vol. 39, no. 3,
[21] P. Angelov and R. Yager, “A new type of simplified fuzzy rule-based pp. 253–269, May 2009.
systems,” Int. J. General Syst., vol. 41, no. 2, pp. 163–185, 2011. [49] R. Polikar, L. Udpa, S. S. Udpa, and V. Honavar, “Learn++: An
[22] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its incremental learning algorithm for supervised neural networks,” IEEE
appfications to modeling and control,” IEEE Trans. Syst. Man Cybern., Trans. Syst., Man Cybern. Part C, Appl., vol. 31, no. 4, pp. 497–508,
vol. 15, no. 1, pp. 116–132, Jan./Feb. 1985. Nov. 2001.
[23] J. L. Castro and M. Delgado, “Fuzzy systems with defuzzification [50] M. D. Muchbailer, A. Topalis, and R. Polikar, “Learn++.NC: Combining
are universal approximators,” IEEE Trans. Syst., Man Cybern., Part B, ensemble of classifiers with dynamically weighted consult-and-vote for
Cybern., vol. 26, no. 1, pp. 149–152, Feb. 1996. efficient incremental learning of new classes,” IEEE Trans. Neural Netw.,
[24] L. Wang, “Fuzzy systems are universal approximators,” in Proc. IEEE vol. 20, no. 1, pp. 152–168, Jan. 2009.
Int. Conf. Fuzzy Syst., Mar. 1992, pp. 1163–1170. [51] R. Elwell and R. Polikar, “Incremental learning of concept drift in non-
[25] C. C. Lee, “Fuzzy logic in control systems: Fuzzy logic controller. stationary environments,” IEEE Trans. Neural Netw., vol. 22, no. 11,
II,” IEEE Trans. Syst., Man, Cybern., vol. 20, no. 2, pp. 404–436, pp. 1517–1531, Oct. 2011.
Mar./Apr. 1990. [52] C. F. Juang and Y. W. Tsao, “A self-evolving interval type-2 fuzzy neural
[26] T. Kohonen, “Self-organized formation of topologically correct feature network with on-line structure and parameter learning,” IEEE Trans.
maps,” Biol. Cybern., vol. 43, no. 1, pp. 59–69, 1982. Fuzzy Syst., vol. 16, no. 6, pp. 1411–1424, Dec. 2008.
[27] B. Vigdor and B. Lerner, “The Bayesian ARTMAP,” IEEE Trans. Neural [53] C. F. Juang, R. B. Huang, and Y. Y. Lin, “A recurrent self-evolving
Netw., vol. 18, no. 6, pp. 1628–1644, Nov. 2007. interval type-2 fuzzy neural network for dynamic system processing,”
[28] S. Grossberg, Studies of Mind and Brain: Neural Principles of Learning, IEEE Trans. Fuzzy Syst., vol. 17, no. 5, pp. 1092–1105, Oct. 2009.
Perception, Development, Cognition, and Motor Control. Boston, MA, [54] C. F. Juang, T. C. Chen, and W. Y. Cheng, “Speedup of implementing
USA: Reidel, 1982. fuzzy neural networks with high-dimensional inputs through parallel
[29] P. Angelov and D. Filev, “A simplified method for learning evolving processing on graphic processing units,” IEEE Trans. Fuzzy Syst.,
Takagi-Sugeno fuzzy models,” in Proc. IEEE Int. Conf. Fuzzy Syst. vol. 19, no. 4, pp. 717–728, Aug. 2011.
FUZZ, May 2005, pp. 1068–1073. [55] G. D. Wu, Z. W. Zhu, and P. H. Huang, “A TS-type maximizing-
[30] S. Chiu, “Fuzzy model identification based on cluster estimation,” J. discriminability-based recurrent fuzzy network for classification prob-
Intell. Fuzzy Syst., vol. 2, no. 3, pp. 267–278, 1994. lems,” IEEE Trans. Fuzzy Syst., vol. 19, no. 2, pp. 339–352, Apr. 2011.
[31] C. R. Rao and S. K. Mitra, Generalized Inverse of Matrices and its [56] W. N. Street and Y. Kim, “A streaming ensemble algorithm (SEA) for
Applications. New York, NY, USA: Wiley, 1971. large-scale classification,” in Proc. Int. Conf. Knowl. Discovery Data
[32] G. Huang, Q. Zhu, and C. Siew, “Extreme learning machine: Theory Mining, 2001, pp. 377–382.
and applications,” Neurocomputing, vol. 70, pp. 489–501, Dec. 2006. [57] J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: An
[33] K. J. Astrom and B. Wittenmark, Adaptive Control, 2nd ed. Reading, ensemble method for drifting concepts,” J. Mach. Learn. Res., vol. 8,
MA, USA: Addison Wesley, 1995. pp. 2755–2790, Dec. 2007.
[34] N. Kasabov and Q. Song, “DENFIS: Dynamic evolving neural-fuzzy
inference system and its application for time series prediction,” IEEE
Trans. Fuzzy Syst., vol. 10, no. 2, pp. 144–154, Apr. 2002.
[35] N. K. Kasabov, “Evolving fuzzy neural networks for super-
vised/unsupervised online knowledge-based learning,” IEEE Trans. Mahardhika Pratama (S’12) was born in Surabaya,
Syst., Man, Cybern., Part B, Cybern., vol. 31, no. 6, pp. 902–918, Indonesia. He received the B.Eng (Hons.) degree in
Dec. 2001. electrical engineering from the Sepuluh Nopember
[36] T. Javan and Q. Chai, “A BCM theory of meta plasticity for online Institute of Technology, Surabaya, in 2010, and the
self reorganizing fuzzy-associative learning,” IEEE Trans. Neural Netw., M.Sc. degree in computer control and automation
vol. 21, no. 6, pp. 985–1003, Jun. 2010. from Nanyang Technological University, Singapore,
[37] N. Wang, M. J. Er, and X. Meng, “A fast and accurate self organiz- in 2011. He is currently pursuing the Ph.D. degree
ing scheme for parsimonious fuzzy neural network, Neurocomputing, with the University of New South Wales, South
vol. 72, nos. 16–18, pp. 3818–3829, 2009. Wales, Australia.
[38] R. J. Oentaryo, M. J. Er, L. San, L.-Y. Zhai, and X. Li, “Bayesian His current research interests include machine
ART-based fuzzy inference system: A new approach to prognosis of learning, computational intelligence, evolutionary
machining process,” in Proc. IEEE Annu. Conf. Prognostic Health Soc., computation, fuzzy logic, neural network, and evolving adaptive systems.
Jun. 2011, pp. 1–10. Dr. Pratama has achieved the Prestigious Engineering Achievement Award
[39] H. Han and J. Qiao, “A self-organizing fuzzy neural network based on a from the Institute of Engineer, Singapore. He received the Best and Most
growing-and-pruning algorithm,” IEEE Trans. Fuzzy Syst., vol. 18, no. 6, Favorite Final Project. He has been nominated to Who is Who in the World
pp. 1129–1143, Dec. 2010. by Marquis in 2013. He is a member of the IEEE Computational Intelligent
[40] J. D. J. Rubio, “SOFMLS: Online self-organizing fuzzy modified Society and the IEEE System, Man and Cybernetic Society, and Indonesian
least squares network,” IEEE Trans. Fuzzy Syst., vol. 17, no. 6, Soft Computing Society and is an active reviewer in some top journals, such
pp. 1296–1309, Dec. 2009. as IEEE T RANSACTIONS ON S YSTEM , M AN AND C YBERNETICS PART-B:
[41] E. Lughofer, V. Macian, C. Guardiola, and E. P. Klement, “Identifying C YBERNETICS , Neurocomputing, and Applied Soft Computing.
static and dynamic prediction models for NOx emission with evolving
fuzzy system,” Appl. Soft Comput., vol. 11, no. 2, pp. 2487–2500, 2011.
[42] C.-T. Lin, and C. S. G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy Syn-
ergism to Intelligent Systems. Upper Saddle River, NJ, USA: Prentice-
Hall, 1996. Sreenatha G. Anavatti received the Bachelor of
[43] N. K. Kasabov and Q. Song, “DENFIS: Dynamic evolving neural-fuzzy Engineering degree in mechanical engineering from
inference system and its application for time-series prediction,” IEEE Mysore University, Mysore, India, in 1984, and
Trans. Fuzzy Syst., vol. 10, no. 2, pp. 144–154, Apr. 2002. the Ph.D. degree in aerospace engineering from
[44] A. Bifet, G. Holmes, R. Kirkby, and B. P. Fahringer, “MOA: Mas- the Indian Institute of Science, Bangalore, India, in
sive online analysis,” J. Mach. Learn. Res., vol. 11, pp. 1601–1604, 1990.
May 2010. He is currently a Senior Lecturer with the School
[45] C. F. Juang and C. T. Lin, “An on-line self-constructing neural fuzzy of Engineering and Information Technology, Uni-
inference network and its applications,” IEEE Trans. Fuzzy Syst., vol. 6, versity of New South Wales, Australian Defense
no. 1, pp. 12–32, Feb. 1998. Force Academy, South Wales, Australia. His cur-
[46] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward rent research interests include control systems, flight
networks are universal approximators,” Neural Netw., vol. 2, no. 5, dynamics, robotics, aeroelasticity, artificial neural networks, fuzzy systems,
pp. 359–366, 1989. and unmanned systems.
68 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 1, JANUARY 2014
Plamen P. Angelov (SM’04) is with the Com- Edwin Lughofer received the Ph.D. degree from
putational Intelligence and Coordinator with the the Department of Knowledge-Based Mathematical
Intelligent Systems Research, Infolab21, Lancaster Systems, University of Linz, Linz, Austria.
University, Lancaster, U.K. He has internationally He is currently a Post-Doctoral Fellow with the
recognized pioneering results into on-line and evolv- University of Linz. He has participated in several
ing methodologies and algorithms for knowledge research projects on both the European and national
extraction in the form of human-intelligible fuzzy levels. He has published more than 80 journal and
rule-based systems and autonomous machine learn- conference papers in the fields of evolving fuzzy sys-
ing. His current research interests include the com- tems, machine learning and vision, clustering, fault
petitiveness of the industry, defense and quality of detection and human-machine interaction, including
life. a monograph on Evolving Fuzzy Systems (Springer,
He is the founding Editor-in-Chief of the Springer’s journal on Evolving Heidelberg, 2011) and an edited book on Learning in Non-Stationary Envi-
Systems and serves as an Associate Editor of several other international ronments (Springer, New York, 2012).
journals. He also chairs annual conferences organized by the IEEE as a Dr. Lughofer acts as a reviewer in peer-reviewed international journals and
Visiting Professor in Brazil, Germany, Spain. He is a Chair of the Technical as co-organizer of special sessions and issues at international conferences
Committee on Evolving Intelligent Systems, Systems, Man and Cybernetics and journals. He is an Editorial Board Member and an Associate Editor
Society, IEEE. He is a member of the U.K. Autonomous Systems National of the International Springer Journal “Evolving Systems” and the Elsevier
TC, the Autonomous Systems Study Group, NorthWest Science Council, U.K., Journal “Information Fusion,” and the European Society for Fuzzy Logic and
and the Autonomous Systems Network of the Society of British Aerospace Technology Working Group on Learning and Data Mining. Currently, he serves
Companies. He was recognized by The Engineer Innovation and Technology as a key researcher for the national K-project “Process Analytical Chemistry”
Award in 2008 in two categories, Aerospace and Defense, and The Special (18 partners).
Award.