Fuzzy Entropy Theorem PDF
Fuzzy Entropy Theorem PDF
BART KOSKO
Department of Electrical Engineering-Systems, Signal and Image Processing Institute,
Universityof Southern California, Los Angeles, California 90089-2564
ABSTRACT
A sum of real numbers equals the mutual entropy of a fuzzy set and its complement
set. A “fuzzy” or multivalued set is a point in a unit hypercube. Fuzzy mutual (Kullback)
entropy arises from the logarithm of a unique measure of fuzziness. The proof uses the
logistic map, a diffeomorphism that maps extended real space onto the fuzzy cube
embedded in it. The logistic map equates the sum of a vector’s components with the
mutual entropy of two dual fuzzy sets. Diffeomap projection offers a new way to study
the fuzzy structure of real algorithms.
Any sum of real numbers x1,. . . , x, equals the fuzzy mutual entropy of
fuzzy set F in the unit hypercube [O,11’:
can replace the two H terms in (1) with an entropy operator 3.? applied to
fuzzy set F:
iq=iqF). (2)
i-l
The operator X replaces each sum with the value of a map from fuzzy sets
to real numbers.
The infinity “corners” of -w and 03 in R” correspond to the 0 - 1
vertices in I”. The origin in R” corresponds to the midpoint of I”, the
unique fuzzy set F such that F = FC = F n F” =F U F’. The next three
sections review the needed fuzzy information theory and develop the new
measure of fuzzy mutual entropy. Section 5 proves (1).
llA-BI)P=(IA-B*IIP+IIB*-BIIP (3)
ADDITION AS FUZZY MUTUAL ENTROPY 275
IMP = 2 IXjlP.
i=l
(4)
B*=AnB. (5)
AnB=(f +)
AUB=(+ b)
A"=(: $)
AuA"=($ f).
276 B. KOSKO
Note that A L4” # 0 and A UA” #X in this sample and for all fuzzy sets
A. Aristotle’s bivalent “law” of noncontradiction and excluded middle no
longer hold. They hold only to some degree. They hold 100% only for the
bit vectors at cube vertices. They hold 0% at the cube midpoint when
A =A”. For fit vectors between these extremes they hold only to some
degree. The next section shows how the overlap term A nAc and underlap
term A UA” give a unique measure of the fuzziness [5] or entropy of A.
If A and B are not fuzzy sets, then the 100% subsethood relation
A cl3 holds if and only if ai G bi for all i. It still holds if A and B are fuzzy
sets: S(A, B) = 1 iff ai <bi. Then all of B’s 100% subsets define a
hyperrectangle in I” with a long diagonal that runs from the origin to
point B. X4, B) = 1 iff A lies in or on this hyperrectangle, the fuzzy
power set of B, P’(2B>. S( A, B) < 1 iff A lies outside the hyperrectangle.
The closer A lies to the hyperrectangle, the larger the value S(A, B). The
minimum distance lies between A and B*, the 100% subset of B closest
to A in any 1P metric [8]. This distance gives the 1J’“orthogonal” projec-
tion of A onto F(2B) shown in Figure 1 and gives the term IIA -B*I(’ in
the general IP-Pythagorean theorem (3).
\
\
0 = (0,O) Xl
(Xl) = (1.0)
i i
Fig. 1. Pythagorean geometry of the subsethood theorem of fuzzy sets.
ADDITION AS FUZZY AUNT ENTROPY 277
IfA=(~~)andB=f~~),thenS(A,B)=~/~=~andS(B,A)=~/~=
$. So B is more a subset of A than A is of B.
The derived ratio in (9) has the same form as the conditional probability
P(B/A). In general the event probability P(A) is the degree to which the
sample space X is a subset of its own subset or event A, P(A) = SCX, A).
This looks like the identity P(A) =P(A/X). The subsethood theorem (9)
also implies that the whole-in-the-part term S(X, A) gives the relative
frequency n,,/n if A denotes a bit vector with nA Is or successes and with
n -nA OSor failures: S(X, A) =c(A nX)/c(X>=c(A)/c(X> =n,/n.
The subsethood theorem (9) also implies S({xJ, A) = ai since the single-
ton set {xi) maps to the unit bit vector (0 . . . 0 10 . . . 0) with a I in the ith
slot and OS elsewhere and since A =(Ui,. . . , a,>. Then CC{XiI) = 1 and
&xi} nA) =ui. So S((X,}, A) =ui and subsethood formally subsumes
elementhood.
278 B. KOSKO
Maps between unit cubes define fuzzy systems; S: I” -+ IP. Fuzzy systems
associate output fuzzy sets with input fuzzy sets and so generalize if-then
rules. Fuzzy systems are uniformly dense in the space of continuous
functions [91: a fuzzy system can approximate any real continuous (or Bore1
measurable) function on a compact set to any degree of accuracy. The
fuzzy system contains fuzzy rules of the form IF X=A, THEN Y =B that
associate an output fuzzy set B with an input fuzzy set A. The rule defines
a fuzzy Cartesian product A x B or patch in the input-output state space
XX Y. A fuzzy system approximates a function by covering its graph with
patches and averaging patches that overlap. All the rules fire to some
degree as in a neural associative memory [lo]. The approximation theorem
shows that finite discretizations of A and B suffice for the covering. So
the patch or fuzzy Cartesian product A x B reduces to a fuzzy n-by-p
matrix M or relation or point in Z“J’. Then M defines the system mapping
M: I” + IP and the subsethood measure in (9) applies to M. In the same
product space each fuzzy system is a subset to some degree of all other
fuzzy systems. Then (11) below shows that each fuzzy system has a unique
numerical measure of fuzziness [5] or entropy.
How fuzzy is a fuzzy set? A nonfuzzy set lies at a vertex of cube I” and
has 0% fuzziness. The cube midpoint P equals its own opposite, P= PC,
and it alone has 100% fuzziness. In between it varies. The fuzziness of set
F grows as the distance falls between F and FC-as F and F” lie closer to
the midpoint P.
This cube geometry motivates the ratio measure of fuzziness, E(F) =
a/b, where a is the distance I’(F, Fnear)from F to the nearest vertex Fnear
and b is the distance 1’(F, F,,) from F to the farthest vertex F,,,. A long
diagonal connects Fnearto F,,. The fuzzy entropy theorem [7] reduces this
ratio to a ratio of counts:
c(FnFC)
E(F) = (10)
c(FuF’) ’
The probabilistic entropy H(P) [3,5] holds for fit vectors on the simplex
in Z”. Then
others. If some pj > t, then E(P) < l/(n - 1). So the uniform set maxi-
mizes E(P) but does not uniquely maximizes it. So E differs from H.
Now consider how E resembles H. Consider the probability element pi
and the motivation for the logarithm measure (12) as the average infor-
mation or entropy of a message or event: “information is inversely related
to the probability of occurrence” [3]. The more improbable the event, the
more informative the event if the event occurs. So information increases
with l/p,. The same intuition holds for monotone-increasing transforms of
l/p,. This includes the logarithmic transform log l/p, and only the
logarithmic transform in the additive case. The weighted average over the
system or alphabet gives the entropy as the expected information (12).
In the one-fit case E(F) reduces to f/(1 -f> if f< 3 and to (1 -f>/f if
f> 3. This ratio grows to 1 as f moves to the midpoint i and falls to 0 as f
moves to 0 or 1. The more vague or fuzzy the event, the more informative
the event if it occurs. The operator E is subadditive on fuzzy sets since in a
fuzzy space all events connect to one another to some degree. Integration
also shows that f/l -f and 1 -f/f define a continuous probability density
on [O,11 if normalized by In 4 - 1. So far we have only reviewed fuzzy
entropy. We now extend it to mutual entropy to set up the proof of the
main theorem.
Fuzzy mutual entropy arises from a natural question: Why not take the
logarithm of the unit fuzziness f/<l -f)? Any monotone transform will
preserve its shape. So why not follow the probability example and use a
logarithm? Then we can weight the log terms with the fit values and get a
more proper measure of the entropy of a fuzzy set. The idea is to replace
the intuition chain
(13)
fi+ --+lIl~+~fil~~~
fi (14)
l -fi
i
ADDITION AS FUZZY MUTUAL ENTROPY 281
The new fuzzy entropy term in (14) uses the natural logarithm to simplify
the proof of the main theorem. The sum term defines a fuzzy mutual
entropy.
For probability vectors P and Q in the I” simplex, define the mutual
entropy H(P/Q) of P given Q [ll] as
The mutual entropy measures distance in the simplex in the rough sense
that H(P/Q> = 0 if P = Q, and H(P/Q) > 0 if P # Q. This follows from
the Gibbs inequality [31. Some stochastic learning automata and neural
networks [4] minimize H(P/Q) as the learning system’s distribution P
tries to estimate the distribution Q of the sampled environment. In the
cube I”, the fuzzy mutual entropy term in (14) is the usual mutual entropy
H(F/FC) defined on fit vectors.
The sum of the fuzzy information units In (A./l -f,> splits into the
mutual entropies of fuzzy sets F and F’:
LEMMA:
(17)
fi
= )+A In] - f: (1 -.fiNfi l -fi
(18)
Fuzzy cubes map smoothly onto extended real spaces of the same
dimension and vice versa. The 2” infinite limits of extended real space
[ - w, ~1” map to the 2” binary corners of the fuzzy cube I”. The real origin
0 maps to the cube midpoint. Each real point x maps to a unique fuzzy set
F as Figure 3 shows.
A diffeomorphism f: R" -+I"is a one-to-one and onto differentiable
map f with a differentiable inverse f-‘. Different diffeomaps reveal
different fuzzy structure of operations in real space. The theorem (1)
follows from one of the simplest diffeomaps, the logistic map used in
neural models [2, 101 to convert an unbounded real input xi to a bounded
signal or fit value fi:
fi= --L-
1 +e-Xl * (20)
In extended real space R” the logistic map applies to each term of vector
x=(x 1,. . . , ,x,,). Note that fi = 0 iff xi = - ~0,fi = 1 iff xi = ~0,and fi = 3 iff
xi =O. Each real x picks out unique dual fuzzy sets F and F" in fuzzy
space.
The proof of (1) follows from the lemma (16) and from the inverse of
the logistic map (20):
fi
xi=f’(fi) =ln-
1 -fi. (21)
=qF) (23)
in operator notation. Q.E.D.
The logistic map (20) also allows a direct proof for each term xi:
(24)
(27)
6. CONCLUSIONS
REFERENCES