1. Introduction
In the present contribution we derive the Generalized Khinchin–Shannon inequalities (GKS) [
1,
2] associated to entropy measures of the Sharma–Mittal (SM) set [
3]. We stress that the derivations to be presented here are a tentative way of implementing the ideas of the literature on interdisciplinary topics of Statistical Mechanics and Theory of Information [
4,
5,
6]. The algebraic structure of the escort probability distributions on these derivations seems to be essential, contrary to the intuitive derivation of the usual Khinchin–Shannon inequalities for the Gibbs–Shannon entropy measures. We start on
Section 2 with the construction of a generic probabilistic space with their elements—the probabilities of occurrence—arranged on blocks of
m rows and
n columns. It then follows the introduction of the definitions of simple, joint, conditional and marginal probabilities through the use of the Bayes’ law. In
Section 3, we make use of the assumption of concavity in order to unveil the Synergy of the distribution of values of Gibbs–Shannon entropy measures [
2]. In
Section 4, we present the same development but for the SM set of entropy measures, after the introduction of the concept of escort probabilities. We then specialize the derivations to Havrda–Charvat’s, Rényi’s and Landsberg–Vedral’s entropies [
7,
8,
9]. A detailed study is then undertaken in this section to treat the eventual ordering between the probabilities of occurrence and their associated escort probabilities. This is then enough for deriving the GKS inequalities for the SM entropy measures. In
Section 5, we present a proposal for Information measure associated to SM entropies and we derive its related inequalities [
10]. At this point we stress once more the upsurge of the synergy effect on the comparison of the information obtained from the entropy calculated with joint probabilities of occurrence and the entropies corresponding to simple probabilities. In
Section 6, we present an alternative derivation of the GKS inequalities based on Hölder inequalities [
11]. These can provide, in association with Bayes’ law, the same assumptions of concavity which have been used in
Section 3 and
Section 4 and a consequent identical derivation of the GKS inequalities given in
Section 4.
2. The Probability Space. Probabilities of Occurrence
We consider that the data could be represented on two-dimensional arrays of
m rows and
n columns. We then have
blocks of data to undertake the statistical analysis. The joint probabilities of occurrence of a set of
t variables
in columns
, respectively, are given by
where
m is the number of rows in the subarray
of the array
, and
is the number of occurrences of the set
. The values assumed by the variables
,
, are respectively given by:
or,
There are then objects of t columns each, , and if the variables take on values , then we will have components for each of these objects.
On the study of distributions of bases of nucleotides or distributions of amino acids in proteins, the related values of W are and , respectively.
The Bayes’ law for the probabilities of occurrence of Equation (
1) is written as:
where
stands for the conditional probability of occurrence of the values associated to the variables
in the columns
, respectively, if the values associated to
in the
jth column are given a priori. This also means that:
The marginal probabilities related to
are then given by
We then have from Equations (
6) and (
8):
which is the same result of Equation (
5).
3. The Assumption of Concavity and the Synergy of Gibbs–Shannon Entropy Measures
A concave function of several variables should satisfy the following inequality:
We shall apply Equation (
10) to the Gibbs–Shannon entropies:
where Equation (
12) stands for the definition of Gibbs–Shannon entropy which is related to the conditional probabilities
. It is a measure of the uncertainty [
2] on the distribution of probabilities of the columns
, when we have previous information on the distribution of the column
.
From Bayes’ law, Equation (
6) and from Equations (
8), (
11) and (
12), we get:
We now use the correspondences:
and we then have:
and
After substituting Equations (
12), (
16) and (
17) into Equation (
10), we get:
or,
This means that the uncertainty of the distribution on the columns cannot be increased when we have previous information on the distribution of column .
From Equations (
13) and (
19), we then write:
and by iteration we get the Khinchin–Shannon inequality for the Gibbs–Shannon entropy measure:
The usual meaning given to Equation (
21) is that the minimum of the information to be obtained from the analysis of the joint probabilities of a set of
t columns is given by the sum of the informations associated with the
t columns if considered as independent [
1,
2,
10]. This is also seen as an aspect of Synergy [
12,
13] of the distribution of probabilities of occurrence.
4. The Assumption of Concavity and the Synergy of Sharma–Mittal (SM)
Entropy Measures. The GKS Inequalities
We shall now use the assumption of concavity given by Equation (
10) on Sharma–Mittal (SM) entropy measures:
where,
and
r,
s are non-dimensional parameters.
Analogously to Equation (
12), we also introduce the “conditional entropy measure”
where
and
stands for the escort probability
The inverse transformations are given by:
with
A range of variation for the parameters
r,
s of Sharma–Mittal entropies, Equation (
22), should be derived from a requirement for strict concavity. In order to do so, let us remember that for each set of
t columns (
subarray) there are
m rows of
t values each (
t-sequences). We now denote these
t-sequences by:
A sufficient requirement for strict concavity is the negative definiteness of the quadratic form associated to the Hessian matrix [
14], whose elements are given by:
We then consider the
m submatrices along the diagonal of the Hessian matrix. Their determinants should be alternately negative or positive according to whether their order is odd or even [
15], respectively:
We then choose:
and we have from Equations (
34)–(
36):
This completes the proof.
From Bayes’ law, Equation (
6) and from Equations (
8), (
22)–(
25), we can write:
We are now ready to use the concavity assumption, Equation (
10) for deriving the GKS inequalities. In order to do so, we make the correspondences:
With the correspondences above, Equation (
10) will turn into:
An additional information should be taken into consideration before we derive the GKS inequalities:
On each
column,
of a
block, there will be values
,
of
such that
and
After multiplying inequalities (
48) and (
49) by
and
, respectively, and summing up in
and
, respectively, we get:
and
From Equations (
48) and (
49), any sum over the
values can be partitioned as sums over the sets of values
and
:
Substituting Equation (
52) into Equations (
50) and (
51), we have:
and
respectively.
After applying the Bayes’ law, Equation (
6), to the first term on the left hand side of Equations (
53) and (
54), we get:
where
and we have,
,
, according to Equations (
48) and (
49), respectively.
After taking the
s-power in Equations (
55) and (
56) and summing up in
, we have:
We now write, the concavity assumption, Equation (
10), as:
where
and
Equations (
59) and (
60) are now written as:
where
and
Since
and
, we have trivially that:
and
The set of inequalities, Equations (
61), (
64) and (
69), or
and the set of inequalities, Equations (
61), (
65) and (
70), or
can be arranged as the chains of inequalities
and
respectively.
The inequality
is common to the two chains above and it can be written as:
From the definition of the escort probabilities, Equations (
26) and (
27), we can write the right-hand side of Equation (
75), as:
From Equations (
75) and (
76) and the definition of the
-symbols, Equation (
23), we have:
We then get by iteration,
Equation (
78) do correspond to the Generalized Khinchin–Shannon Inequalities (GKS) here derived for Sharma–Mittal entropies.
From Equations (
22) and (
37) we can also write for the GKS inequalities:
The same words which have been written after Equation (
21), could be written also here for the Sharma–Mittal entropy measures as the aspect of Synergy is concerned. We will introduce a proposal for information measure to stress this aspect on the next section.
For
, we can write from Equation (
79):
The Havrda–Charvat’s, Rényi’s and Landsberg–Vedral’s entropies are easily obtained by taking the convenient limits in Equation (
24):
The Gibbs–Shannon entropy measure
, Equation (
11), is included in all these entropies through:
Equations (
83) and (
85) have been derived via the l’Hôpital theorem.
For
, we write from Equations (
82)–(
84):
As a last result of this section, we note that Equation (
79) could be also derived from Equation (
75), since this equation could be also written as:
where we have used Equations (
22)–(
25).
After the comparison of Equation (
42) and (
92), we get the result of Equation (
79) again.
5. An Information Measure Proposal Associated to Sharma–Mittal Entropy Measures
We are looking for a proposal of information measure which can fulfill a requirement of clear interpretation of the upsurge of Synergy in a probabilistic distribution and is supported by the usual idea of entropy as a measure of uncertainty.
For the Sharma–Mittal set of entropy measures the proposal for the associated information measure would be:
where
and
are given by Equations (
22) and (
23). We then have from Equation (
93)
From the GKS inequalities, Equation (
79), and from Equation (
94), we get:
The meaning of Equation (
95) is that the minimum of information associated with
t columns of probabilities of occurrence is given by the sum of informations associated to each column. This corresponds to the expression of Synergy of the distribution of probabilities of occurrences which we have derived on the previous section.
The inequalities (
95), for
are written as:
It seems worthwhile to derive yet another result which unveils once more the fundamental aspect of synergy of the distribution of probabilities of occurrence. From Equation (
93), we have:
and we then write from the GKS inequalities, Equation (
78):
Equation (
100) do correspond to another result which originates from the Synergy of the distribution of probabilities of occurrence. It can be written as: The minimum of the rate of information increase with decreasing entropy in probability distribution for sets of
t columns, is given by the product of the rates of information increase pertaining to each of the
t columns.