1. Introduction
The following notations will be used in this paper: Real scalar variables, whether mathematical variables or random variables, will be denoted by lower-case letters, such as , etc.; real vector/matrix variables—mathematical and random—will be denoted by capital letters, such as , etc. Complex variables will be written with a tilde, such as , etc. Scalar constants will be denoted by etc., and vector/matrix constants by A,B, etc. No tilde will be used on constants. If is a matrix, then its determinant will be denoted by or if the elements of are real or complex. The transpose of A is written as and the complex conjugate transpose as . The absolute value of the determinant will be written as . For example, if is a real scalar, then the absolute value is . If is a real matrix, then the wedge product of the differentials is written as , where, for two real scalar variables x and y with differentials and , the wedge product is defined as so that . If in the complex domain is a matrix, then we can write , which is real; then, we define . If is a real-valued scalar function of X, where X may be scalar real variable x, scalar complex variable , vector/matrix real variable X, or vector/matrix complex variable such that for all X and , then will be called a statistical density.
In many disciplines, especially in physics, communication theory, engineering, and statistics, one popular method of deriving statistical distributions is the optimization of an appropriate measure of entropy under appropriate constraints. For a real scalar random variable
x, [
1] introduced a measure of entropy or a measure of uncertainty:
where
c is a constant. The corresponding measure for the discrete case is
or
, which is a discrete probability law. By optimizing
, several authors have derived exponential, Gaussian, and other distributions under the constraints in terms of moments of
x, such as
fixed over all functional
f, meaning that the first moment is given, where
indicates the expected value of
. This constraint will produce exponential density. If
and
are fixed, meaning that the first two moments are fixed, then one has a Gaussian density, etc. The basic entropy measure in (1) has been generalized by various authors. One such generalized entropy is the Havrda–Charvát entropy [
2]
for the real scalar variable
x, which is given by
where
is a density. The original
is for the discrete case, and the corresponding continuous case is given in (2). Various properties, characterizations, and applications of the Shannon entropy and various
-generalized entropies were discussed by [
3]. A modified version of (2) was introduced by Tsallis [
4], and it is known in the literature as Tsallis’ entropy, which is the following:
Observe that when
in (2) and
in (3), both of these generalized entropies in the real scalar case reduce to the Shannon entropy of (1). Tsallis developed the whole area of non-extensive statistical mechanics by deriving Tsallis’ statistics by optimizing (3) under the constraint that the first moment is fixed in an escort density,
. Hundreds of papers have been published on Tsallis’ statistics.
In early 2000, the second author introduced a generalized entropy of the following form:
where
is a statistical density,
, where
X may be real scalar
x, complex scalar
, real vector/matrix
X, or complex vector/matrix
,
a is a fixed real scalar anchoring point,
is a real scalar parameter, and
is a real scalar constant so that the deviation of
from
a is measured in
units. In the real scalar case, we can see that when
, then (4) goes to the Shannon entropy in (1). Therefore, for vector/matrix variables in the real and complex domain, one has a generalization of the Shannon entropy in (4). If (3) is optimized under the constraint that the first moment
in
is fixed; then, it does not lead directly to Tsallis’ statistics. One must optimize (3) in the escort density mentioned above under the restriction that the first moment in the escort density is fixed. Then, one obtains Tsallis’ statistics. If (4) is used, then one can derive various real and complex, scalar, vector, or matrix-variate distributions directly from
by imposing moment-like restrictions in
. A particular case of (4) for
, introduced by the second author was applied by [
5] in time-series analysis, fractional calculus, and other areas. The researchers in [
6] used a particular case of (4) in record values, ordered random variables, and derived some properties, including characterization theorems. In [
7] discussed the analytical properties of the classical Mittag–Leffler function as being derived as the solution of the simplest fractional differential equation governing relaxation processes. In [
8] studied the complexity of the ultraslow diffusion process using both the classical Shannon entropy and its general case with the inverse Mittag–Leffler function in conjunction with the structural derivative.
In the present article, the term “entropy” is used as a mathematical measure of uncertainty or information characterized by some basic axioms, as illustrated by [
3]. Thus, it is a functional resulting from a set of axioms, that is, a function that can be interpreted in terms of a statistical density in the continuous case and in terms of multinomial probabilities in the discrete case. A general discussion of “entropy” is not attempted here because, as per Von Neumann, “whoever uses the term ‘entropy’ in a discussion always wins since no one knows what entropy really is, so in a debate, one always has the advantage”. An overview of various entropic functional forms used so far in the literature is available from [
9], along with their historical backgrounds and an account of the numbers of citations of these various functional forms. Hence, no detailed discussion of various entropic functional forms is attempted in the present paper. The concept of entropy is applied in general physics, information theory, chaos theory, time series, computer science, data mining, statistics, engineering, mathematical linguistics, stochastic processes, etc. An account of the entropic universe was given by [
10], along with answers to the following questions: How different concepts of entropy arose, what the mathematical definitions of each entropy are, how entropies are related to each other, which entropy is appropriate in which areas of application, and their impacts on the scientific community. Hence, the present article does not attempt to repeat the answers to these questions again. The present paper is about one entropy measure on a real scalar variable, its generalizations to vector/matrix variables in the real and complex domains, and an illustration of how this entropy can be optimized under various constraints to derive various statistical densities in the scalar, vector, and matrix variables in the real and complex domains. Because the entropy measure to be considered in the present article does not contain derivatives, the method of calculus of variation is used for optimization so that the resulting Euler equations will be simple. Mathematical variables and random variables are treated in the same way so that the double notations used for random variables are avoided. In order to avoid having too many symbols and the resulting confusion, scalar variables are denoted by lower-case letters and vector/matrix variables are denoted by capital letters so that the presentation is concise, consistent, and reader-friendly.
Entropy as an Expected Value
Shannon entropy
can be looked upon as an expected value of
. In Mathai’s entropy (4), one can write the numerator as
, which is the expected value of
. Then, (4) is the following expected value:
The quantity in the expected value operator goes to
when
, which is the same as the Shannon case for
. Therefore, the quantity inside the expectation operator is an approximation to
.
2. Optimization of Mathai’s Entropy for the Real Scalar Case
Let
x be a real scalar variable and let
be a density function, that is,
for all
x and
. Consider the optimization of (4) under the following moment-like constraints:
over all possible densities
. Then, if we use calculus of variation for the optimization of (4), the Euler equation is the following:
where
and
are Lagrangian multipliers and
is taken as
for convenience for
;
a is a fixed real scalar constant,
, and
is the normalizing constant. For
,
changes into
for
,
. When
, both
and
go to
for
. Observe that all three functions
can be reached through the pathway parameter
. From
, one can go to
and
. Similarly, from
, one can obtain
and
. Hence,
or
is Mathai’s pathway model for the real scalar positive variable
x as a mathematical model or as a statistical model. The model
is a generalized type-1 beta model,
is a generalized type-2 beta model, and
is a generalized gamma model. For
,
is a real scalar Gaussian model. For
,
is a Maxwell–Boltzmann density for
, and for
,
is the Rayleigh density for the real scalar positive variable case. If a location parameter is desired, then
x is replaced by
in all of the above models, where
m is the relocation parameter. For
,
is Tsallis’ statistic of non-extensive statistical mechanics; see [
4] Tsallis (1988). Hundreds of articles have been published on Tsallis’ statistics. For
,
and
—but not
—provide superstatistics of statistical mechanics. Several articles have been published on superstatistics.
Fermi–Dirac and Bose–Einstein densities are also available from the same procedure. In this case, the second factor
in the constraint is replaced by
, and the Lagrangian multipliers are taken as
and
so that the second factor in Equation (
6) becomes
for
with
to create a density function. Now, take
. Then, for
and for some constant
d, this gives the Fermi–Dirac density, and for
and
, this gives Bose–Einstein density.
In model-building situations, if is the generalized gamma model, Maxwell–Botlzmann model , Rayleigh model , or Gaussian model and is the stable or ideal situation in a physical system, then and provide the unstable or chaotic neighborhoods, and through the pathway parameter , one can model the stable situation, the unstable neighborhoods, and the transitional stages in a data analysis situation. This is the pathway idea of Mathai.
4. Real Matrix-Variate Case
Let
be a real
, and rank
p matrix with distinct real scalar variables
as elements. Let
be a
constant positive definite matrix and let
be a
constant positive definite matrix. Let
. This
u is an important quantity in statistical literature. Hence, we will impose restrictions in terms of moments of
u. Consider the optimization of Mathai’s entropy in (4) over all densities
, where
X is a
matrix, as defined above, subject to the constraints:
over all possible densities
. Then, proceeding as in
Section 3, we end up with the following densities, where we use the same notations of
in order to avoid having too many symbols: For
,
For
,
and for
,
For evaluating the normalizing constants, we use the following transformations:
,
the sum of squares of all the
elements in
Y, and, hence,
, where
Z is a
vector. Then, from Lemma 2,
Then, for
, we evaluate the
s-integral by using a real scalar type-1 beta integral; for
, we evaluate the
s-integral by using a real scalar type-2 beta integral; for
, the
s-integral is evaluated by using a real scalar gamma integral. Then, the normalizing constants are the following:
where, in (23), the conditions are
; in (24), the conditions are
; in (25), the conditions are
.
Observe that (21) and (22) are available from (20). Similarly, (20) and (22) are available from (21). In other words, all densities in (20)–(22) are available through the pathway parameter
. Note that (22) for
can be taken as a multivariate version of Maxwell–Boltzmann density coming from a rectangular matrix-variate real random variable. Similarly, for
, one can take (22) as a version of the multivariate real Rayleigh density coming from a rectangular matrix-variate real random variable. For
, (20) is a real rectangular matrix-variate Gaussian density. One can consider (20) as a generalized real multivariate type-1 beta density, (21) as a generalized real multivariate type-2 beta density, and (22) as the corresponding gamma density. For
, the model in (20) is a suitable model for reliability analysis for a real multivariate situation. As observed in
Section 3, one can see that
has an H-function distribution for
. In addition, for
,
is a real scalar type-1 beta distributed with the parameters
; for
,
is a real scalar type-2 beta distributed with the parameters
; for
,
is a real scalar gamma distributed with the parameters
.
Note 3. If a location parameter matrix M is to be introduced, then replace X with everywhere. If and if X is of rank q, then one can consider . Then, parallel results hold for all of the results in Section 4 by interchanging A with B and p with q.