Normal-Power-Logistic Distribution Properties and
Normal-Power-Logistic Distribution Properties and
https://doi.org/10.1007/s41096-022-00143-4
RESEARCH ARTICLE
Abstract
The applications of Normal distribution in literature are verse, the new modi-
fied univariate normal power distribution is a new distribution which is adequate
for modelling bimodal data. There are many data that would have been modelled
by normal distribution, but because of their bimodality, they are not, since normal
distribution is unimodal. In this paper, a new extension of the normal linear model
called the normal-Power generalized linear model, derived from the T-Power{Logis-
tic} framework is presented. The statistical properties of the distribution and the pro-
posed model were derived such as quantiles, median, mode, robust skewness, robust
kurtosis and moment. The maximum likelihood estimation method was considered
to obtain the unknown model parameters. Three real data sets were analyzed to dem-
onstrate the flexibility and usefulness of the proposed model. The new model would
be very useful as alternative in cases where skewed or bimodal response variables,
which are not well fitted with normal linear model.
1 Introduction
In probability and statistics, the power function and normal distributions are very
useful in their individual applications. Not many authors have thought it to combine
these two distributions. The normal distribution does not have a shape parameter,
Copyright ©2021 by authors, all rights reserved. Authors agree that this article remains permanently
open access under the terms of the Creative Commons Attribution License 4.0 International License.
* Matthew I. Ekum
matekum@yahoo.com
1
Department of Mathematical Sciences, College of Basic Sciences, Lagos State University
of Science and Technology, Ikorodu, Lagos, Nigeria
2
Department of Statistics, Faculty of Science, University of Lagos, Akoka, Lagos, Nigeria
13
Vol.:(0123456789)
Journal of the Indian Society for Probability and Statistics
but power function has; while power function does not have a location parameter but
normal has. Both are flexible, so combining them will produce a more flexible dis-
tribution. The power function distribution is the inverse of Pareto distribution (Dal-
las 1976). The power function distribution is a special model that can be formed or
related to the uniform, Weibull, Kumaraswamy distributions. The power function
distribution is considered one of the simplest and handy lifetime distributions.
Meniconi and Barry (1996) proposed the two-parameter power function distribu-
tion as a simple alternative to the exponential distribution when it comes to model-
ling failure data related to mortality rate and component failures. It is a special case
of the beta distribution and one may sight the importance of the distribution in statis-
tical tests such as the likelihood ratio test. The normal distribution on the other hand
has been combined with other distribution to form a more flexible distribution, such
as exponentiated-Normal (Gupta et al. 1998), Beta-Normal distribution (Eugene and
Lee 2002), Gamma-Normal (GN) distribution (Zografos and Balakrishnan 2009),
Kumaraswamy-Normal distribution (Cordeiro and de Castro 2011). Estimation of
the power function parameters has been done by various authors, such as Zaka and
Akhter (2013) .
Many classical distributions have been extensively used for modelling real data
in many areas. However, in many situations; there is a clear need for extended forms
of these distributions to improve the flexibility and goodness of fit of these distribu-
tions. For that reason, families of continuous distributions are developed by intro-
ducing one or more additional shape parameter(s) to the baseline distribution or by
combining two or more distributions to produce new ones. Akarawak et al. (2013)
described such new distributions as convoluted distributions. Some authors in recent
years have developed frameworks used in combining these distributions to form new
ones. A good example is the T-R{Y} framework (Aljarrah et al. 2014). Since then,
a lot of authors have been using it to develop flexible life time distributions that are
hazard weighted functions of the baseline distributions. Weibull-Normal distribution
(Alzaatreh et al. 2014) was one of the first normal distribution combined with other
distribution using the T-R{Y} framework. The Weibull power function distribution
(Tahir et al. 2016) has a combination of power function and weibull distribution,
using weibull distribution as a baseline distribution.
Among the authors that used the T-R{Y} framework in 2016 includes (Alzaatreh
et al. 2016) and Almagambetova et al. (2016). Okorie et al. (2017) proposed the
modified power function distribution. Famoye et al. (2018) developed the Weibull-
Normal{log-logistics} distribution with the normal distribution as the baseline.
Zubair et al. (2018) also used the framework to develop a new convoluted distri-
bution. Other convoluted distributions developed using the framework include the
reduced beta skewed laplace distribution (Arowolo et al. 2019); Odd Lomax-expo-
nential{log-logistic} distribution (Ogunsanya et al. 2019); exponentiated-exponen-
tial-Dagum{Lomax} distribution (Ekum et al. 2020a); Lomax-Cauchy{uniform}
(Amalare et al. 2020); and Rayleigh-Cauchy{uniform} (Ogunsanya et al. 2021).
The simplicity and usefulness of the power function distribution compelled
the researchers to explore its further extensions, generalizations, and applications
in different areas of science (Arshad et al. 2020; Ekum et al. 2020b). Recently,
Gamma-Power{log-logistic} distribution was proposed by Ekum et al. (2021) and
13
Journal of the Indian Society for Probability and Statistics
demonstrated its usefulness in modelling skewed data. None of these study have
combined normal and power function distribution, especially making power func-
tion distribution a baseline, except the normal-power{logistic} distribution (NPLD)
proposed in the work of Ekum et al. (2021). More so, many properties of the NPLD
has not been defined and studied, and it has not been developed into a generalized
linear model for predicting relationship in regression applications.
Predicting oil spillage is of a major interest to researchers in the field of Geo-
science and geological statistics. In Nigeria, oil spillage is a major problem that have
devastated the ecosystem and biodiversity of the Niger Delta region in Nigeria. The
quantity of oil spilled may be estimated using the estimated spilled volume. The
estimated spill volume of crude oil may be determined by the duration of clean-up
(Whanda et al. 2016; Deinkuro et al. 2021). Also, researchers may want to know if
they can predict their researchgate score using their citations and research items.
These are emerging issues of interest to researchers, especially the ones in academ-
ics (Jordan (2015); O’Brien (2019)). More so, the COVID-19 mortality rate per pop-
ulation and the linear effect on the economic wellbeing of Nigerians is also worth to
study. This is because, the GDP per capita can be affected by COVID-19 mortality.
The COVID-19 factor is also an extra burden to the wellbeing of the people (Pak
et al. (2020); Iluno et al. (2021)).
In literature, there are some modifications of the normal distribution, which pro-
duced multimodality (Kundu 2017), which has multiple modes with less number of
parameters. The modification of the normal distribution developed by Kundu (2017)
is a bivariate family of distributions, why the one developed here is a univariate
family. More so, Kundu (2017) did not extend their distribution to generalized lin-
ear model. The motivation of this work is based on the modelling of independent
variables in regression modelling that have bimodal features. Other authors such
as Famoye et al. (2018), Kundu (2017), etc, had developed distributions that are
bimodal but none has extended it to regression modelling. More so, real life prob-
lems like the crude oil spill volume, number of citations in research gate, GDP per
Capita, etc are real variables which maximum values can be estimated, so they are
bounded below by zero (non negative) and above by a real value, rather than infinity.
Thus, a distribution with bounded support is necessary [0, 𝜆], where 𝜆 > 0 is a real
upper bound (Ekum et al. 2020b).
Thus, in this study, the aim is to adopt a novel univariate continuous probabil-
ity distribution called the normal-power-logistics distribution NPLD, which was
derived from the T-Power{logistic} family proposed and studied by Ekum et al.
(2021) and extends it into generalized linear model in order to solve real regres-
sion problems, where the dependent variables are bimodal and skewed with a known
maximum value. The model has four parameters, two from the normal distribu-
tion and the other two from the power function distribution, which one of it is a
shape parameter and the other is an upper bound parameter to control the extremes
of the distribution. The scope covers different characterizations, properties, regres-
sion model, and parameter estimation of the NPLD model. The method of Maxi-
mum Likelihood Estimation (MLE) was used to estimate the model parameters. The
importance of the new model was proved empirically using three real-life datasets.
The proposed model would be very useful in engineering, medicine, and all fileds of
13
Journal of the Indian Society for Probability and Statistics
life, where the dependent variable of interest to be predicted has bimodal features. It
is expected to perform well when normal distribution fails to fit the data of interest.
In this section, the theory and application of the proposed scheme are considered.
The following definitions will be very useful in characterising the proposed model.
Definition 2 : The pdf of the T- power{logistic} family is derived by taking the first
derivative of FX (x) with respect to x and it is given by
[ ( )]
;0 ≤ x < 𝜆
k𝜆k xk
fX (x) = ( ) fT ln k (2)
x 𝜆k − xk 𝜆 − xk
13
Journal of the Indian Society for Probability and Statistics
fX (x)
hX (x) = (4)
1 − FX (x)
Definition 9 : Let R be a non-negative random variable with pdf fR (x), and let E(Rk )
denote the kth moment of R, then
where E(X k ) is the kth moment of the random variable, X; [1 − FY (.)] is the survival
function of the random variable Y, and T is the quantile values random variable T
with respect to fT (x).
The proposed model is a generalized linear model that takes the form
g(𝜇i ) = 𝛽0 + 𝛽1 x1i + 𝛽2 x2i + ... + 𝛽p xpi
13
Journal of the Indian Society for Probability and Statistics
where g(𝜇i ) is the link function, and the right hand side is the linear predictor. Six
goodness-of-fit criteria are used to compare the flexibility of the proposed model
with other known models. The goodness-of-fit criteria are log-likelihood (LogL),
Akaike Information Criterion (AIC), Kolmogorov-Smirnov statistic (D), Anderson-
Darling statistic (A), Cramer-von Mises statistic (𝜔) and Chi-square statistic (𝜒 2).
See (Chen and Balakrishnan 1995) for detailed information on A and 𝜔. The lower
the value of the criteria, the better the performance of the model. Also, to show the
relationship between the observed dependent variable y and the predicted dependent
variable ŷ , the coefficient of correlation is used. This shows the model that performs
well if the correlation coefficient is high. It is assumed that the dependent variable y
has a normal-power distribution.
Recall the cdf of T- power{logistic} defined by Ekum et al. (2021) given in Definition
(1) as
[ ( )]
xk
FX (x) = FT ln k
𝜆 − xk
where FT [t] is the cdf random variable T. So, T can follow any known distribution.
If T follows a normal distribution with parameters 𝜇 and 𝜎, then the pdf of T is given
by
� � ��
;−∞≤t ≤∞
1 1 t−𝜇 2
fT (t) = √ exp −
2𝜋𝜎 2 2 𝜎
Therefore
( )
xk
t = ln k
𝜆 − xk
13
Journal of the Indian Society for Probability and Statistics
� k �
⎧ ⎡ x ⎤⎫
ln − 𝜇 ⎥⎪
1⎪ ⎢ 𝜆k −xk
FX (x) = ⎨1 + erf ⎢ √ ⎥⎬;
2⎪ ⎢ 𝜎 2 ⎥⎪ (8)
⎩ ⎣ ⎦⎭
𝜇, 𝜎, k, 𝜆 > 0;0 < x < 𝜆
where error function, erf(.) is given by
x
∫
2 2
erf (x) = √ e−t dt
𝜋 0
Density of NPLD
sigma=4.5, mu=0, k=0.6, lambda=10
sigma=4.5, mu=0, k=1.6, lambda=10
sigma=4.5, mu=0, k=1.0, lambda=10
sigma=4.5, mu=0, k=4.6, lambda=10
0.15
0.10
f(x)
0.05
0.00
0 5 10
x
Fig. 1 Probability Density Function with different parameters values showing bimodal features
13
Journal of the Indian Society for Probability and Statistics
2.3.2 Useful Transformation
( )
Xk
Lemma 2.1 If X ∼ NPLD(𝜇, 𝜎, k, 𝜆), a random variable W = ln 𝜆k −X k
follows a
normal distribution with parameters 𝜇 and 𝜎 , then the pdf of W is given by
� � ��
1 1 w−𝜇 2
f (w) = √ exp −
2𝜋𝜎 2 2 𝜎
Differentiating w with respect to x, and making dx the subject of the equation gives
x(𝜆k − xk )
dx = dw. (12)
k𝜆k
Now, changing the support from x to that of w, we have
0≤x≤𝜆 ⇒ −∞ ≤ w ≤ ∞ (13)
It follows from inverse transformation and we have
� � � �
, −∞ ≤ w ≤ ∞
1 1 w−𝜇 2
fW (w) = √ exp − (14)
2𝜋𝜎 2 2 𝜎
is the pdf of normal distribution with parameters 𝜇 and 𝜎 . Equation (14) completes
the proof. ◻
From Lemma 2.1, it shows that the pdf of NPLD with parameters (𝛼, 𝛽, k, 𝜆) is a
proper pdf. No further proof is needed.
13
Journal of the Indian Society for Probability and Statistics
� �
⎧ ⎡ xk ⎤⎫
⎢ ln − 𝜇 ⎥⎪
1⎪ 𝜆k −xk
SX (x) = ⎨1 − erf ⎢ √ ⎥⎬, (15)
2⎪ ⎢ 𝜎 2 ⎥⎪
⎩ ⎣ ⎦⎭
2.3.5 Quantile Function
Theorem 2.2 Let X be a random variable that follows NPLD with cdf FX (x), then the
inverse function of the cdf, which is the quantile function exist, and it is given by
{ }1∕k
e[𝜇+𝜎Φ (p)]
−1
QX (p) = 𝜆
1 + e[𝜇+𝜎Φ (p)]
−1
13
Journal of the Indian Society for Probability and Statistics
x= { } (20)
1 + e[𝜇+𝜎Φ (p)]
−1
Equation (20) is the inverse function of the cdf of X, and it can be written as
{ }1∕k
e[𝜇+𝜎Φ (p)]
−1
QX (p) = 𝜆 (21)
1 + e[𝜇+𝜎Φ (p)]
−1
where QX (p) is the quantile function of NPLD; Φ−1 (p) is the inverse function of the
cdf of standard normal distribution, and p is a probability value uniformly gener-
ated, that is, P ∼ U(0, 1). Thus, Equation (21) completes the proof. ◻
2.3.6 Measures of Partition
The quantile function can be used to derive all the measures of partition, such as,
median, quartile, octile, decile and percentile.
The median of NPLD is
{ }1∕k
e[𝜇+𝜎Φ (0.5)]
−1
QX (p) = 𝜆 (22)
1 + e[𝜇+𝜎Φ (0.5)]
−1
QX (0.25) = 𝜆 (24)
1 + e[𝜇+𝜎Φ (0.25)]
−1
The 3rd quartile of NPLD, which is the same as the 75th percentile is given by
13
Journal of the Indian Society for Probability and Statistics
{ }1∕k
e[𝜇+𝜎Φ (0.75)]
−1
QX (0.75) = 𝜆 (26)
1 + e[𝜇+𝜎Φ (0.75)]
−1
Theorem 2.3 Let X be a random variable that follows NPLD with quantile function
QX (p), then the skewness is robust, because it is a resistance measure, which is not
affected by extreme value, 𝜆.
Proof Recall the median, 1st quartiles (Q1) and 3rd quartile (Q3) of NPLD given by
{ 𝜇 }1∕k
e
Q2 = 𝜆 ,
1 + e𝜇
{ [𝜇−0.68𝜎] }1∕k
e
Q1 = 𝜆
1 + e[𝜇−0.68𝜎]
and
{ [𝜇+0.68𝜎] }1∕k
e
Q3 = 𝜆
1 + e[𝜇+0.68𝜎]
respectively.
Substituting the values of Q2, Q1 and Q3 into (28) gives
{ }1∕k { [𝜇−0.68𝜎] }1∕k { 𝜇 }1∕k
e[𝜇+0.68𝜎] e e
𝜆 1+e[𝜇+0.68𝜎]
+ 𝜆 1+e [𝜇−0.68𝜎]
− 2𝜆 1+e 𝜇
Sk = { [𝜇+0.68𝜎] }1∕k { [𝜇−0.68𝜎] }1∕k (29)
e e
𝜆 1+e[𝜇+0.68𝜎]
− 𝜆 1+e [𝜇−0.68𝜎]
13
Journal of the Indian Society for Probability and Statistics
Theorem 2.4 Let X be a random variable that follows NPLD with quantile function
QX (p), then the kurtosis is robust, because it is a resistance measure, which is not
affected by extreme value, 𝜆.
QX (p) = 𝜆 .
1 + e[𝜇+𝜎Φ (p)]
−1
E1 = QX (1∕8) = 𝜆
1 + e[𝜇+𝜎Φ (1∕8)]
−1
E2 = QX (2∕8) = 𝜆 = Q1 . (33)
1 + e[𝜇+𝜎Φ (2∕8)]
−1
E3 = QX (3∕8) = 𝜆
1 + e[𝜇+𝜎Φ (3∕8)]
−1
13
Journal of the Indian Society for Probability and Statistics
E5 = QX (5∕8) = 𝜆
1 + e[𝜇+𝜎Φ (5∕8)]
−1
E6 = QX (6∕8) = 𝜆 = Q3 . (36)
1 + e[𝜇+𝜎Φ (5∕8)]
−1
E7 = QX (7∕8) = 𝜆
1 + e[𝜇+𝜎Φ (5∕8)]
−1
Substituting the values of E1, E2, E3, Q5, E7 and E7 into (31) gives
𝜆A − 𝜆B + 𝜆C − 𝜆D
Ku = { }1∕k { [𝜇−0.68𝜎] }1∕k .
e[𝜇+0.68𝜎] e (38)
𝜆 1+e[𝜇+0.68𝜎] − 𝜆 1+e[𝜇−0.68𝜎]
2.3.10 Mode of NPLD
Theorem 2.5 Let X be a random variable that follows NPLD with pdf fX (x), a dif-
ferentiable function, then the mode is not unique and possibly bimodal for some
parameter values.
13
Journal of the Indian Society for Probability and Statistics
� � � � �2 �
k𝜆k 1 xk
fX (x) = √ exp − 2 ln k −𝜇
x(𝜆k − xk ) 2𝜋𝜎 2 2𝜎 𝜆 − xk
The mode can be derived by differentiating the pdf, equate to zero, and solve for x.
Using product rule
dfX (x) dv du
=u +v (40)
dx dx dx
Let
k𝜆k
u= (41)
x(𝜆k − xk )
and
� � � � �2 �
1 1 xk
v= √ exp − 2 ln k −𝜇 (42)
2𝜋𝜎 2 2𝜎 𝜆 − xk
du (k + 1)xk − kxk
= (43)
dx x(𝜆k − xk )2
Differentiating v with respect to x gives
� k � 2
⎧ ⎡ x ⎤⎫
k ⎪ 1⎢ ln − 𝜇 ⎥⎪
dv k𝜆 k
𝜆 −x k
=− √ exp⎨− ⎢ ⎥⎬
dx k k
x(𝜆 − x ) 2𝜋𝜎 2 2
⎪ ⎢ 𝜎 ⎥⎪
⎩ ⎣ ⎦⎭
� k � (44)
⎡ x ⎤
⎢ ln 𝜆k −xk − 𝜇 ⎥
×⎢ ⎥
⎢ 𝜎2 ⎥
⎣ ⎦
Inserting (41), (42), (43) and (44) into (40) and equating to zero gives
( )
xk
𝜎 2 (k + 1)xk+1 − k𝜆2k 𝜎 2 x − k2 𝜆2k ln k + 𝜇k2 𝜆2k = 0 (45)
𝜆 − xk
13
Journal of the Indian Society for Probability and Statistics
Theorem 2.6 Let X be a random variable that follows NPLD with parameters
𝜇, 𝜎, k, 𝜆, the pdf of X, fX (x), is a weighted pdf of power function distribution with
parameters k and 𝜆, that is,
fX (x) = ΨfR (x) (47)
where fR (x) is the pdf of power function distribution, and Ψ is the weight.
Proof Recall the pdf of NPLD given in (9). Given the following series expansions
∑
∞
yi
exp(y) = , (48)
i=0
i!
∞ ( )
∑ n n−j j
(y + a)n =
j
a y, (49)
i=0
[ ( ) ]2i ∞ ( )
xk ∑ 2i
ln −𝜇 = (−𝜇)2i−j
k
𝜆 −x k j
j=0
[ ( )]j (50)
x k
× ln k ,
𝜆 − xk
∑
∞
(−1)l−1 (y − 1)l
ln(y) = (51)
l=0
l
[( ) ]l
( ) l−1 xk
xk ∑ (−1)
∞ −1
ln =
𝜆k −xk
, (52)
𝜆k − xk l=0
l
13
Journal of the Indian Society for Probability and Statistics
( )
∑
∞
n
(y + a)n = an−m ym ,
m
m=0
[( ) ]lj ∞ ( ) (53)
x k ∑ lj
−1 = (−1)lj−m
𝜆k − xk m
i=0
[( )]m
xk
× , (54)
𝜆k − xk
∞ ( )
∑ n n−s s
(55)
n
(y + a) = a y,
s
s=0
∞ ( )
∑ −(m + 1)
k −(m+1)
k
(𝜆 − x ) = (𝜆k )−(m+1)−s (−xk )s . (56)
s
s=0
1 �
∞
1
fX (x) = √ (−1)3i−2j+2lj−m+s
2
2𝜋𝜎 i=j=l=m=s=0 (m + s)
� �� �� � 2i−j (57)
lj 2i −(m + 1) 𝜇 k(m + s)xk(m+s)−1
× .
m j s 2i lj 𝜆k(m+s) 𝜎 2i i!
k(m + s)xk(m+s)−1
fX (x) = Ψ . (58)
𝜆k(m+s)
If m = s = 0, then
� � 2i−j
1 �
∞
2i 𝜇
Ψ= √ (−1)3i−2j+2lj .
j 2i lj 𝜎 2i i!
2𝜋𝜎 2 i=j=l=0 (59)
kxk−1
fX (x) = Ψ k .
𝜆
where fR (x) is the pdf of power function distribution. Hence, Equation (59) com-
pletes the proof. ◻
13
Journal of the Indian Society for Probability and Statistics
2.3.12 Moment of NPLD
Let X be a continuous random variable with pdf fX (x), the rth moment is given by
∫ (60)
E(X r ) = fX (x)dx
1 �
∞
1
fX (x) = √ (−1)3i−2j+2lj−m+s
2
2𝜋𝜎 i=j=l=m=s=0 (m + s)
� �� �� � 2i−j
lj 2i −(m + 1) 𝜇 k(m + s)xk(m+s)−1
× .
m j s 2i lj 𝜆k(m+s) 𝜎 2i i!
∫ (61)
E(X r ) = xr fX (x)dx
Note that
∑ ∑
∫ ∫ (62)
=
So that
� �
�
∞
(−1)3i−2j+2lj−m+s lj
E(X r ) =
2i lj i! m
i=j=l=m=s=0
� �� � � r+k � (63)
2i −(m + 1) 1 𝜇2i−j k 𝜆
×
j s √ k(m+s) 𝜎 2i r + k
.
2𝜋𝜎 2 𝜆
Let
( )( )
∑
∞
(−1)3i−2j+2lj−m+s lj 2i
𝜓i,j,l,m,s =
2i lj i! m j
i=j=l=m=s=0
( ) (64)
−(m + 1)
× .
s
So that
� �
r 𝜇2i−j k 𝜆r+k−k(m+s)
1
E(X ) = 𝜓i,j,l,m,s √ 2i
. (65)
2𝜋𝜎 2 𝜎 r+k
13
Journal of the Indian Society for Probability and Statistics
2.3.13 Mean of NPLD
k𝜆k+1
E(X) = √ . (67)
(k + 1) 2𝜋𝜎 2
(2𝜋𝜎 2 )n∕2
�
n
1
×
i=1 xi (𝜆k − xik )
The maximum likelihood estimation parameters of the NPLD are given by differ-
entiating 𝓁 partially with respect to 𝜇, 𝜎 and k and equating the results to zero and
solve for each parameter.
[ ( ) ]
1 ∑
n
𝜕𝓁 xik
= 2 ln −𝜇 (69)
𝜕𝜇 𝜎 i=1 𝜆k − xik
⎧� � � �2 ⎫
1 �⎪
n
𝜕𝓁 n xik ⎪
=− + 3 ⎨ ln k
−𝜇 ⎬ (70)
𝜕𝜎 𝜎 𝜎 i=1 ⎪ k
𝜆 − xi ⎪
⎩ ⎭
13
Journal of the Indian Society for Probability and Statistics
[ ( ) ]( )
𝜆k ∑
n
𝜕𝓁 n xik lnxi − ln𝜆
= − nln𝜆 − 2 ln −𝜇
𝜕k k 𝜎 i=1 𝜆k − xik 𝜆k − xik
( ) (71)
∑n
𝜆k ln𝜆 − xik lnxi
−
i=1 𝜆k − xik
The equations obtained by setting the partial derivatives 𝓁 with respect to k to zero
is not in closed form and the values of the parameter k is found using Newton’s
numerical procedure provided by R package (R Development Core Team 2009). The
parameter 𝜆 cannot be estimated using the MLE method because it depends on X,
thus, is estimated from from data using
𝜆̂ = max(xi ) + 𝜖; ∀x ∈ X (75)
where 𝜖 > 0 is a very small positive number less than 1 chosen by the user.
It should be noted that the maximum likelihood estimators of the parameters 𝜇
and 𝜎 are in close form and will always exist provided the values of parameters k
and 𝜆 are known. The value of parameter 𝜆 cannot be determined by the maximum
likelihood estimation method because it is an upper bound, so it can be estimated
by equation (75) from the data. Parameter k is not in closed form and a numeri-
cal optimization method is used to estimate it. We find the initial value of k used
in the numerical optimization by first assuming that the random sample is from
power function distribution. We estimate the initial value of k from power func-
tion distribution. The moment estimate of parameter k is given by k = 𝜆−̄ x̄
x
, x̄ < 𝜆 ,
where x̄ is the sample mean (Ekum et al. 2020b), estimated from data.
13
Journal of the Indian Society for Probability and Statistics
In a case where the parameter estimated using Newton approximation is not optimal,
a new relationship is derived by EM algorithm.
Let
∫ (76)
k𝜄+1 = arg max f (x|I;Ω𝜄 )lnf (x, I;Ω)dx
k
∫ (77)
k𝜄+1 = arg max f (x|I;k)lnf (x, I;𝜇, 𝜎, k, 𝜆)dx
k
2𝜋𝜎 2
and
� � � � �2 �
k𝜆k 1 xk
fX (x) = √ exp − 2 ln k −𝜇
x(𝜆k − xk ) 2𝜋𝜎 2 2𝜎 𝜆 − xk
respectively.
Substituting the pdfs of normal distribution and NPLD into Equation (77) gives
∞ � �2
∫0
1 − 21 x−̄x
k𝜄+1 = arg max √ e S
k
2𝜋S2
⎧ ⎧ � � � �2 ⎫ ⎫
⎪ 1 ⎪
⎪ k ⎨− 2S2 ln x̂ k −xk −̄x ⎬ ⎪
xk
(78)
⎪ k̂x(n) ⎪ (n) ⎪⎪
× ln⎨ √ e⎩ ⎭ dx.
⎬
⎪ x(̂x(n) − x ) 2𝜋S
k k 2 ⎪
⎪ ⎪
⎩ ⎭
where 𝜇, 𝜎 and 𝜆 are known, such that, 𝜇̂ = x̄ , 𝜎̂ = S , and 𝜆̂ = sup(x = x̂)(n), where x̄
and S are the sample mean and sample standard deviation of ln 𝜆−x x
. Note that
x̂ (n) − x > 0, ∀ x ∈ X . Note k1 is the initial value of k assumed as suggested, that is,
k1 = 𝜆−̄ x̄
x
, x̄ < 𝜆. So that k𝜄+1 is the new estimate of k and it is optimal.
Now that optimal value of k is known, then we can estimate the values of 𝜇 and 𝜎
using equations (72) and (73) respectively.
13
Journal of the Indian Society for Probability and Statistics
B = Q∗(1−𝛼) SΘ (79)
where Q∗p is the standard quantile function of NPLD, Φ−1 (p) is the inverse function
of the cdf of standard normal distribution known as the quantile function, and p is
a probability value uniformly generated. Note that 𝜆 > 0 is a regulator parameter in
this case. Its value is adjusted to determine how large the error bound should be. In
this research, 𝜆 is taken as 2 to accommodate the population parameter. So, the level
of significance, 𝛼 and 𝜆 are always chosen. The values of 𝜆 can be 1, 2 or 3 depend-
ing on how large you want the error bound to be.
Thus, the 100(1 − 𝛼)% confidence interval for parameter Θ is given by
̂ ± Q∗ SΘ
Θ=Θ (81)
(1−𝛼)
where Θ
̂ is the point estimate of Θ.
The simulation study is presented to show the performances of the maximum likeli-
hood estimators and their consistency. The procedure used to perform the simulation
studies involves, generating uniform distribution of n quantiles, p. The quantile func-
tion defined in equation (21) for NPLD was used to generate NPLD random variates
for the sample sizes n = 50, 100, 200 and 300 replicated 1000 times. The parameters
values are set as k = 𝜎 = 𝜇 = 0.5 , k = 𝜎 = 𝜇 = 1, and k = 𝜎 = 𝜇 = 2 and for a fixed
𝜆 = 2. The actual values, mean estimates, standard errors, and 95% confidence inter-
val are presented in Tables 1, 2 and 3. Tables 1, 2 and 3 show that the standard error
decreases as the sample size increases, which implies that the MLEs are consistent.
Let assume that the dependent random variable Y of interest in our linear model fol-
lows a NPLD given independent variable(s) X. The linear regression model is called
NPLD Generalized Linear Model (NPGLM).
Given the linear model in matrix form
Y = XB + e (82)
where Y is a n-dimensional vector called the dependent vector for all observations
n; X is the set of k independent variables packed into a (n × k + 1) matrix called the
13
Journal of the Indian Society for Probability and Statistics
Table 1 Simulation Study showing Mean estimates, standard error, and confidence interval of the MLE
for k = 𝜎 = 𝜇 = 0.5
n Parameters Actual values Mean Standard error Confidence interval
Table 2 Simulation Study showing Mean estimates, standard error, and confidence interval of the MLE
for k = 𝜎 = 𝜇 = 1
n Parameters Actual values Mean Standard error Confidence interval
design matrix; B is a (k + 1)-dimensional vector called the slope vector; e is the error
term packed into a n-dimensional vector called the error vector.
The conditions to use the GPGLM to fit the model are given thus:
13
Journal of the Indian Society for Probability and Statistics
Table 3 Simulation Study showing Mean estimates, standard error, and confidence interval of the MLE
for k = 𝜎 = 𝜇 = 2
n Parameters Actual values Mean Standard error Confidence interval
where parameter 𝜆 is an upper bound. The pdf f(y) is not free from parameter (𝜆),
and hence, might be difficult to express as a member of the exponential family.
However, a simple transformation can be done with the data that follows a NPLD
to a normal distribution as proved in Lemma (2.1).
Recall the transformed pdf
13
Journal of the Indian Society for Probability and Statistics
� � ��
1 1 w−𝜇 2
f (w) = √ exp − (84)
2𝜋𝜎 2 2 𝜎
Since (86) can be written in exponential class, we can directly derive the joint suf-
ficient statistics from it. So, the joint sufficient statistics for 𝜇 and 𝜎 are w and w2
respectively. Thus, w and w2 can give all information concerning parameters 𝜇 and 𝜎
respectively.
√
�
�n
w2i �n wx b
i ij j
logL = −nlog( 2𝜋) − nlog𝜎 − 2
−
i=1
2𝜎 i=1
𝜎2
� (89)
�
n (x b )2
ij j
−
i=1
2𝜎 2
( yki
)
where j = 0, 1, ..., p, E(wi ) = 𝜇i = xij bj and wi = log . Note that p is the num-
�
𝜆 −yki
k
13
Journal of the Indian Society for Probability and Statistics
Then
� � yk ��2
√
i
� n log 𝜆k −yk
i
logL = −nlog( 2𝜋) − nlog𝜎 − 2
2𝜎
(90)
i=1
� � yk ��
�
i
log 𝜆k −y xij bj � n (x� b )2
�n k
i ij j
− 2
−
i=1
𝜎 i=1
2𝜎 2
( p × 1) matrix. Note that W = log 𝜆k −Y k , where the value of lambda can be approx-
Y
imated from the data using nth order statistic or simply 𝜆̂ = max(yi ) + 𝜎ȳ ∀ i, where
𝜎ȳ is the standard error of y computed from the data. An approximation for k can
also be derived from data using k̂ = 𝜆−̄ ȳ
y
, ȳ < 𝜆, where ȳ is the sample mean, derived
from Ekum et al. (2020b).
3 Results
3.1 Application
In this section, applications to three real data sets were provided to illustrate the uses
and importance of the NPLD. Three competing models are used to fit the two data
of interest, they are NPLD, Normal are Gamma GLMs.
The data on the estimated spilled volume (ESV) is collected from 7th January 2011
to 27th December 2019, at Shell Nigeria webisite (www.shell.com.ng/sustainability/
environment/oil-spills.html).
Figure 2 shows that the oil spill data is bimodal with positive skewness (1.1302)
and kurtosis (3.3977).
The estimated spill volume of crude oil can be determined by the Duration of Clean-
up (DOC). If the duration of clean-up is known, the spill volume can be estimated
from an appropriate model. Thus, the dependent variable is ESV and the independ-
ent variable is the DOC.
13
Journal of the Indian Society for Probability and Statistics
25
20
15
Frequency
10
5
0
Table 4 shows the model parameters estimated, their standard errors and their
corresponding P-values.
Table 5 shows that the NPLD regression model outperforms the other regression
models using all the selection criteria.
Total Research Gate (TRG) score data is a cross-sectional data collected from
Research Gate page of 100 selected researchers in the field of Mathematical Science
Table 4 Generalised linear model parameter estimates for Oil Spill model
Distribution 𝛽̂0 SE𝛽̂0 P-Value 𝛽̂1 SE𝛽̂1 P-Value
13
Journal of the Indian Society for Probability and Statistics
as of 15th May 2021. The data includes TRG score, Total Research Interest (TRI),
Citations, Recommendations, Reads and Research Items (RI). The independent vari-
ables are citations and RI (Fig. 3).
Figure 3 shows that the TRG score data is bimodal with positive skewness of
0.1595 and kurtosis of 1.9747.
The TRG score can be predicted by Citations and Research Items. If citations and
research items increased, the TRG score will also increase. Thus, the dependent var-
iable is TRG score, while the independent variables are citations and research items.
Table 6 shows the model parameters estimated using MLE, their standard error
and their corresponding P-values. The fitted NPLD regression model shows that the
estimates 𝛽0 and 𝛽1 are significant at 5% level of error. This is also true for gamma
and normal regression models.
Table 7 shows that the NPLD regression model outperforms the other regression
models using all the goodness-of-fit criteria.
The data used here are daily data collected from World Health Organisation (WHO)
from 1st June 2020 to 31st December 2020, spanning 214 datasets, used by Iluno
et al. (2021). The independent variable is a measure of COVID-19, termed COVID-
19 Mortality per 1 million persons in the population (CMP), while the dependent
0 10 20 30 40 50
TRG Score per Author
13
Journal of the Indian Society for Probability and Statistics
Table 6 Generalised linear model parameter estimates for TRG Score model
Distribution 𝛽̂1 SE𝛽̂1 P-Value 𝛽̂2 SE𝛽̂2 P-Value
Table 7 Generalised linear model goodness-of-fit criteria for TRG Score model
Distribution −LogL AIC D A 𝜔 𝜒2
variable is the GDP per capita per COVID-19 laboratory-confirmed cases (RGDPC).
The CMP is a proxy to measure COVID-19 mortality, while RGDPC is a proxy to
measure the economic wellbeing of a country.
Figure 4 shows that the RGDPC data has a positive skewness of 2.317554 and
kurtosis of 7.896267. This data is highly skewed and very peaked (leptokurtic).
The RGDPC can be predicted by the CMP. If COVID-19 Mortality per Population is
high, it can affect the GDP per Capita of a country negatively. Thus, the dependent
variable is RGDPC and the independent variable is the CMP. Four competing dis-
tributions are used to fit the GLM. The performance of the three competing models
are presented in Table 8 to show the performance of the models when fitted to the
RGDPC data (Table 9).
Table 8 shows the model parameters estimated, their standard errors and their
corresponding P-values.
Table 9 shows that the NPLD regression modeloutperforms the other regression
models using all the selection criteria.
4 Conclusions
This study developed a novel NPLD model, using the T-Power{logistic} family of
distributions. The cdf, pdf, survival function, hazard rate, cumulative hazard func-
tion, reverse hazard function, useful transformation, quantile functions, mode, robust
skewness, robust kurtosis, series expansion and moment are derived. The maximum
likelihood estimation of the parameters of the distribution were derived and that of
its generalized regression model. The NPLD regression model was applied to three
real-life data namely, Estimated Spill Volume (ESV) of crude oil in Niger Delta
13
Journal of the Indian Society for Probability and Statistics
120
100
80
Frequency
60
40
20
0
area of Nigeria, Total Research Gate (TRG) score of some selected researchers in
research gate and GDP per Capita per COVID-19 cases (RGDPC; and the results
of its performance was compared favourably with normal and Gamma regression
models.
The goodness of fit statistics showed that the NPLD regression model outper-
forms the other regression models using all the selection criteria. Also, the good-
ness of fit statistics also show that the NPLD regression model outperforms the
other regression models using all the criteria for the TRG score model as well as
the RGDPC model. Hence, NPLD regression model can be used effectively to ana-
lyze and model the crude oil spill volume data, TRG score data, RGDPC and other
related data when normal is not good fit.
This research therefore recommends that
• NPLD model should be used to estimate spill volume of crude oil, and total
research gate score.
13
Journal of the Indian Society for Probability and Statistics
Acknowledgements I am very grateful to my PhD supervisors Prof. Muminu Adamu, who is also the
pioneer HOD, Department of Statistics, University of Lagos and Dr. Eno Akarawak for their supervi-
sory role during my Ph.D. work. I appreciate Prof. Felix Famoye of Central Michigan University, United
States for mentoring me during the beginning of this work in the University of Lagos and for the materi-
als he gave me. I am also grateful to the reviewers for their appropriate and constructive suggestions to
improve this work.
Declaration
Conflict of interest The authors declare that they have no conflict of interest.
References
Akagbue HI, Adamu MO, Anake TA (2017) Solutions of chi-square quantile differential equation. In:
Proceedings of the world congress on engineering and computer science, San Francisco, USA
Akarawak EEE, Adeleke IA, Okafor RO (2013) The Weibull–Rayleigh distribution and its properties. J
Eng Res 18(1):61–72
Akarawak EEE, Adeleke IA, Okafor RO (2017) The Gamma–Rayleigh distribution and applications to
survival data. Nigerian J Basic Appl Sci 25(2):130–142
Aljarrah MA, Lee C, Famoye F (2014) On generating T-X family of distributions using quantile func-
tions. J Stat Distrib Appl 1(2):1–17. https://doi.org/10.1890/13-1452.1
Almagambetova A, Zakiyeva N, Alzaatreh A, Pya N (2016) On logistic-normal distribution. Department
of Mathematics, Nazarbayev University, Kazakhstan
Alzaatreh A, Famoye F, Lee C (2014) The gamma-normal distribution: properties and application. Com-
put Stat Data Anal 69:67–80
13
Journal of the Indian Society for Probability and Statistics
Alzaatreh A, Lee C, Famoye F, Ghosh I (2016) The generalized Cauchy family of distributions with
applications. J Stat Distrib Appl, vol 3, No. 12. https://doi.org/10.1186/s40488-016-0050-3
Amalare AA, Ogunsanya AS, Ekum MI, Owolabi TO (2020) Lomax–CauchyUniform distribution: prop-
erties and application to exceedances of flood peaks of Wheaton River. Benin J Stat 3:128–141
Arowolo OT, Nurudeen TS, Akinyemi JA, Ogunsanya AS, Ekum MI (2019) Reduced beta skewed
Laplace distribution with application to failure-time of electrical component data. Ann Stat Theory
Appl (ASTA) 1:31–41
Arshad MA, Iqbal MZ, Ahmadm M (2020) Exponentiated power function distribution: properties and
applications. J Stat Theory Appl 19(2):297–313
Chen G, Balakrishnan N (1995) A general purpose approximate goodness-of-fit test. J Qual Technol
27:154–161
Cordeiro GM, de Castro M (2011) A new family of generalized distributions. J Stat Comput Simul
81:883–898
Dallas AC (1976) Characterization of Pareto and power function distribution. Ann Math Stat 28:491–497
Deinkuro NS, Knapp CW, Raimi MO, Nanalok NH (2021) Oil Spills in the Niger Delta Region, Nige-
ria: Environmental Fate of Toxic Volatile Organics. Res Square. https://doi.org/10.21203/rs.3.rs-
654453/v1
Ekum MI, Adamu MO, Akarawak EEE (2020a) T-Dagum: A way of generalizing dagum distribution
using Lomax quantile function. J Prob Stat. https://doi.org/10.1155/2020/1641207
Ekum MI, Adeleke IA, Akarawak EEE (2020b) Lambda upper bound distribution: some properties and
applications. Benin J Stat 3:12–40
Ekum MI, Adamu MO, Akarawak EEE (2021) A class of power function distributions: its properties and
applications. Unilag J Math Appl 1(1):35–59
Eugene N, Lee C (2002) Famoye F Beta-normal distribution and its applications. Commun Stat Theory
Methods 31:497–512
Famoye F, Akarawak E, Ekum M (2018) Weibull-normal distribution and its applications. J Stat Theory
Appl 17(4):719–727. https://doi.org/10.2991/jsta.2018.17.4.12
Gupta RC, Gupta PL, Gupta RD (1998) Modeling failure time data by Lehmann alternatives. Commun
Stat Theory Methods 27:887–904
Iluno C, Taylor J, Akinmoladun O, Ekum Aderele OM (2021) Modelling the effect of Covid-19 mortality
on the economy of Nigeria. Res Glob 3(2021):100050
Jordan K (2015) Exploring the ResearchGate score as an academic metric: reflections and implications
for practice. In: Quantifying and Analysing Scholarly Communication on the Web (ASCW-15),
Oxford
Kundu D (2017). Multivariate geometric skew-normal distribution. Stat J Theor Appl Stat 51(6)
Meniconi M, Barry DM (1996) The power function distribution: a useful and simple distribution to
assess electrical component reliability. Micreoelectronics Reliab 36:1207–1212
O’Brien K (2019) ResearchGate. J Med Library Assoc JMLA 107(2):284–285. https://doi.org/10.5195/
jmla.2019.643
Ogunsanya AS, Sanni OO, Yahya WB (2019) Exploring some properties of odd Lomax-exponential dis-
tribution. Ann Stat Theory Appl(ASTA) 1:21–30
Ogunsanya AS, Akarawak EEE, Ekum MI (2021) On some properties of Rayleigh–Cauchy distribution. J
Stat Manage Syst. https://doi.org/10.1080/09720510.2020.1822499
Okorie IE, Akpanta AC, Ohakwe J, Chikezie DC (2017) The modified Power function distribution.
Cogent Math 4:1319592. https://doi.org/10.1080/23311835.2017.1319592
Pak A, Adegboye OA, Adekunle AI, Rahman KM, McBryde ES, Eisen DP (2020) Economic conse-
quences of the COVID-19 outbreak: the need for epidemic preparedness. Front Public Health 8.
https://doi.org/10.3389/fpubh.2020.00241
Tahir MH, Alizadeh M, Mansoor M, Cordeiro GM, Zubair M (2016) The Weibull-power function distri-
bution with applications. Hacettepe J Math Stat 45(1):245–265
Whanda S, Adekola O, Adamu B, Yahaya S, Pandey P (2016) Geo-spatial analysis of oil spill distribution
and susceptibility in the Niger Delta Region of Nigeria. J Geogr Inf Syst 8:438–456. https://doi.org/
10.4236/jgis.2016.84037
Zaka A, Akhter AS (2013) Methods for estimating the parameters of power function distribution. Pak J
Stat Oper Res 9:213–224
Zografos K, Balakrishnan N (2009) On families of beta and generalized gamma-generated distributions
and associated inference. Stat Methodol 6:344–362
13
Journal of the Indian Society for Probability and Statistics
Zubair M, Alzaatreh A, Cordeiro GM, Tahir MH, Mansoor M (2018) On generalized classes of exponen-
tial distribution using T-X family framework. Filomat 32(4):1259–1272
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and
applicable law.
13