0% found this document useful (0 votes)
21 views32 pages

Normal-Power-Logistic Distribution Properties and

Uploaded by

yuvandamu7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views32 pages

Normal-Power-Logistic Distribution Properties and

Uploaded by

yuvandamu7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Journal of the Indian Society for Probability and Statistics

https://doi.org/10.1007/s41096-022-00143-4

RESEARCH ARTICLE

Normal‑Power‑Logistic Distribution: Properties


and Application in Generalized Linear Model

Matthew I. Ekum1 · Muminu O. Adamu2 · Eno E. E. Akarawak2

Accepted: 21 October 2022


© The Indian Society for Probability and Statistics (ISPS) 2022

Abstract
The applications of Normal distribution in literature are verse, the new modi-
fied univariate normal power distribution is a new distribution which is adequate
for modelling bimodal data. There are many data that would have been modelled
by normal distribution, but because of their bimodality, they are not, since normal
distribution is unimodal. In this paper, a new extension of the normal linear model
called the normal-Power generalized linear model, derived from the T-Power{Logis-
tic} framework is presented. The statistical properties of the distribution and the pro-
posed model were derived such as quantiles, median, mode, robust skewness, robust
kurtosis and moment. The maximum likelihood estimation method was considered
to obtain the unknown model parameters. Three real data sets were analyzed to dem-
onstrate the flexibility and usefulness of the proposed model. The new model would
be very useful as alternative in cases where skewed or bimodal response variables,
which are not well fitted with normal linear model.

Keywords Bimodal · Exponential family · Generalized linear model · Normal-


power{logistic} · Quantile function

1 Introduction

In probability and statistics, the power function and normal distributions are very
useful in their individual applications. Not many authors have thought it to combine
these two distributions. The normal distribution does not have a shape parameter,

Copyright ©2021 by authors, all rights reserved. Authors agree that this article remains permanently
open access under the terms of the Creative Commons Attribution License 4.0 International License.

* Matthew I. Ekum
matekum@yahoo.com
1
Department of Mathematical Sciences, College of Basic Sciences, Lagos State University
of Science and Technology, Ikorodu, Lagos, Nigeria
2
Department of Statistics, Faculty of Science, University of Lagos, Akoka, Lagos, Nigeria

13
Vol.:(0123456789)
Journal of the Indian Society for Probability and Statistics

but power function has; while power function does not have a location parameter but
normal has. Both are flexible, so combining them will produce a more flexible dis-
tribution. The power function distribution is the inverse of Pareto distribution (Dal-
las 1976). The power function distribution is a special model that can be formed or
related to the uniform, Weibull, Kumaraswamy distributions. The power function
distribution is considered one of the simplest and handy lifetime distributions.
Meniconi and Barry (1996) proposed the two-parameter power function distribu-
tion as a simple alternative to the exponential distribution when it comes to model-
ling failure data related to mortality rate and component failures. It is a special case
of the beta distribution and one may sight the importance of the distribution in statis-
tical tests such as the likelihood ratio test. The normal distribution on the other hand
has been combined with other distribution to form a more flexible distribution, such
as exponentiated-Normal (Gupta et al. 1998), Beta-Normal distribution (Eugene and
Lee 2002), Gamma-Normal (GN) distribution (Zografos and Balakrishnan 2009),
Kumaraswamy-Normal distribution (Cordeiro and de Castro 2011). Estimation of
the power function parameters has been done by various authors, such as Zaka and
Akhter (2013) .
Many classical distributions have been extensively used for modelling real data
in many areas. However, in many situations; there is a clear need for extended forms
of these distributions to improve the flexibility and goodness of fit of these distribu-
tions. For that reason, families of continuous distributions are developed by intro-
ducing one or more additional shape parameter(s) to the baseline distribution or by
combining two or more distributions to produce new ones. Akarawak et al. (2013)
described such new distributions as convoluted distributions. Some authors in recent
years have developed frameworks used in combining these distributions to form new
ones. A good example is the T-R{Y} framework (Aljarrah et al. 2014). Since then,
a lot of authors have been using it to develop flexible life time distributions that are
hazard weighted functions of the baseline distributions. Weibull-Normal distribution
(Alzaatreh et al. 2014) was one of the first normal distribution combined with other
distribution using the T-R{Y} framework. The Weibull power function distribution
(Tahir et al. 2016) has a combination of power function and weibull distribution,
using weibull distribution as a baseline distribution.
Among the authors that used the T-R{Y} framework in 2016 includes (Alzaatreh
et al. 2016) and Almagambetova et al. (2016). Okorie et al. (2017) proposed the
modified power function distribution. Famoye et al. (2018) developed the Weibull-
Normal{log-logistics} distribution with the normal distribution as the baseline.
Zubair et al. (2018) also used the framework to develop a new convoluted distri-
bution. Other convoluted distributions developed using the framework include the
reduced beta skewed laplace distribution (Arowolo et al. 2019); Odd Lomax-expo-
nential{log-logistic} distribution (Ogunsanya et al. 2019); exponentiated-exponen-
tial-Dagum{Lomax} distribution (Ekum et al. 2020a); Lomax-Cauchy{uniform}
(Amalare et al. 2020); and Rayleigh-Cauchy{uniform} (Ogunsanya et al. 2021).
The simplicity and usefulness of the power function distribution compelled
the researchers to explore its further extensions, generalizations, and applications
in different areas of science (Arshad et al. 2020; Ekum et al. 2020b). Recently,
Gamma-Power{log-logistic} distribution was proposed by Ekum et al. (2021) and

13
Journal of the Indian Society for Probability and Statistics

demonstrated its usefulness in modelling skewed data. None of these study have
combined normal and power function distribution, especially making power func-
tion distribution a baseline, except the normal-power{logistic} distribution (NPLD)
proposed in the work of Ekum et al. (2021). More so, many properties of the NPLD
has not been defined and studied, and it has not been developed into a generalized
linear model for predicting relationship in regression applications.
Predicting oil spillage is of a major interest to researchers in the field of Geo-
science and geological statistics. In Nigeria, oil spillage is a major problem that have
devastated the ecosystem and biodiversity of the Niger Delta region in Nigeria. The
quantity of oil spilled may be estimated using the estimated spilled volume. The
estimated spill volume of crude oil may be determined by the duration of clean-up
(Whanda et al. 2016; Deinkuro et al. 2021). Also, researchers may want to know if
they can predict their researchgate score using their citations and research items.
These are emerging issues of interest to researchers, especially the ones in academ-
ics (Jordan (2015); O’Brien (2019)). More so, the COVID-19 mortality rate per pop-
ulation and the linear effect on the economic wellbeing of Nigerians is also worth to
study. This is because, the GDP per capita can be affected by COVID-19 mortality.
The COVID-19 factor is also an extra burden to the wellbeing of the people (Pak
et al. (2020); Iluno et al. (2021)).
In literature, there are some modifications of the normal distribution, which pro-
duced multimodality (Kundu 2017), which has multiple modes with less number of
parameters. The modification of the normal distribution developed by Kundu (2017)
is a bivariate family of distributions, why the one developed here is a univariate
family. More so, Kundu (2017) did not extend their distribution to generalized lin-
ear model. The motivation of this work is based on the modelling of independent
variables in regression modelling that have bimodal features. Other authors such
as Famoye et al. (2018), Kundu (2017), etc, had developed distributions that are
bimodal but none has extended it to regression modelling. More so, real life prob-
lems like the crude oil spill volume, number of citations in research gate, GDP per
Capita, etc are real variables which maximum values can be estimated, so they are
bounded below by zero (non negative) and above by a real value, rather than infinity.
Thus, a distribution with bounded support is necessary [0, 𝜆], where 𝜆 > 0 is a real
upper bound (Ekum et al. 2020b).
Thus, in this study, the aim is to adopt a novel univariate continuous probabil-
ity distribution called the normal-power-logistics distribution NPLD, which was
derived from the T-Power{logistic} family proposed and studied by Ekum et al.
(2021) and extends it into generalized linear model in order to solve real regres-
sion problems, where the dependent variables are bimodal and skewed with a known
maximum value. The model has four parameters, two from the normal distribu-
tion and the other two from the power function distribution, which one of it is a
shape parameter and the other is an upper bound parameter to control the extremes
of the distribution. The scope covers different characterizations, properties, regres-
sion model, and parameter estimation of the NPLD model. The method of Maxi-
mum Likelihood Estimation (MLE) was used to estimate the model parameters. The
importance of the new model was proved empirically using three real-life datasets.
The proposed model would be very useful in engineering, medicine, and all fileds of

13
Journal of the Indian Society for Probability and Statistics

life, where the dependent variable of interest to be predicted has bimodal features. It
is expected to perform well when normal distribution fails to fit the data of interest.

2 Materials and Methods

In this section, the theory and application of the proposed scheme are considered.

2.1 The Method of Generating the T‑R{Y} Family of Distributions

The method of generating T-R{Y} family of distributions is considered. The T-R{Y}


is a general approach for defining the W[F(x)] (a non-decreasing differentiable func-
tion) using the quantile function of a random variable Y in the T-X framework. Let T,
R and Y be three random variables with cdf FT (x) = P(T ≤ x), FR (x) = P(R ≤ x) and
FY (x) = P(Y ≤ x) respectively, with corresponding pdf, fT (x), fR (x) and fY (x). Also,
QT (x), QR (x) and QY (x) are their corresponding quantile functions. It is assumed that
T is supported on the interval (a, b) and Y is supported on the interval (c, d) such
that b > a and d > c are real numbers.

2.2 Important Operational Definition of Terms

The following definitions will be very useful in characterising the proposed model.

Definition 1 The cdf of T- power{logistic} family of distributions proposed by Ekum


et al. (2021) is given by
[ ( )]
xk
FX (x) = FT ln k ;x < 𝜆 (1)
𝜆 − xk

Definition 2 : The pdf of the T- power{logistic} family is derived by taking the first
derivative of FX (x) with respect to x and it is given by
[ ( )]
;0 ≤ x < 𝜆
k𝜆k xk
fX (x) = ( ) fT ln k (2)
x 𝜆k − xk 𝜆 − xk

Other definitions follow from Definitions (1) and (2)

Definition 3 : The survival function of the distribution from T- power{logistic} fam-


ily is given by
SX (x) = 1 − FX (x) (3)
Definition 4 : The hazard function of the distribution from T- power{logistic} family
is given by

13
Journal of the Indian Society for Probability and Statistics

fX (x)
hX (x) = (4)
1 − FX (x)

Definition 5 : The cumulative hazard function of the distribution from T- power{


logistic} family is given by
HX (x) = −log[1 − FX (x)] (5)
Definition 6 : The reverse hazard function of the distribution from T- power{logis-
tic} family is given by
fX (x)
𝜏X (x) = (6)
FX (x)

Definition 7 : The quantile function of T- power{logistic} family is the inverse func-


tion of its cdf and it is given by
QX (p) = QR {FY [QT (p)]} (7)
where QT (p) = FT−1 (p). The quantile function is used in Monte Carlo method to
simulate random variates of a distribution, and it is used to determine measures of
partition. Several ways of quantile approximation when it is not in closed form are
available in literature, of which quantile mechanics is one of such approach (Akag-
bue et al. 2017).

Definition 8 : The T- power{logistic} family of distributions is derived from T-R{


Y} family proposed by Aljarrah et al. (2014) and Alzaatreh et al. (2014). The rela-
tionship among T, R, and Y are given thus: (i) X = QR [FY (T)] in distribution, (ii)
QX (p) = QR {FY [QT (p)]}, (iii) if T = Y in distribution, then X = R in distribution,
and (iv) if Y = R in distribution, then X = T in distribution.

Definition 9 : Let R be a non-negative random variable with pdf fR (x), and let E(Rk )
denote the kth moment of R, then

E(X k ) ≤ E(Rk ).E{[1 − FY (T)]}

where E(X k ) is the kth moment of the random variable, X; [1 − FY (.)] is the survival
function of the random variable Y, and T is the quantile values random variable T
with respect to fT (x).

2.3 Normal‑Power function {logistic} Model

The proposed model is a generalized linear model that takes the form
g(𝜇i ) = 𝛽0 + 𝛽1 x1i + 𝛽2 x2i + ... + 𝛽p xpi

13
Journal of the Indian Society for Probability and Statistics

where g(𝜇i ) is the link function, and the right hand side is the linear predictor. Six
goodness-of-fit criteria are used to compare the flexibility of the proposed model
with other known models. The goodness-of-fit criteria are log-likelihood (LogL),
Akaike Information Criterion (AIC), Kolmogorov-Smirnov statistic (D), Anderson-
Darling statistic (A), Cramer-von Mises statistic (𝜔) and Chi-square statistic (𝜒 2).
See (Chen and Balakrishnan 1995) for detailed information on A and 𝜔. The lower
the value of the criteria, the better the performance of the model. Also, to show the
relationship between the observed dependent variable y and the predicted dependent
variable ŷ , the coefficient of correlation is used. This shows the model that performs
well if the correlation coefficient is high. It is assumed that the dependent variable y
has a normal-power distribution.

2.3.1 Cumulative Distribution and Probability Density Functions of NPLD

Recall the cdf of T- power{logistic} defined by Ekum et al. (2021) given in Definition
(1) as
[ ( )]
xk
FX (x) = FT ln k
𝜆 − xk

where FT [t] is the cdf random variable T. So, T can follow any known distribution.
If T follows a normal distribution with parameters 𝜇 and 𝜎, then the pdf of T is given
by
� � ��
;−∞≤t ≤∞
1 1 t−𝜇 2
fT (t) = √ exp −
2𝜋𝜎 2 2 𝜎

and the cdf of T is given by


� � ��
t−𝜇 �t − 𝜇�
1
FT (t) = 1 + erf √ =Φ
2 𝜎 2 𝜎

Therefore
( )
xk
t = ln k
𝜆 − xk

So, put the value of t into FT (t) to have


� �
⎡ xk ⎤
⎢ ln 𝜆k −xk
− 𝜇⎥
FX (x) = Φ⎢ ⎥
⎢ 𝜎 ⎥
⎣ ⎦

So, put the value of t into FT (t) to have

13
Journal of the Indian Society for Probability and Statistics

� k �
⎧ ⎡ x ⎤⎫
ln − 𝜇 ⎥⎪
1⎪ ⎢ 𝜆k −xk
FX (x) = ⎨1 + erf ⎢ √ ⎥⎬;
2⎪ ⎢ 𝜎 2 ⎥⎪ (8)
⎩ ⎣ ⎦⎭
𝜇, 𝜎, k, 𝜆 > 0;0 < x < 𝜆
where error function, erf(.) is given by
x


2 2
erf (x) = √ e−t dt
𝜋 0

Equation (8) is the cdf of Normal-Power function {logistic} distribution (NPLD)


The corresponding pdf of NPLD is given by taking the first derivative of FX (x)
with respect to x and it is given by
� � � � �2 �
k𝜆k 1 xk
fX (x) = √ exp − 2 ln k −𝜇 ;
x(𝜆k − xk ) 2𝜋𝜎 2 2𝜎 𝜆 − xk (9)
𝜇, 𝜎, k, 𝜆 > 0;0 < x < 𝜆
where 𝜇 is a location parameter, k is a shape parameter, 𝜎 is a scale parameter, and 𝜆
doubles as a scale and upper bound parameter. A random variable X follows a NPLD
if it can be defined as X ∼ NPLD(𝜇, 𝜎, k, 𝜆).
Figure 1 is the pdf plot of NPLD, which shows that NPLD can be bimodal for
some parameters values, skewed and kurtosis .

Density of NPLD
sigma=4.5, mu=0, k=0.6, lambda=10
sigma=4.5, mu=0, k=1.6, lambda=10
sigma=4.5, mu=0, k=1.0, lambda=10
sigma=4.5, mu=0, k=4.6, lambda=10
0.15
0.10
f(x)

0.05
0.00

0 5 10
x

Fig. 1  Probability Density Function with different parameters values showing bimodal features

13
Journal of the Indian Society for Probability and Statistics

2.3.2 Useful Transformation

( )
Xk
Lemma 2.1 If X ∼ NPLD(𝜇, 𝜎, k, 𝜆), a random variable W = ln 𝜆k −X k
follows a
normal distribution with parameters 𝜇 and 𝜎 , then the pdf of W is given by
� � ��
1 1 w−𝜇 2
f (w) = √ exp −
2𝜋𝜎 2 2 𝜎

Proof Recall the pdf of NPLD in (9)


We want to show that random variable W follows a normal distribution with
parameters 𝜇 and 𝜎.
𝜆 𝜆
k𝜆k
∫0 ∫0 x(𝜆k − xk ) 2𝜋𝜎 2
fX (x)dx = √
� � � � �2 � (10)
1 xk
×exp − 2 ln k −𝜇 dx.
2𝜎 𝜆 − xk

By change of variable, let


( )
xk
w = ln k . (11)
𝜆 − xk

Differentiating w with respect to x, and making dx the subject of the equation gives

x(𝜆k − xk )
dx = dw. (12)
k𝜆k
Now, changing the support from x to that of w, we have
0≤x≤𝜆 ⇒ −∞ ≤ w ≤ ∞ (13)
It follows from inverse transformation and we have
� � � �
, −∞ ≤ w ≤ ∞
1 1 w−𝜇 2
fW (w) = √ exp − (14)
2𝜋𝜎 2 2 𝜎

is the pdf of normal distribution with parameters 𝜇 and 𝜎 . Equation (14) completes
the proof.  ◻

From Lemma 2.1, it shows that the pdf of NPLD with parameters (𝛼, 𝛽, k, 𝜆) is a
proper pdf. No further proof is needed.

2.3.3 Survival and Related Functions of NPLD

The survival function of NPLD is given by

13
Journal of the Indian Society for Probability and Statistics

� �
⎧ ⎡ xk ⎤⎫
⎢ ln − 𝜇 ⎥⎪
1⎪ 𝜆k −xk
SX (x) = ⎨1 − erf ⎢ √ ⎥⎬, (15)
2⎪ ⎢ 𝜎 2 ⎥⎪
⎩ ⎣ ⎦⎭

The hazard function of NPLD is given by


� � � k � �2 �
2k𝜆k exp − 2𝜎1 2 ln 𝜆kx−xk − 𝜇
hX (x) = � � � k � �� , (16)
√ ln kx k −𝜇
𝜆 −x
x(𝜆2 − x2 ) 2𝜋𝜎 2 1 − erf √
𝜎 2

The cumulative hazard function of NPLD is given by


� �
⎧ ⎡ xk ⎤⎫
⎪ 1 1 ⎢ ln − 𝜇 ⎥⎪
𝜆k −xk
HX (x) = −ln⎨ − erf ⎢ √ ⎥⎬ (17)
⎪2 2 ⎢ 𝜎 2 ⎥⎪
⎩ ⎣ ⎦⎭

The reverse hazard function of NPLD is given by


� � � k � �2 �
2k𝜆k exp − 2𝜎1 2 ln 𝜆kx−xk − 𝜇
𝜏X (x) = � � � k � �� . (18)
√ ln kx k −𝜇
𝜆 −x
x(𝜆2 − x2 ) 2𝜋𝜎 2 1 + erf √
𝜎 2

2.3.4 Quantile Function and Measures of Partition of NPLD

2.3.5 Quantile Function

Theorem 2.2 Let X be a random variable that follows NPLD with cdf FX (x), then the
inverse function of the cdf, which is the quantile function exist, and it is given by
{ }1∕k
e[𝜇+𝜎Φ (p)]
−1

QX (p) = 𝜆
1 + e[𝜇+𝜎Φ (p)]
−1

Proof Recall the cdf of NPLD given by


� �
⎡ xk ⎤
⎢ ln 𝜆k −xk
− 𝜇⎥
FX (x) = Φ⎢ ⎥.
⎢ 𝜎 ⎥
⎣ ⎦

13
Journal of the Indian Society for Probability and Statistics

Since FX (x) is a probability value, letting p = FX (x) gives


� �
⎡ xk ⎤
⎢ ln 𝜆k −xk
− 𝜇⎥
p = Φ⎢ ⎥. (19)
⎢ 𝜎 ⎥
⎣ ⎦

Solving for x gives


( )1∕k
𝜆k e[𝜇+𝜎Φ (p)]
−1

x= { } (20)
1 + e[𝜇+𝜎Φ (p)]
−1

Equation (20) is the inverse function of the cdf of X, and it can be written as
{ }1∕k
e[𝜇+𝜎Φ (p)]
−1

QX (p) = 𝜆 (21)
1 + e[𝜇+𝜎Φ (p)]
−1

where QX (p) is the quantile function of NPLD; Φ−1 (p) is the inverse function of the
cdf of standard normal distribution, and p is a probability value uniformly gener-
ated, that is, P ∼ U(0, 1). Thus, Equation (21) completes the proof.  ◻

2.3.6 Measures of Partition

The quantile function can be used to derive all the measures of partition, such as,
median, quartile, octile, decile and percentile.
The median of NPLD is
{ }1∕k
e[𝜇+𝜎Φ (0.5)]
−1

QX (p) = 𝜆 (22)
1 + e[𝜇+𝜎Φ (0.5)]
−1

But Φ−1 (0.5) = 0, so that


{ 𝜇 }1∕k
e
QX (p) = 𝜆 (23)
1 + e𝜇
The 1st quartile of NPLD, which is the same as the 25th percentile is given by
{ }1∕k
e[𝜇+𝜎Φ (0.25)]
−1

QX (0.25) = 𝜆 (24)
1 + e[𝜇+𝜎Φ (0.25)]
−1

But Φ−1 (0.25) = −0.68, so that


{ [𝜇−0.68𝜎] }1∕k
e
QX (0.25) = 𝜆 (25)
1 + e[𝜇−0.68𝜎]

The 3rd quartile of NPLD, which is the same as the 75th percentile is given by

13
Journal of the Indian Society for Probability and Statistics

{ }1∕k
e[𝜇+𝜎Φ (0.75)]
−1

QX (0.75) = 𝜆 (26)
1 + e[𝜇+𝜎Φ (0.75)]
−1

But Φ−1 (0.75) = 0.68, so that


{ [𝜇+0.68𝜎] }1∕k
e
QX (0.75) = 𝜆 (27)
1 + e[𝜇+0.68𝜎]

2.3.7 Skewness and Kurtosis of NPLD

2.3.8 Robust Measure of Skewness

By definition, the robust coefficient of skewness based on quantiles proposed by


Bowley (1920) is given by
Q3 + Q1 − 2Q2
Sk = (28)
Q3 − Q1

where Qi is the ith quartile.

Theorem 2.3 Let X be a random variable that follows NPLD with quantile function
QX (p), then the skewness is robust, because it is a resistance measure, which is not
affected by extreme value, 𝜆.

Proof Recall the median, 1st quartiles (Q1) and 3rd quartile (Q3) of NPLD given by
{ 𝜇 }1∕k
e
Q2 = 𝜆 ,
1 + e𝜇
{ [𝜇−0.68𝜎] }1∕k
e
Q1 = 𝜆
1 + e[𝜇−0.68𝜎]

and
{ [𝜇+0.68𝜎] }1∕k
e
Q3 = 𝜆
1 + e[𝜇+0.68𝜎]

respectively.
Substituting the values of Q2, Q1 and Q3 into (28) gives
{ }1∕k { [𝜇−0.68𝜎] }1∕k { 𝜇 }1∕k
e[𝜇+0.68𝜎] e e
𝜆 1+e[𝜇+0.68𝜎]
+ 𝜆 1+e [𝜇−0.68𝜎]
− 2𝜆 1+e 𝜇
Sk = { [𝜇+0.68𝜎] }1∕k { [𝜇−0.68𝜎] }1∕k (29)
e e
𝜆 1+e[𝜇+0.68𝜎]
− 𝜆 1+e [𝜇−0.68𝜎]

Factorising out 𝜆 gives

13
Journal of the Indian Society for Probability and Statistics

{ }1∕k { }1∕k { 𝜇 }1∕k


e[𝜇+0.68𝜎] e[𝜇−0.68𝜎] e
1+e[𝜇+0.68𝜎]
+ 1+e[𝜇−0.68𝜎]
− 2 1+e 𝜇
Sk = { }1∕k { [𝜇−0.68𝜎] }1∕k (30)
e[𝜇+0.68𝜎] e
1+e[𝜇+0.68𝜎]
− 1+e[𝜇−0.68𝜎]

Equation (30) completes the proof, showing that it is free of 𝜆.  ◻

2.3.9 Robust Measure of Kurtosis

Be definition, the robust coefficient of kurtosis based on quantiles proposed by


Moors (1988) is given by
(E7 − E5 ) + (E3 − E1 )
Ku = (31)
E6 − E2

where Ei is the ith octile.

Theorem 2.4 Let X be a random variable that follows NPLD with quantile function
QX (p), then the kurtosis is robust, because it is a resistance measure, which is not
affected by extreme value, 𝜆.

Proof Recall the quantile function of NPLD given by


{ }1∕k
e[𝜇+𝜎Φ (p)]
−1

QX (p) = 𝜆 .
1 + e[𝜇+𝜎Φ (p)]
−1

The 1st octile ( E1) is derived thus


{ }1∕k
e[𝜇+𝜎Φ (1∕8)]
−1

E1 = QX (1∕8) = 𝜆
1 + e[𝜇+𝜎Φ (1∕8)]
−1

{ [𝜇−1.15𝜎] }1∕k (32)


e
=𝜆
1 + e[𝜇−1.15𝜎]

The 2nd octile ( E2) is derived thus


{ }1∕k
e[𝜇+𝜎Φ (2∕8)]
−1

E2 = QX (2∕8) = 𝜆 = Q1 . (33)
1 + e[𝜇+𝜎Φ (2∕8)]
−1

The 3rd octile ( E3) is derived thus


{ }1∕k
e[𝜇+𝜎Φ (3∕8)]
−1

E3 = QX (3∕8) = 𝜆
1 + e[𝜇+𝜎Φ (3∕8)]
−1

{ [𝜇−0.32𝜎] }1∕k (34)


e
=𝜆 .
1 + e[𝜇−0.32𝜎]

13
Journal of the Indian Society for Probability and Statistics

The 5th octile ( E5) is derived thus


{ }1∕k
e[𝜇+𝜎Φ (5∕8)]
−1

E5 = QX (5∕8) = 𝜆
1 + e[𝜇+𝜎Φ (5∕8)]
−1

{ [𝜇+0.32𝜎] }1∕k (35)


e
=𝜆 .
1 + e[𝜇+0.32𝜎]

The 6th octile ( E6) is derived thus


{ }1∕k
e[𝜇+𝜎Φ (6∕8)]
−1

E6 = QX (6∕8) = 𝜆 = Q3 . (36)
1 + e[𝜇+𝜎Φ (5∕8)]
−1

The 7th octile ( E7) is derived thus


{ }1∕k
e[𝜇+𝜎Φ (7∕8)]
−1

E7 = QX (7∕8) = 𝜆
1 + e[𝜇+𝜎Φ (5∕8)]
−1

{ [𝜇+1.15𝜎] }1∕k (37)


e
=𝜆 .
1 + e[𝜇+1.15𝜎]

Substituting the values of E1, E2, E3, Q5, E7 and E7 into (31) gives
𝜆A − 𝜆B + 𝜆C − 𝜆D
Ku = { }1∕k { [𝜇−0.68𝜎] }1∕k .
e[𝜇+0.68𝜎] e (38)
𝜆 1+e[𝜇+0.68𝜎] − 𝜆 1+e[𝜇−0.68𝜎]

{ [𝜇+1.15𝜎] }1∕k { }1∕k { }1∕k


e[𝜇+0.32𝜎] e[𝜇−0.32𝜎]
where A = 1+e e
; B= ; C= ;
{ [𝜇−1.15𝜎] }1∕k
[𝜇+1.15𝜎] 1+e[𝜇+0.32𝜎] 1+e[𝜇−0.32𝜎]
e
D = 1+e [𝜇−1.15𝜎]

Factorising out 𝜆 gives


A−B+C−D
Ku = { }1∕k { [𝜇−0.68𝜎] }1∕k .
e[𝜇+0.68𝜎] e (39)
1+e[𝜇+0.68𝜎]
− 1+e[𝜇−0.68𝜎]

Equation (39) completes the proof, showing that it is free of 𝜆.  ◻

2.3.10 Mode of NPLD

Theorem 2.5 Let X be a random variable that follows NPLD with pdf fX (x), a dif-
ferentiable function, then the mode is not unique and possibly bimodal for some
parameter values.

Proof Recall the pdf of NPLD given by

13
Journal of the Indian Society for Probability and Statistics

� � � � �2 �
k𝜆k 1 xk
fX (x) = √ exp − 2 ln k −𝜇
x(𝜆k − xk ) 2𝜋𝜎 2 2𝜎 𝜆 − xk

The mode can be derived by differentiating the pdf, equate to zero, and solve for x.
Using product rule
dfX (x) dv du
=u +v (40)
dx dx dx
Let

k𝜆k
u= (41)
x(𝜆k − xk )
and
� � � � �2 �
1 1 xk
v= √ exp − 2 ln k −𝜇 (42)
2𝜋𝜎 2 2𝜎 𝜆 − xk

Differentiating u with respect to x gives

du (k + 1)xk − kxk
= (43)
dx x(𝜆k − xk )2
Differentiating v with respect to x gives
� k � 2
⎧ ⎡ x ⎤⎫
k ⎪ 1⎢ ln − 𝜇 ⎥⎪
dv k𝜆 k
𝜆 −x k
=− √ exp⎨− ⎢ ⎥⎬
dx k k
x(𝜆 − x ) 2𝜋𝜎 2 2
⎪ ⎢ 𝜎 ⎥⎪
⎩ ⎣ ⎦⎭
� k � (44)
⎡ x ⎤
⎢ ln 𝜆k −xk − 𝜇 ⎥
×⎢ ⎥
⎢ 𝜎2 ⎥
⎣ ⎦

Inserting (41), (42), (43) and (44) into (40) and equating to zero gives
( )
xk
𝜎 2 (k + 1)xk+1 − k𝜆2k 𝜎 2 x − k2 𝜆2k ln k + 𝜇k2 𝜆2k = 0 (45)
𝜆 − xk

The solution to (45) is the mode of NPLD.


Now, assume that 𝜎 = k = 𝜆 = 1 and 𝜇 = 0, (45) becomes
( )
x
2x2 − x − ln = 0; 0 < x < 1 (46)
1−x
It is obvious from (46) that the mode of NPLD is not unique and it is possi-
bly bimodal. The value of the shape parameter determines if it is bimodal or

13
Journal of the Indian Society for Probability and Statistics

multi-modal. If k = 1, it is bimodal, if k = 2, it will have 3 peaks, if k = 3, it will


have 4 peaks. However, some of these peaks might not be visible or obvious graphi-
cally because there can be repeated roots of the polynomial equation. The resulting
equation for the mode is a polynomial of order k+1 as shown in equation (45).  ◻

2.3.11 Series Expansion of NPLD

Theorem 2.6 Let X be a random variable that follows NPLD with parameters
𝜇, 𝜎, k, 𝜆, the pdf of X, fX (x), is a weighted pdf of power function distribution with
parameters k and 𝜆, that is,
fX (x) = ΨfR (x) (47)
where fR (x) is the pdf of power function distribution, and Ψ is the weight.

Proof Recall the pdf of NPLD given in (9). Given the following series expansions


yi
exp(y) = , (48)
i=0
i!

∞ ( )
∑ n n−j j
(y + a)n =
j
a y, (49)
i=0

[ ( ) ]2i ∞ ( )
xk ∑ 2i
ln −𝜇 = (−𝜇)2i−j
k
𝜆 −x k j
j=0
[ ( )]j (50)
x k
× ln k ,
𝜆 − xk



(−1)l−1 (y − 1)l
ln(y) = (51)
l=0
l

[( ) ]l
( ) l−1 xk
xk ∑ (−1)
∞ −1
ln =
𝜆k −xk
, (52)
𝜆k − xk l=0
l

13
Journal of the Indian Society for Probability and Statistics

( )


n
(y + a)n = an−m ym ,
m
m=0
[( ) ]lj ∞ ( ) (53)
x k ∑ lj
−1 = (−1)lj−m
𝜆k − xk m
i=0

[( )]m
xk
× , (54)
𝜆k − xk

∞ ( )
∑ n n−s s
(55)
n
(y + a) = a y,
s
s=0

∞ ( )
∑ −(m + 1)
k −(m+1)
k
(𝜆 − x ) = (𝜆k )−(m+1)−s (−xk )s . (56)
s
s=0

Inserting (48–56) into the pdf of NPLD in (9) gives

1 �

1
fX (x) = √ (−1)3i−2j+2lj−m+s
2
2𝜋𝜎 i=j=l=m=s=0 (m + s)
� �� �� � 2i−j (57)
lj 2i −(m + 1) 𝜇 k(m + s)xk(m+s)−1
× .
m j s 2i lj 𝜆k(m+s) 𝜎 2i i!

Equation (57) is the series expansion form of NPLD pdf.


Now, let
� �
1 �

1 lj
Ψ= √ (−1)3i−2j+2lj−m+s
(m + s) m
2𝜋𝜎 2 i=j=l=m=s=0
� �� � 2i−j
2i −(m + 1) 𝜇
× .
j s 2 lj 𝜎 2i i!
i

k(m + s)xk(m+s)−1
fX (x) = Ψ . (58)
𝜆k(m+s)
If m = s = 0, then
� � 2i−j
1 �

2i 𝜇
Ψ= √ (−1)3i−2j+2lj .
j 2i lj 𝜎 2i i!
2𝜋𝜎 2 i=j=l=0 (59)
kxk−1
fX (x) = Ψ k .
𝜆
where fR (x) is the pdf of power function distribution. Hence, Equation (59) com-
pletes the proof.  ◻

13
Journal of the Indian Society for Probability and Statistics

2.3.12 Moment of NPLD

Let X be a continuous random variable with pdf fX (x), the rth moment is given by

∫ (60)
E(X r ) = fX (x)dx

Recall the series expansion form of NPLD pdf given as

1 �

1
fX (x) = √ (−1)3i−2j+2lj−m+s
2
2𝜋𝜎 i=j=l=m=s=0 (m + s)
� �� �� � 2i−j
lj 2i −(m + 1) 𝜇 k(m + s)xk(m+s)−1
× .
m j s 2i lj 𝜆k(m+s) 𝜎 2i i!

Inserting fX (x) into Equation (60) gives

∫ (61)
E(X r ) = xr fX (x)dx

Note that
∑ ∑
∫ ∫ (62)
=

So that
� �


(−1)3i−2j+2lj−m+s lj
E(X r ) =
2i lj i! m
i=j=l=m=s=0
� �� � � r+k � (63)
2i −(m + 1) 1 𝜇2i−j k 𝜆
×
j s √ k(m+s) 𝜎 2i r + k
.
2𝜋𝜎 2 𝜆

Let
( )( )


(−1)3i−2j+2lj−m+s lj 2i
𝜓i,j,l,m,s =
2i lj i! m j
i=j=l=m=s=0
( ) (64)
−(m + 1)
× .
s

So that
� �
r 𝜇2i−j k 𝜆r+k−k(m+s)
1
E(X ) = 𝜓i,j,l,m,s √ 2i
. (65)
2𝜋𝜎 2 𝜎 r+k

Equation (65) is the rth moment of GPLD.

13
Journal of the Indian Society for Probability and Statistics

2.3.13 Mean of NPLD

If r = 1, then (65) becomes


� �
1𝜇2i−j k 𝜆1+k−k(m+s)
E(X) = 𝜓i,j,l,m,s √ 2i
. (66)
2𝜋𝜎 2 𝜎 k+1

Note that if i = j = l = m = s = 0, then Equation (66) becomes

k𝜆k+1
E(X) = √ . (67)
(k + 1) 2𝜋𝜎 2

2.4 Maximum Likelihood Estimation (MLE) of NPLD

The likelihood function of NPLD is given by


� � � � �2 �
∑n 1 xk
i
− ln −𝜇
kn 𝜆kn i=1 2𝜎 2 𝜆k −xk
L(𝜇, 𝜎, k, 𝜆) = e i

(2𝜋𝜎 2 )n∕2

n
1
×
i=1 xi (𝜆k − xik )

Taking the log gives


n
𝓁 = nlnk + knln𝜆 − ln(2𝜋𝜎 2 )
2
⎧� � � �2 ⎫
1 �⎪
n
xik ⎪
− 2 ⎨ ln k k
−𝜇 ⎬
(68)
2𝜎 i=1 ⎪ 𝜆 − xi ⎪
⎩ ⎭
�n

n
− lnxi − ln(𝜆k − xik )
i=1 i=1

The maximum likelihood estimation parameters of the NPLD are given by differ-
entiating 𝓁 partially with respect to 𝜇, 𝜎 and k and equating the results to zero and
solve for each parameter.
[ ( ) ]
1 ∑
n
𝜕𝓁 xik
= 2 ln −𝜇 (69)
𝜕𝜇 𝜎 i=1 𝜆k − xik

⎧� � � �2 ⎫
1 �⎪
n
𝜕𝓁 n xik ⎪
=− + 3 ⎨ ln k
−𝜇 ⎬ (70)
𝜕𝜎 𝜎 𝜎 i=1 ⎪ k
𝜆 − xi ⎪
⎩ ⎭

13
Journal of the Indian Society for Probability and Statistics

[ ( ) ]( )
𝜆k ∑
n
𝜕𝓁 n xik lnxi − ln𝜆
= − nln𝜆 − 2 ln −𝜇
𝜕k k 𝜎 i=1 𝜆k − xik 𝜆k − xik
( ) (71)
∑n
𝜆k ln𝜆 − xik lnxi

i=1 𝜆k − xik

Equating (69) to zero gives


[ ( )]
1 ∑
n
xik
𝜇̂ = 2 ln . (72)
n𝜎 i=1 𝜆k − xik

Equating (70) to zero gives




� ⎧� � � �2 ⎫
� � n
xik
�1 ⎪ ⎪
𝜎̂ = �
�n ⎨ ln k k
−𝜇 ⎬ (73)
i=1 ⎪ 𝜆 − xi ⎪
⎩ ⎭

Equating (71) to zero gives


[ ( ) ]( )
𝜆k ∑
n
n xik lnxi − ln𝜆
= nln𝜆 − 2 ln −𝜇
k 𝜎 i=1 𝜆k − xik 𝜆k − xik
( ) (74)
∑ n
𝜆k ln𝜆 − xik lnxi
+
i=1 𝜆k − xik

The equations obtained by setting the partial derivatives 𝓁 with respect to k to zero
is not in closed form and the values of the parameter k is found using Newton’s
numerical procedure provided by R package (R Development Core Team 2009). The
parameter 𝜆 cannot be estimated using the MLE method because it depends on X,
thus, is estimated from from data using

𝜆̂ = max(xi ) + 𝜖; ∀x ∈ X (75)

where 𝜖 > 0 is a very small positive number less than 1 chosen by the user.
It should be noted that the maximum likelihood estimators of the parameters 𝜇
and 𝜎 are in close form and will always exist provided the values of parameters k
and 𝜆 are known. The value of parameter 𝜆 cannot be determined by the maximum
likelihood estimation method because it is an upper bound, so it can be estimated
by equation (75) from the data. Parameter k is not in closed form and a numeri-
cal optimization method is used to estimate it. We find the initial value of k used
in the numerical optimization by first assuming that the random sample is from
power function distribution. We estimate the initial value of k from power func-
tion distribution. The moment estimate of parameter k is given by k = 𝜆−̄ x̄
x
, x̄ < 𝜆 ,
where x̄ is the sample mean (Ekum et al. 2020b), estimated from data.

13
Journal of the Indian Society for Probability and Statistics

2.5 Numerical Optimization of Parameter k

In a case where the parameter estimated using Newton approximation is not optimal,
a new relationship is derived by EM algorithm.
Let

∫ (76)
k𝜄+1 = arg max f (x|I;Ω𝜄 )lnf (x, I;Ω)dx
k

where Ω is the parameter space of NPLD, so that we have

∫ (77)
k𝜄+1 = arg max f (x|I;k)lnf (x, I;𝜇, 𝜎, k, 𝜆)dx
k

Recall the pdfs of normal distribution and NPLD as


� �2
1 −1 x−𝜇
f (x) = √ e 2 𝜎

2𝜋𝜎 2
and
� � � � �2 �
k𝜆k 1 xk
fX (x) = √ exp − 2 ln k −𝜇
x(𝜆k − xk ) 2𝜋𝜎 2 2𝜎 𝜆 − xk

respectively.
Substituting the pdfs of normal distribution and NPLD into Equation (77) gives
∞ � �2

∫0
1 − 21 x−̄x
k𝜄+1 = arg max √ e S
k
2𝜋S2
⎧ ⎧ � � � �2 ⎫ ⎫
⎪ 1 ⎪
⎪ k ⎨− 2S2 ln x̂ k −xk −̄x ⎬ ⎪
xk
(78)
⎪ k̂x(n) ⎪ (n) ⎪⎪
× ln⎨ √ e⎩ ⎭ dx.

⎪ x(̂x(n) − x ) 2𝜋S
k k 2 ⎪
⎪ ⎪
⎩ ⎭

where 𝜇, 𝜎 and 𝜆 are known, such that, 𝜇̂ = x̄ , 𝜎̂ = S , and 𝜆̂ = sup(x = x̂)(n), where x̄
and S are the sample mean and sample standard deviation of ln 𝜆−x x
. Note that
x̂ (n) − x > 0, ∀ x ∈ X . Note k1 is the initial value of k assumed as suggested, that is,
k1 = 𝜆−̄ x̄
x
, x̄ < 𝜆. So that k𝜄+1 is the new estimate of k and it is optimal.
Now that optimal value of k is known, then we can estimate the values of 𝜇 and 𝜎
using equations (72) and (73) respectively.

2.6 Error Bound and Confidence Interval for NPLD

The error bound for estimating a generic parameter Θ of NPLD is given by

13
Journal of the Indian Society for Probability and Statistics

B = Q∗(1−𝛼) SΘ (79)

where 𝛼 is the level of significance, Θ is the parameter to be estimated, Q∗p is the


standard quantile function of NPLD with p = 1 − 𝛼;p ∈ [0, 1], and SΘ is the stand-
ard error of Θ, that is, the square root of the variance of Θ.
The standard quantile function of NPLD is derived when k = 𝜎 = 1 and 𝜇 = 0
from the quantile function of NPLD and it is given by
( −1 )
eΦ (p)
(80)

Qp = 𝜆
1 + Φ−1 (p)

where Q∗p is the standard quantile function of NPLD, Φ−1 (p) is the inverse function
of the cdf of standard normal distribution known as the quantile function, and p is
a probability value uniformly generated. Note that 𝜆 > 0 is a regulator parameter in
this case. Its value is adjusted to determine how large the error bound should be. In
this research, 𝜆 is taken as 2 to accommodate the population parameter. So, the level
of significance, 𝛼 and 𝜆 are always chosen. The values of 𝜆 can be 1, 2 or 3 depend-
ing on how large you want the error bound to be.
Thus, the 100(1 − 𝛼)% confidence interval for parameter Θ is given by
̂ ± Q∗ SΘ
Θ=Θ (81)
(1−𝛼)

where Θ
̂ is the point estimate of Θ.

2.7 Simulation Study of NPLD

The simulation study is presented to show the performances of the maximum likeli-
hood estimators and their consistency. The procedure used to perform the simulation
studies involves, generating uniform distribution of n quantiles, p. The quantile func-
tion defined in equation (21) for NPLD was used to generate NPLD random variates
for the sample sizes n = 50, 100, 200 and 300 replicated 1000 times. The parameters
values are set as k = 𝜎 = 𝜇 = 0.5 , k = 𝜎 = 𝜇 = 1, and k = 𝜎 = 𝜇 = 2 and for a fixed
𝜆 = 2. The actual values, mean estimates, standard errors, and 95% confidence inter-
val are presented in Tables 1, 2 and 3. Tables 1, 2 and 3 show that the standard error
decreases as the sample size increases, which implies that the MLEs are consistent.

2.8 Generalized Linear Regression Model for NPLD (NPGLM)

Let assume that the dependent random variable Y of interest in our linear model fol-
lows a NPLD given independent variable(s) X. The linear regression model is called
NPLD Generalized Linear Model (NPGLM).
Given the linear model in matrix form
Y = XB + e (82)
where Y is a n-dimensional vector called the dependent vector for all observations
n; X is the set of k independent variables packed into a (n × k + 1) matrix called the

13
Journal of the Indian Society for Probability and Statistics

Table 1  Simulation Study showing Mean estimates, standard error, and confidence interval of the MLE
for k = 𝜎 = 𝜇 = 0.5
n Parameters Actual values Mean Standard error Confidence interval

50 k 0.5 0.5503 0.0503 (0.4660, 0.6346)


𝜎 0.5 0.5005 0.1005 (0.3320, 0.6690)
𝜇 0.5 0.5488 0.1488 (0.2994, 0.7982)
100 k 0.5 0.5323 0.0203 (0.4983, 0.5663)
𝜎 0.5 0.5004 0.1001 (0.3326, 0.6682)
𝜇 0.5 0.5286 0.1178 (0.3311, 0.7261)
200 k 0.5 0.5104 0.0102 (0.4933, 0.5275)
𝜎 0.5 0.5002 0.0055 (0.491, 0.5094)
𝜇 0.5 0.5078 0.0468 (0.4293, 0.5863)
300 k 0.5 0.5003 0.0100 (0.4835, 0.5171)
𝜎 0.5 0.4998 0.0014 (0.4975, 0.5021)
𝜇 0.5 0.5008 0.0088 (0.486, 0.5156)

Table 2  Simulation Study showing Mean estimates, standard error, and confidence interval of the MLE
for k = 𝜎 = 𝜇 = 1
n Parameters Actual values Mean Standard error Confidence interval

50 k 1 1.0046 0.0158 (0.9781, 1.0311)


𝜎 1 1.0019 0.0981 (0.8374, 1.1664)
𝜇 1 1.4142 0.3144 (0.8871, 1.9413)
100 k 1 1.0053 0.0141 (0.9817, 1.0289)
𝜎 1 1.0014 0.0942 (0.8435, 1.1593)
𝜇 1 1.4018 0.2980 (0.9022, 1.9014)
200 k 1 1.0030 0.0032 (0.9976, 1.0084)
𝜎 1 1.0012 0.0902 (0.8500, 1.1524)
𝜇 1 1.3154 0.2100 (0.9634, 1.6674)
300 k 1 1.0028 0.0028 (0.9981, 1.0075)
𝜎 1 1.0011 0.0682 (0.8868 1.1154)
𝜇 1 1.4158 0.1158 (1.2217, 1.6099)

design matrix; B is a (k + 1)-dimensional vector called the slope vector; e is the error
term packed into a n-dimensional vector called the error vector.

2.8.1 Conditions for NPGLM

The conditions to use the GPGLM to fit the model are given thus:

• Y must be continuous random variable


• Y must be positive real number strictly greater than zero but strictly less than 𝜆
(upper bound for Y)

13
Journal of the Indian Society for Probability and Statistics

Table 3  Simulation Study showing Mean estimates, standard error, and confidence interval of the MLE
for k = 𝜎 = 𝜇 = 2
n Parameters Actual values Mean Standard error Confidence interval

50 k 2 1.9986 0.1017 (1.8281, 2.1691)


𝜎 2 2.0035 0.2965 (1.5065, 2.5005)
𝜇 2 2.1415 0.4451 (1.3953, 2.8877)
100 k 2 1.9987 0.0325 (1.9442, 2.0532)
𝜎 2 2.0031 0.0964 (1.8415, 2.1647)
𝜇 2 2.0544 0.3547 (1.4598, 2.6490)
200 k 2 1.9992 0.0064 1.9885, 2.0099)
𝜎 2 2.0026 0.0163 (1.9753, 2.0299)
𝜇 2 2.0066 0.0178 (1.9768, 2.0364)
300 k 2 2.0001 0.0060 (1.9900, 2.0102)
𝜎 2 2.0011 0.0112 (1.9823, 2.0199)
𝜇 2 2.0018 0.0129 (1.9802, 2.0234)

• Y must follow NPLD


• NPLD must be a member of the exponential family

2.8.2 Exponential Class of NPLD

An exponential family or class is a parametric set of probability distributions that


has a certain form. This special form is chosen for mathematical convenience, based
on some useful algebraic properties, as well as for generality (Akarawak et al. 2017).
It is assumed that each component of Y follows a distribution in the exponential fam-
ily of the form
{[ ]}
𝜃T(y) − b(𝜃)
fY (y;𝜃, 𝜙) = c(T(y), 𝜙)exp ; a(𝜙) > 0, (83)
a(𝜙)

where a(𝜙) is a function of a known parameter 𝜙 only, b(𝜃) is a function of a canoni-


cal parameter 𝜃 and c(T(y), 𝜙) is a function of y and 𝜙 only, and T(y) is a function of
y, known as the sufficient statistics for Y.
Let assume that Y is a random variable that follows NPLD. Recall the pdf of the
NPLD with parameters 𝜇, 𝜎, k, 𝜆 given by
� � � � �2 �
k𝜆k 1 xk
fX (x) = √ exp − 2 ln k −𝜇
x(𝜆k − xk ) 2𝜋𝜎 2 2𝜎 𝜆 − xk

where parameter 𝜆 is an upper bound. The pdf f(y) is not free from parameter (𝜆),
and hence, might be difficult to express as a member of the exponential family.
However, a simple transformation can be done with the data that follows a NPLD
to a normal distribution as proved in Lemma (2.1).
Recall the transformed pdf

13
Journal of the Indian Society for Probability and Statistics

� � ��
1 1 w−𝜇 2
f (w) = √ exp − (84)
2𝜋𝜎 2 2 𝜎

Taking the log of (84) gives


√ w2 w𝜇 𝜇2
logf (w) = −log( 2𝜋) − log𝜎 − 2 − 2 − 2 (85)
2𝜎 𝜎 2𝜎
Taking the exponential of (85) gives
� √ � � �
w2 −w𝜇 − 𝜇2 ∕2
f (w) = exp −log( 2𝜋) − log𝜎 − 2 exp (86)
2𝜎 𝜎2

Comparing (86) with (83) gives


𝜃T(y) = −w𝜇�, T(y)√= w, 𝜃 = −𝜇, b(𝜃) � = −𝜇 ∕2, a(𝜙) = 𝜙, 𝜙 = 𝜎 and
2 2
w2
c(T(y), 𝜙) = exp −log( 2𝜋) − log𝜎 − 2𝜎 2 ,
where w is a function of y, k, 𝜆 given by
( )
yk
w = −log .
𝜆k − yk

Since (86) can be written in exponential class, we can directly derive the joint suf-
ficient statistics from it. So, the joint sufficient statistics for 𝜇 and 𝜎 are w and w2
respectively. Thus, w and w2 can give all information concerning parameters 𝜇 and 𝜎
respectively.

2.8.3 Maximum Likelihood Estimation of the Parameters of NPLD Regression Model

The log-likelihood of the pdf of NPLD is


√ �n
w2i �
n
wi 𝜇 𝜇2
logL = −nlog( 2𝜋) − nlog𝜎 − − − n (87)
i=1
2𝜎 2 i=1 𝜎 2 2𝜎 2

The link function is given by


g(𝜇i ) = 𝜇i = XB = b0 + b1 x1i + ... + bp xpi (88)
So that



�n
w2i �n wx b
i ij j
logL = −nlog( 2𝜋) − nlog𝜎 − 2

i=1
2𝜎 i=1
𝜎2
� (89)

n (x b )2
ij j

i=1
2𝜎 2
( yki
)
where j = 0, 1, ..., p, E(wi ) = 𝜇i = xij bj and wi = log . Note that p is the num-

𝜆 −yki
k

ber of independent variables, and 𝛼 , k and 𝜆 are known parameters.

13
Journal of the Indian Society for Probability and Statistics

Then
� � yk ��2

i
� n log 𝜆k −yk
i
logL = −nlog( 2𝜋) − nlog𝜎 − 2
2𝜎
(90)
i=1
� � yk ��

i
log 𝜆k −y xij bj � n (x� b )2
�n k
i ij j
− 2

i=1
𝜎 i=1
2𝜎 2

The MLE parameter estimate for bj is in closed form and it is given by


[ ( )]
yki
(91)
� −1 �
B̂ = (X X) X log ;𝜆 > y ∀ y ∈ Y, k > 0.
𝜆k − yki

where B̂ = [b̂ 0 , b̂ 1 , ..., b̂ p ], X = [X1 , X2 , ..., Xp ] is a (n × p) matrix, so that X ′ is a


(p × n) matrix, W is an (n × 1) matrix, ( k so ) that X W is a (p × 1) matrix. Thus, B̂ is a

( p × 1) matrix. Note that W = log 𝜆k −Y k , where the value of lambda can be approx-
Y

imated from the data using nth order statistic or simply 𝜆̂ = max(yi ) + 𝜎ȳ ∀ i, where
𝜎ȳ is the standard error of y computed from the data. An approximation for k can
also be derived from data using k̂ = 𝜆−̄ ȳ
y
, ȳ < 𝜆, where ȳ is the sample mean, derived
from Ekum et al. (2020b).

3 Results

3.1 Application

In this section, applications to three real data sets were provided to illustrate the uses
and importance of the NPLD. Three competing models are used to fit the two data
of interest, they are NPLD, Normal are Gamma GLMs.

3.1.1 Application 1: Estimated Spill Volume (ESV) of Crude Oil in Nigeria

The data on the estimated spilled volume (ESV) is collected from 7th January 2011
to 27th December 2019, at Shell Nigeria webisite (www.shell.com.ng/sustainability/
environment/oil-spills.html).
Figure 2 shows that the oil spill data is bimodal with positive skewness (1.1302)
and kurtosis (3.3977).

3.1.2 Fitting the Models to Oil Spill Data

The estimated spill volume of crude oil can be determined by the Duration of Clean-
up (DOC). If the duration of clean-up is known, the spill volume can be estimated
from an appropriate model. Thus, the dependent variable is ESV and the independ-
ent variable is the DOC.

13
Journal of the Indian Society for Probability and Statistics

Crude Oil Spillage

25
20
15
Frequency

10
5
0

200 400 600 800 1000 1200

Estimated Spill Volume (in bbl)

Fig. 2  Histogram showing ESV of Crude Oil in Nigeria

Table 4 shows the model parameters estimated, their standard errors and their
corresponding P-values.
Table 5 shows that the NPLD regression model outperforms the other regression
models using all the selection criteria.

3.1.3 Application 2: Total Research Gate Score

Total Research Gate (TRG) score data is a cross-sectional data collected from
Research Gate page of 100 selected researchers in the field of Mathematical Science
Table 4  Generalised linear model parameter estimates for Oil Spill model
Distribution 𝛽̂0 SE𝛽̂0 P-Value 𝛽̂1 SE𝛽̂1 P-Value

NPLD 1.463495 0.000920 0.000000 0.007670 0.000002 0.000000


Normal -87.909300 60.111900 0.147000 4.101300 0.807100 0.000002
Gamma 0.019900 0.004512 0.000029 -0.000071 0.000016 0.000042

Table 5  Generalised linear model goodness-of-fit criteria for ESV model


Distribution −LogL AIC D A 𝜔 𝜒2

NPLD 93.97682 193.9536 0.2809 6.0198 3.4428 43.060


Normal 663.5405 1333.0810 0.4157 6.0410 4.0625 1558.443
Gamma 401.7506 809.5012 0.6966 10.6512 8.1603 370.830

13
Journal of the Indian Society for Probability and Statistics

as of 15th May 2021. The data includes TRG score, Total Research Interest (TRI),
Citations, Recommendations, Reads and Research Items (RI). The independent vari-
ables are citations and RI (Fig. 3).
Figure 3 shows that the TRG score data is bimodal with positive skewness of
0.1595 and kurtosis of 1.9747.

3.1.4 Fitting the Models to Research Gate Data

The TRG score can be predicted by Citations and Research Items. If citations and
research items increased, the TRG score will also increase. Thus, the dependent var-
iable is TRG score, while the independent variables are citations and research items.
Table 6 shows the model parameters estimated using MLE, their standard error
and their corresponding P-values. The fitted NPLD regression model shows that the
estimates 𝛽0 and 𝛽1 are significant at 5% level of error. This is also true for gamma
and normal regression models.
Table 7 shows that the NPLD regression model outperforms the other regression
models using all the goodness-of-fit criteria.

3.1.5 Application 3: Gross Domestics Product per Capita per COVID‑19 Cases

The data used here are daily data collected from World Health Organisation (WHO)
from 1st June 2020 to 31st December 2020, spanning 214 datasets, used by Iluno
et al. (2021). The independent variable is a measure of COVID-19, termed COVID-
19 Mortality per 1 million persons in the population (CMP), while the dependent

Total Research Gate Score


15
Frequency
10
5
0

0 10 20 30 40 50
TRG Score per Author

Fig. 3  Histogram showing TRG Score of some selected researchers

13
Journal of the Indian Society for Probability and Statistics

Table 6  Generalised linear model parameter estimates for TRG Score model
Distribution 𝛽̂1 SE𝛽̂1 P-Value 𝛽̂2 SE𝛽̂2 P-Value

NPLD 0.00212 0.00000 0.00015 0.05554 0.00000 0.00000


Normal 0.00117 0.00036 0.00151 0.04910 0.00857 0.00000
Gamma -0.00002 0.00000 0.00022 -0.00003 0.00000 0.00030

Table 7  Generalised linear model goodness-of-fit criteria for TRG Score model
Distribution −LogL AIC D A 𝜔 𝜒2

NPLD 280.5727 565.1455 0.2145 9.0244 1.5532 10.5850


Normal 379.0210 762.0421 0.2305 10.5501 1.9402 94.1980
Gamma 344.9956 693.9912 0.3247 17.9003 3.6227 58.6690

variable is the GDP per capita per COVID-19 laboratory-confirmed cases (RGDPC).
The CMP is a proxy to measure COVID-19 mortality, while RGDPC is a proxy to
measure the economic wellbeing of a country.
Figure 4 shows that the RGDPC data has a positive skewness of 2.317554 and
kurtosis of 7.896267. This data is highly skewed and very peaked (leptokurtic).

3.1.6 Fitting the Models to COVID‑19 Data

The RGDPC can be predicted by the CMP. If COVID-19 Mortality per Population is
high, it can affect the GDP per Capita of a country negatively. Thus, the dependent
variable is RGDPC and the independent variable is the CMP. Four competing dis-
tributions are used to fit the GLM. The performance of the three competing models
are presented in Table 8 to show the performance of the models when fitted to the
RGDPC data (Table 9).
Table 8 shows the model parameters estimated, their standard errors and their
corresponding P-values.
Table 9 shows that the NPLD regression modeloutperforms the other regression
models using all the selection criteria.

4 Conclusions

This study developed a novel NPLD model, using the T-Power{logistic} family of
distributions. The cdf, pdf, survival function, hazard rate, cumulative hazard func-
tion, reverse hazard function, useful transformation, quantile functions, mode, robust
skewness, robust kurtosis, series expansion and moment are derived. The maximum
likelihood estimation of the parameters of the distribution were derived and that of
its generalized regression model. The NPLD regression model was applied to three
real-life data namely, Estimated Spill Volume (ESV) of crude oil in Niger Delta

13
Journal of the Indian Society for Probability and Statistics

GDP per Capita per COVID−19 Cases

120
100
80
Frequency

60
40
20
0

0.1 0.2 0.3 0.4 0.5


RGDPC

Fig. 4  Histogram showing RGDPC of Nigeria

Table 8  Generalised linear model parameter estimates for RGDPC model


Distribution 𝛽̂0 SE𝛽̂0 P-Value 𝛽̂1 SE𝛽̂1 P-Value

NPLD 3.01647 0.00080 0.00000 −0.53761 0.00058 0.00000


Normal 0.45155 0.00890 0.00000 −0.06810 0.00184 0.00000
Gamma −1.92291 0.06327 0.00000 2.44210 0.02002 0.00000

area of Nigeria, Total Research Gate (TRG) score of some selected researchers in
research gate and GDP per Capita per COVID-19 cases (RGDPC; and the results
of its performance was compared favourably with normal and Gamma regression
models.
The goodness of fit statistics showed that the NPLD regression model outper-
forms the other regression models using all the selection criteria. Also, the good-
ness of fit statistics also show that the NPLD regression model outperforms the
other regression models using all the criteria for the TRG score model as well as
the RGDPC model. Hence, NPLD regression model can be used effectively to ana-
lyze and model the crude oil spill volume data, TRG score data, RGDPC and other
related data when normal is not good fit.
This research therefore recommends that

• NPLD model should be used to estimate spill volume of crude oil, and total
research gate score.

13
Journal of the Indian Society for Probability and Statistics

Table 9  Generalised linear model goodness-of-fit criteria for RGDPC model


Distribution −LogL AIC D A 𝜔 χ2

NPLD 280.5727 565.1455 0.2145 9.0244 1.5532 10.5850


Normal 379.0210 762.0421 0.2305 10.5501 1.9402 94.1980
Gamma 344.9956 693.9912 0.3247 17.9003 3.6227 58.6690

• It is recommended that the convoluted distribution NPLD should be used when


normal is not a good fit to emerging data of interest.
• It is recommended based on the applications that clean-up of spilled oil should
be carried out immediately and complete it at record time, because it can be used
to estimate the spilled volume of crude oil.
• It is also recommended that researchers should increase the research items they
upload to research gate and write quality papers to increase their citations, in
order to increase their total research gate score.
• It is also recommended that COVID-19 mortality be reduced, by providing medi-
cal response to infected individuals, because, it can affect the economic well-
being of the nation.

Acknowledgements I am very grateful to my PhD supervisors Prof. Muminu Adamu, who is also the
pioneer HOD, Department of Statistics, University of Lagos and Dr. Eno Akarawak for their supervi-
sory role during my Ph.D. work. I appreciate Prof. Felix Famoye of Central Michigan University, United
States for mentoring me during the beginning of this work in the University of Lagos and for the materi-
als he gave me. I am also grateful to the reviewers for their appropriate and constructive suggestions to
improve this work.

Funding There is no funding for this research.

Declaration

Conflict of interest The authors declare that they have no conflict of interest.

References
Akagbue HI, Adamu MO, Anake TA (2017) Solutions of chi-square quantile differential equation. In:
Proceedings of the world congress on engineering and computer science, San Francisco, USA
Akarawak EEE, Adeleke IA, Okafor RO (2013) The Weibull–Rayleigh distribution and its properties. J
Eng Res 18(1):61–72
Akarawak EEE, Adeleke IA, Okafor RO (2017) The Gamma–Rayleigh distribution and applications to
survival data. Nigerian J Basic Appl Sci 25(2):130–142
Aljarrah MA, Lee C, Famoye F (2014) On generating T-X family of distributions using quantile func-
tions. J Stat Distrib Appl 1(2):1–17. https://​doi.​org/​10.​1890/​13-​1452.1
Almagambetova A, Zakiyeva N, Alzaatreh A, Pya N (2016) On logistic-normal distribution. Department
of Mathematics, Nazarbayev University, Kazakhstan
Alzaatreh A, Famoye F, Lee C (2014) The gamma-normal distribution: properties and application. Com-
put Stat Data Anal 69:67–80

13
Journal of the Indian Society for Probability and Statistics

Alzaatreh A, Lee C, Famoye F, Ghosh I (2016) The generalized Cauchy family of distributions with
applications. J Stat Distrib Appl, vol 3, No. 12. https://​doi.​org/​10.​1186/​s40488-​016-​0050-3
Amalare AA, Ogunsanya AS, Ekum MI, Owolabi TO (2020) Lomax–CauchyUniform distribution: prop-
erties and application to exceedances of flood peaks of Wheaton River. Benin J Stat 3:128–141
Arowolo OT, Nurudeen TS, Akinyemi JA, Ogunsanya AS, Ekum MI (2019) Reduced beta skewed
Laplace distribution with application to failure-time of electrical component data. Ann Stat Theory
Appl (ASTA) 1:31–41
Arshad MA, Iqbal MZ, Ahmadm M (2020) Exponentiated power function distribution: properties and
applications. J Stat Theory Appl 19(2):297–313
Chen G, Balakrishnan N (1995) A general purpose approximate goodness-of-fit test. J Qual Technol
27:154–161
Cordeiro GM, de Castro M (2011) A new family of generalized distributions. J Stat Comput Simul
81:883–898
Dallas AC (1976) Characterization of Pareto and power function distribution. Ann Math Stat 28:491–497
Deinkuro NS, Knapp CW, Raimi MO, Nanalok NH (2021) Oil Spills in the Niger Delta Region, Nige-
ria: Environmental Fate of Toxic Volatile Organics. Res Square. https://​doi.​org/​10.​21203/​rs.3.​rs-​
654453/​v1
Ekum MI, Adamu MO, Akarawak EEE (2020a) T-Dagum: A way of generalizing dagum distribution
using Lomax quantile function. J Prob Stat. https://​doi.​org/​10.​1155/​2020/​16412​07
Ekum MI, Adeleke IA, Akarawak EEE (2020b) Lambda upper bound distribution: some properties and
applications. Benin J Stat 3:12–40
Ekum MI, Adamu MO, Akarawak EEE (2021) A class of power function distributions: its properties and
applications. Unilag J Math Appl 1(1):35–59
Eugene N, Lee C (2002) Famoye F Beta-normal distribution and its applications. Commun Stat Theory
Methods 31:497–512
Famoye F, Akarawak E, Ekum M (2018) Weibull-normal distribution and its applications. J Stat Theory
Appl 17(4):719–727. https://​doi.​org/​10.​2991/​jsta.​2018.​17.4.​12
Gupta RC, Gupta PL, Gupta RD (1998) Modeling failure time data by Lehmann alternatives. Commun
Stat Theory Methods 27:887–904
Iluno C, Taylor J, Akinmoladun O, Ekum Aderele OM (2021) Modelling the effect of Covid-19 mortality
on the economy of Nigeria. Res Glob 3(2021):100050
Jordan K (2015) Exploring the ResearchGate score as an academic metric: reflections and implications
for practice. In: Quantifying and Analysing Scholarly Communication on the Web (ASCW-15),
Oxford
Kundu D (2017). Multivariate geometric skew-normal distribution. Stat J Theor Appl Stat 51(6)
Meniconi M, Barry DM (1996) The power function distribution: a useful and simple distribution to
assess electrical component reliability. Micreoelectronics Reliab 36:1207–1212
O’Brien K (2019) ResearchGate. J Med Library Assoc JMLA 107(2):284–285. https://​doi.​org/​10.​5195/​
jmla.​2019.​643
Ogunsanya AS, Sanni OO, Yahya WB (2019) Exploring some properties of odd Lomax-exponential dis-
tribution. Ann Stat Theory Appl(ASTA) 1:21–30
Ogunsanya AS, Akarawak EEE, Ekum MI (2021) On some properties of Rayleigh–Cauchy distribution. J
Stat Manage Syst. https://​doi.​org/​10.​1080/​09720​510.​2020.​18224​99
Okorie IE, Akpanta AC, Ohakwe J, Chikezie DC (2017) The modified Power function distribution.
Cogent Math 4:1319592. https://​doi.​org/​10.​1080/​23311​835.​2017.​13195​92
Pak A, Adegboye OA, Adekunle AI, Rahman KM, McBryde ES, Eisen DP (2020) Economic conse-
quences of the COVID-19 outbreak: the need for epidemic preparedness. Front Public Health 8.
https://​doi.​org/​10.​3389/​fpubh.​2020.​00241
Tahir MH, Alizadeh M, Mansoor M, Cordeiro GM, Zubair M (2016) The Weibull-power function distri-
bution with applications. Hacettepe J Math Stat 45(1):245–265
Whanda S, Adekola O, Adamu B, Yahaya S, Pandey P (2016) Geo-spatial analysis of oil spill distribution
and susceptibility in the Niger Delta Region of Nigeria. J Geogr Inf Syst 8:438–456. https://​doi.​org/​
10.​4236/​jgis.​2016.​84037
Zaka A, Akhter AS (2013) Methods for estimating the parameters of power function distribution. Pak J
Stat Oper Res 9:213–224
Zografos K, Balakrishnan N (2009) On families of beta and generalized gamma-generated distributions
and associated inference. Stat Methodol 6:344–362

13
Journal of the Indian Society for Probability and Statistics

Zubair M, Alzaatreh A, Cordeiro GM, Tahir MH, Mansoor M (2018) On generalized classes of exponen-
tial distribution using T-X family framework. Filomat 32(4):1259–1272

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and
applicable law.

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy