0% found this document useful (0 votes)
38 views7 pages

Compusoft, 2 (1), 18-24

In this paper, we study the statistical properties of Exponential Extension Model and then we also check the validity of proposed model for different real data sets through different techniques. We are using two main techniques which are easy to understand and implement, and are based on intuitive and graphical techniques such as Q-Q-plot test, Kolmogorov–Smirnov (K-S) test which plot the graph of empirical distribution function and fitted distribution function. These plots are used to investigate whether an assumed model adequately fits a set of data and we present power comparison between p-values of these data sets obtaining by K-S test for model validation to obtain feasible real data sets which are most suitable for parameter estimation of exponential extension model.

Uploaded by

Ijact Editor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views7 pages

Compusoft, 2 (1), 18-24

In this paper, we study the statistical properties of Exponential Extension Model and then we also check the validity of proposed model for different real data sets through different techniques. We are using two main techniques which are easy to understand and implement, and are based on intuitive and graphical techniques such as Q-Q-plot test, Kolmogorov–Smirnov (K-S) test which plot the graph of empirical distribution function and fitted distribution function. These plots are used to investigate whether an assumed model adequately fits a set of data and we present power comparison between p-values of these data sets obtaining by K-S test for model validation to obtain feasible real data sets which are most suitable for parameter estimation of exponential extension model.

Uploaded by

Ijact Editor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

COMPUSOFT, An international journal of advanced computer technology, 1 (2), Dec-2012 (Volume-I, Issue-II)

ISSN:2320-0790

A Study of Statistical Properties & Model Validation


for Exponential Extension Model
Dr. Ashwini Kumar Srivastava
Department of Computer Application,
Shivharsh.Kisan P.G. College, Basti, U.P., India
ashwini.skpg@gmail.com
Abstract: In this paper, we study the statistical properties of Exponential Extension Model and then we also check
the validity of proposed model for different real data sets through different techniques. We are using two main
techniques which are easy to understand and implement, and are based on intuitive and graphical techniques such as
Q-Q-plot test, KolmogorovSmirnov (K-S) test which plot the graph of empirical distribution function and fitted
distribution function. These plots are used to investigate whether an assumed model adequately fits a set of data and
we present power comparison between p-values of these data sets obtaining by K-S test for model validation to obtain
feasible real data sets which are most suitable for parameter estimation of exponential extension model.
Keywords: Exponential Extension model, probability density function (pdf'), cumulative distribution function ( cdf),
model validation, quantile-quantile(Q-Q) test, goodness of t test

I. INTRODUCTION

reliability function and failure rate hazard function.

Exponential models play a central role in analyses of


lifetime or survival data, in part because of their convenient
statistical theory, their important 'lack of memory' property
and their constant hazard rates. In circumstances where the
one-parameter family of exponential distributions is not
sufficiently broad, a number of wider families such as the
gamma, Weibull and lognormal models are in common
use[1]. Adding parameters to a well-established family of
models is a time honored device for obtaining more
flexible new families of models.

II. THE STATISTICAL PROPERTIES OF


EXPONENTIAL EXTENSION MODEL
The two-parameter Exponential Extension model has one
shape and one scale parameter [7]. The random variable x
follows exponential extension model with the shape and
scale parameters as > 0 and > 0 respectively, if it has
the following cumulative distribution function (cdf),

F(x; , ) 1 exp 1 1 x

In recent times, Haghighi and Sadeghi[2], Nadarajah and


Haghighi[3] introduced Exponential Extension model by
adding a parameter to exponential model. The two
parameter Exponential Extension model represent the
shape and scale parameter. It is observed that this family
always has a decreasing probability function like an
exponential model but it allows for increasing, decreasing
and constant hazard rates like a Weibull model or an
Exponentiated Exponential model [4, 5 and 6]. The
Exponential Extension model has an explicit expression of

(2.1)

where, x 0, 0, 0.
The probability density function (pdf) can be written as
1

f (x; , ) 1 x

exp 1 1 x

; (2.2)

where, x 0, 0, 0.
and it will be denoted by X~EE(, ). The R functions
dexpo.ext( ) and pexpo.ext( ) given in [8] can be used for the
18

COMPUSOFT, An international journal of advanced computer technology, 1 (2), Dec-2012 (Volume-I, Issue-II)

computation of pdf and cdf, reapectively. Some of the


typical EE density functions for different values of and for
= 1 are depicted in Figure 1.

The hazard rate function in equation (2.4) exhibits the


following shapes:
1.

if < 1 then h(x) is monotonically decreasing


with h(0)=. and h(x) 0 as x.

2.

if > 1 then h(x) is monotonically increasing


with h(0)=. and h(x) 0 as x.

3.

if = 1 then h(x) =.., x.

Some of the typical Exponential Extension model hazard


functions for different values of and for = 1 are
depicted in Figure 2. The associated R function hexpo.ext( )
given in [8].
The quantile function is

xq

Fig 1. The PDF of EE model for =1 and different


values of .
The reliability/survival function is

R(x; , ) exp 1 1 x

(2.3)

Median(x)

where, x 0, 0, 0.

1
1
1 log 0.5 1

(2.6)

Let U be the uniform (0,1) random variable and F(.) a cdf


-1
-1
for which F (.) exists. Then F (u) is a draw from
distribution F(.) .

The hazard function is

(2.5)

The median is

The associated R function sexpo.ext( ) given in [8],


computes the reliability function.
1

The computation of quantiles, the R function qexpo.ext( ),


given in [8] .

where, x 0, 0, 0.

h(x; , ) 1 x

1
1 log 1 q 1 1 ; 0 q 1.

Therefore, the random deviate can be generated from


EE(,) by

(2.4)

1
1 log 1 u 1 1 ; 0 u 1.

(2.7)

where u has the uniform distribution i.e. U(0, 1)


distribution. The R function rexpo.ext( ), given in [8] ,
generates the random deviate from EE(,).

III. COMPUTATION OF MAXIMUM LIKELIHOOD


ESTIMATION
To obtain maximum likelihood estimators of the
parameters (, ), let x1, . . . , xn be the observation of a
sample from a distribution with cumulative distribution
function (2.1), and let x(1) , ... , x(n) be the corresponding
order statistics. The likelihood function of the parameter
L(, ) based on the first k- order statistics is given by

Fig 2. The Hazard function of EE model for =1 and


different values of .
19

COMPUSOFT, An international journal of advanced computer technology, 1 (2), Dec-2012 (Volume-I, Issue-II)

0.2, 0.3, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.7, 0.7, 0.7, 0.8,
0.8, 1.0, 1.0, 1.0, 1.0, 1.1, 1.3, 1.5, 1.5, 1.5, 1.5, 2.0,
2.0, 2.2, 2.5, 2.7, 3.0, 3.0, 3.3, 3.3, 4.0, 4.0, 4.5, 4.7,
5.0, 5.4, 5.4, 7.0, 7.5, 8.8, 9.0, 10.3, 22.0, 24.5

L( , ) n log n log ( 1) log 1 x i


i 1

n 1 x i

(3.1)

i 1

Data Set 5: This data set is from McCool (1974) giving the
fatigue life in hours of ten bearing of a certain type[12].
These data are as follows:

Therefore, to obtain the MLEs of and [9], we can


maximize (3.1) directly with respect to and or we can
solve the following two non-linear equations using
Newton-Raphson method. We have,

n
log L n

log 1 x i 1 1 x i

i1

152.7, 172.0, 172.5, 173.3, 193.0, 204.7, 216.5, 234.9,


262.6, 422.6
A. Maximun Likelihood (ML) Estimation

(3.2)

For obtaining the MLE (maximum likelihood estimation)


and standard error, we have started the iterative procedure
by maximizing the log-likelihood function given in (3.1)
directly with an initial guess for =1.0 and =0.5, far away
from the solution[13]. We have used optim( ) function in R
with option Newton-Raphson method[14, 15]. The iterative
process stopped only after various no. of iterations depend
on used data set[16]. The Table 1 shows the ML estimates,
standard error(SE) with number of Iterations and LogLikelihood value of the parameters alpha and lambda.

and,
n
xi
log L n

i 1 1 x i
n

(3.3)

x i 1 x i
i 1

IV. DATA ANALYSIS


In this section we present five real data sets for illustration
of the proposed methodology. These are
Data Set 1: The following data set includes the time
intervals (in days) of the successive earthquakes in the last
century in Iran and this data are taken by International
Institute of Earthquake Engineering and Seismology. [2].
284, 246, 139, 2280, 95, 308, 355, 607, 11, 563, 553
Data Set 2: The following data represent the number of
million revolution before failing for each of the 23 ball
bearings in the life test [9].

V. MODEL VALIDATION

17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 48.80, 51.84,


51.96, 54.12, 55.56, 67.80, 68.64, 68.64, 68.88, 84.12,
93.12, 98.64, 105.12, 105.84, 127.92, 128.04, 173.40

Most statistical methods assume an underlying model in the


derivation of their results. However, when we presume that
the data follow a specific model, we are making an
assumption. If such a model does not hold, then the
conclusions from such analysis may be invalid. Although
hazard plotting and the other graphical methods can guide
the choice of the parametric distribution, one cannot of
course be sure that the proper model has been selected.
Hence model validation is still necessary to check whether
we have achieved the goal of choosing the right model[17].
In this paper we outline some of the methods used to check
model appropriateness.

Data Set 3: Aarset MV. How to identify bathtub hazard


rate. IEEE Trans Reliability 1987;R-36(1):106 -108. (
Failure time of 50 items)[10].
0.1, 0.2, 1.0, 1.0,
12.0, 18.0, 18.0,
40.0, 45.0, 45.0,
67.0, 67.0, 67.0,
83.0, 84.0, 84.0,
86.0, 86.0

1.0, 1.0, 1.0, 2.0, 3.0, 6.0, 7.0,


18.0, 18.0, 18.0, 21.0, 32.0,
47.0, 50.0, 55.0, 60.0, 63.0,
67.0, 72.0, 75.0, 79.0, 82.0,
84.0, 85.0, 85.0, 85.0, 85.0,

11.0,
36.0,
63.0,
82.0,
85.0,

A. KolmogorovSmirnov Test

Data Set 4: The data represent 46 repair times (in hours)


for an airborne communication transceiver Chhikara and
Folks [11]. The data are as follows:

The KolmogorovSmirnov test (KS test) is a


nonparametric test for the equality of continuous and that
20

COMPUSOFT, An international journal of advanced computer technology, 1 (2), Dec-2012 (Volume-I, Issue-II)

can be used to compare a sample with a reference


probability model. The KolmogorovSmirnov statistic
quantifies a distance between the empirical distribution
function of the sample and the cumulative distribution
function of the reference distribution[18].

fitted distribution function using proposed data sets in


Figure 3-7 and the result of K-S test is shown in table 2.
Table2. D and its Corresponding p-value using KS-test

The Empirical Distribution Function(EDF)

An estimate of F(x) = P[ X x] is the proportion of sample


points that fall in the interval [-, x]. This estimate is called
the empirical distribution function(EDF). The EDF of an
observed sample xl, x2,. . . , xn is defined by

i
Fn (x)
n
1

for

x X1:n

for Xi:n x Xi 1:n ; i 1, . . ., n 1


for

Data Set

D - value

P - value

0.1862

0.77620

0.2885

0.04348

0.1915

0.05108

0.1309

0.40930

0.4853

0.01084

x X n:n

where xl:n, x2:n, . . . , xn:n is the ordered sample.


The KolmogorovSmirnov (K-S) test is a nonparametric
goodness-of-fit test and is used to determine whether an
underlying probability distribution (Fn(x)) differs from a
hypothesized distribution (F0(x)).

Kolmogorov-Smirnov (K-S) distance

The K-S distance between two distribution functions is


defined as

D n max Fn (x) F0 (x i ) , and


1 in

D n max F0 (x i ) Fn (x) ,

Fig 3. The graph for empirical distribution function


and fitted distribution function for data set-1.

1 in

where F0(xi) is the cumulative distribution function


evaluated at xi and Fn(x) is the EDF. To perform the twosided goodness of fit test H0 : F(x) = F0(x) for all x, where
F is a completely specified continuous distribution function
against the alternative H1 : F (x) = F0(x), for some x, the
K-S statistic is

D n max D n , D n
1 in

The distribution of the K-S statistic does not depend on F0


as long as F0 is continuous.
To study the goodness-of-t of the Exponential Extension
model, we compute the Kolmogorov-Smirnov statistic
between the empirical distribution function and the fitted
distribution function when the parameters are obtained by
method of maximum likelihood. We shall use the
ks.expo.ext( ) function in R given in [8] to perform the test.
Now, we plot the empirical distribution function and the

Fig 4. The graph for empirical distribution function


and fitted distribution function for data set-2.

21

COMPUSOFT, An international journal of advanced computer technology, 1 (2), Dec-2012 (Volume-I, Issue-II)

Since, the high p-value clearly indicates that this data set
can be used to analyze EE model, and in this analysis data
set-1 and data set-4 having high p-value. Therefore from
above result and Figure 3-7, it is clear that the estimated EE
model provides excellent good t to the given data set-1
and data set-4.

B. The Q-Q Plots Test


The Q-Q plot test is used to investigate whether an assumed
model adequately fits a set of data. It helps the analyst to
assess how well a given theoretical distribution fits the data.
Let xl, x2,. . . , xn be a sample from a given population with
cdf F(x). Let xl:n, x2:n, . . . , xn:n, be the corresponding order
statistics and pl:n, p2:n, . . . , pn:n be the plotting positions.
Define the plotting positions by [19, 20],

Fig 5. The graph for empirical distribution function


and fitted distribution function for data set-3.

p1:n

i 0.5
; i =1, 2, . . ., n.
n

be an estimate of F(x) based on x = (xl, x2,.


Finally, let F(x)
. . , xn). Thus, F 1 (p1:n ) is the estimated quantile
corresponding to the ith order statistic, xi : n Similarly,

F(x
i:n ) is the estimated probability corresponding to xi : n.
again,

Let F(x)
be an estimate of F(x) based on xl, x2,. . . , xn. The
scatter plot of the points
Fig 6. The graph for empirical distribution function
and fitted distribution function for data set-4.

F 1 (p1:n ) versus xi : n , i = 1 , 2, . . . ,n ,
is called a Q-Q plot. Thus, the Q-Q plots show the
estimated versus the observed quantiles. If the model fits
the data well, the pattern of points on the Q-Q plot will
exhibit a 45-degree straight line. Note that all the points of
a Q-Q plot are inside the square

F 1 (p1:n ) , F 1 (p n:n ) x1:n , x n:n .

Now, we shall use the R function qq.expo.ext( ) given in [8]


to perform the proposed test. We draw QuantileQuantile(Q-Q) plot using MLEs as estimate for used
different data set in given Figure 8-12.

Fig 7. The graph for empirical distribution function


and fitted distribution function for data set-5.
22

COMPUSOFT, An international journal of advanced computer technology, 1 (2), Dec-2012 (Volume-I, Issue-II)

Fig 8. Quantile-Quantile(Q-Q) plot using MLEs as


estimate for data set-1.

Fig 11. Quantile-Quantile(Q-Q) plot using MLEs as


estimate for data set-4.

Fig 9. Quantile-Quantile(Q-Q) plot using MLEs as


estimate for data set-2.

Fig 12. Quantile-Quantile(Q-Q) plot using MLEs as


estimate for data set-5.
Thus, as can be seen from the straight line pattern in Figure
8-12, the EE model fits the data very well for data set-1 and
data set-4.
VI. CONCLUSION
An attempt has been made to incorporate Exponential
Extension model for software reliability data. We have
presented the statistical tools for empirical modeling of the
data in general. These tools are developed in R language
and environment for model analysis, model validation and
estimation of parameters using method of maximum
likelihood. To check the validity of the model, we have
plotted a graph of empirical distribution function and fitted
distribution function using KS-test for different data set and
also we have to present power comparison between pvalues of these data sets obtaining by K-S test for receiving
feasible real data sets which are excellent good fit for

Fig 10. Quantile-Quantile(Q-Q) plot using MLEs as


estimate for data set-3.
23

COMPUSOFT, An international journal of advanced computer technology, 1 (2), Dec-2012 (Volume-I, Issue-II)

analysis of Exponential Extension model. We have also


discussed the Quantile-Quantile (Q-Q) plots for model
validation. Thus, from both used techniques of model
validation for EE model on different data set, the
Exponential Extension model fits the data very well only
for data set-1 and data set-4.

[12]

J.

I.

McCool,

Inferential

Techniques

for

Weibull

Populations, Aeropace Research Laboratories Report ARL


TR74-0180, Wright-Patterson Air Force Base, Dayton, Ohio,
1974.
[13]

Ihaka, R. and Gentleman, R.R. (1996). R: A language for data


analysis and graphics, Journal of Computational and Graphical
Statistics, 5, 299314.

VII. ACKNOWLEDGEMENT
[14]

The author is thankful to Dr. Vijay Kumar, Associate


Professor in Department of Mathematics and Statistics in
DDU Gorakhpur University, Gorakhpur, the editor and the
referees for their valuable suggestions, which improved the
paper to a great extent.

Venables, W. N., Smith, D. M. and R Development Core Team


(2010). An Introduction to R, R Foundation for Statistical
Computing,

Vienna,

Austria.

ISBN

3-900051-12-7,

http://www.r-project.org.
[15]

Srivastava, A.K. and Kumar V. (2011). Markov Chain Monte


Carlo methods for Bayesian inference of the Chen model,

VIII. REFERENCES
[1]

International Journal of Computer Information Systems, Vol. 2


(2), 07-14.

Murthy, D.N.P., Xie, M. and Jiang, R. (2003). Weibull Models,


[16]

Wiley, New York.


[2]

likelihood estimators in the presence of location-scale nuisance

Haghighi, F. and Sadeghi, S., (2009) An Exponential


Extension,

parameters, Communications in Statistics, 2, 23-28.

URL:
[17]

http://hal.inria.fr/docs/00/38/67/55/PDF/p186.pdf
[3]

exponential

distribution,

Statistics,

using MCMC method for non-informativeset of priors,

doi:

International Journal of Computer Applications, Vol. 18(4),

10.1080/02331881003678678.
[4]

31-39.

Gupta, R.D., Kundu, D., 2001. Exponentiated exponential


[18]

family: an alternative to gamma and Weibull distributions.

010-4

Srivastava, A.K. and Kumar V.(2011). Analysis of Software


[19]

Reliability Data using Exponential Power Model. International


2, No. 2, February 2011, 38-45.

[20]

AUTHORS PROFILE

Paradigm, International Journal of Advanced Research in

ASHWINI KUMAR SRIVASTAVA received his M.Sc in Mathematics

Artificial Intelligence, Vol. 1 (9), 39-45.

from D.D.U.Gorakhpur University, MCA(Hons.) from U.P.Technical

Kumar, V. (2010). Bayesian analysis of exponential extension

University, M. Phil in Computer Science

model, J. Nat. Acad.Math., Vol. 24, 109-128.


[8]

[9]

from Allagappa University and Ph.D. in

Kumar, V. and Ligges, U. (2011). reliaR : A package for some


probability

distributions.

Computer

http://cran.r-

from

D.D.U.Gorakhpur University, Gorakhpur.


Currently working as Assistant Professor

Lawless, J. F., (2003). Statistical Models and Methods for

in Department of Computer Application in


Shivharsh Kisan P.G. College, Basti, U.P.

Lifetime Data, 2 ed., John Wiley and Sons, New York.

He has got 8 years of teaching experience

Aarset, M. V. (1987). How to identify bathtub hazard rate.

as well as 4 years research experience. His

IEEE Transactions Reliability, 36,106-108.


[11]

Science

project.org/web/packages/reliaR/index.html.

nd

[10]

Thode, H. C. (2002). Testing For Normality. CRC Press,


ISBN: 0-8247-9613-6.

Kumar, R., Srivastava, A.K. and Kumar, V. (2012). Analysis of


Gumbel Model for Software Reliability Using Bayesian

[7]

Evans, M., Hastings, N., and Peacock, B.(2000). Statistical


Distributions, 3rd ed. New York, Wiley.

Journal of Advanced Computer Science and Applications, Vol.

[6]

Hazewinkel, Michiel, ed. (2001), "Kolmogorov-Smirnov test",


Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-

Biometrical Journal 43, 117-130.


[5]

Srivastava, A.K. and Kumar V. (2011). Software reliability


data analysis with Marshall-Olkin Extended Weibull model

NADARAJAH, S., and Haghighi, F. (2009): An extension of


the

Eastman, J. and Bain, L.J., (1973).A property of maximum

main research interests are Software Reliability, Artificial Neural

Chhikara, R. S. & Folks, J. L. (1977). The inverse Gaussian

Networks, Bayesian methodology and Data Warehousing.

distribution as a lifetime model. Technometrics, 19, 461468.

24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy