1. Introduction
Chiang Mai in northern Thailand is a region with high mountains and forests. When it comes to the rainy season, excess runoff water often floods villages at the foot of mountains. The rainy season in Thailand lasts from May to October. Between August and September, there is “heavy rain” to “very heavy rain”, which can cause flash flooding and river bank overflows. However, predicting heavy precipitation and flooding is challenging due to rainfall variability. Daily rainfall data from the region typically include zero and positive values that fit a zero-inflated gamma (ZIG) distribution. Thus, to accurately predict future catastrophic events, it is vital to measure the central tendency of rainfall in specific places using statistical parameters such as the mean of a distribution. Therefore, the mean of a ZIG distribution can be used to analyze rainfall data series for forecasting future precipitation amounts.
The ZIG distribution can be used to fit data that contain both zero and positive values: the positive values follow a gamma distribution, while the zero values follow a binomial distribution. The ZIG distribution has been used to analyze right-skewed data with a high clump-at-zero frequency in several fields. In meteorology, Kaewprasert et al. [
1] used the mean of a ZIG distribution to analyze mixed zero and non-zero rainfall data. In medicine, Wang et al. [
2] analyzed data on the HIV status of children recorded as non-responses (zero) and positive responses by using the mean of a ZIG distribution.
The two types of statistical inference are parameter estimation and hypothesis testing. The most often used interval estimation technique for a parameter is the confidence interval enclosing the estimate’s minimum and maximum values. The confidence interval for the mean of a ZIG distribution has been the focus of several studies. Muralidharan and Kale et al. [
3] estimated the confidence interval for the mean of a ZIG distribution and applied it to analyze real rainfall data. Simultaneous confidence intervals for the difference between the means of ZIG distributions were introduced by Ren et al. [
4]. Kaewprasert et al. [
1] expanded the scope of this by comparing the difference between the means of several ZIG distributions. Wang et al. [
2] produced confidence interval estimates for the mean of a gamma distribution with zero values.
The mean is frequently utilized in practice to gauge statistical significance in many domains. The common mean is of interest when establishing inference for more than one population when independent samples are collected from them. The process of building the confidence interval for the common mean of several distributions has been studied by many scholars. For example, Yan [
5] established confidence interval estimation for the common mean of several gamma populations by using fiducial inference and the method of variance estimates recovery (MOVER). Maneerat and Niwitpong [
6] estimated the confidence interval for the common mean of several delta-lognormal distributions. As previously indicated, although there have been numerous estimations of the confidence interval for the common mean of several gamma and delta-lognormal distributions, the common mean of several ZIG distributions has not yet been the subject of a study on statistical inference.
The motivation for the study was to examine previously reported confidence interval estimation methods for the common mean of several gamma distributions and extend them to estimate the confidence interval for the common mean of several ZIG distributions. Thus, we chose estimation methods used for the confidence interval for the common mean and common coefficient of variation of several delta-lognormal distributions as follows. Maneerat and Niwitpong [
6] proposed using the fiducial generalized confidence interval (GCI) and the highest posterior density (HPD) interval based on the Jeffreys rule prior to estimate the confidence interval for the common mean of several delta-lognormal distributions. Using the fiducial GCI and Bayesian approach based on the uniform prior was proposed by Yosboonruang et al. [
7] to estimate the confidence interval for the common coefficient of variation of several delta-lognormal distributions.
Herein, we explored several confidence interval estimation methods for the common mean of several ZIG distributions using the fiducial GCI approach and Bayesian and HPD methods based on the Jeffreys rule or uniform prior. We used them to calculate the 95% confidence interval for the common mean of three daily rainfall datasets (Chomthong, Mae Taeng, and Doi Saket) in Chiang Mai, Thailand.
The outline of this study is organized as follows.
Section 2 provides the methodologies to estimate the confidence interval for the common mean of several ZIG distributions.
Section 3 reports the numerical computations using the methods in a Monte Carlo simulation study.
Section 4 presents the empirical application of the proposed confidence interval estimation methods using data on daily rainfall collected from three rain stations in Chiang Mai, Thailand, in September 2020 and 2021. Finally, a discussion and conclusions are offered in
Section 5 and
Section 6, respectively.
2. Methods
Let
;
;
be random variables of size
from
k ZIG distributions denoted as
. This distribution has three parameters: shape parameter
, rate parameter
, and the proportion of zero values
. For
k populations of observations, the distribution function of
is given by
where
is a gamma distribution function, which can be denoted as
when
;
;
. For
, the zero observations follow a binomial distribution denoted as
. Furthermore,
and
represent the numbers of zero and non-zero values, respectively, where
. The population mean of
is given by
Krishnamoorthy et al. [
8] and Krishnamoorthy and Wang [
9] used cube-root approximation
, thereby ensuring that the
are approximately normally distributed, which is denoted as
with mean and variance of
and
, respectively. It is possible to represent
and
in terms of
and
, respectively, as follows:
By resolving the equations in the
and
sets above, we, respectively, arrive at
Thus, the mean of a ZIG distribution is .
The unbiased estimators for
,
, and
are
,
, and
, respectively; then
where
and
.
Using the finding from Aitchison [
10], Vännman [
11] claimed that the minimal variance unbiased estimators of the variance of
can be derived as
According to Yan [
5] and Maneerat and Niwitpong [
6], the common mean of several ZIG distributions can be defined as
where
. The confidence interval for the common mean of several ZIG distributions can be estimated by using the suggested methods listed below.
2.1. The Fiducial GCI Method
Fisher [
12] was the first to propose the fiducial approach. Meanwhile, Hannig [
13] conducted additional research into the fiducial approach and provided some general results. The fiducial interval is the generalized pivotal quantity (GPQ), which may be applied in generalized inference and can be viewed as the result of the fiducial framework. A framework for this that shows the connection between the distribution and the parameter was proposed by Hannig et al. [
14] in the form of a fiducial GPQ.
Suppose ; ; is a random sample from , where is the parameter of interest. Therefore, the GPQ can only be a function of . This is called the fiducial GPQ, which satisfies the following two conditions:
For each , the conditional distribution of is free of the nuisance parameter.
For the observed value of at , .
According to Krishnamoorthy et al. [
8], this approach is based on the observation that
approximates a gamma distribution. Let
and
, respectively, represent the observed sample mean and variance based on the
that have been cube-root transformed. This makes it possible to obtain the respective fiducial GPQs for
and
as follows:
where
,
,
, and
. In addition, the respective fiducial GPQs for
and
have the following forms:
Similarly, the fiducial GPQ for
can be written as [
15]
Meanwhile, the fiducial GPQ for
is given by
Subsequently, the fiducial GPQ for the estimated variance of
is given by
Therefore, we can estimate the confidence interval for the common mean of
k ZIG distributions (
) using its fiducial GPQ as follows:
where
.
Thus, the
fiducial GCI for
becomes
where
denotes the
percentiles of
. This process is specified in Algorithm 1.
Algorithm 1 The fiducial GCI method |
Generate ; ; . Generate and independently. Compute fiducial GPQs , , and . Compute and , leading to obtaining . Repeat steps (1)–(4) 2000 times. Compute the 95% fiducialGCI for using Equation ( 14).
|
2.2. The Bayesian Methods
Suppose that
;
;
, then
. Likelihood function
for
and the prior distribution, which is used to explain conditional probability, make up the Bayesian statistical approach. Therefore, the likelihood function of
k normally distributed samples is given by
For the ZIG distribution, the joint likelihood function of
k individual samples is given by
The common mean for several ZIG distributions can be estimated using the Bayesian approach based on a variety of priors, two of which are derived in the following subsections.
2.2.1. The Jeffreys Rule Prior
Introduced by Harvey and Van Der Merwe [
16], the Jeffreys rule prior can be written as
Adding the likelihood functions in Equations (
16) and (
17) results in the posterior distribution of
becoming
where
and
. Subsequently, the respective marginal posterior distributions of
,
, and
are obtained as
In addition, the respective Bayesian derivations for
and
based on the Jeffreys rule prior have the following forms:
and
Thus, the Bayesian estimation for
based on the Jeffreys rule prior is given by
Meanwhile, the Bayesian estimation for the variance of
based on the Jeffreys rule prior is given by
Therefore, we can construct the Bayesian credible interval for the common mean of several ZIG distributions based on the Jeffreys rule prior as
where
.
Thus, the
Bayesian credible interval for
based on the Jeffreys rule prior is
where
denotes the
percentiles of
.
2.2.2. The Uniform Prior
Due to the uniform prior’s constant function for the prior probability, Bolstad and Curran [
17] presented the uniform priors of
,
, and
. Subsequently, the posterior distribution of
based on the uniform prior becomes
where
and
. Consequentially, the respective marginal posterior distributions of
,
, and
can be obtained as
In addition, the Bayesian uniform priors for
and
have the following respective forms:
and
Thus, the Bayesian estimate for
based on the uniform prior is given by
Subsequently, the Bayesian estimate for the variance of
based on the uniform prior is given by
Therefore, we can construct the Bayesian estimate for the confidence interval for the common mean of several ZIG distributions based on the uniform prior as
where
.
Thus, the
Bayesian credible interval for
based on the uniform prior can be written as
where
denotes the
th percentiles of
.
2.3. The HPD Interval
In the previous section, the Bayesian statistical approach is made up of the prior distribution, which is used to define the conditional probability, and likelihood function
, where
. Therefore, the posterior distribution of
is given by
When posterior distribution
is not symmetric, Box and Tiao [
18] introduced the HPD interval with the characteristic that the probability density of each point inside the interval is greater than that of every point outside of it. Consequently, region
W in the parameter space of
is known as the HPD region of the content
. These are the two conditions that comprise this situation:
Similar to the studies of Maneerat and Niwitpong [
6], Yosboonruang et al. [
7], Chen and Shao [
19], and Noyan and Pham-Gia [
20], we applied the
HPDinterval package in the R software suite for Step (6) in Algorithm 2 to respectively compute the HPD intervals based on the Jeffreys rule or uniform prior for
as follows:
and
Algorithm 2 The Bayesian credible interval base on the Jeffreys rule or uniform prior |
Generate ; ; . Compute and . Generate , , and as given in Equation ( 19) and , , and as given in Equation ( 27) based on the Jeffreys rule or uniform prior, respectively. Compute and to obtain as given in Equation ( 24), and and to obtain as given in Equation ( 32), respectively. Repeat steps (1)–(4) 2000 times. Compute the 95% Bayesian credible interval based on the Jeffreys rule or uniform prior for as given in Equations ( 25) and ( 33), respectively.
|
4. Empirical Application of the Confidence Interval Estimation Methods with Real Data
Daily rainfall data supplied by the Upper Northern Region Irrigation Hydrology Center [
21] were from the Chomthong, Mae Taeng, and Doi Saket districts in Chiang Mai, Thailand during September 2020 and 2021.
Table 5 includes daily rainfall data from the three areas, and
Figure 7 and
Figure 8 present histogram plots of the rainfall observations and Q-Q plots of the positive rainfall data following gamma distributions, respectively. We focused on estimating the daily rainfall data in these areas by applying the estimation methods for the confidence interval for the common mean of three ZIG distributions. By separating the rainfall data into non-zero and zero observations, it was possible to determine the best-fitting distribution for the rainfall data with positive values only. The lowest Akaike information criterion (AIC) and Bayesian information criterion (BIC) values in
Table 6 and
Table 7, respectively, confirm that the gamma distribution was the best fit for all three non-zero rainfall datasets.
The summary statistics computed for the rainfall datasets from Chomthong, Mae Taeng, and Doi Saket in Chiang Mai, Thailand are reported in
Table 8. The estimated confidence interval for the common mean of the three rainfall datasets was
mm/day.
Table 9 summarizes the computed 95% confidence interval for the common mean for the three rainfall datasets using the proposed methods. The length of the confidence interval estimated via fiducial GCI was the shortest, which supports the simulation results for
in the previous section. Thus, we recommend the fiducial GCI for estimating the confidence interval for the common mean of several ZIG distributions.
5. Discussion
Estimating the confidence interval for the common mean of several gamma distributions was first reported by Yan [
5]. Meanwhile, Maneerat and Niwitpong [
6] proposed estimation methods for the confidence interval for the common mean of several delta-lognormal distributions (a lognormal distribution with zero observations) using the fiducial GCI and HPD interval based on the Jeffreys rule prior. In this study, we extended these ideas to construct estimates for the confidence interval for the common mean of several ZIG distributions. Specifically, we proposed several approaches based on the fiducial GCI and Bayesian and HPD methods based on the Jeffreys rule or uniform priors. A coverage probability close to or greater than the nominal confidence level of
and the shortest expected length were used to select the best-performing confidence interval. The results indicate that, while the Bayesian and HPD coverage probability were close to or greater than the nominal confidence level of
, those of the fiducial GCI were similarly close to or greater than that level, and their expected length was the shortest. However, the results from a comparative simulation study show that the coverage probabilities of the fiducial GCI, the Bayesian, and HPD interval based on Jeffreys rule or uniform prior were greater than or close to the nominal confidence level of 0.95 under most circumstances. As the sample sizes increased, the coverage probabilities of all of the proposed methods performed better but were still under the nominal confidence level of 0.95. When the sample sizes were increased, the expected lengths of all of the proposed methods became shorter, whereas when the shape parameter was increased, the expected lengths of all of the proposed methods became longer. When considering the expected lengths, those of the fiducial GCI were the shortest under most circumstances. If the proportion of zero values increased, the expected lengths of all of the proposed methods became shorter. However, the coverage probabilities of the fiducial GCI were lower than the nominal confidence level of 0.95 in some cases. The HPD interval based on the Jeffreys rule or uniform prior outperformed the fiducial GCI. Overall, the fiducial GCI and the HPD interval based on the Jeffreys rule or uniform prior performed the best in the simulation study because they fulfilled the requirements for both criteria. Although Kaewprasert et al. [
1] claimed that Bayesian and HPD methods are the most effective for estimating the mean and the difference between the means of ZIG distributions, our findings for the data and scenario used in this study contradict their claims because the range of intervals for the common mean was wider than when using the Bayesian and HPD methods. According to our results, the fiducial GCI consistently supplied the smallest expected length and a suitable coverage probability for both
and
. However, in certain instances, the HPD based on Jeffreys rule prior produced results that were consistent with those of Kaewpraset et al. [
1].
In addition, we calculated the confidence interval for the common mean of three rainfall datasets from Chiang Mai, Thailand using the proposed methods. We found that the fiducial GCI once again performed the best in this empirical scenario. Our approach may be useful for estimating the rainfall in September, as this information could be important for residents in the hilly and forested regions of places such as Chiang Mai who want to avoid flooding and landslides.
6. Conclusions
We constructed estimators for the confidence interval for the common mean of several ZIG distributions using the fiducial GCI and Bayesian and HPD methods based on the Jeffreys rule or uniform prior. The coverage probability and expected length were used to assess how well they performed in various scenarios. According to the findings from the simulation study, the coverage probabilities of the fiducial GCI were greater than the nominal confidence level of , and its expected lengths were the shortest in almost all cases for and . The efficacies of the proposed methods were tested using real daily rainfall datasets from Chomthong, Mae Taeng, and Doi Saket in Chiang Mai, Thailand. Once again, the fiducial GCI outperformed the other methods by providing the shortest length of the confidence interval, which is the same as the simulation study results. Therefore, the fiducial GCI is recommended for estimating the confidence interval for the common mean of several ZIG distributions, while the HPD based on the Jeffreys rule or uniform prior could also be used in some scenarios.