Abstract
Negative binomial regression is a powerful technique for modeling count data, particularly when dealing with overdispersion. However, estimating the parameters for large-dimensional sparse models is challenging due to the complexity of optimizing the mean and dispersion parameter of the negative binomial distribution. To address this issue, the authors propose a novel approach that employs two iterations of the majorize-minimize (MM) algorithm, one for estimating the dispersion parameter and the other for estimating the mean parameters. These approaches improve the convergence speed and stability of the algorithm. The authors also use group penalty for variable selection, which enhances the accuracy and efficiency of the algorithm. The proposed method provides an explicit solution, simplifies the iteration process, and maintains good stability while ensuring algorithm convergence. Furthermore, the authors apply the proposed algorithm to the zero-inflated model and demonstrate its promising predictive performance on specific data sets. The research has important implications for count data modeling and analysis in various fields, such as data mining, machine learning, and bioinformatics.
Similar content being viewed by others
References
Blasco Moreno A, Pérez Casany M, Puig P, et al., What does a zero mean? Understanding false, random and structural zeros in ecology, Methods in Ecology and Evolution, 2019, 10(7): 949–959.
Hafemeister C and Satija R, Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression, Genome Biology, 2019, 20(1): 296–296.
Green J A, Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression, Health Psychology and Behavioral Medicine, 2021, 9(1): 436–455.
Feng Y, Wang Y, Wang W, et al., Robust estimation of semiparametric transformation model for panel count data, Journal of Systems Science & Complexity, 2021, 34(6): 2334–2356.
Zhang S, Sun Z, Ma W, et al., The effect of cooperative membership on agricultural technology adoption in Sichuan, China, China Economic Review, 2020, 62(C): 101334.
Li S and Shao Q, Exploring the determinants of renewable energy innovation considering the institutional factors: A negative binomial analysis, Technology in Society, 2021, 67(C): 101680.
Ayers K L and Cordell H J, Identification of grouped rare and common variants via penalized logistic regression, Genetic Epidemiology, 2013, 37(6): 592–602.
Chatterjee S, Chowdhury S, Mallick H, et al., Group regularization for zeroinflated negative binomial regression models with an application to health care demand in Germany, Statistics in Medicine, 2018, 37(20): 3012–3026.
Agresti A, Foundations of Linear and Generalized Linear Models, John Wiley & Sons, New York, 2015.
León-Novelo L, Fuentes C, and Emerson S, Marginal likelihood estimation of negative binomial parameters with applications to RNA-seq data, Biostatistics, 2017, 18(4): 637–650.
Kandemir Çetinkaya M and Kaçranlar S, Improved two-parameter estimators for the negative binomial and Poisson regression models, Journal of Statistical Computation and Simulation, 2019, 89(14): 2645–2660.
Kenne Pagui E C, Salvan A, and Sartori N, Improved estimation in negative binomial regression, Statistics in Medicine, 2022, 41(13): 2403–2416.
Breheny P and Huang J, Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, The Annals of Applied Statistics, 2011, 5(1): 232–253.
Wei F and Zhu H, Group coordinate descent algorithms for nonconvex penalized regression, Computational Statistics & Data Analysis, 2012, 56(2): 316–326.
Breheny P and Huang J, Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors, Statistics and Computing, 2015, 25(2): 173–187.
Huang J, Jiao Y, Kang L, et al., GSDAR: A fast Newton algorithm for l0 regularized generalized linear models with statistical guarantee, Computational Statistics, 2022, 37(1): 507–533.
Fan J, Liu H, Sun Q, et al., I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error, Annals of Statistics, 2018, 46(2): 814–841.
Jiang D and Huang J, Majorization minimization by coordinate descent for concave penalized generalized linear models, Statistics and Computing, 2014, 5(24): 871–883.
Wang Z, Liu H, and Zhang T, Optimal computational and statistical rates of convergence for sparse nonconvex learning problems, Annals of Statistics, 2014, 42(6): 2164–2201.
Lee Y and Nelder J A, Hierarchical generalised linear models: A synthesis of generalised linear models, randomeffect models and structured dispersions, Biometrika, 2001, 88(4): 987–1006.
Tseng P, Convergence of a block coordinate descent method for nondifferentiable minimization, Journal of Optimization Theory and Applications, 2001, 109(3): 475–494.
Kwon S and Kim Y, Large sample properties of the scad-penalized maximum likelihood estimation on high dimensions, Statistica Sinica, 2012, 22(2): 629–653.
Jochmann M, What belongs where? Variable selection for zero-inflated count models with an application to the demand for health care, Computational Statistics, 2013, 28: 1947–1964.
Wang Z, Ma S, and Wang C Y, Variable selection for zeroinflated and overdispersed data with application to health care demand in Germany, Biometrical Journal, 2015, 57(5): 867–884.
Riphahn R T, Wambach A, and Million A, Incentive effects in the demand for health care: A bivariate panel count data estimation, Journal of Applied Econometrics, 2003, 18(4): 387–405.
Wang Z, Ma S, Wang C Y, et al., EM for regularized zeroinflated regression models with applications to postoperative morbidity after cardiac surgery in children, Statistics in Medicine, 2014, 33(29): 5192–5208.
Loeys T, Moerkerke B, De Smet O, et al., The analysis of zeroinflated count data: Beyond zeroinflated Poisson regression, British Journal of Mathematical and Statistical Psychology, 2012, 65(1): 163–180.
She Y, An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors, Computational Statistics & Data Analysis, 2012, 56(10): 2976–2990.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
JIN Baisuo is an editorial member for Journal of Systems Science & Complexity and was not involved in the editorial review or the decision to publish this article. All authors declare that there are no competing interests.
Additional information
This research was supported in part by the National Natural Science Foundation of China under Grant Nos. 72111530199, 12231017 and 72293573, and in part by the Natural Science Foundation of Anhui Province of China under Grant No. 2108085J02.
This paper was recommended for publication by Editor TANG Liansheng.
Rights and permissions
About this article
Cite this article
Li, M., Jin, B. Advanced Algorithm for Parameters Estimation of Negative Binomial Distribution with High Dimensional Sparse Group Structure. J Syst Sci Complex 37, 2173–2195 (2024). https://doi.org/10.1007/s11424-024-3202-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-024-3202-4