Abstract
It is important to identify the most appropriate statistical model for citation data in order to maximise the potential of future analyses as well as to shed light on the processes that may drive citations. This article assesses stopped sum models and some variants and compares them with two previously used models, the discretised lognormal and negative binomial, using the Akaike Information Criterion (AIC). Based upon data from 20 Scopus categories, some of the stopped sum variant models had lower AIC values than the discretised lognormal models, which were otherwise the best (with respect to AIC). However, very large standard errors were returned for some of these variant models, indicating the imprecision of the estimates and the impracticality of the approach. Hence, although the stopped sum variant models show some promise for citation analysis, they are only recommended when they fit better than the alternatives and have manageable standard errors. Nevertheless, their good fit to citation data gives evidence that two different, but related, processes may drive citations.



Similar content being viewed by others
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), Second International Symposium on Information Theory (pp. 267–281). Budapest: Akadémiai Kiadó.
Bookstein, A. (2001). Implications of ambiguity for scientometric measurement. Journal of the American Society for Information Science and Technology, 52(1), 74–79. doi:10.1002/1532-2890(2000)52:1<74:AID-ASI1052>3.0.CO;2-C.
Bornmann, L., Schier, H., Marx, W., & Daniel, H.-D. (2012). What factors determine citation counts of publications in chemistry besides their quality? Journal of Informetrics, 6(1), 11–18. doi:10.1016/j.joi.2011.08.004.
Bozdogan, H. (2000). Akaike’s Information Criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44(1), 62–91. doi:10.1006/jmps.1999.1277.
Burnham, K.P., & Anderson, D.R. (2003). Model selection and multi-model inference: A practical information-theoretic approach (2nd ed.). Springer.
Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703. doi:10.1137/070710111.
De Solla Price, D. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306. doi:10.1002/asi.4630270505.
Deschacht, N., & Engels, T. C. E. (2014). Limited dependent variable models and probabilistic prediction in informetrics. In Measuring scholarly impact (pp. 193–214). doi:10.1007/978-3-319-10377-8_9.
Didegah, F., & Thelwall, M. (2013). Which factors help authors produce the highest impact research? Collaboration, journal and document properties. Journal of Informetrics, 7(4), 861–873. doi:10.1016/j.joi.2013.08.006.
Dobbie, M. J., & Welsh, A. H. (2001). Models for zero-inflated count data using the Neyman type A distribution. Statistical Modelling, 1(1), 65–80. doi:10.1191/147108201128096.
Dodge, Y. (2003). The Oxford dictionary of statistical terms. In S. D. Cox, D. Commenges, A. Davison, P. Solomon, & S. Wilson (Eds.), (1st ed.). Oxford: Oxford University Press.
Hesse, M. B. (1953). Models in Physics. The British Journal for the Philosophy of Science, 4(15), 198–214.
Johnson, N. L., Kemp, A. W., & Kotz, S. (2005). Univariate discrete distribution (3rd ed.). New York: Wiley-Interscience.
Karlis, D., & Xekalaki, E. (2007). Mixed Poisson distributions. International Statistical Review, 73(1), 35–58. doi:10.1111/j.1751-5823.2005.tb00250.x.
Lee, Y. G., Lee, J. D., Song, Y. I., & Lee, S. J. (2007). An in-depth empirical analysis of patent citation counts using zero-inflated count data model: The case of KIST. Scientometrics, 70(1), 27–39. doi:10.1007/s11192-007-0102-z.
Low, W. J., Wilson, P., & Thelwall, M. (2015). Stopped sum models for citation data. In A. A. Salah, Y. Tonta, A. A. A. Salah, C. Sugimoto, & U. Al (Eds.), Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29 June–3 July, 2015 (pp. 184–195). Istanbul: Boğaziçi University.
Maurseth, P. B., & Verspagen, B. (2002). Knowledge spillovers in Europe: A patent citations analysis. Scandinavian Journal of Economics, 104(4), 531–545. doi:10.1111/1467-9442.00300.
Merton, R. K. (1968). The Matthew effect in science: The reward and communication systems of science are considered. Science, 159(3810), 56–63. doi:10.1126/science.159.3810.56.
Neyman, J. (1939). On a new class of “contagious” distributions, applicable in entomology and bacteriology. The Annals of Mathematical Statistics, 10(1), 35–57. doi:10.1214/aoms/1177732245.
Nikoloulopoulos, A. K., & Karlis, D. (2008). On modeling count data: a comparison of some well-known discrete distributions. Journal of Statistical Computation and Simulation,. doi:10.1080/10629360601010760.
Oliveira, M., Einbeck, J., Higueras, M., Ainsbury, E., Puig, P., & Rothkamm, K. (2015). Zero-inflated regression models for radiation-induced chromosome aberration data: A comparative study. Biometrical Journal,. doi:10.1002/bimj.201400233.
R Core Team. (2014). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Rigby, R. A., Stasinopoulos, D. M., & Lane, P. W. (2005). Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society. Series C Applied Statistics, 54(3), 507–554. doi:10.1111/j.1467-9876.2005.00510.x.
Ruppert, D. (2011). Statistics and data analysis for financial engineering. New York: Springer.
Thelwall, M., & Wilson, P. (2014a). Distributions for cited articles from individual subjects and years. Journal of Informetrics, 8(4), 824–839. doi:10.1016/j.joi.2014.08.001.
Thelwall, M., & Wilson, P. (2014b). Regression for citation data: An evaluation of different methods. Journal of Informetrics, 8(4), 963–971. doi:10.1016/j.joi.2014.09.011.
Van Raan, A. F. J. (2004). Sleeping Beauties in science. Scientometrics, 59(3), 467–472. doi:10.1023/B:SCIE.0000018543.82441.f1.
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer.
Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1–25.
Zhu, R., & Joe, H. (2009). Modelling heavy-tailed count data using a generalised Poisson-inverse Gaussian family. Statistics and Probability Letters, 79(15), 1695–1703. doi:10.1016/j.spl.2009.04.011.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Low, W.J., Wilson, P. & Thelwall, M. Stopped sum models and proposed variants for citation data. Scientometrics 107, 369–384 (2016). https://doi.org/10.1007/s11192-016-1847-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-016-1847-z