Stopped sum models and proposed variants for citation data

Low, Wan Jing; Wilson, Paul; Thelwall, Mike

doi:10.1007/s11192-016-1847-z

Stopped sum models and proposed variants for citation data

Published: 30 January 2016

Volume 107, pages 369–384, (2016)
Cite this article

Scientometrics Aims and scope Submit manuscript

408 Accesses
4 Citations
Explore all metrics

Abstract

It is important to identify the most appropriate statistical model for citation data in order to maximise the potential of future analyses as well as to shed light on the processes that may drive citations. This article assesses stopped sum models and some variants and compares them with two previously used models, the discretised lognormal and negative binomial, using the Akaike Information Criterion (AIC). Based upon data from 20 Scopus categories, some of the stopped sum variant models had lower AIC values than the discretised lognormal models, which were otherwise the best (with respect to AIC). However, very large standard errors were returned for some of these variant models, indicating the imprecision of the estimates and the impracticality of the approach. Hence, although the stopped sum variant models show some promise for citation analysis, they are only recommended when they fit better than the alternatives and have manageable standard errors. Nevertheless, their good fit to citation data gives evidence that two different, but related, processes may drive citations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new algorithm for zero-modified models applied to citation counts

Article 17 August 2020

Deep and narrow impact: introducing location filtered citation counting

Article 15 November 2019

Modelling citation networks

Article 05 September 2015

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), Second International Symposium on Information Theory (pp. 267–281). Budapest: Akadémiai Kiadó.
Google Scholar
Bookstein, A. (2001). Implications of ambiguity for scientometric measurement. Journal of the American Society for Information Science and Technology, 52(1), 74–79. doi:10.1002/1532-2890(2000)52:1<74:AID-ASI1052>3.0.CO;2-C.
Article Google Scholar
Bornmann, L., Schier, H., Marx, W., & Daniel, H.-D. (2012). What factors determine citation counts of publications in chemistry besides their quality? Journal of Informetrics, 6(1), 11–18. doi:10.1016/j.joi.2011.08.004.
Article Google Scholar
Bozdogan, H. (2000). Akaike’s Information Criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44(1), 62–91. doi:10.1006/jmps.1999.1277.
Article MathSciNet MATH Google Scholar
Burnham, K.P., & Anderson, D.R. (2003). Model selection and multi-model inference: A practical information-theoretic approach (2nd ed.). Springer.
Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703. doi:10.1137/070710111.
Article MathSciNet MATH Google Scholar
De Solla Price, D. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306. doi:10.1002/asi.4630270505.
Article Google Scholar
Deschacht, N., & Engels, T. C. E. (2014). Limited dependent variable models and probabilistic prediction in informetrics. In Measuring scholarly impact (pp. 193–214). doi:10.1007/978-3-319-10377-8_9.
Didegah, F., & Thelwall, M. (2013). Which factors help authors produce the highest impact research? Collaboration, journal and document properties. Journal of Informetrics, 7(4), 861–873. doi:10.1016/j.joi.2013.08.006.
Article Google Scholar
Dobbie, M. J., & Welsh, A. H. (2001). Models for zero-inflated count data using the Neyman type A distribution. Statistical Modelling, 1(1), 65–80. doi:10.1191/147108201128096.
Article MATH Google Scholar
Dodge, Y. (2003). The Oxford dictionary of statistical terms. In S. D. Cox, D. Commenges, A. Davison, P. Solomon, & S. Wilson (Eds.), (1st ed.). Oxford: Oxford University Press.
Hesse, M. B. (1953). Models in Physics. The British Journal for the Philosophy of Science, 4(15), 198–214.
Article Google Scholar
Johnson, N. L., Kemp, A. W., & Kotz, S. (2005). Univariate discrete distribution (3rd ed.). New York: Wiley-Interscience.
Book MATH Google Scholar
Karlis, D., & Xekalaki, E. (2007). Mixed Poisson distributions. International Statistical Review, 73(1), 35–58. doi:10.1111/j.1751-5823.2005.tb00250.x.
Article MATH Google Scholar
Lee, Y. G., Lee, J. D., Song, Y. I., & Lee, S. J. (2007). An in-depth empirical analysis of patent citation counts using zero-inflated count data model: The case of KIST. Scientometrics, 70(1), 27–39. doi:10.1007/s11192-007-0102-z.
Article Google Scholar
Low, W. J., Wilson, P., & Thelwall, M. (2015). Stopped sum models for citation data. In A. A. Salah, Y. Tonta, A. A. A. Salah, C. Sugimoto, & U. Al (Eds.), Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29 June–3 July, 2015 (pp. 184–195). Istanbul: Boğaziçi University.
Google Scholar
Maurseth, P. B., & Verspagen, B. (2002). Knowledge spillovers in Europe: A patent citations analysis. Scandinavian Journal of Economics, 104(4), 531–545. doi:10.1111/1467-9442.00300.
Article Google Scholar
Merton, R. K. (1968). The Matthew effect in science: The reward and communication systems of science are considered. Science, 159(3810), 56–63. doi:10.1126/science.159.3810.56.
Article Google Scholar
Neyman, J. (1939). On a new class of “contagious” distributions, applicable in entomology and bacteriology. The Annals of Mathematical Statistics, 10(1), 35–57. doi:10.1214/aoms/1177732245.
Article MATH Google Scholar
Nikoloulopoulos, A. K., & Karlis, D. (2008). On modeling count data: a comparison of some well-known discrete distributions. Journal of Statistical Computation and Simulation,. doi:10.1080/10629360601010760.
MathSciNet MATH Google Scholar
Oliveira, M., Einbeck, J., Higueras, M., Ainsbury, E., Puig, P., & Rothkamm, K. (2015). Zero-inflated regression models for radiation-induced chromosome aberration data: A comparative study. Biometrical Journal,. doi:10.1002/bimj.201400233.
Google Scholar
R Core Team. (2014). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Rigby, R. A., Stasinopoulos, D. M., & Lane, P. W. (2005). Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society. Series C Applied Statistics, 54(3), 507–554. doi:10.1111/j.1467-9876.2005.00510.x.
Article MathSciNet MATH Google Scholar
Ruppert, D. (2011). Statistics and data analysis for financial engineering. New York: Springer.
Book MATH Google Scholar
Thelwall, M., & Wilson, P. (2014a). Distributions for cited articles from individual subjects and years. Journal of Informetrics, 8(4), 824–839. doi:10.1016/j.joi.2014.08.001.
Article Google Scholar
Thelwall, M., & Wilson, P. (2014b). Regression for citation data: An evaluation of different methods. Journal of Informetrics, 8(4), 963–971. doi:10.1016/j.joi.2014.09.011.
Article Google Scholar
Van Raan, A. F. J. (2004). Sleeping Beauties in science. Scientometrics, 59(3), 467–472. doi:10.1023/B:SCIE.0000018543.82441.f1.
Article Google Scholar
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer.
Book MATH Google Scholar
Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1–25.
Article Google Scholar
Zhu, R., & Joe, H. (2009). Modelling heavy-tailed count data using a generalised Poisson-inverse Gaussian family. Statistics and Probability Letters, 79(15), 1695–1703. doi:10.1016/j.spl.2009.04.011.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Statistical Cybermetrics Research Group, School of Mathematics and Computer Science, University of Wolverhampton, Wulfruna Street, Wolverhampton, WV1 1LY, UK
Wan Jing Low, Paul Wilson & Mike Thelwall

Authors

Wan Jing Low
View author publications
You can also search for this author in PubMed Google Scholar
Paul Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Mike Thelwall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wan Jing Low.

Appendix

Tables 3, 4, 5 and 6.

Table 3 AIC for all subjects for each stopped sum variant models, compared with discretised lognormal and negative binomial

Full size table

Table 4 AIC for all subjects for Poisson, Neyman type A, Polya Aeppli, PIG, ZIP and ZINB compared with discretised lognormal and negative binomial

Full size table

Table 5 Estimated parameters of negative binomial model with the SVA models

Full size table

Table 6 Estimated parameters of negative binomial model with the modified SVB models

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Low, W.J., Wilson, P. & Thelwall, M. Stopped sum models and proposed variants for citation data. Scientometrics 107, 369–384 (2016). https://doi.org/10.1007/s11192-016-1847-z

Download citation

Received: 26 June 2015
Published: 30 January 2016
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11192-016-1847-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stopped sum models and proposed variants for citation data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A new algorithm for zero-modified models applied to citation counts

Deep and narrow impact: introducing location filtered citation counting

Modelling citation networks

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Stopped sum models and proposed variants for citation data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A new algorithm for zero-modified models applied to citation counts

Deep and narrow impact: introducing location filtered citation counting

Modelling citation networks

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.