Abstract
Open source software (OSS) development communities are typically very specialised, on the one hand, and experience high turnover, on the other. Combination of specialization and turnover can cause parts of the system implemented in a certain programming language to become unmaintainable, if knowledge of that language has disappeared together with the retiring developers.
Inspired by measures of linguistic diversity from the study of natural languages, we propose a method to quantify the risk of not having maintainers for code implemented in a certain programming language. To illustrate our approach, we studied risks associated with different languages in Emacs, and found examples of low risk due to high popularity (e.g., C, Emacs Lisp); low risk due to similarity with popular languages (e.g., C++, Java, Python); or high risk due to both low popularity and low similarity with popular languages (e.g., Lex). Our results show that methods from the social sciences can be successfully applied in the study of information systems, and open numerous avenues for future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Very Large Data Bases, pp. 487–499. Morgan Kaufmann (1994)
Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. Information Theories & Applications 10(4), 370–375 (2003)
Capiluppi, A., Serebrenik, A., Youssef, A.: Developing an h-index for OSS developers. In: Lanza, M., Di Penta, M., Xi, T. (eds.) MSR, pp. 251–254. IEEE (2012)
Delorey, D., Knutson, C., Giraud-Carrier, C.: Programming language trends in open source development: An evaluation using data from all production phase Sourceforge projects. In: WoPDaSD (2007)
Doyle, J.R., Stretch, D.D.: The classification of programming languages by usage. Man-Machine Studies 26(3), 343–360 (1987)
Ducheneaut, N.: Socialization in an open source software community: A socio-technical analysis. Computer Supported Cooperative Work 14(4), 323–368 (2005)
Fearon, J.D.: Ethnic and cultural diversity by country. J. Econ. Growth 8(2), 195–222 (2003)
Gelernter, D., Jagannathan, S.: Programming linguistics. MIT Press (1990)
Giuri, P., Ploner, M., Rullani, F., Torrisi, S.: Skills, division of labor and performance in collective inventions: Evidence from open source software. International Journal of Industrial Organization 28(1), 54–68 (2010)
Goeminne, M., Mens, T.: Evidence for the Pareto principle in Open Source Software Activity. In: SQM. CEUR-WS workshop proceedings (2011)
Greenberg, J.: The measurement of linguistic diversity. Language 32(1), 109–115 (1956)
Handel, Z.: What is Sino-Tibetan? Snapshot of a field and a language family in flux. Language and Linguistics Compass 2(3), 422–441 (2008)
Heggarty, P.: Beyond lexicostatistics: How to get more out of “word lis” comparisons. Diachronica 27(2), 301–324 (2010)
Hemetsberger, A., Reinhardt, C.: Learning and knowledge-building in open-source communities a social-experiential approach. Management Learning 37(2), 187–214 (2006)
Jepsen, T.C.: Just what is an ontology, anyway? IT Professional 11(5), 22–27 (2009)
Jones, C., Jones, T.: Estimating software costs, vol. 3. McGraw-Hill (1998)
Jones, C.: Applied Software Measurement: Global Analysis of Productivity and Quality. McGraw-Hill (2008)
Karus, S., Gall, H.: A study of language usage evolution in open source software. In: MSR, pp. 13–22. ACM (2011)
Kouters, E., Vasilescu, B., Serebrenik, A., van den Brand, M.G.J.: Who’s who in Gnome: Using LSA to merge software repository identities. In: ICSM, pp. 592–595. IEEE (2012)
Moberg, J., Gooskens, C., Nerbonne, J., Vaillette, N.: Conditional entropy measures intelligibility among related languages. In: Proceedings of Computational Linguistics in the Netherlands, pp. 51–66 (2007)
Mordal, K., Anquetil, N., Laval, J., Serebrenik, A., Vasilescu, B., Ducasse, S.: Software quality metrics aggregation in industry. Software: Evolution and Process (2012)
Nakakoji, K., Yamamoto, Y., Nishinaka, Y., Kishida, K., Ye, Y.: Evolution patterns of open-source software systems and communities. In: IWPSE, pp. 76–85. ACM (2002)
Neumann, D.E.: An enhanced neural network technique for software risk analysis. IEEE Trans. Softw. Eng 28(9), 904–912 (2002)
Patil, G.P., Taillie, C.: Diversity as a concept and its measurement. Journal of the American Statistical Association 77(379), 548–561 (1982)
Poncin, W., Serebrenik, A., van den Brand, M.G.J.: Process mining software repositories. In: CSMR, pp. 5–14. IEEE (2011)
Posnett, D., D’Souza, R., Devanbu, P., Filkov, V.: Dual ecological measures of focus in software development. In: ICSE, pp. 452–461. IEEE (2013)
Rechenberg, P.: Programming languages as thought models. Struct. Program. 11(3), 105–116 (1990)
Robles, G., González-Barahona, J.M.: Contributor turnover in libre software projects. In: Damiani, E., Fitzgerald, B., Scacchi, W., Scotto, M., Succi, G. (eds.) Open Source Systems, vol. 203, pp. 273–286. Springer, Heidelberg (2006)
Robles, G., González-Barahona, J.M., Merelo, J.J.: Beyond source code: the importance of other artifacts in software development (a case study). Journal of Systems and Software 79(9), 1233–1248 (2006)
Schildt, H.: C/C++ Programmer’s Reference, 2nd edn. McGraw-Hill (2000)
Serebrenik, A., van den Brand, M.G.J.: Theil Index for Aggregation of Software Metrics Values. In: ICSM, pp. 1–9. IEEE (2010)
Stallman, R.M.: EMACS the extensible, customizable self-documenting display editor. SIGPLAN Not 16(6), 147–156 (1981)
Swadesh, M., Sherzer, J., Hymes, D.: The Origin and Diversification of Language. Adeline Transaction (1971)
Vasilescu, B., Filkov, V., Serebrenik, A.: StackOverflow and GitHub: associations between software development and crowdsourced knowledge. In: SocialCom, pp. 188–195. ASE/IEEE (accepted 2013)
Vasilescu, B., Serebrenik, A., van den Brand, M.G.J.: You can’t control the unfamiliar: A study on the relations between aggregation techniques for software metrics. In: ICSM, pp. 313–322. IEEE (2011)
Vasilescu, B., Serebrenik, A., Devanbu, P., Filkov, V.: How social Q&A sites are changing knowledge sharing in Open Source software communities. In: CSCW. ACM (accepted 2014)
Vasilescu, B., Serebrenik, A., Goeminne, M., Mens, T.: On the variation and specialisation of workload–A case study of the Gnome ecosystem community. In: Empirical Software Engineering, pp. 1–54 (2013)
Watt, D.A., Findlay, W.: Programming language design concepts. Wiley (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Vasilescu, B., Serebrenik, A., van den Brand, M.G.J. (2013). The Babel of Software Development: Linguistic Diversity in Open Source. In: Jatowt, A., et al. Social Informatics. SocInfo 2013. Lecture Notes in Computer Science, vol 8238. Springer, Cham. https://doi.org/10.1007/978-3-319-03260-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-03260-3_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03259-7
Online ISBN: 978-3-319-03260-3
eBook Packages: Computer ScienceComputer Science (R0)