Statistical Debugging Using Latent Topic Models

Andrzejewski, David; Mulhern, Anne; Liblit, Ben; Zhu, Xiaojin

doi:10.1007/978-3-540-74958-5_5

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4701))

Included in the following conference series:

European Conference on Machine Learning

6184 Accesses
35 Citations

Abstract

Statistical debugging uses machine learning to model program failures and help identify root causes of bugs. We approach this task using a novel Delta-Latent-Dirichlet-Allocation model. We model execution traces attributed to failed runs of a program as being generated by two types of latent topics: normal usage topics and bug topics. Execution traces attributed to successful runs of the same program, however, are modeled by usage topics only. Joint modeling of both kinds of traces allows us to identify weak bug topics that would otherwise remain undetected. We perform model inference with collapsed Gibbs sampling. In quantitative evaluations on four real programs, our model produces bug topics highly correlated to the true bugs, as measured by the Rand index. Qualitative evaluation by domain experts suggests that our model outperforms existing statistical methods for bug cause identification, and may help support other software tasks not addressed by earlier models.

This research was supported in part by AFOSR Grant FA9550-07-1-0210, NSF Grant CCF-0621487, and NLM Training Grant 5T15LM07359.

Download to read the full chapter text

Chapter PDF

Latent Dirichlet Allocation (LDA) Based on Automated Bug Severity Prediction Model

Studying software logging using topic models

Article 30 January 2018

Predicting Bug-Fix Time: Using Standard Versus Topic-Based Text Categorization Techniques

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Arumuga Nainar, P., Chen, T., Rosin, J., Liblit, B.: Statistical debugging using compound boolean predicates. In: Elbaum, S. (ed.) International Symposium on Software Testing and Analysis, July 9–12, 2007, London, United Kingdom (2007)
Google Scholar
Dickinson, W., Leon, D., Podgurski, A.: Finding failures by cluster analysis of execution profiles. In: Proceedings of the 23rd International Conference on Software Engeneering (ICSE-01), pp. 339–348. IEEE Computer Society, Los Alamitos (2001)
Chapter Google Scholar
Hangal, S., Lam, M.S.: Tracking down software bugs using automatic anomaly detection. In: ICSE 2002: Proceedings of the 24th International Conference on Software Engineering, pp. 291–301. ACM Press, New York (2002)
Chapter Google Scholar
Jones, J.A., Harrold, M.J.: Empirical evaluation of the Tarantula automatic fault-localization technique. In: ASE 2005: Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, pp. 273–282. ACM Press, New York (2005)
Chapter Google Scholar
Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, June 12–15 2005, Chicago, Illinois (2005)
Google Scholar
Liu, C., Yan, X., Fei, L., Han, J., Midkiff, S.P.: SOBER: statistical model-based bug localization. In: Wermelinger, M., Gall, H. (eds.) ESEC/SIGSOFT FSE, pp. 286–295. ACM, New York (2005)
Google Scholar
Zheng, A.X., Jordan, M.I., Liblit, B., Aiken, A.: Statistical debugging of sampled programs. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) NIPS 16, MIT Press, Cambridge, MA (2004)
Google Scholar
Zheng, A.X., Jordan, M.I., Liblit, B., Naik, M., Aiken, A.: Statistical debugging: Simultaneous identification of multiple bugs. In: ICML (2006)
Google Scholar
Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. of Uncertainty in Artificial Intelligence, UAI 1999, Stockholm (1999)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Griffiths, T., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004)
Article Google Scholar
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Liblit, B.: Cooperative Bug Isolation: Winning Thesis of the 2005 ACM Doctoral Dissertation Competition. LNCS, vol. 4440. Springer, Heidelberg (2007)
MATH Google Scholar
Liblit, B.: The Cooperative Bug Isolation Project, http://www.cs.wisc.edu/cbi/
Kass, R., Raftery, A.: Bayes factors. Journal of the American Statistical Association 90, 773–795 (1995)
Article MATH Google Scholar
EXIF Tag Parsing Library, http://libexif.sf.net/
Do, H., Elbaum, S., Rothermel, G.: Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Software Engineering: An International Journal 10(4), 405–435 (2005)
Article Google Scholar
Rothermel, G., Elbaum, S., Kinneer, A., Do, H.: Software-artifact intrastructure repository (September 2006), http://sir.unl.edu/portal/
Schleimer, S., Wilkerson, D.S., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data 2003, San Diego, California, June 09–12, 2003, pp. 76–85. ACM Press, New York (2003)
Chapter Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)
Article Google Scholar
Lal, A., Lim, J., Polishchuk, M., Liblit, B.: Path optimization in programs and its application to debugging. In: Sestoft, P. (ed.) 15th European Symposium on Programming, Vienna, Austria, pp. 246–263. Springer, Heidelberg (2006)
Google Scholar
Blei, D.M., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested Chinese restaurant process. In: NIPS 16 (2003)
Google Scholar
Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the Association for Computational Linguistics, pp. 271–278 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Sciences Department, University of Wisconsin, Madison WI 53706, USA
David Andrzejewski, Anne Mulhern, Ben Liblit & Xiaojin Zhu

Authors

David Andrzejewski
View author publications
You can also search for this author in PubMed Google Scholar
Anne Mulhern
View author publications
You can also search for this author in PubMed Google Scholar
Ben Liblit
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojin Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Raomon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andrzejewski, D., Mulhern, A., Liblit, B., Zhu, X. (2007). Statistical Debugging Using Latent Topic Models. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-74958-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Statistical Debugging Using Latent Topic Models

Abstract

Chapter PDF

Similar content being viewed by others

Latent Dirichlet Allocation (LDA) Based on Automated Bug Severity Prediction Model

Studying software logging using topic models

Predicting Bug-Fix Time: Using Standard Versus Topic-Based Text Categorization Techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Statistical Debugging Using Latent Topic Models

Abstract

Chapter PDF

Similar content being viewed by others

Latent Dirichlet Allocation (LDA) Based on Automated Bug Severity Prediction Model

Studying software logging using topic models

Predicting Bug-Fix Time: Using Standard Versus Topic-Based Text Categorization Techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.