0% found this document useful (0 votes)

95 views8 pages

Predicting Bad Commits: Finding Bugs by Learning Their Socio-Organizational Patterns

This document discusses using machine learning techniques to predict commits that may introduce bugs before they are integrated into a codebase. The authors were able to achieve 34% precision on a commercial ASIC project by combining PinDown, an automatic debugger, Code Maat, a tool for mining version control data, and feature engineering. Key features for the model included code complexity, number of authors per file, logical coupling between files, and code churn. The dataset was imbalanced, so SMOTE oversampling was used to balance it before training an XGBoost classifier.

Uploaded by

bhargav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views8 pages

Predicting Bad Commits: Finding Bugs by Learning Their Socio-Organizational Patterns

Uploaded by

bhargav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Predicting Bad Commits

Finding bugs by learning their socio-organizational patterns

Christian Graber, Verifyter, San Jose CA, USA (christian.graber@verifyter.com)

Daniel Hansson, Verifyter, Lund, Sweden (daniel.hansson@verifyter.com)
Adam Tornhill, Empear, Malmo, Sweden (adam.tornhill@empear.com)

Abstract—This paper explores the feasibility of predicting bad commits before they happen and how this
capability can be used in the context of CI and regression testing. Using standard machine learning techniques we
demonstrate that it is possible to achieve 34% precision in bug prediction on a commercial ASIC IP project. That
means 1 in 3 predictions were correct. Key to achieving this outcome is the combination of PinDown, Code Maat and
feature engineering. PinDown is an automatic debugger of regression test failures. Code Maat is a free tool for mining
and analyzing data from version control systems.

Keywords—debugging; bug prediction; machine learning; continuous integration.

I. INTRODUCTION
Continuous Integration (CI) systems are now commonplace in the hardware design flow. CI systems make
sure code updates compile and pass a small smoke test suite before they are committed to revision control.
Frequent updates are encouraged to reduce the risk of integration failures. For larger design teams this can lead to
significant use of farm resources. Furthermore, the coverage of the smoke test suite is typically low to average,
increasing the chance for failures post-integration. This paper proposes a method to lessen both, burden on farm
resources and the probability of post-integration failures, by flagging bad commits even before they enter the CI
stage and by running regression test suites customized to risk profiles.

II. FEATURE SELECTION

In this paper we are using supervised machine learning techniques [1]. The anonymous training data stems
from a commercial large team ASIC project that used PinDown [2] to automatically find regression bugs. A
regression bug is a commit that breaks one or more tests. Raw training data was extracted from the Perforce
revision control system for a period of 8 months. Labels are the bugs found by PinDown. If a commit is a bug
detected by PinDown the label is True, otherwise it is False. The false positive rate in this project is below 1%,
which makes for a highly unbalanced set of data. Labels generated by PinDown are very accurate. Accuracy of
PinDown labels is typically exceeding 99% in most real life projects. Altogether this data set has 36793 commits
containing 93 bugs. We have opted to try traditional machine learning techniques first on this set because of its
small size. In this context the performance of machine learning does largely depend on the quality of the features.
So feature engineering played a central role in achieving our results. We have partnered with Empear who have
open-sourced their feature extraction tool Code Maat [3]. Code Maat looks at the evolution of the code base by
blending technical, social and organizational information to find patterns. In addition to features produced by
Code Maat Verifyter have added a set of proprietary features.
In this paper we are focusing on Code Maat features while Verifyter features are kept proprietary. By default,
Code Maat runs an analysis on the number of authors per module. The authors analysis is based on the idea that
the more developers working on a module, the larger the communication challenges. Logical coupling refers to
modules that tend to change together. Modules that are logically coupled have a hidden, implicit dependency
between them such that a change to one of them leads to a predictable change in the coupled module. Code churn
is related to post-release defects. Modules with higher churn tend to have more defects. There are several
different aspects of code churn supported by Code Maat: Absolute churn, churn by author and churn by entity.

1
Figure 2

Figure 1: Treemap of Complexity

Code complexity is another strong feature used in our training. The tool used to analyze code complexity is
CLOC [8]. See Figure 1. The illustration itself was done with R.
Here is how to interpret Figure 1:

• Every rectangle represents one file

• The size of the rectangle corresponds to number of source code lines

• The color of the rectangle corresponds to number of comment lines

• Rectangles are grouped by language

That means dark rectangles have very few comments while bright green ones are very well commented.
Since Pindown was configured to find bugs at commit level and features were collected at file level, they had
to be ‘rolled-up’ to commit level by calculating their averages (_avg), maximums (_max) and sums (_tot). Here is
the list of total 51 features that were used. Code Maat features are mentioned by name, other features are Pindown
proprietary:
Feature Description
author_revs_avg Number of commits per author per file
author_revs_max Number of commits per author per file
file
author_revs_tot Number of commits per author per file
n_authors_avg Total number of authors per file
n_authors_max Total number of authors per file

2
n_authors_tot Total number of authors per file
n_revs_avg Total number of commits per file
n_revs_max Total number of commits per file
n_revs_tot Total number of commits per file
f1-f4 proprietary
soc_avg Sum of coupling
soc_max Sum of coupling
soc_tot Sum of coupling
f5-f18 proprietary
cpx_total_avg Total complexity per file
cpx_total_max Total complexity per file
cpx_total_tot Total complexity per file
cpx_mean_avg Mean complexity per file
cpx_mean_max Mean complexity per file
cpx_mean_tot Mean complexity per file
cpx_sd_avg Complexity standard deviation per file
cpx_sd_max Complexity standard deviation per file
cpx_sd_tot Complexity standard deviation per file
cpx_max_avg Maximum complexity per file
cpx_max_max Maximum complexity per file
cpx_max_tot Maximum complexity per file
f19-f27 proprietary
Table 1: List of Features

III. BALANCING THE DATA SET

In this dataset only about 0.25% of commits are faulty. This makes the classes in the data set highly
imbalanced. For training a machine learning classifier we need a fairly balanced set. Here we have opted for the
Synthetic Minority Over-sampling Technique (SMOTE) to balance the set. SMOTE is a common oversampling
technique in data analysis [4].

Figure 3: Imbalanced Data

3
Figure 4: Balanced Data

Dimensionality reduction with Principal Component Analysis (PCA) was used to reduce the data set to two
dimensions for visualization [5]. With that the effect of balancing the data set can be visualized as in figures 2 and
3. Figure 2 shows the imbalanced set where the majority of commits are good (red). Figure 3 visualizes how the
number of bad commits (blue) was significantly increased. In fact, the number of bad commits increased to
produce a 50% split between good and bad commits in the SMOTE data set. This set was then used for training.

IV. TRAINING THE CLASSIFIER

After some exploration of machine learning techniques we settled on XGBoost. XGBoost stands for eXtreme
Gradient Boosting and is an optimized distributed gradient boosting library designed to be highly efficient,
flexible and portable [6]. Every classifier has its own set of parameters, called hyper-parameters, and they need to
be tuned to achieve good results. For hyper-parameter exploration we used the popular SciKit-Learn data science
library [7].
Here is the set of best hyper parameters found, listing only parameters that deviate from XGBoost defaults:
Hyper-parameter Value
max_depth 3
learning_rate 0.1
n_estimators 600
Table 2: Hyper-Parameters

V. VALIDATING THE CLASSIFIER

Finally it was time to evaluate the classifier on the validation set. The validation set was randomly selected
from the original data. Since this selection and the generation of SMOTE data are random and can generate quite
different outcomes we had to run validation many times to generate summary statistics. Table 3 shows summary
statistics for default threshold of 0.5 (threshold defines at what probability a commit is considered a bug):

Metric Value
Precision Mean 0.344
Precision Standard Error 0.0035
Recall Mean 0.198
Recall Standard Error 0.0024
Table 3: Classifier Metrics Summary Statistics

4
Figure 5: AUC Curve of Sample Validation Set

The AUC curve in figure 3 shows how precision relates to recall for different thresholds. Varying the
threshold allows to trade off precision vs. recall. Our focus was to prioritize precision over recall.

VI. OUTCOMES
The classifier trained in the previous section can then be used to produce a risk for each commit. That is the
risk for this commit to be faulty. In the context of regression testing this can be used to find faulty commits in a
range of commits. We have tested this idea in live projects where the range of commits are all the commits
between two subsequent regression runs. Sorting this range in chronological is illustrated in Figure 5. This
would allow e.g. to run a test suite with higher coverage when the aggregate risk level is high and vice versa.

Figure 6

Sorting all the commits in range by risk level allows to give immediate feedback to the committer of high
risk commits before any simulations are run. This is illustrated in Figure 6. We have verified the accuracy of
these risk levels by running regression tests on the range and having PinDown find faulty commits when the
regression failed.

5
Figure 7

The outcome of this for a real live commercial project is shown in Figure 7. We have gradually increased the
classifier precision to determine at what precision the results are useful. We found empirically that a precision of
30% or above starts being very useful.

Figure 8: Sample Outcomes

VII. APPLICATIONS
Risk profiles are very useful in the context of CI and regression testing. Figure 8 illustrates four different
application scenarios. The CI staging and debug scenarios are already supported by PinDown today.

6
Figure 9: Applications of Machine Learning

• CI staging: A commit is first staged for CI testing. Before tests are run immediate feedback is provided
to the committer.
• CI testing: Depending on the risk profile of all staged commits a larger or smaller test suite can be
launched allowing to optimize farm resources.
• Bucketing: This is an application that does not use risk profiles. Instead the error signatures are used to
find the best matching bucket with ML.
• Debug: Risk profiles allow optimizing the search for faulty commits by prioritizing search in high risk
areas.

The user interface for immediate feedback in the CI staging phase is simply a list of staged commits ordered
by risk profile. In Figure 9 six commits are predicted faulty which translates to a 92% chance for a faulty
commit.

7
Figure 10: Staged Commits Risk Profile

VIII. CONCLUSION
The ML techniques demonstrated in this paper produced a classifier with a precision of about 34% on average
on a real life commercial project. We found this level of precision to have very useful applications in the context
of CI and regression testing.
Going forward we expect to collect significantly more data to improve classifier metrics. Another direction
for further exploration would be semantic analysis of source code and extraction of design patterns. In contrast
the features described in this paper are language agnostic. Semantic analysis and pattern detection would allow
for better representation of e.g. code complexity. However, such techniques are significantly more complex to
implement while the methods laid out in this paper are very straightforward to use.

REFERENCES
[1] Supervised Learning on Wikipedia https://en.wikipedia.org/wiki/Supervised_learning
[2] Verifyter PinDown automatic debugger http://verifyter.com/technology/debug
[3] Code Maat on github https://github.com/adamtornhill/code-maat
[4] SMOTE on Wikipedia https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis
[5] PCA on Wikipedia https://en.wikipedia.org/wiki/Principal_component_analysis
[6] XGBoost https://xgboost.readthedocs.io/en/latest/
[7] SciKit-Learn data science library http://scikit-learn.org/
[8] CLOC http://cloc.sourceforge.net/
[9] OpenCores TV80 https://opencores.org/projects/tv80

Mod 7 Smote ML
No ratings yet
Mod 7 Smote ML
40 pages
Codeforces
No ratings yet
Codeforces
3 pages
Software Defect Prediction Using Ensemble Learning
No ratings yet
Software Defect Prediction Using Ensemble Learning
6 pages
Thomas Ziegler Ma
No ratings yet
Thomas Ziegler Ma
77 pages
Software Defect
No ratings yet
Software Defect
46 pages
Sat - 3.Pdf - Code Smell Detection Using Machine Learning
No ratings yet
Sat - 3.Pdf - Code Smell Detection Using Machine Learning
11 pages
Uma's Final Project1
No ratings yet
Uma's Final Project1
92 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Pratten Thesis
No ratings yet
Pratten Thesis
33 pages
ADS-ch3 2024-25
No ratings yet
ADS-ch3 2024-25
35 pages
Learning Machine Learning With Yellowbrick
No ratings yet
Learning Machine Learning With Yellowbrick
64 pages
Bachelor thesis-G.H. Van de Water-S2297213
No ratings yet
Bachelor thesis-G.H. Van de Water-S2297213
48 pages
Dynamic Selection of Heterogenous Ensemble To Improve Bug Prediction
No ratings yet
Dynamic Selection of Heterogenous Ensemble To Improve Bug Prediction
62 pages
Estimating Handling Time of Software Defects
No ratings yet
Estimating Handling Time of Software Defects
14 pages
Feature Engineering For Machine Learning and Data Analytics
No ratings yet
Feature Engineering For Machine Learning and Data Analytics
26 pages
Technical Seminar
No ratings yet
Technical Seminar
21 pages
Software Defect Prediction - Final - Doc - Phase 1
No ratings yet
Software Defect Prediction - Final - Doc - Phase 1
36 pages
Course Work AI - Foundation
No ratings yet
Course Work AI - Foundation
12 pages
DAI Unit 3&4
No ratings yet
DAI Unit 3&4
7 pages
MT 13042
No ratings yet
MT 13042
43 pages
Bug Paper
No ratings yet
Bug Paper
10 pages
15 Jsee2445
No ratings yet
15 Jsee2445
11 pages
Anomaly Detection in Log Files Using
No ratings yet
Anomaly Detection in Log Files Using
67 pages
Introduction To Spectroscopy 5th Edition Pavia
No ratings yet
Introduction To Spectroscopy 5th Edition Pavia
7 pages
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
No ratings yet
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
20 pages
Master Thesis TU Delft Dinesh Bisesser 2020
No ratings yet
Master Thesis TU Delft Dinesh Bisesser 2020
104 pages
Bug Classification Accuracy Report Updated
No ratings yet
Bug Classification Accuracy Report Updated
6 pages
ML Unit3 QB Solutions
No ratings yet
ML Unit3 QB Solutions
11 pages
Application of Neural Networks For Software Quality Prediction Using Object-Oriented Metrics
No ratings yet
Application of Neural Networks For Software Quality Prediction Using Object-Oriented Metrics
10 pages
Study of Predicting Fault Prone Software Modules
No ratings yet
Study of Predicting Fault Prone Software Modules
3 pages
Fabric Defect Final Black Book Abcdeffg
No ratings yet
Fabric Defect Final Black Book Abcdeffg
64 pages
Real-World Challenges in Building Accurate Software Fault Prediction Models
No ratings yet
Real-World Challenges in Building Accurate Software Fault Prediction Models
47 pages
Software Reusability
No ratings yet
Software Reusability
6 pages
Software Bug Detection Using Data Mining
No ratings yet
Software Bug Detection Using Data Mining
6 pages
Effort-Aware and Just-In-Time Defect Prediction With Neural Network
No ratings yet
Effort-Aware and Just-In-Time Defect Prediction With Neural Network
19 pages
A Meta-Stacked Software Bug Prognosticator Classifier
No ratings yet
A Meta-Stacked Software Bug Prognosticator Classifier
7 pages
Question Classification Blooms 1 PDF
No ratings yet
Question Classification Blooms 1 PDF
68 pages
Overview of Software Defect Prediction Using Machine Learning Algorithms
No ratings yet
Overview of Software Defect Prediction Using Machine Learning Algorithms
12 pages
Final Report
No ratings yet
Final Report
17 pages
Research Proposal
No ratings yet
Research Proposal
4 pages
Best Mobile Under 60000
No ratings yet
Best Mobile Under 60000
24 pages
UCLA Electronic Theses and Dissertations: Title
No ratings yet
UCLA Electronic Theses and Dissertations: Title
43 pages
A Developer Centered Bug Prediction Model
No ratings yet
A Developer Centered Bug Prediction Model
21 pages
ML Project - Report
No ratings yet
ML Project - Report
5 pages
Ebug Final
No ratings yet
Ebug Final
25 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Untitled
No ratings yet
Untitled
15 pages
Chapter Four Common Business Applications of Information Technology
No ratings yet
Chapter Four Common Business Applications of Information Technology
14 pages
Identifing Software Bugs or Not Using SMLT Model
No ratings yet
Identifing Software Bugs or Not Using SMLT Model
34 pages
Software Defect Prediction Using Machine Learning
No ratings yet
Software Defect Prediction Using Machine Learning
5 pages
Dynamic Selection of Classifiers in Bug Prediction: An Adaptive Method
No ratings yet
Dynamic Selection of Classifiers in Bug Prediction: An Adaptive Method
11 pages
Using Genetic Programming To Evolve Detection Strategies For Object-Oriented Design Flaws
No ratings yet
Using Genetic Programming To Evolve Detection Strategies For Object-Oriented Design Flaws
9 pages
Fault Prediction
No ratings yet
Fault Prediction
9 pages
Message
No ratings yet
Message
17 pages
AI ML K6rn1i 54 Merged
No ratings yet
AI ML K6rn1i 54 Merged
6 pages
14 Apr
No ratings yet
14 Apr
9 pages
S2 - 5 - Lab Mininet Walkthrough
No ratings yet
S2 - 5 - Lab Mininet Walkthrough
14 pages
OPABP NidhiSrivastava
No ratings yet
OPABP NidhiSrivastava
7 pages
Module 2
No ratings yet
Module 2
20 pages
Chapter 10
No ratings yet
Chapter 10
37 pages
Predictivemaintenance FaultDetection
No ratings yet
Predictivemaintenance FaultDetection
12 pages
Context: Description
No ratings yet
Context: Description
5 pages
Job Roadmap For Students (No Degree Required)
No ratings yet
Job Roadmap For Students (No Degree Required)
37 pages
Form 4 End Term 1 Comp Q
No ratings yet
Form 4 End Term 1 Comp Q
13 pages
Background Remover Free - Remove BG For Free Online
No ratings yet
Background Remover Free - Remove BG For Free Online
1 page
Interview Questions For Freshers MMCOE
No ratings yet
Interview Questions For Freshers MMCOE
4 pages
Predicting Root Cause Analysis (RCA) Bucket For
No ratings yet
Predicting Root Cause Analysis (RCA) Bucket For
4 pages
UEBA
No ratings yet
UEBA
15 pages
Researchdemo 1
No ratings yet
Researchdemo 1
11 pages
Dynamic Control Over UVM Register Backdoor Hierarchy
No ratings yet
Dynamic Control Over UVM Register Backdoor Hierarchy
6 pages
Calibration of Software Quality: Fuzzy Neural and Rough Neural Computing Approaches
No ratings yet
Calibration of Software Quality: Fuzzy Neural and Rough Neural Computing Approaches
4 pages
Practicing Netiquettes
No ratings yet
Practicing Netiquettes
21 pages
jOOQ Manual 3.8
No ratings yet
jOOQ Manual 3.8
256 pages
Unit 1
No ratings yet
Unit 1
32 pages
Binary Tutorial
No ratings yet
Binary Tutorial
10 pages
What's The Main Benefit of A Three-Tier Architecture?
No ratings yet
What's The Main Benefit of A Three-Tier Architecture?
2 pages
Excel RTD: Vantage FX Smarttrader Tools
No ratings yet
Excel RTD: Vantage FX Smarttrader Tools
12 pages
Proposal - Website +software AMC For Cryptoconnect
No ratings yet
Proposal - Website +software AMC For Cryptoconnect
5 pages
Embedded Firmware Engineer-CPE - Bison
No ratings yet
Embedded Firmware Engineer-CPE - Bison
2 pages
Bakinplayer Log
No ratings yet
Bakinplayer Log
5 pages
Transaction Recording Anywhere Anytime: Rich Edelman
No ratings yet
Transaction Recording Anywhere Anytime: Rich Edelman
21 pages
Lab 02
No ratings yet
Lab 02
4 pages
CS1101 Computational Engineering: Introduction To C Programming Language
No ratings yet
CS1101 Computational Engineering: Introduction To C Programming Language
34 pages
Part 1 Ict Notes
No ratings yet
Part 1 Ict Notes
3 pages
Sooraj Yadav
No ratings yet
Sooraj Yadav
2 pages
Ec302T Microprocessor and Microcontroller
No ratings yet
Ec302T Microprocessor and Microcontroller
6 pages
04 2
No ratings yet
04 2
3 pages
Flexible Checker: A One-Stop Shop For All Your Checkers and A Methodology For Elastic Scoreboarding
No ratings yet
Flexible Checker: A One-Stop Shop For All Your Checkers and A Methodology For Elastic Scoreboarding
8 pages
Controls Lab Manual PDF
No ratings yet
Controls Lab Manual PDF
12 pages
Problem Set #4 Due: 1:00pm On Wednesday, February 19: Written Problems
No ratings yet
Problem Set #4 Due: 1:00pm On Wednesday, February 19: Written Problems
5 pages
Ctwist: Circumferential Tread Wear Imaging System
No ratings yet
Ctwist: Circumferential Tread Wear Imaging System
2 pages
Fpse 64
No ratings yet
Fpse 64
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Predicting Bad Commits: Finding Bugs by Learning Their Socio-Organizational Patterns

Uploaded by

Predicting Bad Commits: Finding Bugs by Learning Their Socio-Organizational Patterns

Uploaded by

Predicting Bad Commits

Finding bugs by learning their socio-organizational patterns

Christian Graber, Verifyter, San Jose CA, USA (christian.graber@verifyter.com)

Keywords—debugging; bug prediction; machine learning; continuous integration.

II. FEATURE SELECTION

Figure 1: Treemap of Complexity

• Every rectangle represents one file

• The size of the rectangle corresponds to number of source code lines

• The color of the rectangle corresponds to number of comment lines

• Rectangles are grouped by language

III. BALANCING THE DATA SET

Figure 3: Imbalanced Data

IV. TRAINING THE CLASSIFIER

V. VALIDATING THE CLASSIFIER

Figure 8: Sample Outcomes

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.