Mining SW Data
Mining SW Data
María Gómez
…. ….
What?
Why?
How?
What is Mining Software Repositories (MSR)?
DATA Actionable
Software
MINING Information
Data
What is Mining Software Repositories (MSR)?
Main goals:
Examples:
• Version control systems (CVS, SVN, Git, Mercurial)
• Bug repositories (Bugzilla, JIRA)
• Mailing lists (e-mails, wiki pages)
• Development collaboration sites (StackOverflow)
What to mine?
Examples:
• Code bases (SourceForge, GoogleCode)
• Project ecosystems (GitHub)
What to mine?
Examples:
• Crash reports
• Field logs
• Execution traces
What to mine?
Other
Repositories
Examples:
• App Stores (Google Play Store, Apple App Store)
• Contain mobile apps and user feedbacks (reviews, ratings)
What to mine?
Historical Runtime
Repositories Repositories
Cross-link
of repositories!
Code Other
Repositories Repositories
Why MSR?
• Project Manager
• Developers
• Designers
• Testers
• Usability engineers
• Engineers
MSR
• Post-release maintenance
Applications of MSR
• New bug report
• Mark duplicate
• New change
• Suggest APIs
Repositories
Actionable
Information
MSR Process
Repositories
Actionable
Information
Data Extraction
Repositories
Actionable
Information
Data Analysis
• Quantitative vs qualitative
• Regression models
• Grounded theory
Quantitative Qualitative
Example:
What factors contribute to delays on bug fixing time most?
Types of Empirical Analysis
Grounded theory
• Classification
• Clustering
Data mining techniques
Association Rules and Frequent Patterns
• Find frequent patterns in a database
• Itemset: set of items
• Support of itemsets
• Confidence of rules
• Supervised learning
• R
http://www.r-project.org/
Free software for statistical computing and graphics
• Weka
http://www.cs.waikato.ac.nz/ml/weka/
Open-source tool containing a collection of machine learning and
data mining algorithms.
MSR Process
Repositories
Actionable
Information
Data Synthesis
• Developer feedback
• Bug prediction
• Quality assurance
• Architecture analysis
• ………
What can we learn from
software data?
When do changes induce fixes? Jacek Sliwerski, Thomas Zimmermann and Andreas Zeller. (MSR’ 05)
Can we predict bugs? (2)
How Long will it Take to Fix This Bug? C. WeiB, R. Premraj, T. Zimmermann, A. Zeller. (MSR’ 07)
Can we identify duplicate bug reports?
Search-Based Duplicate Defect Detection: An Industrial Experience. Amoui, M., Kaushik, N., Al-Dabbagh, A., Tahvildari, L., Li, S., & Liu, W. (MSR’13)
Change Propagation
How does a change in one source code entity propagate to other entities?
Predicting Change Propagation in Software Systems. Ahmed E. Hassan and Richard C. Holt (ICSM ’04)
Classify Changes as Buggy or Clean
• Can we warn developers that there is a bug in a change’’?
Automatic Identification of Bug-Introducing Changes. Kim, S., Zimmermann, T., Pan, K., & James Jr, E. (ASE’ 06)
Classify Changes as Buggy or Clean
Automatic Identification of Bug-Introducing Changes. Kim, S., Zimmermann, T., Pan, K., & James Jr, E. (ASE’ 06)
Classification of security bug reports
• Interpret themes
Mining questions about software energy consumption. Pinto, G., Castor, F., & Liu, Y. D. (MSR’ 14)
API change and fault proneness
impact success
• Relationship between success of Android apps and Android API
instability
API change and fault proneness: a threat to the success of Android apps. M. Linares et al. (FSE’13)
Recommending and Localizing Change
Requests for Mobile Apps based on
User Reviews
• Automatic classification of user reviews from Google Play store
Recommending and Localizing Change Requests for Mobile Apps based on User Reviews. F. Palomba et. al. (ICSE’17)
MSR in Practice