0% found this document useful (0 votes)

66 views2 pages

Improving Bug Localization With Character-Level Convolutional Neural Network and Recurrent Neural Network

Uploaded by

Nguyen Van Toan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views2 pages

Improving Bug Localization With Character-Level Convolutional Neural Network and Recurrent Neural Network

Uploaded by

Nguyen Van Toan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

2018 25th Asia-Pacific Software Engineering Conference (APSEC)

Improving Bug Localization with Character-level

Convolutional Neural Network and Recurrent
Neural Network
Yan Xiao, Jacky Keung
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Email: yanxiao6-c@my.cityu.edu.hk, Jacky.Keung@cityu.edu.hk

Abstract—Background: Automated bug localization in large The code tokens in source files are similar to English words
amounts of source files for bug reports is a crucial task in in natural languages while the difference is obvious. Some
software engineering. However, the different representations of code tokens, especially the class or method names, are not
bug reports and source files limited the accuracy of the existing
bug localization techniques. Aims: We propose a novel deep actual words that are commonly used in natural languages.
learning-based model to improve the accuracy of bug localization Although they are very important in bug localization, most
for bug reports by expressing them in character and analyzing existing studies regarded them as unknown words [2], [3],
them with a language model. Method: The proposed model is [4]. However, both bug reports and source files are composed
composed of two main parts: character-level convolutional neural of characters. They share same expressions in character level.
network (CNN) and recurrent neural network (RNN) language
model. Both bug reports and source files are expressed in a Therefore, this paper proposes a bug localization technique
character level and then input into a CNN, whose output is given based on a language model in character-level. The proposed
to an RNN encoder-decoder architecture. Results: The results of model first obtains the character embeddings of the prepro-
preliminary experiments show that the proposed model achieves cessed bug reports and source files. Two CNNs with multiple
comparable or even higher accuracy than the existing machine filters are then applied to extract features respectively from
translation-based bug localization technique. Conclusion: The
proposed model is capable of automatically localizing buggy files the vectors of bug reports and source files, whose outputs are
for bug reports and achieves better accuracy by analyzing them fed into the subsequent RNN encoder-decoder architecture.
in character level where both bug reports and source code can The experiments on three open-source Java projects show the
be expressed. feasibility and effectiveness of the proposed model.
Index Terms—bug localization, convolutional neural network,
recurrent neural network, deep learning II. T HE P ROPOSED M ODEL
This section describes the proposed model whose overview
I. I NTRODUCTION AND M OTIVATION
is illustrated in Figure 1.
Automatically localizing buggy files for bug reports re-
mains a significant task in software project teams, especially A. Data Preprocessing
those involving hundreds of thousands of source files. It is We first combine summary and description in bug reports
painstaking for developers to search all source files for bug- to be a new document. Revised term frequency-user focused
fixing. The automated bug localization techniques are thus inverse document frequency (TF-IDuF) is then applied to filter
proposed to rank the source files and recommend the top some common words in the new bug reports for the purpose
relevant files to developers. However, bug reports are written in of redundancy reduction [3]. We also extract two types of
natural languages while source files are written in code tokens. Abstract Syntax Tree nodes from each source file as [4].
The different expressions between them have been empirically
demonstrated to be responsible for the low accuracy of the B. Character-level CNN
existing bug localization techniques [1], [2], [3], [4]. After preprocessing bug reports and source code, convolu-
Ye et al. [2] tried to bridge the lexical gap by adding the tional operations are applied in each word of them respectively.
semantic similarity between bug reports and source files into Each character in a word is transformed into a k-dimensional
their previous proposed learning-to-rank model [1]. Xiao et character embeddings, which are then convolved by multiple
al. [3] transformed bug reports and source files into word filters with different sizes (2 × k and 3 × k in Figure 1). The
vectors using word embedding techniques to preserve the subsequent max-pooling layer is used to conclude the features,
semantics, and extracted features from word vectors using whose outputs are given to encoder and decoder respectively.
enhanced CNN. To distinguish bug reports and source files,
Xiao et al. [4] proposed BugTranslator, a machine translation- C. RNN Encoder-Decoder
based bug localization technique. However, all the existing 1) Encoder: The output features extracted from a word in
techniques regard both bug reports and source files as natural bug reports by the character-level CNN are given to one long
languages. short-term memory (LSTM) cell. For example, the feature

978-1-7281-1970-0/18/$31.00 ©2018 IEEE 703

DOI 10.1109/APSEC.2018.00097
Bug Report Source Code
Bug 360872 Remove GtkCombo and friends. Gtk combo box remove text

k dimensional k dimensional
character embeddings character embeddings

convolutional layer convolutional layer

with multiple filters with multiple filters

max pooling c max pooling

Encoder Decoder

he0 LSTM LSTM LSTM LSTM LSTM LSTM hd0 LSTM LSTM LSTM LSTM LSTM score

Fig. 1. The overview of the proposed model.

vectors of the fourth word in a bug report are fed into the attention mechanism. It is much important when there are large
fourth LSTM in the encoder as shown in Figure 1. The amounts of source files. In Project Eclipse UI, the number of
context vector c is the final state of the encoder, which is source files is 6228 that is about five times the number in
the conclusion of the features of bug reports that will be one Project SWT. The performance of BugTranslator is thus better
part of input in the decoder. than our proposed model in Project Eclipse UI.
2) Decoder: Similar to the encoder, the features extracted IV. C ONCLUSIONS AND F UTURE W ORK
from each word in source code by the character-level CNN are
one part of the input to each LSTM cell. Besides, the context In this paper, bug reports and source files are analyzed in
vector is concatenated with output features of each word to be character-level instead of word-level to suppress the effect of
fed into each LSTM. different expressions on the accuracy of bug localization. The
number of unknown words is also reduced. The proposed
III. T HE P RELIMINARY R ESULTS model applies character-level CNN to extract features from
bug reports and source files, whose output is fed into the subse-
In order to validate the feasibility and effectiveness of the
quent RNN encoder-decoder. The preliminary results indicate
proposed model, we conduct several preliminary experiments
the feasibility and effectiveness of the proposed model.
on the before-fixed version of three open-source Java projects
1 We intend to enhance the proposed model with attention
similar to [4]. 3656, 2632, 2817 bug reports respectively
mechanisms and fine-tune the proposed model. In the future,
for Project Eclipse UI, JDT, SWT are used. Mean average
we will conduct experiments on more projects to obtain the
precision (MAP) and mean reciprocal rank (MRR) are used
general performance of the proposed model.
to evaluate the performance of the proposed model and the
competitor BugTranslator [4]. V. ACKNOWLEDGEMENT
TABLE I This work is supported in part by the General Research Fund
R ESULTS OF T WO M ODELS . of the Research Grants Council of Hong Kong (No. 11208017)
and the research funds of City University of Hong Kong (No.
Project Metrics BugTranslator Proposed model
MAP 0.36 0.35
9678149 and 7005028), and the Research Support Fund by
Eclipse UI Intel.
MRR 0.42 0.40
MAP 0.34 0.35
JDT R EFERENCES
MRR 0.41 0.42
MAP 0.34 0.37
SWT [1] X. Ye, R. Bunescu, and C. Liu, “Learning to rank relevant files for
MRR 0.40 0.42
bug reports using domain knowledge,” in Proceedings of the 22nd
ACM SIGSOFT International Symposium on Foundations of Software
The preliminary results are shown in Table I. The MAP Engineering. ACM, 2014, pp. 689–699.
[2] X. Ye, H. Shen, X. Ma, R. Bunescu, and C. Liu, “From word embeddings
and MRR values of the proposed model are better than to document similarities for improved information retrieval in software
BugTranslator in Project JDT and SWT. The performance engineering,” in Proceedings of the 38th International Conference on
of BugTranslator is limited since it analyzes bug reports and Software Engineering. ACM, 2016, pp. 404–415.
[3] Y. Xiao, J. Keung, Q. Mi, and K. E. Bennin, “Improving bug localization
source files in word level where many out-of-vocabulary words with an enhanced convolutional neural network,” in Asia-Pacific Software
exist. But BugTranslator gives more emphases on the related Engineering Conference (APSEC), 2017 24th. IEEE, 2017, pp. 338–347.
words in buggy files with those in bug reports using an [4] Y. Xiao, J. Keung, K. E. Bennin, and Q. Mi, “Machine translation-based
bug localization technique for bridging lexical gap,” Information and
1 https://github.com/yanxiao6/BugLocalization-dataset
Software Technology, 2018.

704

(ICPC 2017) Bug Localization With Combination of Deep Learning and Information Retrieval
No ratings yet
(ICPC 2017) Bug Localization With Combination of Deep Learning and Information Retrieval
12 pages
Ise Report
No ratings yet
Ise Report
7 pages
Word Embedding Comparison
No ratings yet
Word Embedding Comparison
19 pages
Fault Localization Using Deep Learning
No ratings yet
Fault Localization Using Deep Learning
6 pages
2019 ICLR CuBERT Pre Trained Contextual Embedding of Source Code
No ratings yet
2019 ICLR CuBERT Pre Trained Contextual Embedding of Source Code
22 pages
BDCC 06 00156 v2
No ratings yet
BDCC 06 00156 v2
23 pages
Improving Bug Detection Via Context-Based Code Rep
No ratings yet
Improving Bug Detection Via Context-Based Code Rep
30 pages
2019-POPL - Code2vec Learning Distributed Representations of Code
No ratings yet
2019-POPL - Code2vec Learning Distributed Representations of Code
29 pages
Fse23 1
No ratings yet
Fse23 1
13 pages
Code2vec Learning Distributed Representations of Code
No ratings yet
Code2vec Learning Distributed Representations of Code
30 pages
A Deep Dive Into Large Language Models For Automated Bug Localization and Repair
No ratings yet
A Deep Dive Into Large Language Models For Automated Bug Localization and Repair
23 pages
ABLo TS
No ratings yet
ABLo TS
12 pages
BTP Report
No ratings yet
BTP Report
27 pages
Visual Image Caption Generator Using Deep Learning
No ratings yet
Visual Image Caption Generator Using Deep Learning
7 pages
Implementation of Simple and Efficient P
No ratings yet
Implementation of Simple and Efficient P
8 pages
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
No ratings yet
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
26 pages
Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory
No ratings yet
Implementing Complexity in Automatic Image Caption Generator Using Recurrent Neural Network Over Long Short-Term Memory
8 pages
Streamlining Security Vulnerability Triage With Large Language Models
No ratings yet
Streamlining Security Vulnerability Triage With Large Language Models
16 pages
Mishra Thesis AI Augmented Vulnerability
No ratings yet
Mishra Thesis AI Augmented Vulnerability
96 pages
Ebug Final
No ratings yet
Ebug Final
25 pages
CNN-Based Automatic Prioritization of Bug Reports Transaction Paper
No ratings yet
CNN-Based Automatic Prioritization of Bug Reports Transaction Paper
14 pages
Source Code Plagiarism
No ratings yet
Source Code Plagiarism
41 pages
SP-Automatic Generation of Descriptive Comments For Code Blocks
No ratings yet
SP-Automatic Generation of Descriptive Comments For Code Blocks
8 pages
Bug Classification Accuracy Report Updated
No ratings yet
Bug Classification Accuracy Report Updated
6 pages
Show and Tell: A Neural Image Caption Generator
No ratings yet
Show and Tell: A Neural Image Caption Generator
9 pages
Software Defect Prediction PPR
No ratings yet
Software Defect Prediction PPR
11 pages
RLocator
No ratings yet
RLocator
14 pages
Dense Captioning - Public
No ratings yet
Dense Captioning - Public
53 pages
Technical Report: Image Captioning With Semantically Similar Images
No ratings yet
Technical Report: Image Captioning With Semantically Similar Images
3 pages
Pami Im2Show and Tell: Lessons Learned From The 2015 MSCOCO Image Captioning Challenge
No ratings yet
Pami Im2Show and Tell: Lessons Learned From The 2015 MSCOCO Image Captioning Challenge
12 pages
Apply Deep Learning-Based CNN and LSTM For Visual Image Caption Generator
No ratings yet
Apply Deep Learning-Based CNN and LSTM For Visual Image Caption Generator
6 pages
Zhang 2019
No ratings yet
Zhang 2019
12 pages
Bug Paper
No ratings yet
Bug Paper
10 pages
AI Experience
No ratings yet
AI Experience
4 pages
Eesha Survey Papers
No ratings yet
Eesha Survey Papers
12 pages
Locagent: Graph-Guided LLM Agents For Code Localization: Xiangru - Tang@Yale - Edu Gangdade@Usc - Edu Xingyao@All-Hands - Dev
No ratings yet
Locagent: Graph-Guided LLM Agents For Code Localization: Xiangru - Tang@Yale - Edu Gangdade@Usc - Edu Xingyao@All-Hands - Dev
17 pages
2019 Subcharacter Embeddings' Preference On Neural Networks
No ratings yet
2019 Subcharacter Embeddings' Preference On Neural Networks
4 pages
Deep Captioning With MRNN
No ratings yet
Deep Captioning With MRNN
17 pages
PGCON Paper Final
No ratings yet
PGCON Paper Final
4 pages
Abstract
No ratings yet
Abstract
1 page
A Guide To Image Captioning. How Deep Learning Helps in Captioning
No ratings yet
A Guide To Image Captioning. How Deep Learning Helps in Captioning
17 pages
AIND-Capstone - Machine - Translation - Ipynb at Master Tommytracey - AIND-Capstone
No ratings yet
AIND-Capstone - Machine - Translation - Ipynb at Master Tommytracey - AIND-Capstone
26 pages
Bug Reports Priortisation 5 Page
No ratings yet
Bug Reports Priortisation 5 Page
5 pages
Image Captioning
No ratings yet
Image Captioning
33 pages
Mapping Bug Reports To Relevant Files: A Ranking Model, A Fine-Grained Benchmark, and Feature Evaluation
No ratings yet
Mapping Bug Reports To Relevant Files: A Ranking Model, A Fine-Grained Benchmark, and Feature Evaluation
18 pages
DL Project Report
No ratings yet
DL Project Report
10 pages
Automated Image Captioning Using CNN and RNN
No ratings yet
Automated Image Captioning Using CNN and RNN
17 pages
Loreggia Giacomo
No ratings yet
Loreggia Giacomo
80 pages
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
No ratings yet
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
10 pages
Malware Classification Using Graph Neural Networks
No ratings yet
Malware Classification Using Graph Neural Networks
53 pages
Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
QQ - GG: Point Any
No ratings yet
QQ - GG: Point Any
14 pages
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
No ratings yet
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
10 pages
4.1 - Unsupervised Visual Representation Learning by Context Prediction
No ratings yet
4.1 - Unsupervised Visual Representation Learning by Context Prediction
10 pages
Gen AI Lab Questions
No ratings yet
Gen AI Lab Questions
3 pages
Bug Localization Thesis
100% (3)
Bug Localization Thesis
8 pages
MT 13042
No ratings yet
MT 13042
43 pages
Efficient Estimation of Word Representations in Vector Space: January 2013
No ratings yet
Efficient Estimation of Word Representations in Vector Space: January 2013
13 pages
Generating Caption From Images Using Flickr Image Dataset
No ratings yet
Generating Caption From Images Using Flickr Image Dataset
7 pages
Computing Jacobian and Hessian of Estimators and Their Application To Risk Approximation
No ratings yet
Computing Jacobian and Hessian of Estimators and Their Application To Risk Approximation
4 pages
Definability of Rough Approximations For Binary Relations and Cloud Computing
No ratings yet
Definability of Rough Approximations For Binary Relations and Cloud Computing
4 pages
Knowledge Approximations and Representations in Binary Granular Computing
No ratings yet
Knowledge Approximations and Representations in Binary Granular Computing
5 pages
LAB: Cell-Based Training LAB
No ratings yet
LAB: Cell-Based Training LAB
37 pages
Design Review Checklist: Hauw Suwito, Consultant
No ratings yet
Design Review Checklist: Hauw Suwito, Consultant
5 pages
P Practical Dynamic Element Matching Techniques For 3-Level Unit Elements
No ratings yet
P Practical Dynamic Element Matching Techniques For 3-Level Unit Elements
87 pages
Design A Trustzone-Enalble Soc Using The Xilinx Vivado Cad Tool
No ratings yet
Design A Trustzone-Enalble Soc Using The Xilinx Vivado Cad Tool
28 pages
Models and Implementation of A Dynamic Element Matching DAC
No ratings yet
Models and Implementation of A Dynamic Element Matching DAC
10 pages
Introduction To: System-on-Chip
No ratings yet
Introduction To: System-on-Chip
34 pages
Soc Design Methodology Soc Design Methodology
No ratings yet
Soc Design Methodology Soc Design Methodology
25 pages
A Comparison of Dynamic Element Matching in Dacs
No ratings yet
A Comparison of Dynamic Element Matching in Dacs
6 pages
A Fully Integrated DC-DC Converter For Dynamic Voltage Scaling Applications
No ratings yet
A Fully Integrated DC-DC Converter For Dynamic Voltage Scaling Applications
4 pages
A Comparative Study On Convolutional Neural Network Based Face Recognition
No ratings yet
A Comparative Study On Convolutional Neural Network Based Face Recognition
5 pages
Models and Implementation of A Dynamic Element Matching DAC
No ratings yet
Models and Implementation of A Dynamic Element Matching DAC
6 pages
A Tri-Level Current-Steering DAC Design With Improved Output-Impedance Related Dynamic Performance
No ratings yet
A Tri-Level Current-Steering DAC Design With Improved Output-Impedance Related Dynamic Performance
4 pages
Study On Impacts of Large-Scale Photovoltaic Power Station On Power Grid Voltage Profile
No ratings yet
Study On Impacts of Large-Scale Photovoltaic Power Station On Power Grid Voltage Profile
5 pages
Ad3501 Deep Learning QB
No ratings yet
Ad3501 Deep Learning QB
8 pages
CSE182PR
No ratings yet
CSE182PR
50 pages
Visual Question Answering System For Indian Regional Languages
No ratings yet
Visual Question Answering System For Indian Regional Languages
6 pages
Literature Review Table
No ratings yet
Literature Review Table
9 pages
UNIT2
No ratings yet
UNIT2
25 pages
Handwriting To Text Conversion
No ratings yet
Handwriting To Text Conversion
7 pages
Speech Emotion System Full Project Report
No ratings yet
Speech Emotion System Full Project Report
54 pages
Aws Scholarship
No ratings yet
Aws Scholarship
48 pages
DSP Research Paper by Shanmukh and Meher
No ratings yet
DSP Research Paper by Shanmukh and Meher
33 pages
Atulkumar Bca 5thsem A35404819038 NTCC Amity University Jharkhand
No ratings yet
Atulkumar Bca 5thsem A35404819038 NTCC Amity University Jharkhand
76 pages
Vieira, 2021
No ratings yet
Vieira, 2021
15 pages
Deep Learning Chemical Engineering
100% (1)
Deep Learning Chemical Engineering
13 pages
A Brief Review On Artificial Neural Network Network Structures and Applications
No ratings yet
A Brief Review On Artificial Neural Network Network Structures and Applications
6 pages
Bert Bilstm Lstm联合抽取
No ratings yet
Bert Bilstm Lstm联合抽取
11 pages
Data Science
No ratings yet
Data Science
9 pages
Detecting and Mitigating LowRate DoS and DDoS Attacks Multimodal Fusion of TimeFrequency Analysis and Deep Learning Model 2024 Strojarski Facultet
No ratings yet
Detecting and Mitigating LowRate DoS and DDoS Attacks Multimodal Fusion of TimeFrequency Analysis and Deep Learning Model 2024 Strojarski Facultet
7 pages
Deep Learning
No ratings yet
Deep Learning
22 pages
Momentumrnn: Integrating Momentum Into Recurrent Neural Networks
No ratings yet
Momentumrnn: Integrating Momentum Into Recurrent Neural Networks
13 pages
Audio Deepfake Detection Paper
100% (1)
Audio Deepfake Detection Paper
6 pages
Get Cyber Security and Network Security 1st Edition Sabyasachi Pramanik Free All Chapters
100% (10)
Get Cyber Security and Network Security 1st Edition Sabyasachi Pramanik Free All Chapters
72 pages
(2016 ACCV) Anticipating Accidents in Dashcam Videos
No ratings yet
(2016 ACCV) Anticipating Accidents in Dashcam Videos
18 pages
Final Document-1
No ratings yet
Final Document-1
62 pages
Sequence Models
No ratings yet
Sequence Models
85 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
Sample Project Report
No ratings yet
Sample Project Report
26 pages
T-Gsa: Transformer With Gaussian-Weighted Self-Attention For Speech Enhancement
No ratings yet
T-Gsa: Transformer With Gaussian-Weighted Self-Attention For Speech Enhancement
5 pages
Machine Learning Approach For Predictive Maintenance in Hydroelectric Power Plants
No ratings yet
Machine Learning Approach For Predictive Maintenance in Hydroelectric Power Plants
6 pages
Event-Driven LSTM For Forex Price Prediction
No ratings yet
Event-Driven LSTM For Forex Price Prediction
7 pages
Sarima VS LSTM For Time Series
No ratings yet
Sarima VS LSTM For Time Series
14 pages
Mathematics 08 01245 v2
No ratings yet
Mathematics 08 01245 v2
29 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Improving Bug Localization With Character-Level Convolutional Neural Network and Recurrent Neural Network

Uploaded by

Improving Bug Localization With Character-Level Convolutional Neural Network and Recurrent Neural Network

Uploaded by

2018 25th Asia-Pacific Software Engineering Conference (APSEC)

Improving Bug Localization with Character-level

978-1-7281-1970-0/18/$31.00 ©2018 IEEE 703

convolutional layer convolutional layer

max pooling c max pooling

Fig. 1. The overview of the proposed model.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.