Combining Xxsentence Similarities Measures To Identify Paraphrases
Combining Xxsentence Similarities Measures To Identify Paraphrases
TagedPAbstract
Paraphrase identification consists in the process of verifying if two sentences are semantically equivalent or not. It is applied in
many natural language tasks, such as text summarization, information retrieval, text categorization, and machine translation. In
general, methods for assessing paraphrase identification perform three steps. First, they represent sentences as vectors using bag
of words or syntactic information of the words present the sentence. Next, this representation is used to measure different similari-
ties between two sentences. In the third step, these similarities are given as input to a machine learning algorithm that classifies
these two sentences as paraphrase or not. However, two important problems in the area of paraphrase identification are not han-
dled: (i) the meaning problem: two sentences sharing the same meaning, composed of different words; and (ii) the word order
problem: the order of the words in the sentences may change the meaning of the text. This paper proposes a paraphrase identifica-
tion system that represents each pair of sentence as a combination of different similarity measures. These measures extract lexical,
syntactic and semantic components of the sentences encompassed in a graph. The proposed method was benchmarked using the
Microsoft Paraphrase Corpus, which is the publicly available standard dataset for the task. Different machine learning algorithms
were applied to classify a sentence pair as paraphrase or not. The results show that the proposed method outperforms state-of-the-
art systems.
Ó 2017 Elsevier Ltd. All rights reserved.
1 1. Introduction
2 TagedPThe degree of similarity between phrases is measured by sentence similarity, or short-text similarity methods.
3 These similarity methods should also address problems of measuring sentences with partial information, such as
4 when one sentence is split into two or more short texts or phrases that contain two or more sentences. One specific
5 task derived from sentence similarity is the Paraphrase Identification (PI). This task aims to verify if two sentences
I
This paper has been recommended for acceptance by Pascale fung.
* Corresponding author.
E-mail addresses: rafael.mello@ufrpe.br, rafaelflmello@gmail.com (R. Ferreira), gdcc@cin.ufpe.br (G.D.C. Cavalcanti), fred@cin.ufpe.br
(F. Freitas), rdl@cin.ufpe.br (R.D. Lins), steven.simske@hp.com (S.J. Simske), marcelo.riss@hp.com (M. Riss).
http://dx.doi.org/10.1016/j.csl.2017.07.002
0885-2308/ 2017 Elsevier Ltd. All rights reserved.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
6 TagedPare semantically equivalent or not (Das and Smith, 2009). Automatic text summarization (Ferreira et al., 2013),
7 information retrieval (Yu et al., 2009), image retrieval (Coelho et al., 2004), text categorization (Liu and Guo, 2005),
8 and machine translation (Papineni et al., 2002) are examples of applications that rely on or may benefit from sen-
9 tence similarity and PI methods.
10 TagedPThe literature reports several efforts to address such a problem by extracting syntactic information from sentences
11 (Islam and Inkpen, 2008; Oliva et al., 2011) or by representing sentences using vectors of bag of words (Mihalcea
12 et al., 2006; Qiu et al., 2006). Sentences are modelled in such a way to allow similarity methods to compute different
13 measures to evaluate the degree of similarity between words. In general, a PI method conveys these similarities as
14 input to machine learning algorithms in order to identify paraphrases. However, two important problems are not han-
15 dled in traditional sentence similarities approaches:
TagedPThe Meaning Problem (Choudhary and Bhattacharyya, 2002): It is characterized by the lack of semantic analysis in
the previous sentence similarity measures proposed. Essentially this problem is to measure the similarity between
the meaning of sentences (Choudhary and Bhattacharyya, 2002). Nevertheless, the measures that claim to deal with
it only apply methods such as the latent semantic indexing (Deerwester et al., 1990), corpus-based methods (Li et al.,
2003) and WordNet similarity measures (Mihalcea et al., 2006). These techniques, however, are used to find the
semantic similarity of the words in a sentence, but not the similarity between two complete sentences. Thus, the eval-
uation of the meaning similarity degree between two sentences remains still an open problem. For example, the sen-
tences Peter is a handsome boy and Peter is a good-looking lad, share a similar meaning, if the context they appear
in does not change much.
TagedPD104X XThe Word Order Problem (Zhou et al., 2010): In many cases different word order implies in divergent sentences’
meanings (Zhou et al., 2010). For example, A loves B and B loves A, represent two completely different sentences in
meaning. Therefore, dealing with this problem certainly enhances the final measure of sentence similarity.
16 TagedPThis paper proposes a paraphrase identification system that combines lexical, syntactic and semantic similarity
17 measures. Since traditional methods only rely on lexical and syntactic measures, we believe that the addition of
18 semantic role annotation analysis (Marquez et al., 2008) is a promising alternative to address the meaning and the
19 word order problems. These three measures were previously tried on and obtained good results on the sentence simi-
20 larity problem (Ferreira et al., 2014b). The same authors improved their sentence similarity measure by using a simi-
21 larity matrix that penalizes the measure based on sentence size (Ferreira et al., 2014a). This penalization is important
22 because large sentences could be considered similar even if they contain more information than small sentences. To
23 the best of our knowledge, these measures were never used to identify paraphrases.D105X X Thus, the main novelty of this
24 paper is the application of different machine learning algorithms to combine sentence similarity measures in order to
25 identify paraphrases. In addition it presents the concept of Basic Unit to the sentence similarity measures proposed
26 in previous papers.
27 TagedPThe proposes system is composed of three steps:
28 1TagedP . D106X XSentence Representation: This step performs the lexical, syntactic and semantic analysis and encapsules the out-
29 puts in a text file (for lexical) and two RDFs1 files (for syntactic and semantic).
30 TagedP2. D107X XSimilarity Analysis: It measures the similarity of each pair of sentences using the output of the previous step.
31 TagedP3. D108X XParaphrase Classification: The last step applies a machine learning algorithm, using the sentences similarities
32 measures from second step, to identify if the pair of sentences is paraphrase or not.
33 TagedPIn order to evaluate the proposed system, a series of experiments was performed using the Microsoft Research
34 Paraphrase Corpus (MSRP) (Dolan et al., 2004), which is the standard dataset for this problem. The proposed
35 approach was compared using four measures: accuracy, precision, recall, and F-measure (Achananuparp et al.,
36 2008), in the experimental study, the principal hypothesis of this work was validated showing that the combination
37 of lexical, syntactic, and semantic aspects of a sentence pair achieve better results for the PI task than state-of-the-art
38 methods. In addition, it is also validated that the use of the sentence representation proposed in (Ferreira et al.,
39 2014b) achieves good performance for the PI task.
1
Resource Description Framework.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
40 TagedPThe rest of this paper is organized as follows. Section 2 presents the most relevant differences between the pro-
41 posed method and the state of the art related work. Section 3 explains the proposed sentence representation, the simi-
42 larity measure, and the paraphrase identification process. The benchmarking of the proposed and the best other
43 similar methods is presented in Section 4. This paper ends drawing the conclusions and discussing lines for further
44 work in Section 5.
46 TagedPThis section gives an overview of previous methods for paraphrase identification (Androutsopoulos and
47 Malakasiotis, 2010). The methods proposed could be divided into: (i) Threshold-based, which empirically iden-
48 tifies a threshold in sentence similarities values that divide the sentences into two groups (paraphrase or not);
49 and (ii) Machine Learning, that applies machine learning to different features (usually similarities values) to
50 identify paraphrases.
51 TagedPThe threshold-based methods always carry out a sentence similarity step before PI, and returns as output a simi-
52 larity value between 0 and 1. The second step finds a threshold that classifies the sentences as paraphrase or not.
53 It follows the description about the state-of-art threshold-based methods.
54 TagedPMihalcea et al. (Mihalcea et al., 2006) represent the sentences as bag of word vector and perform a similarity mea-
55 sure that work as follows: for each word in the first sentence (main sentence), it tries to identify the word in the sec-
56 ond sentence that has the highest semantic similarity according to one of the word-to-word similarity measure.
57 Then, the process is repeated using the second sentence as the main sentence. Finally, the resulting similarity scores
58 are combined using the arithmetic average. This method uses a threshold equal to 0.5 to identify paraphrases. Thus,
59 sentences with a similarity value highest than 0.5 are tagged as paraphrases.
60 TagedPOliva and collaborators (Oliva et al., 2011) propose the SyMSS method that assesses the influence of the syn-
61 tactic structure of two compared sentences in the similarity calculation. They represent the sentences as syntactic
62 dependence tree. It is based on the idea that a sentence is made up of meaning of its individual words and the
63 syntactic connections among them. Using WordNet, semantic information is obtained through a process that
64 finds the main phrases composing the sentence. Then they applied different thresholds, between 0 and 1 with a
65 variation of 0.05, to identify which one improves PI accuracy. The best results were obtained using 0.6 as
66 threshold.
67 TagedPIslam and Inkpen (Islam and Inkpen, 2008), presented an approach to measure the similarity of two texts that
68 makes use of semantic and syntactic information. They combines three different similarities to perform the PI task.
69 At first, they uses the entire sentence as a string to calculate string similarity, which is acquired applying the longest
70 common subsequence measure (Kondrak, 2005). Then, they use a bag of word representation to perform a semantic
71 word similarity, which is measured by a corpus-based measure (Islam and Inkpen, 2006). The last similarity uses
72 syntactic information to evaluate a word order similarity. The final similarity is calculated combining the string simi-
73 larity, semantic similarity and common-word order similarity. As proposed on (Oliva et al., 2011) they test different
74 thresholds between 0 and 1. However, the variance value was 0.1. As (Oliva et al., 2011) they conclude that the best
75 threshold was 0.6.
76 TagedPDas and Smith (Das and Smith, 2009) proposed a probabilistic model (Smith and Eisner, 2006) which incorpo-
77 rates syntax, semantics (using WordNet), and hidden loose alignments between two sentences trees to perform sen-
78 tence similarity. Applying theses features they estimate similarity as a posterior probability in a classifier. If the
79 posterior probability exceeds 0.5, the pair is labeled a paraphrase.
80 TagedPA different way to look to PI is relying on machine learning algorithms. In this context, the methods proposed
81 uses different kind of features, like similarities between sentences and sentence dependency relation, to classify the
82 sentence pair as paraphrase or not. The papers Heilman and Smith (Heilman and Smith, 2010), Qiu et al. (Qiu et al.,
83 2006) and Stephen et al. (Wan et al., 2006) proposed a machine learning based methods to deal with PI process. It is
84 important to notice that all of these papers uses a supervised approach.
85 TagedPHeilman and Smith (Heilman and Smith, 2010) proposed a tree edit method, which encapsules the syntactic rela-
86 tions among tokens in a sentence, to identifies paraphrases. The main idea is to transform the tree created for the first
87 sentence into the second using nine operations, like insert and delete. The authors train a logistic regression classifi-
88 cation model to seek a tree edits short sequence of that transforms one tree into another. They found 33 edit sequen-
89 ces that classify sentence pairs as paraphrases or not.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
90 TagedPA supervised two-phase framework based on sentences dissimilarity to identify paraphrases is introduced by Qiu
91 and colleagues (Qiu et al., 2006). They represent sentences using semantic triples extract from PropBank (Palmer
92 et al., 2005). In the first phase, the system calculates semantic similarity among sentence’s tokens to find related
93 words and pair them up in a greedy manner. The second phase is responsible to identify if extra information,
94 unpaired tokens, exists in the sentences and if the effect of its removal is not significant. Thus, a Support Vector
95 Machine (SVM) classifier is applied to a wide set of features of unpaired tokens, including internal counts of numeric
96 expressions, named entities, words, semantic roles, whether they are similar to other tuples in the same sentence, and
97 contextual features like source/target sentence length and paired tokens count. The SVM classifies sentence pairs as
98 paraphrase or not.
99 TagedPStephen et al. (Wan et al., 2006) created an approach that uses of 17 different features in order to identifies para-
100 phrases. The features are divided into: (i) n-gram overlap features, (ii) dependency relation overlap features, (iii)
101 dependency tree-edit distance features, and (iv) surface features features. The authors applies four different machine
102 learning algorithms (naive bayes learner, C4.5 decision tree, support vector machine, K-nearest neighboD10X Xr) using the
103 17 features to label a pair of sentence as paraphrase or not.
104 XTagedPD1 XRecently, studies have been proposed the application of Convolutional Neural Network (CNN) techniques to
105 deal with the problem of paraphrase identification (Yin and Sch€utze, 2015; Yin et al., 2015). XD12 XThe main idea is to
106 use word embeddings to represent the sentences and then apply CNN algorithms to classify sentence pairs as para-
107 phrase or not.
108 TagedPThe work by Ferreira and collaborators (Ferreira et al., 2014b) and (Ferreira et al., 2014a) proposes an approach
109 that apply lexical, syntactic and semantic analysis to represent sentences. Then, it proposed a word matching algo-
110 rithm, using statistics and WordNet measures, to carry out the similarity between sentences. These approaches
111 achieve good results in sentence similarities contexts, however they have not been used yet to identify paraphrases.
112 TagedPAlthough the methods in this paper take advantage of similar ideas to (Ferreira et al., 2014b) and (Ferreira et al.,
113 2014a) to compare sentences, the approach here proposes a new similarity matrix added to a new algorithm. Besides
114 that, the similarity measure relies on a size penalization coefficient to reduce the similarity for sentences with differ-
115 ent sizes. Thus, the approach proposed in this paper combines: (i) three layers sentence representation, which encom-
116 passes three different levels of sentence information. Previous works do not combine these three levels of sentence
117 information. They usually provides only one or two analysis. (ii) A similarity measure that encapsulates a matrix
118 considering similarities among all words in sentences and a size penalization coefficient to deal with sentences with
119 different sizes. This paper proposed the application of different machine learning algorithms to identify paraphrases
120 using the proposed similarities as features.
121 TagedPTable 1 presents a summary of the systems presented and the proposed system. It lists: (i) if the system uses a
122 Threshold (T), Machine Learning (ML) or Convolutional Neural Network (CNN); (ii) the sentence representation
123 used; (iii) syntactic (Syntactic Relations (SR) and Word Order (WO)) and semantic (Corpus-based and WordNet
Table 1
Features’ comparison among related work and the proposal.
System Method Representation Syntactic Semantic Features
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
124 TagedPmeasures and Semantic Role Annotation (SRA)) aspects used; and (iv) the features that the system relies on to per-
125 form the PI.
126 TagedPD13X XIt is importante to notice that the recently some papers were proposed to deal with the identification of paraphrase
127 in Twitter (Xu et al., 2015). D14X XHowever, the papers described in this section are more focused on news sentences.
128 Therefore, they were not compared with the works on Twitter. - FIQUEI NA DVIDA SE DEIXO AQUI, NA
129 DISCUSSO OU CONCLUSO.
131 TagedPGiven a dataset G = {a1, a2, ...D16X X, an}, where ai = (si1, si2) is a pair of sentences, a paraphrase identification system
132 aims to verify if two sentences, si1 and si2, are semantically equivalent or not.
133 TagedPFig. 1 shows the proposed paraphrase identification system that is composed of three modules: Sentence
134 Representation, Similarity Analysis, and Classification.
TagedPSentence Representation. This module represents each sentence (sij) using three different sets of features: Lexical,
Syntactic, and Semantic. The output of this component are two triples (r1 and r2), where ri is the representation of
each sentence using lexical, syntactic and semantic analysis. The lexical representation consists in a bag of words
vector, and the syntactic and semantic representations are Resource Description Framework (RDF) graphs. More
details about these representations are presented in Section 3.1.
TagedPSimilarity Analysis. This module provides different methods to measure similarities between sentences using the tri-
ples (r1 and r2), provided by previous module. These similarities are used as features (F) to identify paraphrases in
the classification module. So, F ¼ {sim1, sim2, ...D,17X X simn} is the feature vector used to identify paraphrase, where n
is number of different similarity measures used and simi is the similarity value between r1 and r2. Details about the
similarities method are provided in Section 3.2.
TagedPClassification. This module receives the output (F) from the previous module as input to machine learning techni-
ques in order to classify the pair of sentences as paraphrase or not. The classification process is supervised, in other
words, the system requires a dataset containing sentences labeled as paraphrases or not.
135 TagedPThe following sections describe the Sentence Representation and Similarity Analysis modules.
136 D18X X
3.1. The system’s sentence representation
137 TagedPThis section explains the sentence representation used for calculating the similarity measure encompassing three
138 layers: lexical, syntactic and semantic. A single sentence is taken as input to build such representation. The output is
139 a text and two RDF files (W3C, 2004) that contain the lexical, syntactic and semantic layers, respectively. Each layer
140 is detailed as followsD19X X:
144 1TagedP . D12XLexicalX analysis: This step splits the sentence into a list of tokens, including punctuation.
145 TagedP2. Stopword
D12X X removal: It rules out words with little representative value to the document, e.g. articles and pronouns,
146 and the punctuation. This work benefits from the stopword list proposed by Dolamic and Savoy (Dolamic and
147 Savoy, 2010).
Fig. 1. Architecture of the proposed paraphrase identification system. G is a dataset contain pair of sentences; sij is a pair of sentences; r1 and r2
are the triples that contains the representation using lexical, syntactic and semantic analysis for each sentence; and F is feature vector used to clas-
sify the pair os sentences as paraphrase or nor.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
148 TagedP3. D123X XLemmatization: This step applies the stemming preprocessing service, which translates tokens into their basic
149 form. For instance, plural words are made singular and all verb tenses and persons are exchanged by the verb
150 infinitive. Lemmatization for this system is carried out by the Stanford coreNLP tool2.
151 TagedPFig. 2 depicts the operations accomplished in this layer for the sentence “The judge declared the defendant
152 guiltyD124X X”. It also displays the output of each step. The output of this layer is a text file containing the list of tokens.
153 TagedPThis layer helps improving the performance for simple text processing tasks. Although it does not convey much
154 information about the sentence, it is widely employed in various traditional text mining tasks, such as, information
155 retrieval and summarization.
159 1TagedP . D126X XSyntactic analysis: This step benefits from the output of a dependence tree, built based on (de Marneffe and
160 Manning, 2008), to extract relations such as subject, direct object, adverbial modifier, among others. In addition,
161 prepositions and conjunction relations are also extracted in this step.
162 TagedP2. D127X XGraph creation: Next, a directed graph stores the entities with their relations. The vertices are the elements
163 obtained from the shallow layer, while the edges denote the relations described in the previous steps.
164 TagedPFig. 3 deploys the syntactic layer for the sentence “The judge declared the defendant guiltyD128X X”. The edges usually
165 have one direction, following the direction of the syntactic relations. This is not always the case, however. The
166 model also accommodates bi-directed edges, usually corresponding to conjunction relations. One should notice that
167 all vertices from the example are listed in the output of the previous layer.
168 TagedPThe syntactic analysis step is important as it represents an order relation among the tokens of a sentence. It
169 describes the possible or acceptable syntactic structures of the language; and decomposes the text into syntactic units
170 in order to “understandD129X X” the way in which the syntactic elements are arranged in a sentence. Such kind of relations
171 could be used in applications, as for instance, automatic text summarization, text categorization, information
1X X
Fig. 2. Lexical Layer for the sentence “The judge declared the defendant guiltyD”.
2X X 3X X
Fig. 3. Syntactic layer for “The judge declared the defendant guiltyD”.D
2
http://nlp.stanford.edu/software/corenlp.shtml.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
4X X .5X X
Fig. 4. Semantic layer for the phrase “The judge declared the defendant guiltyD”D
172 TagedPretrieval, etc. The process of creating the dependence tree could extract wrong relations, however the Section 4
173 shows that this layer is important to identify paraphrases.
174 TagedPThe RDF format was chosen to store the graph because: (i) it is a standard model for data interchange on the web;
175 (ii) it provides a simple and clean format; (iii) inferences are easily summoned with the RDF triples; and (iv) there
176 are several freely available tools to handle RDF.
182 1TagedP . D132XSense X identification: Sense identification is of paramount importance to this type of representation since different
183 words could denote the same meaning, particularly regarding to verbs. For instance, “affirm”, “argue”, “claim”,
184 “declare” are words that could be associated with the sense of “statementD13X X”.
185 TagedP2. Role D134X X annotation: Differently from the syntactic layer, role annotation identifies the semantic function of each
186 entity. For instance, in the same sentence of the previous example, judge is the speaker of the action declared.
187 Thus, the interpretation of the action is additionally identified, not only its syntactic relation.
188 TagedPThis layer deals with the meaning problem, receiving the output of the step of sense identification as its input. The
189 general meaning of the main entities of a sentence, not just the written words, is identified in this step. On its turn,
190 role annotation extracts discourse information, as it deploys the order of the actions, the actors, etc, dealing with
191 word order problem. Such information is relevant for the extraction and summarization tasks, for instance. For both
192 sense identification and role annotation the proposed methods extracts relations from Framenet3 using Semafor tool-
193 kit4. Once again, it is important to notice that these NLP procedures cannot identify all semantic relations. Despite
194 that, the relations found are enough to improve the proposed paraphrase identification method.
195 TagedPFig. 4 presents a semantic layer example. Two different types of relations are identified in the figure: the sense
196 relations, e.g. the triple guilty-sense-veredict, and the role annotation relations, e.g. judge-speaker-declare. The
197 semantic layer uses a RDF graph representation, likewise the syntactic layer.
3
framenet.icsi.berkeley.edu.
4
www.ark.cs.cmu.edu/SEMAFOR.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
199 TagedPAs mentioned before, the process of paraphrase identification aims to determine whether two sentences share
200 essentially the same meaning. This paper proposes three sentence similarity measures based on the representation
201 detailed on the previous section in order to identify paraphrases. The similarity measure proposed here assesses the
202 degree of sentence similarity based on the three-layer representation of sentences presented in Section 3.1. Before
203 detailing the proposed measures, the concept of D135X XBasic Unit (BU) should be presented.
204 TagedPBU represents the minimal unit of the proposed sentence similarity algorithm. In the Lexical Layer, it is a single
205 word. Thus, the similarity between two BU is the similarity between two words. The similarity measure is divided
206 into two steps:
207 agedPT D136X XSimilarity Matrix Value (SMV): It measures the similarities among sentence’s BU using similarities between
208 words measures.
209 TagedP D137X XSize Penalization Coefficient (SPC): It decreases the similarity when sentences analyzed do not have the same
210 amount of BU.
211 TagedPThe first step is calculating the similarity matrix values as follows. Let A = {a1, a2 ,D138X X..., an} and B = {b1, b2 ,...D139X X, bm}
212 be two sentences, such that, ai is a BU of sentence A, bj is a BU of sentence B, n is the number of tokens of sentence
213 A and m is the number of tokens of sentence B. The calculus of the similarity is presented in Algorithm 1.
214 TagedPThe algorithm receives the set of BUs from sentences A and B as input. Then, it creates a matrix of dimension of
215 m £ n the dimension of the input BUs sets. The variables total_similarity and iteration are initialized with values 0.
216 The variable total_similarity adds up the values of the similarities in each step, while iteration is used to transform
217 the total_similarity into a value between 01(lines 13). The second step is the calculation of similarities for each
218 pair (ai, bj), where ai and bj are the tokens of sentence A and B, respectively. The matrix stores the calculated similar-
219 ities (lines 48). The last part of the algorithm is divided in three steps. First, it sums to total_similarity the high sim-
220 ilarity value from matrix (line 10). Then, it removes the line and column from the matrix that contains the high
221 similarity (lines 11 and 12). To conclude, it updates the iteration value (line 13). The output is the division of total_si
222 milarity and iteration (line 15).
223 TagedPTo compute the similarities between tokens the system uses three different measures:
D140XTagedP XLevenshtein metric (Lev) (Miller et al., 2009) calculates the minimum number of operations of insertion, deletion, or
substitution of a single character needed to transform one string into another.
TagedPD14X XResnik measure (Res) (Miller, 1995) attempts to quantify how much information content is common to two concepts.
The information content is based on the lowest common subsumer (LCS) of the two concepts.
TagedPD142X XLin measure (Miller, 1995) is the ratio of the information contents of the LCS in the Resnik measure to the informa-
tion contents of each of the concepts.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
Fig. 5. Example of similarity between triples, where Sim is the similarity between two tokens or two edges, TotalSimilarity is the total similarity of
one triple, where u and v are edges and a1, a2, b1 and b2 are the tokens associated to the nodes of the graph.
224 TagedPAfter the calculation of the total_similarity; the system computes a Size Penalization Coefficient (SPC), that low-
225 ers the weight of the similarity between sentences with different number of tokens. The SPC is proportional to the
226 total_similarity. Eq. (1) shows how SPC is calculated. It is important to notice that in case of sentences with the
227 same number of tokens, SPC is equal to zero.
ðjnmj SMVÞ=n if ðn > mÞ
SPC ¼ ð1Þ
ðjnmj SMVÞ=m otherwise
228
229 where n and m are the number of tokens in sentence 1 and sentence 2, respectively, and SMV is the total similarity
230 found in the SMV process.
231 TagedPAs for the Syntactic and Semantic Layers, the process follows the same idea of lexical one. However, the BU is
232 represented as a triple (vertex,edge,vertex). In the Syntactic Layer, the similarity is measured by the arithmetic mean
233 of each vetex/edge/vetex matching, as presented in Fig. 5.
234 TagedPIn the Semantic Layer, the sense edges, details in Section 3.1, are connected with the words presented in the sen-
235 tence and with its sense. Therefore, it is important to measure if two sentences contain related words and senses.
236 Hence the measure is calculated from the pair (vertex, edge) as BU. Fig. 6 shows the similarity calculation.
237 TagedPIt is important to notice that the system produces nine different combination of similarities measures. They
238 are a product of a combination of the sentences representation layers (Lexical, Syntactic and Semantic) and the
239 similarities between words measures. Therefore, the similarities are: Lexical-Levenshtein, Syntactic-Levenshtein,
240 Semantic-Levenshtein, Lexical-Resnik, Syntactic-Resnik, Semantic-Resnik, Lexical-Lin, Syntactic-Lin and Semantic-
241 Lin. This combinations are used as features to identify paraphrase.
Fig. 6. Example of similarity between pairs (vertex, edge), where Sim is the similarity between two tokens or two edges, TotalSimilarity is the
total similarity of one triple, u and v are edges and a1 and b1 are the tokens associated to the nodes of the graph.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
242 TagedPAn example illustrates the process. For Sentence1 = {A, B, C, D} and Sentence2 = {D, R, S}, where {A,B,C,D,R,
243 S} are BUs. The first step is create the matrix (4 £ 3) contain the similarities among all BUs (Table 2).
244 TagedPTables 3 and 4 represent two iteration of line 1013 of Algorithm 1. In the first iteration, the total_similarity
245 receives the value 1. Then, it adds 0.6 in the second iteration. At this point, total_similarity ¼ 1:6.
246 TagedPThe last iteration adds 0.3 to total_similarity (=1.9) removes row 1 and column 2 and stop the process. The SMV
247 is 0.64, total_similarity divided by iteration (1.9/3).
248 TagedPThe system calculates the final similarity as present in Eq. (2). In the example, the SPC ¼ 0:17 and
249 final_similarity ¼ 0:47.
250
final_similarity ¼ total_similaritySPC: ð2Þ
251 4. Experiments
252 TagedPThe experimental study conducted here aimed at evaluating the proposed method and to compare it with state-of-
253 the-art methods. Moreover, the effectiveness of the proposed representation that combines lexical, syntactic, and
254 semantic aspects of a sentence pair is also evaluated. This section is organized as follows. Section 4.1 presents the
255 dataset and the metrics used to evaluate the proposed approach. In Section 4.2 the sets of features are described and
256 in Sections 4.3 and 4.4 the results and discussion are showed.
258 TagedPThe Microsoft Research Paraphrase Corpus (MSRP) (Dolan et al., 2004) consists of 5801 pairs of sentences: 4076
259 training pairs and 1725 test pairs, collected from thousands of news sources on the web over a period of 18 months.
260 This database was labeled by two human annotators who determined whether two sentences were paraphrases or not.
261 TagedPThe following evaluation metrics were used: (i) accuracy, the proportion of all correctly predicted sentences com-
262 pared to all sentences; (ii) precision, the proportion of correctly predicted paraphrase sentences to all predicted para-
263 phrase sentences; (iii) recall, the proportion of correctly predicted paraphrase sentences to all paraphrase sentences;
264 and (iv) F-measure, the uniform harmonic mean of precision and recall (Achananuparp et al., 2008).
Table 2
D39X X
Step 1: Create the similarity D40X X
matrix.
A B C D
Table 3
D41X X
Step 2: Removed row D42X X
1 and column 4.
A B C
Table 4
D43X X
Step 3: Removed row D4X X
2 and column 2
A C
R 0.2 0.3
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
Table 5
45X X
System FDeatures: the similarities combinations.
Abbreviation D46X X
Sentence layer D47X X
Similarities between D48X X
words
266 TagedPThe proposed method was evaluated using nine different combinations of similarities as features (details in
267 Section 3.2). These combinations are based on the sentences representation layers (Lexical, Syntactic, and Semantic)
268 and the similarities between words measures as presented in Table 5.
269 TagedPInitially, four different subsets of these measures were used to perform the classificationD145X X
D146XTagedP XNine Features: The feature vector is composed of the whole set of similarities, showed in Table 5.
TagedPD147X XLevenshtein Features: The similarities using Levenshtein measure to identifies similarity between tokens.
TagedPD148X XResnik Features: The similarities using Resnik measure to identifies similarity between tokens.
TagedPD149X XLin Features: The similarities using Lin measure to identifies similarity between tokens.
270 TagedPA feature selection algorithm proposed by Hall [12] is used to extract a relevant subset of features. Its strategy to
271 select the best set of features is based on the correlation between features. It eliminates features with high correlation
272 values by considering the individual predictive ability of each feature along with the degree of redundancy between
273 them. The Nine Features subset was used as input for this algorithm and the output was the following subset of fea-
274 tures: Lexical-Lev, Syntactic-Lev, Lexical-Res, and Semantic-Res. It is important to highlight that each layer repre-
275 sentation was selected at least once. It indicates the importance of all layers in the sentence representation. Thus, the
276 subset containing the features Lexical-Lev, Syntactic-Lev, Lexical-Res and Semantic-Res is abbreviated as Selected
277 Features.
279 TagedPDifferent machine learning algorithms were applied in order to find which one fits better to the proposed PI
280 approach. These algorithms were executed using WEKA Data Mining Software (Witten and Frank, 2000). A selec-
281 tion of different machine learning techniques arising from families was experimented (Fernandez-Delgado et al.,
282 2014). The algorithms that reached better results include: Bayesian Network, RBF Network, C4.5 decision tree classi-
283 fier, and SMO support vector machine with a polynomial kernel. All algorithms were set at the default configuration.
284 TagedPTable 6 presents the results of the proposed method using different feature sets and the classifiers cited above
285 applied to MSRP test data. The results were presented using the test dataset to make analysis easier. However the
286 models were trained and evaluated in the training dataset using 10 folders crossvalidation and the results followed
287 the same order as those presented in the Table 6.
288 TagedPThe results of the proposed approach are compared to the best result of the application of the similarity measure
289 proposed in (Ferreira et al., 2014b), and two baselines proposed by Mihalcea et al. (Mihalcea et al., 2006): (i) Ran-
290 dom Baseline that makes a random decision between true (paraphrase) or false (not paraphrase) value for each candi-
291 date pair and (ii) Vector-based Baseline that uses a cosine similarity measure as traditionally used in information
292 retrieval, with TF-IDF weighting to identify paraphrases.
293 TagedPUsing the whole set of features, the results reached 70.89% of accuracy and 80.2% of F-measure. However, the
294 Levenshtein Features obtained slightly better results, 71.13% of accuracy and 80.2% of F-measure, using only three
295 features. This indicates that some features are redundant or, in some cases, lead the algorithm to incorrect results.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
Table 6
D49X X five
Results of the proposed approach applied to test data using different features. The top
results for each metric are in bold.
Features Algorithm Accuracy Precision Recall F-Measure
Nine features D50X X Bayesian network D51X X 66.37 79.70 66.30 72.40
Nine features D52X X RBF network D53X X 70.14 73.10 87.30 79.50
Nine features D54X X C4.5 70.89 75.30 83.60 79.30
Nine features D5X X SMO 70.08 71.70 90.90 80.20
Levenshtein features D56X X Bayesian network D57X X 68.92 78.00 74.30 76.10
Levenshtein features D58X X D59X X
RBF network 70.60 72.90 88.90 80.10
Levenshtein features D60X X C4.5 71.13 75.50 83.90 79.40
Levenshtein features D61X X SMO 70.14 71.60 91.20 80.20
Resnik features D62X X Bayesian network D63X X 66.60 78.90 68.00 73.00
Resnik features D64X X D65X X
RBF network 69.62 72.10 88.70 79.50
Resnik features D6X X C4.5 68.23 69.20 94.00 79.70
Resnik features D67X X SMO 66.49 66.50 100.0 79.90
D68X X
Lin features Bayesian network D69X X 67.07 77.50 71.10 74.20
D70X X
Lin features RBF D71Xnetwork X 69.73 72.10 89.00 79.60
Lin featuresD72X X C4.5 68.17 73.30 82.10 77.40
Lin features D73X X SMO 66.49 66.50 100.0 79.90
Selected features D74X X Bayesian network D75X X 75.13 82.00 80.20 81.10
Selected features D76X X RBF network D7X X 74.08 73.40 95.60 83.10
Selected features D78X X C4.5 70.26 73.50 86.40 79.40
Selected features D79X X SMO 69.73 71.80 89.70 79.80
Ferreira et al. (Ferreira et al., 2014b) 70.55 74.50 84.70 79.30
D80X X
Random baseline 51.30 68.30 50.00 57.80
D81X X
Vector-based baseline 65.40 71.60 79.50 75.30
296 TagedPFor this reason, the feature selection algorithm was used to indicate the relevant features to this task. As presented in
297 Section 4.2, the group Selected Features contains the features Lexical-Lev, Syntactic-Lev, Lexical-Res, and Seman-
298 tic-Res. It achieved the best results of the experiments: 75.13% of accuracy, 82% of precision, 95.6% of recall and
299 83.1% of F-measure.
300 TagedPThe Levenshtein Features obtained better results than the Resnik and Lin Features. It happens because this dataset
301 contains a lot of proper nouns, such as, people and place names, and the Resnik and Lin Features do not deal with
302 this issue since they are based on the WordNet dictionary which do not contain these kind of words. In addition, the
303 dataset, in general, relies on similar words in sentences pairs. This explains the good performance of the Levenshtein
304 Features. As presented in Section 3.2, the Levenshtein Measure calculates the minimum number of operations
305 needed to transform one string into another. In other words, the use of similar words improves the accuracy of this
306 measure. If this experiment were performed with a dataset containing different words, the Resnik Features and Lin
307 Features should probably achieve better results when compared with to the Levenshtein Features.
308 TagedPIn terms of accuracy and F-measure, all combinations achieved better results than the Random Baseline and
309 only two combinations achieved worse results than the Vector-based Baseline. The best result achieved 9.73 (accu-
310 racy) and 7.80 (F-measure) percentage points better than the Vector-based Baseline. Moreover, the best result
311 using the similarity measure proposed in (Ferreira et al., 2014b) applied to the PI task also achieved better result
312 than the baselines. It confirms the hypothesis that the sentence representation used here achieves good results
313 regardless of the similarity algorithm used. It happens because by dealing with meaning and word order problems
314 this sentence representation increases the performance of similarities measures, and, in turn, the PI methods that
315 use these similarities.
316 TagedPThe experimentation also shows that the proposed method obtained better results compared with (Ferreira et al.,
317 2014b) in all evaluation metrics, in terms of percentage points: 4.58 of accuracy, 7.50 of precision, 10.90 of recall
318 and 3.80 of F-measure. It confirms the hypothesis that the proposed sentence similarity algorithm improves the
319 results of the similarity measure proposed by (Ferreira et al., 2014b) in relation to the PI task.
320 TagedPThe classifiers RBF Network and C4.5 achieved the best results in terms of accuracy in almost every feature sets,
321 excluding the Selected Feature where Bayesian Network achieved a better result. In terms of F-measure the RBF
322 Network and SMO algorithms were better than others in four sets, once againD15X X the Bayesian Network using Selected
323 Feature achieved a good result, it looses only to RBF Network. In general, RBF Network obtained better results.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
Table 7
D82X X
Paraphrases systems comparison - The best results of each metric were highlighted.
System Accuracy Precision Recall F-Measure
324 TagedPThe two combinations that achieved the best results were selected: (Selected Features, BayesNet) and (Selected
325 Features, RBF Network), and compared with state-of-the-art results (Table 7). The best results of each metric were
326 highlighted.
327 TagedPThe proposed method achieves better precision and second better in relation to the F-measure and recall when
328 compared with state-of-the-art systems. The methods proposed by Yin (Yin and Sch€utze, 2015; Yin et al., 2015) did
329 not present the results for precisoin and recall. Once again it is important to highlight that none of the related system
330 deals with meaning and word order problems simultaneously, as our method.
331 TagedPThe fact that the proposed system achieves better precision results is an important statement because, in general,
332 true positive paraphrase is crucial for a lot of applications that uses PI results. For example, Question Answering
333 (Marsi and Krahmer, 2005) or Machine Translation Evaluation (Bannard and Callison-Burch, 2005) applications.
334 TagedPFurthermore, the selected features used to classify the paraphrases combines statistic (Lexical-Lev and
335 Syntactic-Lev) and dictionary-based (Lexical-Res and Semantic-Res) measures to compare similarities between
336 words. The previous works presented in Section 2 and Table 7 points out that systems based on statistic meas-
337 ures tend to achieve better results in terms of accuracy and precision and dictionary-based measure improves
338 the recall of the systems.
339 TagedPAs Das and Smith (Das and Smith, 2009) and Mihalcea et al. (Mihalcea et al., 2006) methods are based on
340 statistic and dictionary-based measures, respectively, it explains why they achieve 79.57% of Precision and
341 97.7% of Recall, respectively. However, the proposed method achieves better F-Measure (83.1%) by combin-
342 ing these two measures.
344 TagedPD152X XThe proposed article did not achieve better results in general terms, but the main contribution was to eliminate
345 meaning and word order problems. It follows some example to show the benefits of the proposed method related to
346 these problems. D153X XAs mentioned before, the meaning problem happens when different words are used to describe the
347 same entities. The sentences S1 and S2 presents a pair of paraphrase that were not identified by other methods. They
348 have different unit matchs: (i) approved - > passed; (ii) legislation - > bill; this morning - > today.
D154XTagedP XS1: “The House Government Reform Committee rapidly approved the legislation this morning.”
TagedPD15X XS2: “The House Government Reform Committee passed the bill today.”
349 TagedPD156X XThe word order problem deals when the information in the sentence comes with a different structure. The senten-
350 ces S3 e S4 uses the same words; however with different order. It could lead to problems for other algorithms.
TagedPD157X XS3: “Atlantic Coast will continue its operations as a Delta Connections carrier.”
TagedPD158X XS4: “It will continue its regional service for Delta Air Lines DAL.N , Atlantic Coast said.”
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
351 TagedPTo conclude the dicussion it is presented several situations where the proposed method tend to erroneously predict
352 if a pair of sentence is paraphrase or not. The two main situations found were:
TagedPD159X XNegation: It happens when one of the sentence denies other. For example, the sentence 2 (S6) is the negation of
sentence 1 (S5):
S5: “Another said its members would continue to call the more than 50 million phone numbers on the Federal Trade
Commission’s list.”
S6: “Meantime, the Direct Marketing Association said its members should not call the nearly 51 million numbers on
the list.”
TagedPD160X XStatement Sentences: It happens when the sentences are statements and the speaker is described differently. For
example:
S7: “It still remains to be seen whether the revenue recovery will be short or long livedD16X X” he said.
S8: “It remains to be seen whether the revenue recovery will be short- or long-livedD”162X X said James Sprayregen, UAL
bankruptcy attorney, in court.
353 TagedPThis confirms the hypothesis posted in (Mihalcea et al., 2006), (Islam and Inkpen, 2008) and (Oliva et al., 2011)
354 that sentence similarity measures are an important step in paraphrase recognition, but this is not always enough.
355 Often, it happens that some portions of both sentences share a high degree of word overlaping.
356 5. Conclusion
357 TagedPThis paper proposed three new sentence similarity measures and a new method to identify paraphrases. The sen-
358 tence similarity measures that integrates lexical, syntactic and semantic analysis. It aims to improve the results incor-
359 porating different levels of information the sentence. These similarities deals with two major state-of-the-art
360 problems: meaning and word order. Another contribution of this work is the evaluation of different machine learning
361 algorithms applying the proposed similarities as features to classify sentence pairs as paraphrases or not.
362 TagedPThe proposed method was evaluated using the Microsoft paraphrase corpus and widely accepted sentence similar-
363 ity measures: accuracy, precision, recall and F-measure. The method achieves better precision and F-measure and
364 the second better in relation to the accuracy and recall when compared to state-of-the-art systems. In addition, a
365 detailed experimentation for different similarities measures applied to paraphrase identification was presented.
366 TagedPThere are new developments of this work already in progress, which include: (i) the improvement of the proposed
367 method to deal with paraprase identification on Twitter; (ii) the creation of mechanisms to deal with the negation and
368 statements sentences problems; and (iii) the application of the proposed method to a textual entailment task.
369 Acknowledgments
370 TagedPThe research results reported in this paper have been partly funded by a R&D project between Hewlett-Packard
371 do Brazil and UFPE originated from tax exemption (IPI - Law n 8.248, of 1991 and later updates).
372 References
373 TagedPAchananuparp, P., Hu, X., Shen, X., 2008. The evaluation of sentence similarity measures. In: Proceedings of the 10th International Conference on
374 Data Warehousing and Knowledge Discovery. Springer-Verlag, Berlin, Heidelberg, pp. 305–316.
375 TagedPAndroutsopoulos, I., Malakasiotis, P., 2010. A survey of paraphrasing and textual entailment methods. J. Artif. Intell. Res. 38 (1), 135–187.
376 TagedPBannard, C.J., Callison-Burch, C., 2005. Paraphrasing with bilingual parallel corpora. In: Knight, K., Ng, H.T., Oflazer, K. (Eds.), ACL. The Asso-
377 ciation for Computer Linguistics, Ann Arbor, Michigan.
378 TagedPChoudhary, B., Bhattacharyya, P., 2002. Text clustering using semantics. In: Proceedings of World Wide Web Conference 2002.
379 TagedPCoelho, T.A.S., Calado, P., Souza, L.V., Ribeiro-Neto, B.A., Muntz, R.R., 2004. Image retrieval using multiple evidence ranking. IEEE Trans.
380 Knowl. Data Eng. 16 (4), 408–417.
381 TagedPDas, D., Smith, N.A., 2009. Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the Joint Conference of
382 the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 -
383 Volume 1. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 468–476.
384 TagedPDeerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A., 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci.
385 41 (6), 391–407.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002
JID: YCSLA
ARTICLE IN PRESS [m3+;July 13, 2017;11:52]
386 TagedPDolamic, L., Savoy, J., 2010. When stopword lists make the difference. J. Assoc. Inf. Sci. Technol. 61 (1), 200–203. doi: 10.1002/asi.v61:1.
387 TagedPDolan, B., Quirk, C., Brockett, C., 2004. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In:
388 Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA,
389 USA. doi: 10.3115/1220355.1220406.
390 TagedPFernandez-Delgado, M., Cernadas, E., Barro, S., Amorim, D., 2014. Do we need hundreds of classifiers to solve real world classification problems.
391 J. Mach. Learn. Res 15 (1), 3133–3181.
392 TagedPFerreira, R., Lins, R.D., Freitas, F., Avila, B., Simske, S.J., Riss, M., 2014. A new sentence similarity assessment measure based on a three-layer
393 sentence representation. In: Proceedings of ACM Symposium on Document Engineering.
394 TagedPFerreira, R., Lins, R.D., Freitas, F., Avila, B., Simske, S.J., Riss, M., 2014. A new sentence similarity method based on a three-layer sentence
395 representation. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, pp. 110–117.
396 TagedPFerreira, R., de Souza Cabral, L., Lins, R.D., de Frana Silva, G., Freitas, F., Cavalcanti, G.D.C., Lima, R., Simske, S.J., Favaro, L., 2013. Assessing
397 sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40 (14), 5755–5764.
398 TagedPHeilman, M., Smith, N.A., 2010. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In: Proceedings of
399 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguis-
400 tics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1011–1019.
401 TagedPIslam, A., Inkpen, D., 2006. Second order co-occurrence PMI for determining the semantic similarity of words. In: Proceedings of the International
402 Conference on Language Resources and Evaluation (LREC 2006), pp. 1033–1038.
403 TagedPIslam, A., Inkpen, D., 2008. Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data
Q3 404 2 (2), 1011–1025.X X
405 TagedPKondrak, G., 2005. N-gram similarity and distance. In: Consens, M., Navarro, G. (Eds.), String Processing and Information Retrieval. Lecture
406 Notes in Computer Science. vol. 3772, Springer Berlin Heidelberg, Buenos Aires, Argentina, pp. 115–126.
407 TagedPLi, Y., Bandar, Z.A., McLean, D., 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE
408 Trans. Knowl. Data Eng. 15 (4), 871–882. doi: 10.1109/TKDE.2003.1209005.
409 TagedPLiu, T., Guo, J., 2005. Text similarity computing based on standard deviation. In: Proceedings of the 2005 International Conference on Advances
410 in Intelligent Computing - Volume Part I. Springer-Verlag, Berlin, Heidelberg, pp. 456–464.
411 TagedPde Marneffe, M.-C., Manning, C.D., 2008. The stanford typed dependencies representation. Coling 2008: Proceedings of Workshop on Cross-
412 Framework and Cross-Domain Parser Evaluation. Association for Computational Linguistics, Manchester, United Kingdom, pp. 1–8.
413 TagedPMarquez, L., Carreras, X., Litkowski, K.C., Stevenson, S., 2008. Semantic role labeling: an introduction to the special issue. Comput. Linguist.
414 34 (2), 145–159.
415 TagedPMarsi, E., Krahmer, E., 2005. Explorations in sentence fusion. In: Proceedings of the 10th European Workshop on Natural Language Generation,
416 pp. 109–117.
417 TagedPMihalcea, R., Corley, C., Strapparava, C., 2006. Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the
418 21st National Conference on Artificial Intelligence - Volume 1. AAAI Press, Boston, Massachusetts, pp. 775–780.
419 TagedPMiller, F. P., Vandome, A. F., McBrewster, J., 2009. Levenshtein distance: information theory, computer science, string (computer science), string
Q4 420 metric, damerau? levenshtein distance, spell checker, hamming distance.X X
421 TagedPMiller, G.A., 1995. Wordnet: a lexical database for english. Commun. ACM 38, 39–41.
422 TagedPOliva, J., Serrano, J.I., del Castillo, M.D., Iglesias, A., 2011. Symss: a syntax-based measure for short-text semantic similarity. Data Knowl. Eng.
423 70 (4), 390–405. doi: 10.1016/j.datak.2011.01.002.
424 TagedPPalmer, M., Gildea, D., Kingsbury, P., 2005. The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31 (1), 71–106.
425 TagedPPapineni, K., Roukos, S., Ward, T., Zhu, W.-J., 2002. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the
426 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA,
427 pp. 311–318.
428 TagedPQiu, L., Kan, M.-Y., Chua, T.-S., 2006. Paraphrase recognition via dissimilarity significance classification. In: Proceedings of the 2006 Conference
429 on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 18–26.
430 TagedPSmith, D.A., Eisner, J., 2006. Quasi-synchronous grammars: alignment by soft projection of syntactic dependencies. In: Proceedings of Workshop
431 on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 23–30.
Q5 432 TagedPW3C, 2004. Resource description framework. http://www.w3.org/RDF/. Last Access June 2015.X X
433 TagedPWan, S., Dras, M., Dale, R., Paris, C., 2006. Using dependency-based features to take the ‘para-farce’ out of paraphrase. In: Proceedings of the
434 Australasian Language Technology Workshop 2006. Sydney, Australia, pp. 131–138.
435 TagedPWitten, I.H., Frank, E., 2000. Data mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann
436 Publishers Inc., San Francisco, CA, USA.
Q6 437 TagedPXu, W., Callison-Burch, C., Dolan, W.B., 2015. Semeval-2015 task 1: paraphrase and semantic similarity in twitter (pit). Proc. Sem. Eval.X X
438 TagedPYin, W., Sch€ utze, H., 2015. Convolutional neural network for paraphrase identification. In: Proceedings of the 2015 Conference of the North
439 American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 901–911.
440 TagedPYin, W., Sch€ utze, H., Xiang, B., Zhou, B., 2015. ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans.
Q7 441 Assoc. Comput. Linguist. 259–272.X X
442 TagedPYu, L.-C., Wu, C.-H., Jang, F.-L., 2009. Psychiatric document retrieval using a discourse-aware model. Artif. Intell. 173 (78), 817–829.
443 TagedPZhou, F., Zhang, F., Yang, B., 2010. Graph-based text representation model and its realization. In: Proceedings of International Conference on
444 Natural Language Processing and Knowledge Engineering (NLP-KE), 2010, pp. 1–8. doi: 10.1109/NLPKE.2010.5587861.
Please cite this article as: R. Ferreira et al., Combining X Xsentence similarities measures to identify paraphrases,
Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.07.002