0% found this document useful (0 votes)

7 views39 pages

Semantic Textual Similarity

Uploaded by

Riddhiman Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views39 pages

Semantic Textual Similarity

Uploaded by

Riddhiman Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Semantic Textual

Similarity in Replika

Denis Fedorenko
Research Engineer, Luka Inc.
Plan
• Task definition

• Baseline model

• Model improvements

• Conclusion and future work

Semantic Textual Similarity
• The task is to measure the meaning similarity of two
texts

• Find a model

M: (text1, text2) → ℝ
Toy STS model
• How many common words are in two texts?

• Example:

J("I have a funny dog", "I have a cat") = 3/6 = 0.5

Toy STS model
• More examples:

J("I have a dog", "I have a cat") = 3/5 = 0.6

J("I have a dog", "I have a puppy") = 3/5 = 0.6
J("I have a funny dog", "My puppy is very nice") = 0

• This model is very sensitive to synonyms and

paraphrases

• How can we overcome this issue?

STS framework
• Find a model (text-to-vector):
n
E: (text) → ℝ

• Such that:

M: (E(text1), E(text2)) → ℝ
where M is a similarity function (e.g. cosine)
or some trainable model (e.g. logistic regression, neural network)
STS in Replika
• The task is to determine whether two utterances are
semantically equivalent

• Find a model

M: (utterance1, utterance2) → {0, 1}

• A particular case of STS

What is "equivalence"?
• Paraphrases

• Utterances that have the same set of possible

answers

• Ultimately, equivalence should be determined by

product requirements
Example: scripts
User phrase constraint:
User phrase template:

Result:

Matched phrase
Example: Replika-QA
User phrase constraint:
User phrase templates:

Result:

Matched phrase
STS evaluation
• On holdout testsets:

• Classification metrics (precision, recall, AUC)

• Information retrieval metrics (average precision,

recall@N)

• In the wild:

• User feedback (upvotes and downvotes) in the

scripts and Replika-QA
Metrics
Plan
• Task definition

• Baseline model

• Model improvements

• Conclusion and future work

Baseline STS model
• Two-class logistic regression classifier over text vectors
produced by the context encoder of the retrieval-based
dialog model (DM)

∈ (0, 1)
sigmoid(W * |v1 – v2|)
rule: 1 if f(x) > 0.5 else 0

v1=DM.Encoder(utter1) v2=DM.Encoder(utter2)

• Trainset: 3900 text pairs obtained by different high-recall

heuristics and marked by assessors

• Testset: 400 text pairs

Retrieval-based
dialog model
Basic QA-LSTM: Tan et al. (2015)
Dialog text encoder
• During training, similar contexts often have similar
or even coinciding answers

• As a result, similar texts are encoded into similar

vectors

• Hence the encoders can be successfully used for

the further text analysis (classification,
clusterization)
Plan
• Task definition

• Baseline model

• Model improvements

• Conclusion and future work

Possible improvements
• Enlarge the datasets

• Search for the better classification model

Dataset extraction pipeline

User logs Extract matches Utterance pairs

Trainset
Preprocessed
Preprocess Mark & Split
utterance pairs

Amazon Testset
Mechanical Turk
(crowdsourcing)
Matches extraction
• Extract matches of the baseline model from the
logs. Obtained false positives will help to improve
precision

• Use a different algorithm (e.g. skip-thought

(Kiros et al. (2015))) to extract novel text pairs from
the logs. Obtained false negatives (according to
the baseline model) will help to improve recall
Matches preprocessing
• Remove text pair duplicates

• Remove too short/long text pairs (outliers)

• Remove pairs with coinciding texts (trivial samples)

• Remove too noisy text pairs e.g. with a lot of out-of-

vocabulary words (non-informative samples and noise)

• Remove pairs with highly dissimilar texts (fight the curse

of dimensionality):

cosine(DM.Encoder(text1), DM.Encoder(text2)) < threshold

Dataset extraction results
• Trainset: 17556 text pairs

• Testsets

• Scripts testset: 1035 text pairs

measures quality on scripts

• Common testset: 1162 text pairs

measures average quality

• Errors (or, false positives) testset: 555 text pairs

measures model's specificity
Scripts testset
Common testset
Errors (false positives)
testset
7 different error types:

We can investigate what kinds of errors the model make

Possible improvements
• Enlarge the datasets

• Search for the better classification model

Classification pipeline
Text pair Vectorize Vector pair

Extract Result:
Feature vector Classify
features 0|1

Trainset
Classification pipeline
Text pair Vectorize Vector pair

Extract Result:
Feature vector Classify
features 0|1

We can vary these components!

Trainset
Pipeline components
• Vectorizers: • Trainsets:
• Dialog context encoder • Marked user logs
• Dialog response encoder • External:
• Features:
• Quora (~400k)
• |v1 - v2|
• SemEval/SICK (~20k)
• v1 * v2
• Combination of all above
• [|v1 - v2|, v1 * v2]
• Classifiers:
• Logistic regression
• SVM
• Random forest
• ...
Model selection

less is better

more is better

Select top candidate models by AUC, tune them on

the validation set and select the best model by FPR
Model selection results

• Best configuration:

• Dialog context encoder

• Marked user logs dataset only

• [|v1 - v2|, v1 * v2] feature vector

• Linear SVM
Model selection discussion
• Quality gain is not as high as it could be

• Classification model quality is limited by the quality

of the underlying vectorizer (dialog model)

• We can try to fine-tune on STS data the already

trained dialog model to solve the target task
directly
Transfer learning
Context Response

Context Response
encoder encoder

Context vector Response vector

Cosine

Loss:

Already trained retrieval-based dialog model!

Transfer learning
Context Response Text1 Text2

Context Response Context shared Context

encoder encoder encoder encoder
Copy trained weights

Context vector Response vector Text1 vector Text2 vector

Cosine Cosine

Loss:
Sigmoid

Loss:
Update weights
Transfer learning results

Trainset: user logs + SemEval/SICK

Transfer learning discussion
• It's not a trivial approach itself

• Need to carefully tune optimizer, it's parameters

and the model itself (e.g. by adding dropout, batch
normalization etc)

• Need more data (much more than 20000 samples)

Conclusion
• Semantic textual similarity is an open problem of the
natural language processing (Cera et al. (2017))

• Definition of the similarity is very important and should

be determined by the target product requirements

• Correct evaluation methodology is also very important

and should be done according to the target
application

• Text representation (text-to-vector) is a crucial step

Future work
• Datasets:

• Enlarge the user logs trainset up to 100000 samples and more

• Incorporate high-quality external datasets (like novel

ParaNMT-50M, Wieting et al. (2017))

• Model:

• Incorporate more features: linguistic, pairwise word similarities

etc (Maharjan et al. (2017))

• Incorporate "hard" negative training samples (Wieting et al.

(2017))

• Mostly focus on end-to-end training and transfer learning

References
• Kiros et al. (2015). Skip-Thought Vectors

• Cera et al. (2017). SemEval-2017 Task 1: Semantic Textual

Similarity Multilingual and Cross-lingual Focused Evaluation

• Wieting et al. (2017). Pushing the Limits of Paraphrastic

Sentence Embeddings with Millions of Machine Translations

• Maharjan et al. (2017). DT Team at SemEval-2017 Task 1:

Semantic Similarity Using Alignments, Sentence-Level
Embeddings and Gaussian Mixture Model Output

• Tan et al. (2015). LSTM-based Deep Learning Models for Non-

factoid Answer Selection

Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Understanding Recurrent Neural Networks (RNN) - NLP - by Praveen Raj - Medium
No ratings yet
Understanding Recurrent Neural Networks (RNN) - NLP - by Praveen Raj - Medium
25 pages
Manual de Pci Geomatica 240103 062603
No ratings yet
Manual de Pci Geomatica 240103 062603
162 pages
AI.5-Machine Learning (21-26)
No ratings yet
AI.5-Machine Learning (21-26)
196 pages
Cs224n 2025 Lecture12 Evaluation Final
No ratings yet
Cs224n 2025 Lecture12 Evaluation Final
59 pages
200+ Important MCQ Artificial Intelligence Class 10 - TutorialAI
67% (3)
200+ Important MCQ Artificial Intelligence Class 10 - TutorialAI
60 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
Chatgptfornlp 230314021506 2f03f614
No ratings yet
Chatgptfornlp 230314021506 2f03f614
69 pages
DA C1 Overview Thao
No ratings yet
DA C1 Overview Thao
87 pages
AI Training For Language Teachers
No ratings yet
AI Training For Language Teachers
167 pages
MBA Syllabus Final
No ratings yet
MBA Syllabus Final
35 pages
A Deep Learning Model Based On Concatenation Approach For The Diagnosis of Brain Tumor
No ratings yet
A Deep Learning Model Based On Concatenation Approach For The Diagnosis of Brain Tumor
10 pages
Tensor Flow Chat Bot
No ratings yet
Tensor Flow Chat Bot
44 pages
NLP - Module 2
No ratings yet
NLP - Module 2
54 pages
1.machine Learning and Its Applications
No ratings yet
1.machine Learning and Its Applications
75 pages
ChatGPT - Jack of All Trades, Master of None
No ratings yet
ChatGPT - Jack of All Trades, Master of None
37 pages
Week 2 and 3
No ratings yet
Week 2 and 3
76 pages
TextFeatureEnginerring-NLP Lec2
No ratings yet
TextFeatureEnginerring-NLP Lec2
60 pages
2020-Anki P. Et Al.-Intelligent Chatbot Adapted From Question and Answer System Using RNN-LSTM Model
No ratings yet
2020-Anki P. Et Al.-Intelligent Chatbot Adapted From Question and Answer System Using RNN-LSTM Model
12 pages
Wa0002
No ratings yet
Wa0002
21 pages
Translation/Generation Lit Survey
No ratings yet
Translation/Generation Lit Survey
23 pages
Fine Tuning The Large Language Pegasus Model For Dialogue Summarization
No ratings yet
Fine Tuning The Large Language Pegasus Model For Dialogue Summarization
13 pages
Language Model Evaluation in Open-Ended Text Gener
No ratings yet
Language Model Evaluation in Open-Ended Text Gener
70 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Three 150224 Generative A I Intro
No ratings yet
Three 150224 Generative A I Intro
19 pages
Final Presentation
No ratings yet
Final Presentation
22 pages
DeepNorm Deep Learning Approach
No ratings yet
DeepNorm Deep Learning Approach
7 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
NLP Prep
No ratings yet
NLP Prep
14 pages
Applying Machine Learning Approaches To Analyze The Vulnerable Roadusers' Crashes at Statewide Traffic Analysis Zones
No ratings yet
Applying Machine Learning Approaches To Analyze The Vulnerable Roadusers' Crashes at Statewide Traffic Analysis Zones
14 pages
NLP
No ratings yet
NLP
9 pages
AI ML Assessment Test
No ratings yet
AI ML Assessment Test
4 pages
Classification of Retail Products From Probabilist
No ratings yet
Classification of Retail Products From Probabilist
24 pages
AutexTification IberLEF 2023 3 Junio
No ratings yet
AutexTification IberLEF 2023 3 Junio
18 pages
10 1002@cpe 5971
No ratings yet
10 1002@cpe 5971
17 pages
Effective Chatbots Using Machine Learning and Natural Language Processing
No ratings yet
Effective Chatbots Using Machine Learning and Natural Language Processing
10 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
Text Data Labelling Using Transformer Based Sentence Embeddings and Text Similarity For Text Classification
No ratings yet
Text Data Labelling Using Transformer Based Sentence Embeddings and Text Similarity For Text Classification
8 pages
Replika: Building An Emotional Conversation With Deep Learning
100% (3)
Replika: Building An Emotional Conversation With Deep Learning
26 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
Semantic Similarity Between Medium-Sized Texts
No ratings yet
Semantic Similarity Between Medium-Sized Texts
13 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
UMA Literature Survey
No ratings yet
UMA Literature Survey
11 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
Lung Cancer PDF
No ratings yet
Lung Cancer PDF
8 pages
Generative AI Text Classification Using Ensemble LLM Approaches
No ratings yet
Generative AI Text Classification Using Ensemble LLM Approaches
8 pages
NLP Tutorial1
No ratings yet
NLP Tutorial1
7 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
SSRN Id4532468
No ratings yet
SSRN Id4532468
13 pages
Automated Class Attendance Management System Using Face Recognition
No ratings yet
Automated Class Attendance Management System Using Face Recognition
7 pages
"Am I The Asshole?": A Deep Learning Approach For Evaluating Moral Scenarios
No ratings yet
"Am I The Asshole?": A Deep Learning Approach For Evaluating Moral Scenarios
6 pages
Rapid Alzheimer's Disease Diagnosis Using Advanced Artificial Intelligence Algorithms
No ratings yet
Rapid Alzheimer's Disease Diagnosis Using Advanced Artificial Intelligence Algorithms
9 pages
NLP - Assignment2 Proper RNN Working
No ratings yet
NLP - Assignment2 Proper RNN Working
3 pages
Text Classification Reseach Paper
No ratings yet
Text Classification Reseach Paper
4 pages
A Soft Introduction To NLP - Semantic Similarity Calculations Using Python - Medium
No ratings yet
A Soft Introduction To NLP - Semantic Similarity Calculations Using Python - Medium
13 pages
Identifying Problems and Solutions in Scientific Text: Kevin Heffernan, Simone Teufel
No ratings yet
Identifying Problems and Solutions in Scientific Text: Kevin Heffernan, Simone Teufel
16 pages
Towards Efficient and Scalable Machine Learning-Based Qos Traffic Classification in Software-Defined Network
No ratings yet
Towards Efficient and Scalable Machine Learning-Based Qos Traffic Classification in Software-Defined Network
13 pages
Effort-Aware and Just-In-Time Defect Prediction With Neural Network
No ratings yet
Effort-Aware and Just-In-Time Defect Prediction With Neural Network
19 pages
Evaluating The Complexity in Semantic Matching A New Dataset in News Final 20230303
No ratings yet
Evaluating The Complexity in Semantic Matching A New Dataset in News Final 20230303
1 page
ST 04
No ratings yet
ST 04
4 pages
LSTM Flow
No ratings yet
LSTM Flow
3 pages
Artificial Intelligence and Internet of Things For Autonomous Vehicles
No ratings yet
Artificial Intelligence and Internet of Things For Autonomous Vehicles
31 pages
Image Classification With RandomForests in R
No ratings yet
Image Classification With RandomForests in R
49 pages
Corentin Herbinet Using Machine Learning Techniques To Predict The Outcome of Profressional Football Matches
No ratings yet
Corentin Herbinet Using Machine Learning Techniques To Predict The Outcome of Profressional Football Matches
73 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
No ratings yet
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
33 pages
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
No ratings yet
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
12 pages
4 Decision Tree - Jupyter Notebook
No ratings yet
4 Decision Tree - Jupyter Notebook
2 pages
137BQ122019
No ratings yet
137BQ122019
2 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
3 pages
Boosting The Performance of Transformer Architectu
No ratings yet
Boosting The Performance of Transformer Architectu
6 pages
Plant Disease Detection Using Image Processing Techniques
No ratings yet
Plant Disease Detection Using Image Processing Techniques
19 pages
Novel Approach For Semantic Similarity Measuremwnt For High Quality Answer Selection in QA Using DL Methods
No ratings yet
Novel Approach For Semantic Similarity Measuremwnt For High Quality Answer Selection in QA Using DL Methods
5 pages
ChatGPT: Jack of All Trades, Master of None
No ratings yet
ChatGPT: Jack of All Trades, Master of None
40 pages
Building A Simple Chatbot From Scratch in Python1
No ratings yet
Building A Simple Chatbot From Scratch in Python1
8 pages
Deep Learning For Semantic Similarity
No ratings yet
Deep Learning For Semantic Similarity
7 pages
Machine Learning Approach For Flood Risks Prediction
No ratings yet
Machine Learning Approach For Flood Risks Prediction
8 pages
Preventing Private Information Inference Attacks On Social Networks
No ratings yet
Preventing Private Information Inference Attacks On Social Networks
14 pages
Locality-Constrained Linear Coding For Image Classification: (Jyang29, Huang) @ifp - Uiuc.edu
No ratings yet
Locality-Constrained Linear Coding For Image Classification: (Jyang29, Huang) @ifp - Uiuc.edu
8 pages
Applying Deep Learning To Answer Selection - A Study and An Open Task
No ratings yet
Applying Deep Learning To Answer Selection - A Study and An Open Task
8 pages
ELOS Version Capes
No ratings yet
ELOS Version Capes
14 pages
Review of Fruit Grading System
No ratings yet
Review of Fruit Grading System
10 pages
Automated Chatbot Implemented Using Natural Language Processing PDF
No ratings yet
Automated Chatbot Implemented Using Natural Language Processing PDF
5 pages
Easy Neural Networks With FANN
No ratings yet
Easy Neural Networks With FANN
6 pages
CCS369
No ratings yet
CCS369
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Semantic Textual Similarity

Uploaded by

Semantic Textual Similarity

Uploaded by

Semantic Textual

• Conclusion and future work

J("I have a funny dog", "I have a cat") = 3/6 = 0.5

J("I have a dog", "I have a cat") = 3/5 = 0.6

• This model is very sensitive to synonyms and

• How can we overcome this issue?

M: (utterance1, utterance2) → {0, 1}

• A particular case of STS

• Utterances that have the same set of possible

• Ultimately, equivalence should be determined by

• Classification metrics (precision, recall, AUC)

• Information retrieval metrics (average precision,

• User feedback (upvotes and downvotes) in the

• Conclusion and future work

• Trainset: 3900 text pairs obtained by different high-recall

• Testset: 400 text pairs

• As a result, similar texts are encoded into similar

• Hence the encoders can be successfully used for

• Conclusion and future work

• Search for the better classification model

User logs Extract matches Utterance pairs

• Use a different algorithm (e.g. skip-thought

• Remove too short/long text pairs (outliers)

• Remove pairs with coinciding texts (trivial samples)

• Remove too noisy text pairs e.g. with a lot of out-of-

• Remove pairs with highly dissimilar texts (fight the curse

cosine(DM.Encoder(text1), DM.Encoder(text2)) < threshold

• Scripts testset: 1035 text pairs

• Common testset: 1162 text pairs

• Errors (or, false positives) testset: 555 text pairs

We can investigate what kinds of errors the model make

• Search for the better classification model

We can vary these components!

Select top candidate models by AUC, tune them on

• Dialog context encoder

• Marked user logs dataset only

• [|v1 - v2|, v1 * v2] feature vector

• Classification model quality is limited by the quality

• We can try to fine-tune on STS data the already

Context vector Response vector

Already trained retrieval-based dialog model!

Context Response Context shared Context

Context vector Response vector Text1 vector Text2 vector

Trainset: user logs + SemEval/SICK

• Need to carefully tune optimizer, it's parameters

• Need more data (much more than 20000 samples)

• Definition of the similarity is very important and should

• Correct evaluation methodology is also very important

• Text representation (text-to-vector) is a crucial step

• Enlarge the user logs trainset up to 100000 samples and more

• Incorporate high-quality external datasets (like novel

• Incorporate more features: linguistic, pairwise word similarities

• Incorporate "hard" negative training samples (Wieting et al.

• Mostly focus on end-to-end training and transfer learning

• Cera et al. (2017). SemEval-2017 Task 1: Semantic Textual

• Wieting et al. (2017). Pushing the Limits of Paraphrastic

• Maharjan et al. (2017). DT Team at SemEval-2017 Task 1:

• Tan et al. (2015). LSTM-based Deep Learning Models for Non-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.