0% found this document useful (0 votes)
12 views2 pages

Semantic Text Similarity

The document discusses developing a model to measure semantic textual similarity (STS) between sentences. The model should assess the degree of semantic equivalence between sentences and provide a similarity score from 0 to 1, regardless of surface-level differences in wording. It describes using BERT to capture contextual dependencies and meaning to tokenize and encode sentences before using cosine similarity to measure semantic similarity. It also notes the process of deploying the API using Streamlit after facing issues with other options due to heavy dependencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views2 pages

Semantic Text Similarity

The document discusses developing a model to measure semantic textual similarity (STS) between sentences. The model should assess the degree of semantic equivalence between sentences and provide a similarity score from 0 to 1, regardless of surface-level differences in wording. It describes using BERT to capture contextual dependencies and meaning to tokenize and encode sentences before using cosine similarity to measure semantic similarity. It also notes the process of deploying the API using Streamlit after facing issues with other options due to heavy dependencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Semantic Text Similarity

Introduction

The machine learning model built will predict a score to show the relativeness of two sentences rather
than their surface appearance. In other words, it measures how similar two texts are in terms of the
concepts, ideas, or information they convey. When comparing texts for semantic similarity, it involves
understanding the context and meaning of the words and sentences rather than just looking for exact
word matches or character-level similarities. It requires a deeper understanding of the content,
including synonyms, paraphrases, and related concepts.

Problem Statement:

Develop an algorithm/model to measure the Semantic Textual Similarity (STS) between two given
sentences and provide a similarity score ranging from 0 (highly dissimilar) to 1 (highly similar). The STS
model should assess the degree of semantic equivalence between the sentences, allowing for more
accurate comparisons of their meaning and context, regardless of surface-level variations in wording or
structure. The objective is to enable applications to quantify the level of similarity between pairs of
sentences for various natural language processing (NLP) tasks, such as information retrieval, paraphrase
identification, and question answering.

Core Approach:

Considering the complexity and problem statement, the BERT offers efficient pre-trained transformers
that would help us easily build our own model, hence 'bert-base-uncased’ due to its ability to capture
complex contextual dependencies and semantic meaning within sentences.

Note the above step, we had conducted several pre-processing techniques by using regular expression
and replace method. After which we made use of the Lemmatizer from NLTK module, which deduces
several inflected forms that eventually helped reduce the burden on our model.

Using the above specified BERT model, we tokenized the texts into 3000 parts by BERT tokenizer to
convert them into PyTorch tensors.

Finally, we made use of the Cosine Similarity method from scikit-learn to compute or measure the
semantic similarity between two sentences.

Deployment Journey:
To be transparent, I have never deployed an api on cloud, so I spent one full day researching
deployment for free and narrowed down to AWS Lambda and Streamlit (Heroku and Azure requires
credit card and my CIBIL is low).

After this due to heavy dependencies incorporated on our project, I faced issues on AWS Lambda and
hence was left with Streamlit. I modified my API based on the required and deployed on Streamlit using
Github.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy