0% found this document useful (0 votes)

39 views9 pages

Semantic Search

The document outlines the implementation of a semantic search engine using BERT and Anaconda, detailing steps for setting up the environment, loading pre-trained models, and performing searches using nearest neighbors and cosine similarity. It provides a structured approach to building a semantic search system for various applications, including research paper retrieval. Key tools discussed include FAISS, Haystack, OpenAI API, and Elasticsearch, with a focus on using Sentence-BERT for generating contextual embeddings.

Uploaded by

venkat Mohan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views9 pages

Semantic Search

Uploaded by

venkat Mohan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Semantic Search:

Which One Should You Use?

Tool Best For Key Strength
Large-scale
FAISS similarity Efficient and optimized for vectors
search
Document
Haystack retrieval & Uses BM25 & Transformers for NLP
QA
Chatbots, AI
OpenAI API High-quality embeddings
search
Elasticsearch Enterprise- Hybrid search (BM25 + vectors)
level search
Step-by-Step Implementation of Semantic Search Using BERT
in Anaconda

BERT (Bidirectional Encoder Representations from Transformers) is a

powerful NLP model that understands the context and meaning of
text. We will use Sentence-BERT (SBERT), which fine-tunes BERT for
efficient semantic search.

Step 1: Set Up Anaconda Environment

Open Anaconda Prompt and create a new environment.

bash
CopyEdit
conda create --name semantic_bert python=3.9 -y
conda activate semantic_bert

Install Required Packages

bash
CopyEdit
pip install torch torchvision torchaudio
pip install sentence-transformers scikit-learn numpy

 torch → Required for deep learning models.

 sentence-transformers → Pre-trained BERT models for
text embeddings.
 scikit-learn → Efficient nearest neighbor search.
Tool Best For Key Strength
 numpy → Handles numerical operations.

Step 2: Load a Pre-Trained BERT

Model
We use Sentence-BERT (SBERT) to generate contextual embeddings.

python
CopyEdit
from sentence_transformers import
SentenceTransformer

# Load a pre-trained BERT model

model = SentenceTransformer("bert-base-nli-mean-
tokens")

# Example documents for semantic search

documents = [
"Artificial Intelligence is changing the
world.",
"Machine learning makes predictions using
data.",
"Deep learning is a branch of machine
learning.",
"Natural language processing helps chatbots
understand text.",
"Computer vision enables machines to interpret
images."
]

# Convert documents into vector embeddings

document_vectors = model.encode(documents,
convert_to_numpy=True)

print("Vector Shape:", document_vectors.shape) #

Example: (5, 768)

✅ This converts text into high-dimensional vectors (BERT

embeddings).

Step 3: Build a Semantic Search Engine

Using Nearest Neighbors
Now, we use Scikit-learn’s Nearest Neighbors Search to find the most
similar documents.

python
Tool Best For Key Strength
CopyEdit
from sklearn.neighbors import NearestNeighbors

# Initialize Nearest Neighbors model with cosine

similarity
nn_model = NearestNeighbors(n_neighbors=3,
metric="cosine")

# Fit model on document embeddings

nn_model.fit(document_vectors)

✅ The model is now trained to retrieve semantically similar text.

Step 4: Perform a Semantic Search

Query
Now, we take a user query, convert it to a BERT vector, and find
similar results.

python
CopyEdit
# Define a user query
query = "How does AI impact industries?"

# Convert query into a vector

query_vector = model.encode([query],
convert_to_numpy=True)

# Search for top 3 most similar documents

distances, indices =
nn_model.kneighbors(query_vector)

# Display results
print("\nTop Search Results:")
for i, idx in enumerate(indices[0]):
print(f"{i+1}. {documents[idx]} (Score: {1 -
distances[0][i]})")

✅ The most contextually similar documents are returned.

Step 5: Improve Search Results Using

Cosine Similarity
We can manually compute cosine similarity for better ranking.

python
CopyEdit
Tool Best For Key Strength
from sklearn.metrics.pairwise import
cosine_similarity

# Compute cosine similarity between query and all

document vectors
cosine_similarities =
cosine_similarity(query_vector, document_vectors)

# Get top 3 results

top_indices = cosine_similarities.argsort()[0][-3:]
[::-1]

print("\nTop Search Results (Using Cosine

Similarity):")
for i, idx in enumerate(top_indices):
print(f"{i+1}. {documents[idx]} (Score:
{cosine_similarities[0][idx]})")

✅ This ensures better-ranked search results.

Step 6: Save and Load the Search Model

To avoid recomputation, save and reload the model.

Save the Model

python
CopyEdit
import pickle

# Save embeddings and model

with open("semantic_bert_model.pkl", "wb") as f:
pickle.dump((documents, document_vectors,
nn_model), f)

Load the Model

python
CopyEdit
# Load saved model
with open("semantic_bert_model.pkl", "rb") as f:
documents, document_vectors, nn_model =
pickle.load(f)

✅ This allows reusing the semantic search model without retraining.

Final Thoughts
Feature Implementation

Search Type Semantic (context-based)

Tool Best For Key Strength

Model Used bert-base-nli-mean-tokens (SBERT)

Search Method Nearest Neighbors + Cosine Similarity

Performance Accurate for contextual search

Scalability Works on large datasets

Next Steps 🚀

 Try larger BERT models (roberta-base-nli-stsb-mean-

tokens, all-mpnet-base-v2).
 Implement FAISS for large-scale search.
 Deploy as a REST API with Flask or FastAPI.

Would you like help with integrating this into a web app? 🚀

You said:
another interesting sample for semantic search using above

ChatGPT said:

Semantic Search Example: Finding Similar Research Papers

Using BERT

Let's build a semantic search system that allows users to search for
research papers based on their abstract or keywords. This is useful for
academic search engines, knowledge management systems, and AI-
powered document retrieval.

📌 Step 1: Install and Set Up Anaconda

Environment
If you haven’t already set up an environment, run:

bash
CopyEdit
conda create --name semantic_research python=3.9 -y
conda activate semantic_research

Install Dependencies
bash
CopyEdit
pip install torch torchvision torchaudio
Tool Best For Key Strength
pip install sentence-transformers scikit-learn numpy
pandas

📌 Step 2: Load Research Paper Data

We'll use a sample dataset of research paper titles and abstracts.

python
CopyEdit
import pandas as pd

# Sample dataset of research papers (title +

abstract)
data = [
{"title": "Deep Learning for Computer Vision",
"abstract": "Deep learning techniques such as
CNNs are widely used for image recognition and
object detection."},

{"title": "Natural Language Processing for

Chatbots",
"abstract": "NLP models, including
transformers, help chatbots understand and generate
human-like text."},

{"title": "Quantum Computing and Cryptography",

"abstract": "Quantum algorithms promise
breakthroughs in encryption and secure
communications."},

{"title": "Reinforcement Learning in Robotics",

"abstract": "Robotic control and automation
benefit from reinforcement learning techniques like
Q-learning."},

{"title": "Neural Networks for Financial

Forecasting",
"abstract": "Predictive models using deep
neural networks improve stock market analysis and
risk assessment."}
]

# Convert dataset into DataFrame

df = pd.DataFrame(data)
df["text"] = df["title"] + ". " + df["abstract"] #
Combine title & abstract

print(df.head())

✅ This dataset simulates research papers, where semantic search will

help find the most relevant paper.
Tool Best For Key Strength

📌 Step 3: Generate Semantic

Embeddings with BERT
We convert titles and abstracts into vector representations using
SBERT.

python
CopyEdit
from sentence_transformers import
SentenceTransformer

# Load BERT-based model

model = SentenceTransformer("bert-base-nli-mean-
tokens")

# Convert research papers to embeddings

document_vectors = model.encode(df["text"].tolist(),
convert_to_numpy=True)

print("Vector Shape:", document_vectors.shape) #

Example: (5, 768)

✅ Now, each research paper is represented as a high-dimensional

vector.

📌 Step 4: Build the Semantic Search

Model
We use Nearest Neighbors Search to find similar research papers.

python
CopyEdit
from sklearn.neighbors import NearestNeighbors

# Initialize Nearest Neighbors model with cosine

similarity
nn_model = NearestNeighbors(n_neighbors=3,
metric="cosine")

# Fit the model on research paper embeddings

nn_model.fit(document_vectors)

✅ The model is now trained for semantic search on research papers.

📌 Step 5: Search for Similar Research

Tool Best For Key Strength

Papers
Now, let's enter a query (e.g., "AI for chatbots") and retrieve the most
relevant papers.

python
CopyEdit
# Define a research query
query = "Conversational AI using deep learning"

# Convert query to vector

query_vector = model.encode([query],
convert_to_numpy=True)

# Perform search
distances, indices =
nn_model.kneighbors(query_vector)

# Display search results

print("\nTop 3 Relevant Research Papers:")
for i, idx in enumerate(indices[0]):
print(f"{i+1}. {df.iloc[idx]['title']} (Score:
{1 - distances[0][i]})")

✅ This will return the most contextually similar research papers based
on meaning rather than just keywords.

📌 Step 6: Save and Load the Semantic

Search Model
To reuse the model without recomputation:

Save the Model

python
CopyEdit
import pickle

# Save model, document vectors, and dataset

with open("semantic_research_model.pkl", "wb") as f:
pickle.dump((df, document_vectors, nn_model), f)

Load the Model

python
CopyEdit
# Load saved model
with open("semantic_research_model.pkl", "rb") as f:
df, document_vectors, nn_model = pickle.load(f)

✅ This allows us to quickly load and perform searches without

Tool Best For Key Strength

recomputing embeddings.

📌 Step 7: Try More Complex Queries

Now, let's test some realistic academic queries.

python
CopyEdit
queries = [
"How can we use AI in finance?",
"Quantum encryption methods",
"Reinforcement learning for automation"
]

# Process each query

for query in queries:
query_vector = model.encode([query],
convert_to_numpy=True)
distances, indices =
nn_model.kneighbors(query_vector)

print(f"\n🔎 Query: {query}")

for i, idx in enumerate(indices[0]):
print(f"{i+1}. {df.iloc[idx]['title']}
(Score: {1 - distances[0][i]})")

✅ This tests semantic retrieval on diverse AI-related topics.

📌 Final Thoughts
Feature Implementation

Use Case Research Paper Search

Model Used bert-base-nli-mean-tokens (SBERT)

Search Type Semantic (meaning-based)

Performance Fast & Contextually Relevant

Scalability Works on Large Paper Repositories

CS411 Final Term MCQs Merged by Masters
No ratings yet
CS411 Final Term MCQs Merged by Masters
357 pages
Jman - Cse
No ratings yet
Jman - Cse
49 pages
Current Affairs - MAY 2025
No ratings yet
Current Affairs - MAY 2025
127 pages
Cognizant Communication Assessment Questions
100% (1)
Cognizant Communication Assessment Questions
4 pages
Genai Manual
No ratings yet
Genai Manual
17 pages
Prospectus Pharm D
No ratings yet
Prospectus Pharm D
18 pages
Lab Manual Generative AI
No ratings yet
Lab Manual Generative AI
34 pages
CEH Practical Notes - ?????????
No ratings yet
CEH Practical Notes - ?????????
40 pages
Dokumen - Pub - Natural Language Processing Practical Using Transformers With Python
No ratings yet
Dokumen - Pub - Natural Language Processing Practical Using Transformers With Python
275 pages
Dynamic Business Strategy Competing in A Fastchanging Uncertain
No ratings yet
Dynamic Business Strategy Competing in A Fastchanging Uncertain
134 pages
Samaksh Gupta Programming Ass. IR
No ratings yet
Samaksh Gupta Programming Ass. IR
13 pages
Sona HWI Final Registration
No ratings yet
Sona HWI Final Registration
30 pages
License Guide Bak
No ratings yet
License Guide Bak
27 pages
1 CNC Press Break
No ratings yet
1 CNC Press Break
27 pages
Exactive Series Manbre en
No ratings yet
Exactive Series Manbre en
258 pages
SONA Round1 Shortlisted Candidates
No ratings yet
SONA Round1 Shortlisted Candidates
2 pages
Cognizant DN Java FSE Progress Report
No ratings yet
Cognizant DN Java FSE Progress Report
4 pages
JD Software Engineer JMAN 2025
No ratings yet
JD Software Engineer JMAN 2025
1 page
AI Lab Question Bank
No ratings yet
AI Lab Question Bank
18 pages
JMAN - CSE New List
No ratings yet
JMAN - CSE New List
28 pages
AC-coupled PV With Fronius PV Inverters: Cerbo GX Color Control GX
No ratings yet
AC-coupled PV With Fronius PV Inverters: Cerbo GX Color Control GX
14 pages
RAG Training NEW
No ratings yet
RAG Training NEW
47 pages
NLP Transformer-Based Models Used For Sentiment Analysis: 1. BERT
No ratings yet
NLP Transformer-Based Models Used For Sentiment Analysis: 1. BERT
98 pages
ML - Project Report PDF
No ratings yet
ML - Project Report PDF
24 pages
Ba LLMS W2 S2 2024 2025
No ratings yet
Ba LLMS W2 S2 2024 2025
47 pages
Protocols and Switching
No ratings yet
Protocols and Switching
48 pages
SONA Institute of Technology - Second Round Shortlisted Candidates
No ratings yet
SONA Institute of Technology - Second Round Shortlisted Candidates
6 pages
Sona College of Technology, Salem
No ratings yet
Sona College of Technology, Salem
6 pages
GEN AI Series - Enterprise Unified Semantic Search: Concepts, Implementation, and Source Code Insights
No ratings yet
GEN AI Series - Enterprise Unified Semantic Search: Concepts, Implementation, and Source Code Insights
39 pages
Sona College-Interview Reschedule
No ratings yet
Sona College-Interview Reschedule
5 pages
Miniproject NLP
No ratings yet
Miniproject NLP
22 pages
DN4.0 Deepskilling Handbook Java FSE
No ratings yet
DN4.0 Deepskilling Handbook Java FSE
33 pages
29 Khattab CS224U IR Part 5
No ratings yet
29 Khattab CS224U IR Part 5
18 pages
C++ in Huge AAA Games - Nicolas Fleury - CppCon 2014
No ratings yet
C++ in Huge AAA Games - Nicolas Fleury - CppCon 2014
51 pages
PDS OperaManEPAS3W Us9901
No ratings yet
PDS OperaManEPAS3W Us9901
58 pages
Deep Learning For Industries
No ratings yet
Deep Learning For Industries
45 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Information Retrival
No ratings yet
Information Retrival
43 pages
Retrieval Augmented Language Model (Ralm) : Module #3 - Langchain
No ratings yet
Retrieval Augmented Language Model (Ralm) : Module #3 - Langchain
54 pages
CTS 3
No ratings yet
CTS 3
16 pages
Import Gensim
No ratings yet
Import Gensim
8 pages
NLP Record
No ratings yet
NLP Record
16 pages
Generative AI 2
No ratings yet
Generative AI 2
24 pages
Sona Hwi Interview Shortlist
No ratings yet
Sona Hwi Interview Shortlist
2 pages
Gen AI Lab
No ratings yet
Gen AI Lab
22 pages
Gen AI Micro
No ratings yet
Gen AI Micro
15 pages
Table Analysis
No ratings yet
Table Analysis
22 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Practical and Effective Neural NER
No ratings yet
Practical and Effective Neural NER
31 pages
Enterprise EM
No ratings yet
Enterprise EM
9 pages
Oppo CPH2591 Op575dl1 2024-06-07 11-14-26
No ratings yet
Oppo CPH2591 Op575dl1 2024-06-07 11-14-26
24 pages
CrateDB and LangChain
No ratings yet
CrateDB and LangChain
14 pages
OpenAI Deep Research
No ratings yet
OpenAI Deep Research
12 pages
(2023) Generation of Highlights From Research Papers Using Pointer-Generator Networks and SciBERT Embeddings
No ratings yet
(2023) Generation of Highlights From Research Papers Using Pointer-Generator Networks and SciBERT Embeddings
19 pages
Cypress Programmer User Guide
No ratings yet
Cypress Programmer User Guide
28 pages
Capgemini Interview Questions
No ratings yet
Capgemini Interview Questions
6 pages
NLP A2
No ratings yet
NLP A2
7 pages
Lab Ai
No ratings yet
Lab Ai
7 pages
Rank
No ratings yet
Rank
9 pages
Data Mining Report
No ratings yet
Data Mining Report
17 pages
Project Report
No ratings yet
Project Report
8 pages
Generative Certification Notes-1
No ratings yet
Generative Certification Notes-1
22 pages
Gena I Short
No ratings yet
Gena I Short
6 pages
Deep Learning For Information Retrieval
No ratings yet
Deep Learning For Information Retrieval
136 pages
Generative AI
No ratings yet
Generative AI
16 pages
Record Client Request Requirements Information Sheet
No ratings yet
Record Client Request Requirements Information Sheet
9 pages
Articles Search Project
No ratings yet
Articles Search Project
8 pages
Sumati
No ratings yet
Sumati
10 pages
Infosys Addon
No ratings yet
Infosys Addon
8 pages
SemanticAI NEW
No ratings yet
SemanticAI NEW
4 pages
Yann Debray - 1714613827618
No ratings yet
Yann Debray - 1714613827618
16 pages
Technicolor Dga4231
No ratings yet
Technicolor Dga4231
8 pages
Genaii
No ratings yet
Genaii
5 pages
PW4 (F1002, F1005, F1020)
No ratings yet
PW4 (F1002, F1005, F1020)
22 pages
Gatling Introduction For Java Section-A
No ratings yet
Gatling Introduction For Java Section-A
17 pages
AI Concepts and Viva Prep Updated
No ratings yet
AI Concepts and Viva Prep Updated
16 pages
Semantic Search Example 2
No ratings yet
Semantic Search Example 2
4 pages
CSE-Batch 3 List
No ratings yet
CSE-Batch 3 List
1 page
Cyber-Enabled Crimes vs. Cyber-Dependent Crimes
No ratings yet
Cyber-Enabled Crimes vs. Cyber-Dependent Crimes
8 pages
Zentropy Addon
No ratings yet
Zentropy Addon
4 pages
Predicting The Future of AI With AI: High-Quality Link Prediction in An Exponentially Growing Knowledge Network
No ratings yet
Predicting The Future of AI With AI: High-Quality Link Prediction in An Exponentially Growing Knowledge Network
13 pages
Data aug-IR
No ratings yet
Data aug-IR
15 pages
Install Milvus and Attu On Docker
No ratings yet
Install Milvus and Attu On Docker
5 pages
CATERPILLAR - Individual Reflective Report
100% (1)
CATERPILLAR - Individual Reflective Report
12 pages
Ali Ahmad and Rameez - Project - Proposal
No ratings yet
Ali Ahmad and Rameez - Project - Proposal
5 pages
Vacation Training Cse - Iqmath (Paid and Attendee List)
No ratings yet
Vacation Training Cse - Iqmath (Paid and Attendee List)
4 pages
Sona L1 Assessment
No ratings yet
Sona L1 Assessment
2 pages
Gen Ai Lab
No ratings yet
Gen Ai Lab
3 pages
DevRev Customer Support Engineering Intern
No ratings yet
DevRev Customer Support Engineering Intern
3 pages
Game Requirements For Venge Io (Clone)
No ratings yet
Game Requirements For Venge Io (Clone)
3 pages
Associate Data Scientist
No ratings yet
Associate Data Scientist
2 pages
CSD
No ratings yet
CSD
2 pages
Guidelines For Open Elective Registration MAY 2025
No ratings yet
Guidelines For Open Elective Registration MAY 2025
6 pages
Associate Software Engineer
No ratings yet
Associate Software Engineer
2 pages
Fine-Tuned Vs RAG Short Notes ?
No ratings yet
Fine-Tuned Vs RAG Short Notes ?
25 pages
Composition ct2
No ratings yet
Composition ct2
25 pages
Programming in Java
No ratings yet
Programming in Java
1 page
Programming in Java 250525 093638
No ratings yet
Programming in Java 250525 093638
1 page
1 - 6 Years Experience 2nd
No ratings yet
1 - 6 Years Experience 2nd
2 pages
January Budget 2021
No ratings yet
January Budget 2021
6 pages
User Behavior Analytics
No ratings yet
User Behavior Analytics
2 pages
VTOC (Comp. Student Record Mngt. System)
No ratings yet
VTOC (Comp. Student Record Mngt. System)
8 pages
Honours Courses - Registration Semester 7
No ratings yet
Honours Courses - Registration Semester 7
1 page
Topic Modeling With BERT. - Towards Data Science
No ratings yet
Topic Modeling With BERT. - Towards Data Science
9 pages
b75mvh Plus
No ratings yet
b75mvh Plus
14 pages
5th and 6th Topic
No ratings yet
5th and 6th Topic
8 pages
OE Regn - Coordinator Instructions
No ratings yet
OE Regn - Coordinator Instructions
1 page
4th Sem 2nd Internal Exam Routine
No ratings yet
4th Sem 2nd Internal Exam Routine
1 page
Introduction To Deep Learning-Session3: Ravi Shukla
No ratings yet
Introduction To Deep Learning-Session3: Ravi Shukla
21 pages
Finkster-Python Cheatsheet
No ratings yet
Finkster-Python Cheatsheet
11 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
19 pages
Template Script Mikrotik Routing Game Online: Ip Address Lokal - . Gateway Modem Game .
No ratings yet
Template Script Mikrotik Routing Game Online: Ip Address Lokal - . Gateway Modem Game .
6 pages
Linear and Nonlinear Texts
No ratings yet
Linear and Nonlinear Texts
5 pages
Pragmatic Microservices with C# and Azure: Build, deploy, and scale microservices efficiently to meet modern software demands
From Everand
Pragmatic Microservices with C# and Azure: Build, deploy, and scale microservices efficiently to meet modern software demands
Christian Nagel
No ratings yet
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
From Everand
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
Neylson Crepalde
No ratings yet
Practical Python Backend Programming
From Everand
Practical Python Backend Programming
Tim Peters
No ratings yet
Amazon SimpleDB Developer Guide
From Everand
Amazon SimpleDB Developer Guide
Prabhakar Chaganti
No ratings yet
Advanced JavaScript Design Patterns
From Everand
Advanced JavaScript Design Patterns
Hernando Abella
No ratings yet
Practical Python Backend Programming: Build Flask and FastAPI applications, asynchronous programming, containerization and deploy apps on cloud
From Everand
Practical Python Backend Programming: Build Flask and FastAPI applications, asynchronous programming, containerization and deploy apps on cloud
Tim Peters
No ratings yet
Frank Kane's Taming Big Data with Apache Spark and Python
From Everand
Frank Kane's Taming Big Data with Apache Spark and Python
Frank Kane
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Semantic Search

Uploaded by

Semantic Search

Uploaded by

Semantic Search:

Which One Should You Use?

BERT (Bidirectional Encoder Representations from Transformers) is a

Step 1: Set Up Anaconda Environment

Install Required Packages

 torch → Required for deep learning models.

Step 2: Load a Pre-Trained BERT

# Load a pre-trained BERT model

# Example documents for semantic search

# Convert documents into vector embeddings

print("Vector Shape:", document_vectors.shape) #

✅ This converts text into high-dimensional vectors (BERT

Step 3: Build a Semantic Search Engine

# Initialize Nearest Neighbors model with cosine

# Fit model on document embeddings

✅ The model is now trained to retrieve semantically similar text.

Step 4: Perform a Semantic Search

# Convert query into a vector

# Search for top 3 most similar documents

✅ The most contextually similar documents are returned.

Step 5: Improve Search Results Using

# Compute cosine similarity between query and all

# Get top 3 results

print("\nTop Search Results (Using Cosine

✅ This ensures better-ranked search results.

Step 6: Save and Load the Search Model

Save the Model

# Save embeddings and model

Load the Model

✅ This allows reusing the semantic search model without retraining.

Search Type Semantic (context-based)

Model Used bert-base-nli-mean-tokens (SBERT)

Search Method Nearest Neighbors + Cosine Similarity

Performance Accurate for contextual search

Scalability Works on large datasets

 Try larger BERT models (roberta-base-nli-stsb-mean-

Semantic Search Example: Finding Similar Research Papers

📌 Step 1: Install and Set Up Anaconda

📌 Step 2: Load Research Paper Data

# Sample dataset of research papers (title +

{"title": "Natural Language Processing for

{"title": "Quantum Computing and Cryptography",

{"title": "Reinforcement Learning in Robotics",

{"title": "Neural Networks for Financial

# Convert dataset into DataFrame

✅ This dataset simulates research papers, where semantic search will

📌 Step 3: Generate Semantic

# Load BERT-based model

# Convert research papers to embeddings

print("Vector Shape:", document_vectors.shape) #

✅ Now, each research paper is represented as a high-dimensional

📌 Step 4: Build the Semantic Search

# Initialize Nearest Neighbors model with cosine

# Fit the model on research paper embeddings

✅ The model is now trained for semantic search on research papers.

📌 Step 5: Search for Similar Research

# Convert query to vector

# Display search results

📌 Step 6: Save and Load the Semantic

Save the Model

# Save model, document vectors, and dataset

Load the Model

✅ This allows us to quickly load and perform searches without

📌 Step 7: Try More Complex Queries

# Process each query

print(f"\n🔎 **Query:** {query}")

✅ This tests semantic retrieval on diverse AI-related topics.

Use Case Research Paper Search

Model Used bert-base-nli-mean-tokens (SBERT)

Search Type Semantic (meaning-based)

Performance Fast & Contextually Relevant

Scalability Works on Large Paper Repositories

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

print(f"\n🔎 Query: {query}")