0% found this document useful (0 votes)

4 views3 pages

Synopsis

The document outlines a project synopsis for a Semantic Search Engine developed as a web application using Streamlit, allowing users to upload PDF files and query their content through natural language. Key features include PDF parsing, text chunking, semantic search using OpenAI embeddings, and dynamic response generation with a live user interface. Future enhancements may include support for additional file formats, summarization, and user access control.

Uploaded by

himanshukangar108

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views3 pages

Synopsis

Uploaded by

himanshukangar108

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Jai Parkash Mukand Lal Innovative

Engineering & Technology Institute

Affiliated to Kurukshetra University, Kurukshetra

Submitted In Partial Fulfilment of the Requirements

For the award of the

“Degree Of Bachelor of Technology in Computer Science & Engg. (AIML)”

PROJECT-II SYNOPSIS

Semantic Search Engine

Submitted By:
Himanshu Kangar (8521220)

Manish (821229)

Under the Support and Guidance of

Mr. Sumit Kumar Mahana

[AP, CSE]

Department of Computer Science and Engineering

(AIML)
Semantic Search Engine
Overview
The Semantic Search Engine is a Streamlit-based web application that enables users to upload PDF files and query their

content using natural language. It integrates LangChain, OpenAI embeddings, and a Chroma vector database to perform

semantic similarity searches and generate intelligent responses using an LLM.

Key Features

- PDF Upload & Parsing: Users can upload PDFs, which are parsed and loaded usingPyPDFLoader.

- Text Chunking: Documents are split into overlapping chunks using RecursiveCharacterTextSplitterfor optimal

embedding and retrieval.

- Embeddings: Chunks are converted to vector representations using OpenAIEmbeddings

(text-embedding-3-large).

- Vector Store: Embeddings are stored in a persistent Chroma vector store for efficient retrieval.

- Semantic Search: User queries are converted to vectors, and relevant document chunks areretrieved using similarity

search (k=1).

- LLM Response: Retrieved chunks and user prompts are passed to ChatOpenAI (gpt-4o-mini) usinga

ChatPromptTemplate for dynamic response generation.

- Live UI: Built with Streamlit, the app features real-time file uploads, prompt inputs, and streaminganswers.

Error Handling and Logging

- Uses try-except-finally blocks to manage errors during file processing.

- Ensures temporary files are deleted post-processing.

- Debug logs are printed to the console for traceability.

Deployment Instructions

1. Install Dependencies:

pip install streamlit langchain openai chromadb

2. Set OpenAI API Key:

export OPENAI_API_KEY=your_key_here

3. Run App: streamlit run app.py

Performance Considerations

- Embedding generation is resource-intensive for large documents.

- Chunk size and retrieval parameter k should be tuned for performance.

- Chroma's persistent store avoids recomputing embeddings across sessions.

Conclusion & Future Scope

Conclusion

This project showcases the seamless integration of document processing, vector search, and AI interaction. With minimal

user input, the app delivers accurate, context-aware answers from uploaded documentsdemonstrating real-world potential

in research, legal, and enterprise settings.

Future Scope

- Support additional file formats (DOCX, TXT).

- Add summarization, filtering, and metadata tagging.

- Enable asynchronous uploads and parallel processing.

- Implement user roles, access control, and analytics.

References

- LangChain Documentation: https://docs.langchain.com/

- OpenAI Embeddings: https://platform.openai.com/docs/guides/embeddings

- Chroma Vector DB: https://docs.trychroma.com/

- Streamlit Docs: https://docs.streamlit.io/

How To Create A Private ChatGPT With Your Own Data
No ratings yet
How To Create A Private ChatGPT With Your Own Data
11 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Assignment 1. Module No. 1 Data Structures and Algorithm (BOTE)
No ratings yet
Assignment 1. Module No. 1 Data Structures and Algorithm (BOTE)
2 pages
Oracle PL SQL Interview Questions For 3 - Years Experience
50% (2)
Oracle PL SQL Interview Questions For 3 - Years Experience
89 pages
Mini Project Docubot Power Point
No ratings yet
Mini Project Docubot Power Point
17 pages
An Effective Query System Using Llms and Langchain IJERTV12IS060161
No ratings yet
An Effective Query System Using Llms and Langchain IJERTV12IS060161
4 pages
Document RAG Assignment
No ratings yet
Document RAG Assignment
4 pages
RP Journal-2
No ratings yet
RP Journal-2
54 pages
Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores
No ratings yet
Flowise AI Tutorial #3 File Loaders, Text Splitters, Embeddings & Vector Stores
3 pages
An Effective Query System Using Llms and Langchain IJERTV12IS060161
No ratings yet
An Effective Query System Using Llms and Langchain IJERTV12IS060161
3 pages
GenAI Final Project
No ratings yet
GenAI Final Project
8 pages
Ali Ahmad and Rameez - Project - Proposal
No ratings yet
Ali Ahmad and Rameez - Project - Proposal
5 pages
Project Proposal
No ratings yet
Project Proposal
10 pages
Blogging Platform - Requirements Specification
No ratings yet
Blogging Platform - Requirements Specification
28 pages
An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain For Enhanced Data Retrieval (#1602597) - 4445287
No ratings yet
An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain For Enhanced Data Retrieval (#1602597) - 4445287
13 pages
Sanjulika Sharma MLE Data Scientist Resume
No ratings yet
Sanjulika Sharma MLE Data Scientist Resume
2 pages
Finally Final
No ratings yet
Finally Final
18 pages
Sithfal-Task2 Explation Matter
No ratings yet
Sithfal-Task2 Explation Matter
6 pages
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
Akram cv1 0
No ratings yet
Akram cv1 0
2 pages
Projects Groups Spring 24-25
No ratings yet
Projects Groups Spring 24-25
3 pages
Setup For Processing and Loading Co
No ratings yet
Setup For Processing and Loading Co
1 page
Department of Computer Engineering: Unigen Ai: A Unified Generative Ai Using Artificial Intelligence and Web Technology
No ratings yet
Department of Computer Engineering: Unigen Ai: A Unified Generative Ai Using Artificial Intelligence and Web Technology
24 pages
Academic Research Assistance 1716570959
No ratings yet
Academic Research Assistance 1716570959
13 pages
Unveiling The PDF Content Query System: Intelligent Document Search
No ratings yet
Unveiling The PDF Content Query System: Intelligent Document Search
14 pages
Finkster-Python Cheatsheet
No ratings yet
Finkster-Python Cheatsheet
11 pages
Project
No ratings yet
Project
4 pages
Assignment #1 Text Retrieval & Search Engine
No ratings yet
Assignment #1 Text Retrieval & Search Engine
6 pages
Langchain App Design
No ratings yet
Langchain App Design
7 pages
LLM For QnA Proposal
No ratings yet
LLM For QnA Proposal
12 pages
Take-Home Challenge
No ratings yet
Take-Home Challenge
3 pages
Internship Report Final
No ratings yet
Internship Report Final
26 pages
AI Projects P Jaswanth Krishna
No ratings yet
AI Projects P Jaswanth Krishna
2 pages
B.Tech 8 Semester Project 2012: Anshul Goyal (20084027) Shrish Chandra Mishra (20084050) Amit Singh (20084057)
No ratings yet
B.Tech 8 Semester Project 2012: Anshul Goyal (20084027) Shrish Chandra Mishra (20084050) Amit Singh (20084057)
26 pages
Downloaded From: Https://ray - Yorksj.ac - Uk/id/eprint/9863/: Institutional Repository Policy Statement
No ratings yet
Downloaded From: Https://ray - Yorksj.ac - Uk/id/eprint/9863/: Institutional Repository Policy Statement
18 pages
Ai Report FINAL
No ratings yet
Ai Report FINAL
26 pages
Synopsis of Final Year Project (Amaan)
No ratings yet
Synopsis of Final Year Project (Amaan)
13 pages
NLP Mini Project
No ratings yet
NLP Mini Project
19 pages
Interview Task 1
No ratings yet
Interview Task 1
2 pages
Post-Interview Evaluation Test1
No ratings yet
Post-Interview Evaluation Test1
2 pages
Langchain Onepager
No ratings yet
Langchain Onepager
1 page
Thesis RAG Retrieval Augmented Generation For The IR-Anthology
No ratings yet
Thesis RAG Retrieval Augmented Generation For The IR-Anthology
83 pages
Gen Ai - Sample Project Ideas
No ratings yet
Gen Ai - Sample Project Ideas
2 pages
Project-Ii Ssed
No ratings yet
Project-Ii Ssed
43 pages
Spotlight AI BulletPoints
No ratings yet
Spotlight AI BulletPoints
12 pages
Project Seminar
No ratings yet
Project Seminar
12 pages
Anas Anwer
No ratings yet
Anas Anwer
2 pages
Iste Search Engine
No ratings yet
Iste Search Engine
6 pages
Labsheet 9
No ratings yet
Labsheet 9
2 pages
AI Database Query System
No ratings yet
AI Database Query System
7 pages
Resume Template For AI
No ratings yet
Resume Template For AI
4 pages
QA Using Gemini Langchain ChromaDB PDF
No ratings yet
QA Using Gemini Langchain ChromaDB PDF
2 pages
Nidhish Resume NC
No ratings yet
Nidhish Resume NC
1 page
Ajinkya - Compressed 1
No ratings yet
Ajinkya - Compressed 1
15 pages
Assignment
No ratings yet
Assignment
5 pages
GenAI PDF
No ratings yet
GenAI PDF
34 pages
Assignment
No ratings yet
Assignment
5 pages
Assignment For Applied AI Engineer (RAG Pipeline) Role
No ratings yet
Assignment For Applied AI Engineer (RAG Pipeline) Role
4 pages
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Resume RS
No ratings yet
Resume RS
1 page
Discord Taz
No ratings yet
Discord Taz
3 pages
Pinnacle - Plus Projects
No ratings yet
Pinnacle - Plus Projects
12 pages
Test 02 - Attempt Review
No ratings yet
Test 02 - Attempt Review
8 pages
MCS-220 2024-25 em
No ratings yet
MCS-220 2024-25 em
60 pages
1274506-Class 12 CS - A1 - 2023-24 - Answers
No ratings yet
1274506-Class 12 CS - A1 - 2023-24 - Answers
9 pages
Transaction Management Database Recovery
No ratings yet
Transaction Management Database Recovery
19 pages
C - Manuals - Access - Access 2016 Part 3 PDF
No ratings yet
C - Manuals - Access - Access 2016 Part 3 PDF
182 pages
Killing 1 Processes (PIDS - ) (Process by Index) in Order To Remove Hung Processes. Requested by OS PR - Oracle Forums
No ratings yet
Killing 1 Processes (PIDS - ) (Process by Index) in Order To Remove Hung Processes. Requested by OS PR - Oracle Forums
3 pages
Project Introduction: Chinook Database
No ratings yet
Project Introduction: Chinook Database
42 pages
Talend
No ratings yet
Talend
6 pages
Rupesh Agarwal Resume Updated
No ratings yet
Rupesh Agarwal Resume Updated
1 page
Ranjeet Practical Project File
No ratings yet
Ranjeet Practical Project File
50 pages
Dwdmsyll Merged
No ratings yet
Dwdmsyll Merged
3 pages
Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 1
No ratings yet
Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 1
12 pages
SAD Unit 4 Distributed Database Components, Types
No ratings yet
SAD Unit 4 Distributed Database Components, Types
8 pages
Dynamo: Amazon's Highly Available Key-Value Store
No ratings yet
Dynamo: Amazon's Highly Available Key-Value Store
21 pages
Using 9.3 Functionality and Scripts: Calculating Transportation Network Slope and Travel Parameters
No ratings yet
Using 9.3 Functionality and Scripts: Calculating Transportation Network Slope and Travel Parameters
3 pages
Session-10-Data Loading in Snowflake
No ratings yet
Session-10-Data Loading in Snowflake
5 pages
ORACLE NOTES - For - FULL STACK
No ratings yet
ORACLE NOTES - For - FULL STACK
113 pages
Cosmosdb
No ratings yet
Cosmosdb
46 pages
JSP MCQs - Final1
No ratings yet
JSP MCQs - Final1
12 pages
Everything You Need To Know: @iammukeshm
No ratings yet
Everything You Need To Know: @iammukeshm
30 pages
WHD Upgrade Guide
No ratings yet
WHD Upgrade Guide
11 pages
SNOWPIPE-Continuous Data Loading in Snowflake
No ratings yet
SNOWPIPE-Continuous Data Loading in Snowflake
5 pages
SQL Quick Study Guide
No ratings yet
SQL Quick Study Guide
2 pages
Ds Database Automation Pro Service
No ratings yet
Ds Database Automation Pro Service
4 pages
MySQL Notes
No ratings yet
MySQL Notes
120 pages
Basics of PI SQL
No ratings yet
Basics of PI SQL
11 pages
DBBL PO (Software) Question Pattern
No ratings yet
DBBL PO (Software) Question Pattern
3 pages
The Database Approach
No ratings yet
The Database Approach
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Synopsis

Uploaded by

Synopsis

Uploaded by

Jai Parkash Mukand Lal Innovative

Engineering & Technology Institute

Submitted In Partial Fulfilment of the Requirements

For the award of the

“Degree Of Bachelor of Technology in Computer Science & Engg. (AIML)”

Semantic Search Engine

Under the Support and Guidance of

Mr. Sumit Kumar Mahana

Department of Computer Science and Engineering

semantic similarity searches and generate intelligent responses using an LLM.

embedding and retrieval.

- Embeddings: Chunks are converted to vector representations using OpenAIEmbeddings

ChatPromptTemplate for dynamic response generation.

Error Handling and Logging

- Uses try-except-finally blocks to manage errors during file processing.

- Ensures temporary files are deleted post-processing.

- Debug logs are printed to the console for traceability.

pip install streamlit langchain openai chromadb

2. Set OpenAI API Key:

3. Run App: streamlit run app.py

- Embedding generation is resource-intensive for large documents.

- Chunk size and retrieval parameter k should be tuned for performance.

- Chroma's persistent store avoids recomputing embeddings across sessions.

Conclusion & Future Scope

in research, legal, and enterprise settings.

- Support additional file formats (DOCX, TXT).

- Add summarization, filtering, and metadata tagging.

- Enable asynchronous uploads and parallel processing.

- Implement user roles, access control, and analytics.

- LangChain Documentation: https://docs.langchain.com/

- OpenAI Embeddings: https://platform.openai.com/docs/guides/embeddings

- Chroma Vector DB: https://docs.trychroma.com/

- Streamlit Docs: https://docs.streamlit.io/

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.