0% found this document useful (0 votes)
4 views3 pages

Synopsis

The document outlines a project synopsis for a Semantic Search Engine developed as a web application using Streamlit, allowing users to upload PDF files and query their content through natural language. Key features include PDF parsing, text chunking, semantic search using OpenAI embeddings, and dynamic response generation with a live user interface. Future enhancements may include support for additional file formats, summarization, and user access control.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

Synopsis

The document outlines a project synopsis for a Semantic Search Engine developed as a web application using Streamlit, allowing users to upload PDF files and query their content through natural language. Key features include PDF parsing, text chunking, semantic search using OpenAI embeddings, and dynamic response generation with a live user interface. Future enhancements may include support for additional file formats, summarization, and user access control.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Jai Parkash Mukand Lal Innovative

Engineering & Technology Institute


Affiliated to Kurukshetra University, Kurukshetra

Submitted In Partial Fulfilment of the Requirements

For the award of the

“Degree Of Bachelor of Technology in Computer Science & Engg. (AIML)”

PROJECT-II SYNOPSIS

ON

Semantic Search Engine


Submitted By:
Himanshu Kangar (8521220)

Manish (821229)

Under the Support and Guidance of

Mr. Sumit Kumar Mahana

[AP, CSE]

Department of Computer Science and Engineering


(AIML)
Semantic Search Engine
Overview
The Semantic Search Engine is a Streamlit-based web application that enables users to upload PDF files and query their

content using natural language. It integrates LangChain, OpenAI embeddings, and a Chroma vector database to perform

semantic similarity searches and generate intelligent responses using an LLM.

Key Features

- PDF Upload & Parsing: Users can upload PDFs, which are parsed and loaded usingPyPDFLoader.

- Text Chunking: Documents are split into overlapping chunks using RecursiveCharacterTextSplitterfor optimal

embedding and retrieval.

- Embeddings: Chunks are converted to vector representations using OpenAIEmbeddings

(text-embedding-3-large).

- Vector Store: Embeddings are stored in a persistent Chroma vector store for efficient retrieval.

- Semantic Search: User queries are converted to vectors, and relevant document chunks areretrieved using similarity

search (k=1).

- LLM Response: Retrieved chunks and user prompts are passed to ChatOpenAI (gpt-4o-mini) usinga

ChatPromptTemplate for dynamic response generation.

- Live UI: Built with Streamlit, the app features real-time file uploads, prompt inputs, and streaminganswers.

Error Handling and Logging

- Uses try-except-finally blocks to manage errors during file processing.

- Ensures temporary files are deleted post-processing.

- Debug logs are printed to the console for traceability.

Deployment Instructions

1. Install Dependencies:

pip install streamlit langchain openai chromadb

2. Set OpenAI API Key:

export OPENAI_API_KEY=your_key_here

3. Run App: streamlit run app.py


Performance Considerations

- Embedding generation is resource-intensive for large documents.

- Chunk size and retrieval parameter k should be tuned for performance.

- Chroma's persistent store avoids recomputing embeddings across sessions.

Conclusion & Future Scope


Conclusion

This project showcases the seamless integration of document processing, vector search, and AI interaction. With minimal

user input, the app delivers accurate, context-aware answers from uploaded documentsdemonstrating real-world potential

in research, legal, and enterprise settings.

Future Scope

- Support additional file formats (DOCX, TXT).

- Add summarization, filtering, and metadata tagging.

- Enable asynchronous uploads and parallel processing.

- Implement user roles, access control, and analytics.

References

- LangChain Documentation: https://docs.langchain.com/

- OpenAI Embeddings: https://platform.openai.com/docs/guides/embeddings

- Chroma Vector DB: https://docs.trychroma.com/

- Streamlit Docs: https://docs.streamlit.io/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy