Sahil Pahuja End Term Seminar Report 1
Sahil Pahuja End Term Seminar Report 1
SAHIL PAHUJA
(01216404523)
Under the supervision of
Dr. SONAM
(ASSISTANT PROFESSOR)
This is to certify that I Sahil Pahuja, Enrollment No. 01216404523 of MCA(SE) 4th sem, from
USICT, GGSIPU Delhi takes full responsibility for the content of this dissertation, source code
and relevant modules accountable for the work done by me.
I hereby declare that the dissertation / internship report or any other reports of the course work,
source code and the relevant modules are not plagiarized from any source directly or indirectly.
If any plagiarism is found I shall be responsible for the same.
Date: 13 / 03 / 2025
New Delhi
Signature:
This is to certify that Sahil Pahuja (Enrollment No. 01216404523) has undertaken internship at
Carnot Research Private Limited from 3rd February 2025 to 3rd August 2025 under the
supervision and guidance of Mr. Pranav Kanire, Senior Developer.
I hereby confirm that the performance and conduct of Sahil Pahuja during the internship period
have been satisfactory.
This is to certify that the work embodied in this Internship Report titled “AI-Driven
Multimodal Journey Planning and Document Intelligence: An Industry Internship
Experience” being submitted in partial fulfillment of the requirements for the award of the
degree of Master of Computer Applications (Software Engineering), is original and has been
carried out by Sahil Pahuja (Enrollment No. 01216404523) under my supervision and guidance.
It is further certified that this Internship work has not been submitted in full or in part to this
university or any other university for the award of any other degree or diploma to the best of my
knowledge and belief.
Dr. Sonam
Assistant Professor
USICT, GGSIPU
INDEX
ABSTRACT............................................................................................................................................................... 6
TOOLS & TECHNOLOGIES................................................................................................................................. 7
1. Python Ecosystem.............................................................................................................................................7
Core Python and Its Versatility...................................................................................................................... 7
Deep Learning and Machine Learning Frameworks......................................................................................7
Specialized Libraries for Speech and Audio Processing................................................................................7
Integration and Application............................................................................................................................8
2. SQL Database Management............................................................................................................................. 9
Role and Benefits of SQL.............................................................................................................................. 9
Database Design and Query Optimization..................................................................................................... 9
3. Git & GitHub.................................................................................................................................................... 9
Git – The Distributed Version Control System.............................................................................................. 9
GitHub – Cloud-Based Collaboration Platform............................................................................................. 9
Integration in the Project Workflow.............................................................................................................10
Collaborative Best Practices........................................................................................................................ 10
4. Large Language Models (LLMs)....................................................................................................................10
Key Characteristics and Applications.......................................................................................................... 10
Fine-Tuning for Domain-Specific Use.........................................................................................................10
5. Retrieval-Augmented Generation (RAG)....................................................................................................... 11
Concept and Functionality............................................................................................................................11
Benefits and Use Cases................................................................................................................................ 11
6. Langchain........................................................................................................................................................11
Concept and Functionality............................................................................................................................11
FUTURE SCOPE.................................................................................................................................................... 13
CONCLUSION........................................................................................................................................................14
REFERENCES........................................................................................................................................................ 15
ABSTRACT
This internship report details the comprehensive design, development, and evaluation of
innovative AI-driven systems addressing significant challenges in urban mobility and speech
processing. The first system is an intelligent transport information platform that integrates
real-time data from multiple transport APIs with a robust SQL-based database. By leveraging
advanced techniques such as Retrieval-Augmented Generation (RAG) and fine-tuned Large
Language Models (LLMs), this platform delivers dynamic transit information, accurate route
recommendations, and timely updates on schedules, fares, and disruptions, ultimately enhancing
commuter experience and accessibility.
The second system focuses on advanced speaker diarization aimed at improving transcription
accuracy in multi-speaker environments. Utilizing state-of-the-art speech recognition models
including OpenAI’s Whisper and speaker embedding techniques from SpeechBrain, this solution
employs sophisticated clustering algorithms like K-Means alongside robust audio processing
tools (Librosa, PyDub, and SciPy) to effectively differentiate between overlapping voices.
Detailed processes such as feature extraction, noise reduction, and voice activity detection ensure
the generation of clear and structured transcripts, which are crucial for applications like meeting
transcriptions, interviews, and broadcast media.
Throughout this report, the integration of cutting-edge technologies is emphasized. Python serves
as the backbone for backend development, supported by its rich ecosystem of libraries such as
TensorFlow and PyTorch for machine learning and deep learning applications. Additionally, the
report touches on the importance of modern development tools and practices, including version
control with Git and GitHub, which streamline collaborative software engineering efforts. The
synthesis of these technologies not only enhances operational efficiency and accuracy but also
demonstrates the transformative impact of AI-driven solutions in addressing real-world
challenges in both transport automation and speech processing.
Overall, this comprehensive analysis underscores the significant advancements achieved during
the internship, providing valuable insights into the deployment of scalable, real-time AI
applications that are poised to redefine modern urban mobility and communication systems.
TOOLS & TECHNOLOGIES
1.Python Ecosystem
Python stands as the backbone of modern AI and software development, thanks to its readability,
extensive libraries, and a supportive community. In this project, Python is not only used as a
programming language but also as an ecosystem enriched by various specialized libraries and
frameworks that streamline the development process. Below is an overview of these key
components:
Python’s simplicity and versatility make it the preferred language for rapid prototyping and
production-level applications alike. It's clear syntax and dynamic typing allow developers to
write efficient code quickly while maintaining readability. Moreover, Python's extensive standard
library supports many basic operations without the need for external dependencies, which is
especially beneficial in data processing and integration tasks.
● TensorFlow:
TensorFlow is an open-source deep learning framework developed by Google. It offers
an extensive range of tools for building and training machine learning models. In our
context, TensorFlow is used to design neural networks that underpin complex AI
functionalities such as image recognition and predictive analytics. Its flexible architecture
enables deployment across various platforms, from desktops to mobile devices.
● PyTorch:
Developed by Facebook’s AI Research lab, PyTorch is another leading deep learning
framework known for its dynamic computation graph, which is particularly advantageous
during model experimentation and debugging. PyTorch’s intuitive interface and robust
community support have made it a popular choice for research and production alike. In
this project, PyTorch is leveraged for tasks requiring rapid model iteration and seamless
integration with custom machine learning workflows.
● OpenAI Whisper:
OpenAI Whisper is a state-of-the-art automatic speech recognition (ASR) model that
converts spoken language into text with high accuracy. Its robust design allows it to
handle diverse accents and noisy backgrounds, making it ideal for applications like
speaker diarization. Whisper plays a central role in transcribing audio data accurately,
which is critical for downstream natural language processing tasks.
● SpeechBrain:
SpeechBrain is an open-source toolkit designed for various speech processing tasks,
including speaker recognition, speech enhancement, and more. It offers pre-trained
models and flexible modules that simplify the process of integrating advanced audio
processing functionalities. In our system, SpeechBrain supports the extraction of speaker
embeddings, which are then used in clustering algorithms to differentiate between voices.
The Python ecosystem's modularity allows for seamless integration of these libraries into a
unified framework. By combining TensorFlow and PyTorch for model training with specialized
audio processing libraries, developers can build sophisticated AI-driven applications. This
ecosystem not only supports rapid prototyping but also ensures that systems are scalable and
maintainable in production environments. The flexibility to switch or integrate additional
libraries as project requirements evolve is one of Python’s strongest assets.
The overall strategy is to utilize Python as the glue that binds various AI components
together—from data ingestion and preprocessing to model training and deployment. This
approach ensures that the system remains agile, allowing for continuous improvements and
updates as new tools and techniques emerge in the field.
2.SQL Database Management
Structured Query Language (SQL) is critical for managing and retrieving organized data
efficiently. In our project, SQL is used to maintain a robust, relational database that stores
historical transit information, real-time updates, and user query logs.
SQL databases provide a systematic and efficient way to store data in structured tables, which
can be easily queried using standardized commands. This structure ensures data integrity,
reduces redundancy, and enables rapid access to large datasets—essential for applications
requiring real-time responses. The SQL-based system supports efficient data retrieval, which is
vital when the AI-driven models need to pull contextual information for tasks such as route
planning and schedule updates.
Proper database design is crucial to handle the large volumes of data generated by transport APIs
and user interactions. The design involves creating well-indexed tables that support fast queries
and ensure scalability. Query optimization techniques, such as caching frequently accessed data
and using stored procedures, are implemented to minimize latency and improve the overall
responsiveness of the system.
Git is a powerful distributed version control system that allows multiple developers to work on
the same codebase simultaneously. It tracks changes meticulously, enabling developers to revert
to previous versions when necessary, and supports branching and merging to facilitate parallel
development. Git’s ability to handle distributed workflows makes it ideal for both individual
developers and large teams.
GitHub builds on Git’s functionalities by providing a web-based interface for repository hosting,
issue tracking, and collaborative code reviews. It facilitates seamless integration with continuous
integration and deployment (CI/CD) pipelines, ensuring that code changes are automatically
tested and deployed. GitHub’s pull request mechanism allows team members to propose changes
and review code collaboratively, fostering an environment of code quality and collective
ownership.
In this project, Git and GitHub are used extensively to manage the entire lifecycle of code
development—from initial prototyping to final deployment. Each feature or bug fix is developed
in isolated branches, which are then reviewed and merged into the main branch only after
rigorous testing. This practice not only helps in maintaining code integrity but also makes it
easier to track progress and pinpoint issues. The transparency and accountability provided by
GitHub’s issue tracking and commit history play a crucial role in ensuring that the project
remains on schedule and meets quality standards.
The use of Git and GitHub promotes several best practices such as code reviews, regular
commits, and detailed documentation of changes. These practices are essential for maintaining
high-quality code, especially in complex projects involving multiple technologies. Additionally,
GitHub’s collaboration tools make it easier for new team members to onboard and understand the
project structure quickly.
LLMs are trained on vast amounts of textual data, which enables them to generate coherent and
contextually relevant responses. They are capable of performing a wide range of tasks—from
answering questions and summarizing content to facilitating conversational interactions in
chatbots. Their adaptability makes them ideal for applications in customer service, content
generation, and real-time query processing.
RAG combines traditional language modeling with real-time data retrieval techniques. When a
query is made, the model searches through external databases and knowledge repositories to
fetch relevant information before generating a final response. This hybrid approach minimizes
the risk of hallucination (i.e., generating factually incorrect information) and significantly
improves the accuracy and reliability of AI responses.
The primary benefit of RAG is its ability to provide contextually rich and factually correct
information by grounding AI-generated content in external data. This is particularly valuable in
applications where precision is crucial, such as real-time transit information and customer
support systems. By leveraging RAG, our system ensures that users receive timely, accurate, and
comprehensive responses to their queries, thereby enhancing overall operational efficiency.
6.Langchain
Retrieval-Augmented Generation (RAG) is an advanced framework that enhances the
performance of LLMs by integrating external knowledge sources during response generation.
● Prompt Templates: Predefined and dynamically formatted prompts to maintain consistent input
for the LLMs.
● LLM Wrappers: Unified interfaces to interact with models like OpenAI GPT, Cohere, or
Anthropic.
● Retrievers: Components that pull relevant chunks from external documents or vector stores.
● Tools and Agents: Used to execute actions like search, computation, or API calling based on
LLM decisions.
In our system, LangChain serves as the backbone for orchestrating a complex flow where a user
query is parsed, relevant transport or document data is retrieved, and a response is generated using
an LLM enhanced by contextual information.
LangChain plays a pivotal role in enhancing the retrieval-augmented generation (RAG) pipeline:
● It connects the retriever (which fetches route, schedule, and fare data from a SQL-based
backend or external API) with the LLM, ensuring grounded, real-time answers.
● Using memory components, the system retains previous user queries to offer conversational
continuity and context-aware suggestions.
● Through agent-based architectures, the assistant can trigger external APIs or databases
conditionally—e.g., fetching live metro status or travel time predictions based on user input.
Overall, LangChain acts as a middleware layer that enhances the flexibility, reusability, and
maintainability of LLM-powered systems, making it an indispensable tool in the development of
intelligent transport assistants and document analysis solutions.
FUTURE SCOPE
Throughout the internship, practical challenges were met with innovative solutions, illustrating
how modern software engineering practices can drive tangible improvements. The integration of
various technologies not only enhanced operational efficiency but also ensured scalability and
adaptability in real-world applications. This period of intensive learning and development
reinforced the value of cross-disciplinary collaboration, where theoretical knowledge was
effectively translated into practical, deployable systems.
The experience gained during this internship lays a robust foundation for future advancements.
The methodologies refined during this project—ranging from fine-tuning deep learning models
to managing complex data workflows—will serve as a cornerstone for further innovation in
AI-driven applications. As urban and communication needs continue to evolve, the skills and
insights developed here will be instrumental in driving continuous improvement and
transformative change in the field.
REFERENCES