0% found this document useful (0 votes)
6 views15 pages

Sahil Pahuja End Term Seminar Report 1

The document is an internship report by Sahil Pahuja detailing his work on AI-driven systems for urban mobility and speech processing during his internship at Carnot Research Private Limited. It outlines the development of an intelligent transport information platform and an advanced speaker diarization system, highlighting the use of Python, SQL, and various AI technologies. The report emphasizes the integration of cutting-edge tools and methodologies to enhance operational efficiency and user experience in real-time applications.

Uploaded by

Sahil Pahuja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views15 pages

Sahil Pahuja End Term Seminar Report 1

The document is an internship report by Sahil Pahuja detailing his work on AI-driven systems for urban mobility and speech processing during his internship at Carnot Research Private Limited. It outlines the development of an intelligent transport information platform and an advanced speaker diarization system, highlighting the use of Python, SQL, and various AI technologies. The report emphasizes the integration of cutting-edge tools and methodologies to enhance operational efficiency and user experience in real-time applications.

Uploaded by

Sahil Pahuja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

AI-Driven Multimodal Journey Planning and Document

Intelligence: An Industry Internship Experience


(IT 764: Seminar and Presentation Report)
submitted in partial fulfillment of the requirements
for the award of the degree of

MASTER OF COMPUTER APPLICATIONS


(SOFTWARE ENGINEERING)
Submitted by

SAHIL PAHUJA
(01216404523)
Under the supervision of

Dr. SONAM
(ASSISTANT PROFESSOR)

UNIVERSITY SCHOOL OF INFORMATION, COMMUNICATION &


TECHNOLOGY
GURU GOBIND SINGH INDRAPRASTHA UNIVERSITY
Dwarka, Sector - 16 C, New Delhi - 110078
May/June 2025
DECLARATION

This is to certify that I Sahil Pahuja, Enrollment No. 01216404523 of MCA(SE) 4th sem, from
USICT, GGSIPU Delhi takes full responsibility for the content of this dissertation, source code
and relevant modules accountable for the work done by me.

I hereby declare that the dissertation / internship report or any other reports of the course work,
source code and the relevant modules are not plagiarized from any source directly or indirectly.
If any plagiarism is found I shall be responsible for the same.

I hereby further certify that:


1)​ The work contained in the dissertation is original and has been done by me.
2)​ The work has not been submitted to any other Institution for any other
degree/diploma/certificate in this university/other university or any organization, etc. of
India or abroad.
3)​ I have followed the standard guidelines in writing the dissertation.
4)​ I hereby declare that I will not upload or publish this dissertation in any of the online or
offline forums.
5)​ Whenever I have used the materials (data, theoretical analysis and text) from other
sources, I have given due credits to them in the text of the reports and given their details
in the references.

Date: 13 / 03 / 2025
New Delhi

Signature:

Name: Sahil Pahuja


Enrollment No: 01216404523
Phone No. 8586800215
EmailId​ :​ sahilpahuja123456@gmail.com
MCA 4th Sem

USICT, GGSIPU, New Delhi


CERTIFICATE FROM THE COMPANY

This is to certify that Sahil Pahuja (Enrollment No. 01216404523) has undertaken internship at
Carnot Research Private Limited from 3rd February 2025 to 3rd August 2025 under the
supervision and guidance of Mr. Pranav Kanire, Senior Developer.

I hereby confirm that the performance and conduct of Sahil Pahuja during the internship period
have been satisfactory.

Col (Dr) Amit Oberoi​


CEO​
Carnot Research Private Limited Seal of the Company
CERTIFICATE FROM THE INSTITUTE

This is to certify that the work embodied in this Internship Report titled “AI-Driven
Multimodal Journey Planning and Document Intelligence: An Industry Internship
Experience” being submitted in partial fulfillment of the requirements for the award of the
degree of Master of Computer Applications (Software Engineering), is original and has been
carried out by Sahil Pahuja (Enrollment No. 01216404523) under my supervision and guidance.

It is further certified that this Internship work has not been submitted in full or in part to this
university or any other university for the award of any other degree or diploma to the best of my
knowledge and belief.

Dr. Sonam​
Assistant Professor​
USICT, GGSIPU
INDEX
ABSTRACT............................................................................................................................................................... 6
TOOLS & TECHNOLOGIES................................................................................................................................. 7
1. Python Ecosystem.............................................................................................................................................7
Core Python and Its Versatility...................................................................................................................... 7
Deep Learning and Machine Learning Frameworks......................................................................................7
Specialized Libraries for Speech and Audio Processing................................................................................7
Integration and Application............................................................................................................................8
2. SQL Database Management............................................................................................................................. 9
Role and Benefits of SQL.............................................................................................................................. 9
Database Design and Query Optimization..................................................................................................... 9
3. Git & GitHub.................................................................................................................................................... 9
Git – The Distributed Version Control System.............................................................................................. 9
GitHub – Cloud-Based Collaboration Platform............................................................................................. 9
Integration in the Project Workflow.............................................................................................................10
Collaborative Best Practices........................................................................................................................ 10
4. Large Language Models (LLMs)....................................................................................................................10
Key Characteristics and Applications.......................................................................................................... 10
Fine-Tuning for Domain-Specific Use.........................................................................................................10
5. Retrieval-Augmented Generation (RAG)....................................................................................................... 11
Concept and Functionality............................................................................................................................11
Benefits and Use Cases................................................................................................................................ 11
6. Langchain........................................................................................................................................................11
Concept and Functionality............................................................................................................................11
FUTURE SCOPE.................................................................................................................................................... 13
CONCLUSION........................................................................................................................................................14
REFERENCES........................................................................................................................................................ 15
ABSTRACT

This internship report details the comprehensive design, development, and evaluation of
innovative AI-driven systems addressing significant challenges in urban mobility and speech
processing. The first system is an intelligent transport information platform that integrates
real-time data from multiple transport APIs with a robust SQL-based database. By leveraging
advanced techniques such as Retrieval-Augmented Generation (RAG) and fine-tuned Large
Language Models (LLMs), this platform delivers dynamic transit information, accurate route
recommendations, and timely updates on schedules, fares, and disruptions, ultimately enhancing
commuter experience and accessibility.

The second system focuses on advanced speaker diarization aimed at improving transcription
accuracy in multi-speaker environments. Utilizing state-of-the-art speech recognition models
including OpenAI’s Whisper and speaker embedding techniques from SpeechBrain, this solution
employs sophisticated clustering algorithms like K-Means alongside robust audio processing
tools (Librosa, PyDub, and SciPy) to effectively differentiate between overlapping voices.
Detailed processes such as feature extraction, noise reduction, and voice activity detection ensure
the generation of clear and structured transcripts, which are crucial for applications like meeting
transcriptions, interviews, and broadcast media.

Throughout this report, the integration of cutting-edge technologies is emphasized. Python serves
as the backbone for backend development, supported by its rich ecosystem of libraries such as
TensorFlow and PyTorch for machine learning and deep learning applications. Additionally, the
report touches on the importance of modern development tools and practices, including version
control with Git and GitHub, which streamline collaborative software engineering efforts. The
synthesis of these technologies not only enhances operational efficiency and accuracy but also
demonstrates the transformative impact of AI-driven solutions in addressing real-world
challenges in both transport automation and speech processing.

Overall, this comprehensive analysis underscores the significant advancements achieved during
the internship, providing valuable insights into the deployment of scalable, real-time AI
applications that are poised to redefine modern urban mobility and communication systems.
TOOLS & TECHNOLOGIES

1.​Python Ecosystem
Python stands as the backbone of modern AI and software development, thanks to its readability,
extensive libraries, and a supportive community. In this project, Python is not only used as a
programming language but also as an ecosystem enriched by various specialized libraries and
frameworks that streamline the development process. Below is an overview of these key
components:

Core Python and Its Versatility

Python’s simplicity and versatility make it the preferred language for rapid prototyping and
production-level applications alike. It's clear syntax and dynamic typing allow developers to
write efficient code quickly while maintaining readability. Moreover, Python's extensive standard
library supports many basic operations without the need for external dependencies, which is
especially beneficial in data processing and integration tasks.

Deep Learning and Machine Learning Frameworks

●​ TensorFlow:
TensorFlow is an open-source deep learning framework developed by Google. It offers
an extensive range of tools for building and training machine learning models. In our
context, TensorFlow is used to design neural networks that underpin complex AI
functionalities such as image recognition and predictive analytics. Its flexible architecture
enables deployment across various platforms, from desktops to mobile devices.

●​ PyTorch:
Developed by Facebook’s AI Research lab, PyTorch is another leading deep learning
framework known for its dynamic computation graph, which is particularly advantageous
during model experimentation and debugging. PyTorch’s intuitive interface and robust
community support have made it a popular choice for research and production alike. In
this project, PyTorch is leveraged for tasks requiring rapid model iteration and seamless
integration with custom machine learning workflows.

Specialized Libraries for Speech and Audio Processing

●​ OpenAI Whisper:
OpenAI Whisper is a state-of-the-art automatic speech recognition (ASR) model that
converts spoken language into text with high accuracy. Its robust design allows it to
handle diverse accents and noisy backgrounds, making it ideal for applications like
speaker diarization. Whisper plays a central role in transcribing audio data accurately,
which is critical for downstream natural language processing tasks.

●​ SpeechBrain:
SpeechBrain is an open-source toolkit designed for various speech processing tasks,
including speaker recognition, speech enhancement, and more. It offers pre-trained
models and flexible modules that simplify the process of integrating advanced audio
processing functionalities. In our system, SpeechBrain supports the extraction of speaker
embeddings, which are then used in clustering algorithms to differentiate between voices.

●​ Librosa, PyDub, and SciPy:


These libraries collectively empower developers to perform detailed audio signal
processing. Librosa is widely used for music and audio analysis, offering a range of tools
for feature extraction such as Mel-frequency cepstral coefficients (MFCCs). PyDub
simplifies the handling of audio file formats and conversions, while SciPy provides
essential scientific computing tools to perform signal processing tasks like filtering and
Fourier analysis. Together, they create a robust pipeline for preparing audio data for
analysis and model training.

Integration and Application

The Python ecosystem's modularity allows for seamless integration of these libraries into a
unified framework. By combining TensorFlow and PyTorch for model training with specialized
audio processing libraries, developers can build sophisticated AI-driven applications. This
ecosystem not only supports rapid prototyping but also ensures that systems are scalable and
maintainable in production environments. The flexibility to switch or integrate additional
libraries as project requirements evolve is one of Python’s strongest assets.

The overall strategy is to utilize Python as the glue that binds various AI components
together—from data ingestion and preprocessing to model training and deployment. This
approach ensures that the system remains agile, allowing for continuous improvements and
updates as new tools and techniques emerge in the field.
2.​SQL Database Management
Structured Query Language (SQL) is critical for managing and retrieving organized data
efficiently. In our project, SQL is used to maintain a robust, relational database that stores
historical transit information, real-time updates, and user query logs.

Role and Benefits of SQL

SQL databases provide a systematic and efficient way to store data in structured tables, which
can be easily queried using standardized commands. This structure ensures data integrity,
reduces redundancy, and enables rapid access to large datasets—essential for applications
requiring real-time responses. The SQL-based system supports efficient data retrieval, which is
vital when the AI-driven models need to pull contextual information for tasks such as route
planning and schedule updates.

Database Design and Query Optimization

Proper database design is crucial to handle the large volumes of data generated by transport APIs
and user interactions. The design involves creating well-indexed tables that support fast queries
and ensure scalability. Query optimization techniques, such as caching frequently accessed data
and using stored procedures, are implemented to minimize latency and improve the overall
responsiveness of the system.

3.​Git & GitHub


Version control is an indispensable part of modern software engineering, and Git combined with
GitHub forms the cornerstone of collaborative development practices in this project.

Git – The Distributed Version Control System

Git is a powerful distributed version control system that allows multiple developers to work on
the same codebase simultaneously. It tracks changes meticulously, enabling developers to revert
to previous versions when necessary, and supports branching and merging to facilitate parallel
development. Git’s ability to handle distributed workflows makes it ideal for both individual
developers and large teams.

GitHub – Cloud-Based Collaboration Platform

GitHub builds on Git’s functionalities by providing a web-based interface for repository hosting,
issue tracking, and collaborative code reviews. It facilitates seamless integration with continuous
integration and deployment (CI/CD) pipelines, ensuring that code changes are automatically
tested and deployed. GitHub’s pull request mechanism allows team members to propose changes
and review code collaboratively, fostering an environment of code quality and collective
ownership.

Integration in the Project Workflow

In this project, Git and GitHub are used extensively to manage the entire lifecycle of code
development—from initial prototyping to final deployment. Each feature or bug fix is developed
in isolated branches, which are then reviewed and merged into the main branch only after
rigorous testing. This practice not only helps in maintaining code integrity but also makes it
easier to track progress and pinpoint issues. The transparency and accountability provided by
GitHub’s issue tracking and commit history play a crucial role in ensuring that the project
remains on schedule and meets quality standards.

Collaborative Best Practices

The use of Git and GitHub promotes several best practices such as code reviews, regular
commits, and detailed documentation of changes. These practices are essential for maintaining
high-quality code, especially in complex projects involving multiple technologies. Additionally,
GitHub’s collaboration tools make it easier for new team members to onboard and understand the
project structure quickly.

4.​Large Language Models (LLMs)


Large Language Models (LLMs) are sophisticated AI models designed to understand, generate,
and translate human-like text. In our project, LLMs serve as the backbone for natural language
processing tasks, including query understanding and dynamic response generation.

Key Characteristics and Applications

LLMs are trained on vast amounts of textual data, which enables them to generate coherent and
contextually relevant responses. They are capable of performing a wide range of tasks—from
answering questions and summarizing content to facilitating conversational interactions in
chatbots. Their adaptability makes them ideal for applications in customer service, content
generation, and real-time query processing.

Fine-Tuning for Domain-Specific Use

To enhance performance in specific domains, LLMs can be fine-tuned on specialized datasets.


This customization improves their accuracy and contextual understanding, enabling them to
handle industry-specific jargon and scenarios. In our project, fine-tuning ensures that the model
not only understands the nuances of urban mobility and transit information but also provides
precise, context-aware responses that improve user experience.

5.​Retrieval-Augmented Generation (RAG)


Retrieval-Augmented Generation (RAG) is an advanced framework that enhances the
performance of LLMs by integrating external knowledge sources during response generation.

Concept and Functionality

RAG combines traditional language modeling with real-time data retrieval techniques. When a
query is made, the model searches through external databases and knowledge repositories to
fetch relevant information before generating a final response. This hybrid approach minimizes
the risk of hallucination (i.e., generating factually incorrect information) and significantly
improves the accuracy and reliability of AI responses.

Benefits and Use Cases

The primary benefit of RAG is its ability to provide contextually rich and factually correct
information by grounding AI-generated content in external data. This is particularly valuable in
applications where precision is crucial, such as real-time transit information and customer
support systems. By leveraging RAG, our system ensures that users receive timely, accurate, and
comprehensive responses to their queries, thereby enhancing overall operational efficiency.

6.​Langchain
Retrieval-Augmented Generation (RAG) is an advanced framework that enhances the
performance of LLMs by integrating external knowledge sources during response generation.

Concept and Functionality

LangChain is a powerful open-source framework designed to simplify the development of


applications powered by Large Language Models (LLMs). It abstracts away the complexity of
integrating LLMs with various data sources, tools, and user interfaces by providing modular
components for chaining together language model operations, memory, prompts, and retrieval
systems. In this project, LangChain is used to create structured workflows that combine
document retrieval, context-aware generation, and external tool invocation, streamlining the
development of intelligent assistants and planners.

Modular Design and Components

LangChain’s architecture is based on the concept of “chains”—pipelines composed of various


interconnected modules:

●​ Prompt Templates: Predefined and dynamically formatted prompts to maintain consistent input
for the LLMs.​

●​ LLM Wrappers: Unified interfaces to interact with models like OpenAI GPT, Cohere, or
Anthropic.​

●​ Retrievers: Components that pull relevant chunks from external documents or vector stores.​

●​ Tools and Agents: Used to execute actions like search, computation, or API calling based on
LLM decisions.​

In our system, LangChain serves as the backbone for orchestrating a complex flow where a user
query is parsed, relevant transport or document data is retrieved, and a response is generated using
an LLM enhanced by contextual information.

Integration in the Journey Planning System

LangChain plays a pivotal role in enhancing the retrieval-augmented generation (RAG) pipeline:

●​ It connects the retriever (which fetches route, schedule, and fare data from a SQL-based
backend or external API) with the LLM, ensuring grounded, real-time answers.​

●​ Using memory components, the system retains previous user queries to offer conversational
continuity and context-aware suggestions.​

●​ Through agent-based architectures, the assistant can trigger external APIs or databases
conditionally—e.g., fetching live metro status or travel time predictions based on user input.​

Overall, LangChain acts as a middleware layer that enhances the flexibility, reusability, and
maintainability of LLM-powered systems, making it an indispensable tool in the development of
intelligent transport assistants and document analysis solutions.
FUTURE SCOPE

●​ Enhanced Real-Time Data Integration:


The current system can be expanded by integrating additional real-time data sources, such as
traffic patterns, weather conditions, and public transport schedules. This would enhance the
accuracy and reliability of urban mobility solutions.

●​ Improved Speech Diarization Models:


Future iterations can incorporate more advanced deep learning models to enhance the accuracy
of speaker identification and separation, even in noisy environments. This would make the
speech processing system more robust and effective for real-world applications.

●​ Multilingual Speech Processing:


Expanding the system to support multiple languages would make it more inclusive and globally
applicable. Incorporating multilingual LLMs and fine-tuning RAG models for language-specific
contexts could significantly broaden its usability.

●​ Predictive Analytics for Urban Mobility:


By incorporating machine learning algorithms, the mobility solution could predict future transit
delays, traffic congestion, and optimal routes. This would improve the accuracy of travel
recommendations.

●​ Automated Error Correction in Speech Transcription:


Introducing post-processing techniques using NLP models can help detect and correct
transcription errors automatically, enhancing the overall accuracy of the speech processing
system.

●​ Integration with IoT Devices:


The system could be integrated with IoT-based mobility devices (e.g., GPS trackers, smart traffic
signals) to provide more accurate and real-time updates, further enhancing urban mobility
solutions.
CONCLUSION

The internship provided a unique opportunity to integrate a diverse range of technologies—from


the dynamic Python ecosystem and robust SQL databases to collaborative tools like Git &
GitHub—with advanced AI methodologies such as Large Language Models (LLMs) and
Retrieval-Augmented Generation (RAG). This hands-on experience allowed for the development
of systems that deliver real-time, context-aware transit updates and precise, structured audio
transcriptions, directly addressing the challenges of urban mobility and speech processing.

Throughout the internship, practical challenges were met with innovative solutions, illustrating
how modern software engineering practices can drive tangible improvements. The integration of
various technologies not only enhanced operational efficiency but also ensured scalability and
adaptability in real-world applications. This period of intensive learning and development
reinforced the value of cross-disciplinary collaboration, where theoretical knowledge was
effectively translated into practical, deployable systems.

The experience gained during this internship lays a robust foundation for future advancements.
The methodologies refined during this project—ranging from fine-tuning deep learning models
to managing complex data workflows—will serve as a cornerstone for further innovation in
AI-driven applications. As urban and communication needs continue to evolve, the skills and
insights developed here will be instrumental in driving continuous improvement and
transformative change in the field.
REFERENCES

Retrieval-augmented Generation | https://research.ibm.com/blog/retrieval-augmented-generation-RA/


1.​ Python Documentation | https://docs.python.org/
2.​ Large Language Models | https://www.ibm.com/think/topics/large-language-models/
3.​ Git Documentation | https://git-scm.com/docs/git?utm_source=chatgpt.com.
4.​ GitHub Docs | About Git and Github
|https://docs.github.com/en/get-started/start-your-journey/about-github-and-git
5.​ Structured Query Language | https://dev.mysql.com/doc/
6.​ Langchain | https://www.langchain.com/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy