0% found this document useful (0 votes)
9 views50 pages

Final Report

VisionFitTrack is an AI-powered fitness tracking solution that utilizes computer vision to detect and count exercise repetitions in real-time using a standard webcam. The system offers a user-friendly interface, privacy-focused design, and supports multiple exercise types while providing accurate feedback and progress visualization. Developed as part of a Master's project at Tezpur University, it aims to democratize fitness technology by making it accessible to a broader audience without the need for expensive equipment.

Uploaded by

arpanneog15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views50 pages

Final Report

VisionFitTrack is an AI-powered fitness tracking solution that utilizes computer vision to detect and count exercise repetitions in real-time using a standard webcam. The system offers a user-friendly interface, privacy-focused design, and supports multiple exercise types while providing accurate feedback and progress visualization. Developed as part of a Master's project at Tezpur University, it aims to democratize fitness technology by making it accessible to a broader audience without the need for expensive equipment.

Uploaded by

arpanneog15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 50

VisionFitTrack: AI Powered Real-Time Workout

Detection & Reps Counter


A project report submitted in partial fulfilment of the requirements for the degree of

MASTER OF COMPUTER APPLICATION (MCA)


OF
TEZPUR UNIVERSITY

2025

Submitted by

ARPAN NEOG (CSM23044)

Guided by
SOMIR SAIKIA
Head of R&D Team,
Vantage Circle
Guwahati, Assam

&
Internal Guide
Dr, TRIBIKRAM PRADHAN
Assistant Professor,
Department of Computer Science & Engineering
Tezpur University,
Assam

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


TEZPUR UNIVERSITY
TEZPUR — 784028
ASSAM
Department of Computer Science & Engineering, Tezpur University

Certificate by the Head of the Department

The project entitled “VisionFitTrack: AI Powered Real Time Workout Detection


and Reps Count” submitted by Arpan Neog (CSM23044) for the partial fulfilment of the
requirements for the Major Project of Master of Computer Application (MCA) at Tezpur
University, has been examined. He has undertaken the internship program at the Vantage Circle,
Guwahati, under the supervision of Mr. Somir Saikia (Head of Research).

Date:
Place: Tezpur, Assam Dr Sarat Saharia
Professor & Head
Department of CSE
Tezpur University

i
Department of Computer Science & Engineering, Tezpur University

Certificate by Internal Guide

This is to certify that the project entitled “VisionFitTrack: AI Powered Real Time
Workout Detection and Reps Count” submitted by Arpan Neog (CSM23044) is carried
out by him under my supervision and guidance for partial fulfillment of the requirements and the
regulations for the award of the degree Master of Computer Application (MCA) during session 2023-
2025 at Tezpur University. To the best of my knowledge, the matter embodied in the project
report has not been submitted to any other university/institute for the award of any Degree or
Diploma.

Date:
Place: Tezpur, Assam Dr Tribikram Pradhan
Assistant Professor
Department of CSE
Tezpur University

ii
Department of Computer Science & Engineering, Tezpur University

Certificate by Internal Examiner

This is to certify that the project entitled “VisionFitTrack: AI Powered Real


Time Workout Detection and Reps Count” submitted by Arpan Neog
(CSM23044), to Tezpur University, in partial fulfillment of the requirements for the
major project of Master of Computer Application (MCA). It is a Bonafide record of the
project work carried out by him during spring semester.

Date:
Place: Tezpur, Assam Internal Examiner

iii
Department of Computer Science & Engineering, Tezpur University

Certificate by External Examiner

This is to certify that the project entitled “VisionFitTrack: AI Powered Real Time
Workout Detection and Reps Count” submitted by Arpan Neog (CSM23044), to
Tezpur University, in partial fulfillment of the requirements for the major project of Master of
Computer Application (MCA). It is a Bonafide record of the project work carried out by him
during spring semester.

Date:
Place: Tezpur, Assam External Examiner

iv
Department of Computer Science & Engineering, Tezpur University

Declaration

I affirm that the project work entitled, “VisionFitTrack: AI Powered Real Time
Workout Detection and Reps Count” submitted to the Department of Computer
Science & Engineering at Tezpur University, was authored solely by me and has not been
presented to any other institution for the purpose of obtaining any other degree.

Place : Tezpur Arpan Neog


Date : (CSM23044)

v
About the Company

Vantage Circle was established around 2010–2011 as a bootstrapped startup in Guwahati,


Assam, by Partha Neog, Anjan Pathak, and Subhash Basumatary. Drawing on their
experiences in large firms, the founders set out with a vision to transform workplace culture
and enhance employee engagement. The company initially launched as a platform offering
employee perks and discounts but quickly evolved into a comprehensive, cloud-based
employee engagement solution.
Over the years, Vantage Circle secured prominent clients such as Infosys, Wipro, Deloitte,
and JP Morgan, establishing itself as a leading player in India’s HR technology space. By
2017, the company achieved profitability without external venture capital funding, relying
solely on initial seed investments from friends and family. Notable milestones include being
featured in Forbes India’s “Nine Startups with Promise” in 2018 and securing a $20 million
contract in 2019 to support a U.S.-based enterprise with approximately 90,000 employees.
As of 2023, Vantage Circle serves over 700 organizations across more than 70 countries,
positively impacting the workplace experience of around 1.8 million employees. The
company has expanded its global footprint with new offices, including one in Calgary,
Canada, and is planning further expansion into the Latin American market.

Key Products:
 Vantage Rewards – Peer & manager recognition platform

 Vantage Perks – Corporate deals & cashback offers

 Vantage Pulse – Employee feedback & sentiment tool

 Vantage Fit – Wellness platform with fitness challenges & AI tracking

vi
Acknowledgement

At the outset, I express my sincere gratitude to Mr. Partha Neog, CEO of Vantage Circle,
for providing me with the opportunity to intern at such an innovative and forward-thinking
organization. His visionary leadership and the company’s commitment to cutting-edge
research in AI and wellness technology created an enriching environment for learning and
growth.

I am especially thankful to Mr. Somir Saikia, Head of Research at Vantage Circle, for his
invaluable mentorship, technical guidance, and consistent encouragement throughout the
internship. His insightful feedback and strategic direction played a key role in shaping the
objectives and execution of my project. I would also like to thank the entire research and
development team at Vantage Circle for their support, collaboration, and willingness to share
knowledge. Working alongside such experienced professionals has been a rewarding
experience that has significantly contributed to my academic and professional development.

Lastly, I extend my heartfelt thanks to Dr. Tribikram Pradhan, Assistant Professor,


Department of Computer Science and Engineering, Tezpur University, for being my project
guide and for his constant support and supervision during this internship. I am also grateful to
Dr. Sarat Saharia, Head of the Department, for facilitating this internship as part of my
MCA final semester project. Their guidance has been instrumental in the successful
completion of this work.

Arpan Neog (CSM23044)

vii
Abstract

VisionFitTrack is an innovative solution designed to bring artificial intelligence and


computer vision to personal fitness tracking. It addresses the challenges faced by
individuals in monitoring their exercise routines and progress, especially for those
seeking accurate, real-time feedback without expensive equipment or wearables. The
system leverages advanced pose estimation and machine learning models to detect,
classify, and count exercise repetitions using a standard webcam, making fitness
analytics accessible to a broad audience. VisionFitTrack features a user-friendly web
interface, interactive progress visualization, and privacy-focused design, with all video
processing performed locally in the browser. The solution was developed with
continuous user feedback, ensuring adaptability and ease of use for people of varying
technical backgrounds. The platform supports multiple exercise types, including push-
ups, pull-ups, bicep curls, shoulder presses, squats, and deadlifts, and provides
automatic tracking of repetitions, sets, and workout duration. VisionFitTrack’s modular
architecture allows for easy integration into diverse environments, from home workouts
to gym settings, and its robust backend ensures secure data management and
personalized user experiences. The project represents a significant step forward in
democratizing fitness technology, offering a scalable and effective tool for anyone
aiming to improve their health and performance through data-driven insights.

viii
Contents

Cover

Certificate by the head of the department i


Certificate by External Guide ii
Certificate by Internal Guide iii
Certificate by Internal Examiner iv
Certificate by External Examiner v
Student Declaration vi
About the Company vii
Acknowledgement viii
Abstract ix
Table of Contents xiii
Introduction 1
Background 1
Task 2
Challenges 2
Approach Used2
Initial System Study 3
Components 3
Indian Sign Language 3
Translation Mechanism 4
3D avatar 4
Earley Parser 5
Integration of LLM into Sign Translation System 6
Sign Detection 7
Feasibility Analysis 8
Technical Feasibility 8
Economical Feasibility 8

ix
Corpus Availability 9
Use Cases 9
System Analysis 10
Corpus Creation 10
3D avatar 12
Signs Animation 13
Web Application 14
Audio Parsing using Earley Parser 14
Sign Detection 14
LLM chat system 15
Software Requirement Specifications 16
General Description 16
Product Perspective 16
Product Functions 16
User Characteristics 16
General Constraints 17
Assumptions and Dependencies 17
Functional Requirements 17
Translation of one language to Indian Sign Language 17
Real Time Translation 17
Signer Friendly Inputs 17
Should follow rules and grammar of Indian Sign Language 17
Singer Friendly Chat System 17
Application should be accessible in all screen sizes 18
Non Functional Requirements 18
Avatar size should follow the block of interpretation 18
User Friendly Interface18
External Interface Requirement 18
Access of Large Language Model 18
Microphone 18
Graphical Processing Unit 18
Performance Requirement 18
Design Constraints 19
Indian Sign Language Compliance 19

x
Future Scope 19
System Design 20
Introduction 20
System Architecture 20
Data Flow Diagram 20
Sign Detection Module 24
Language Processor 24
Use Case 25
Activities 25
Audio parsing using Earley Parser 29
System Implementation 30
Step 1: creation of 3D avatar 30
Step 2: Display of avatar on website using React and Three.js 31
Step 3: Creation of the sign’s corpus 31
Step 4: Animation of Signs 32
Step 5: Development of Processor 33
Step 6: Front-end desing using Mantine 33
Step 7: Enable system with voice accessibility33
Step 8: Development of audio parser 34
Step 9: Implementation of Sign Detection Module 34
Step 10: Integrating Sign Detection Module using Flask 35
Step 11: Integrating OLLAMA server using Langchain 35
System Testing 36
Black Box Testing for Translation 36
White Box testing for Processor 36
Beta testing for audio accessibility 37
Black Box test of Sign Detection Module 37
Black Box testing for Screen Responsiveness 38
Acceptance Testing 38
Conclusion 39
Bibliography 40

xi
Chapter 1

Introduction

1.1 Background

Fitness tracking and exercise monitoring have become increasingly important as more
individuals seek to improve their health and performance through data-driven insights.
Traditional fitness tracking solutions often rely on wearable devices or manual input, which can
be inconvenient, expensive, or inaccessible to many users. Wearables may require regular
charging, can be uncomfortable during certain exercises, and often come with privacy concerns
due to data being sent to external servers. Manual tracking, on the other hand, is prone to human
error and can disrupt the flow of a workout.

VisionFitTrack addresses these limitations by leveraging advances in computer vision and


artificial intelligence to provide a seamless, camera-based fitness tracking experience. By using
only a standard webcam, VisionFitTrack eliminates the need for additional hardware, making
fitness analytics accessible to a wider audience. The project builds on the latest research in pose
estimation and exercise recognition, offering a privacy-friendly, browser-based solution that
processes all video data locally. This ensures that sensitive user information never leaves their
device, addressing major concern in digital health applications. The system is designed to be
intuitive and user-friendly, catering to both fitness enthusiasts and beginners, and is adaptable for
use in homes, gyms, and rehabilitation centers.

1
1.2 Task

The primary task of VisionFitTrack is to detect, classify, and count exercise repetitions in
real time using video input from a webcam. The system is designed to recognize multiple
exercise types—such as push-ups, pull-ups, bicep curls, shoulder presses, squats, and deadlifts—
by analyzing the user’s body pose and movement patterns. The application must provide accurate
feedback, track workout statistics, and visualize progress, all while ensuring user privacy and
ease of use.

The input to the system is a live video stream, which is processed frame by frame to extract key
body points using pose estimation. These keypoints are then analyzed to determine the type of
exercise being performed, the current phase of the movement, and whether a valid repetition has
occurred. The output includes:
- Detected exercise type (e.g., push-up, squat)
- Repetition count for each exercise
- Set count and workout duration
- Real-time feedback on form and progress
- Visual progress charts and workout summaries

The system is also designed to support user authentication, allowing individuals to save their
workout history and track long-term progress. The user interface provides clear instructions, real-
time statistics, and interactive charts to enhance user engagement and motivation.

1.3 Challenges

Several challenges arise in developing a robust, real-time fitness tracking system


using computer vision:
• Variability in user appearance, clothing, lighting, and background can affect pose detection
accuracy. The system must be robust to different body types, camera angles, and

2
environmental conditions.
• Real-time processing requires efficient algorithms to ensure smooth user experience
without lag. The application must balance computational complexity with responsiveness,
especially on devices with limited processing power.
• Distinguishing between similar exercises (e.g., push-ups vs. planks, squats vs. deadlifts)
demands precise analysis of joint angles, movement cycles, and temporal patterns.
Misclassification can lead to inaccurate feedback and user frustration.
• Ensuring privacy by processing all video data locally, without transmitting sensitive
information to external servers. This requires careful optimization of browser-based
machine learning models and pose estimation pipelines.
• Providing a user interface that is intuitive for users of all technical backgrounds, including
those new to fitness technology. The system must offer clear instructions, easy navigation,
and accessible visualizations.
• Handling edge cases such as partial occlusion, fast movements, or users stepping out of the
camera frame, which can disrupt pose detection and rep counting.
• Supporting extensibility for new exercises, custom workout routines, and integration with
other health platforms.

1.4 Approach Used

VisionFitTrack employs a hybrid approach combining rule-based logic and machine


learning for exercise detection and repetition counting:
• MediaPipe Pose is used to extract 17 key body points from the webcam video stream in
real time, providing a robust foundation for pose analysis. The pose estimation model is
optimized for speed and accuracy, enabling smooth tracking even on consumer-grade
hardware.
• A TensorFlow.js model, trained on labeled exercise data, classifies the user’s current
exercise based on normalized keypoint coordinates. The model outputs probabilities for
each supported exercise class, allowing the system to handle ambiguous or transitional
movements.

3
• Rule-based algorithms analyze joint angles (e.g., elbow, knee, hip) and movement
direction to refine exercise classification and accurately count repetitions and sets. State
machines are used to track the phases of each exercise, applying hysteresis and smoothing to
reduce false positives.
• All processing is performed in the browser, ensuring privacy and responsiveness. The
system does not require any server-side video processing, making it suitable for privacy-
conscious users and environments with limited internet connectivity.
• The modular architecture allows for easy extension to new exercises and integration with
additional features, such as progress visualization, user authentication, and personalized
feedback. The codebase is organized into clear modules for pose detection, exercise logic,
progress tracking, and user interface.
• Continuous user feedback was incorporated during development, allowing the system to be
refined for usability, accuracy, and robustness. The application is designed to be accessible
to users with varying levels of fitness and technical expertise.

This approach combines the strengths of modern machine learning with interpretable, rule-
based logic, resulting in a system that is both accurate and adaptable to real-world
conditions. VisionFitTrack sets a foundation for future research and development in AI-
powered fitness tracking, with potential applications in personal training, rehabilitation, and
health VisionFitTrack employs a hybrid approach combining rule-based logic and machine
learning for exercise detection and repetition counting:

• MediaPipe Pose is used to extract 17 key body points from the webcam video stream in
real time, providing a robust foundation for pose analysis. The pose estimation model is
optimized for speed and accuracy, enabling smooth tracking even on consumer-grade
hardware.
• A TensorFlow.js model, trained on labeled exercise data, classifies the user’s current
exercise based on normalized keypoint coordinates. The model outputs probabilities for
each supported exercise class, allowing the system to handle ambiguous or transitional
movements.
• Rule-based algorithms analyze joint angles (e.g., elbow, knee, hip) and movement

4
direction to refine exercise classification and accurately count repetitions and sets. State
machines are used to track the phases of each exercise, applying hysteresis and smoothing to
reduce false positives.
• All processing is performed in the browser, ensuring privacy and responsiveness. The
system does not require any server-side video processing, making it suitable for privacy-
conscious users and environments with limited internet connectivity.
• The modular architecture allows for easy extension to new exercises and integration with
additional features, such as progress visualization, user authentication, and personalized
feedback. The codebase is organized into clear modules for pose detection, exercise logic,
progress tracking, and user interface.
• Continuous user feedback was incorporated during development, allowing the system to be
refined for usability, accuracy, and robustness. The application is designed to be accessible
to users with varying levels of fitness and technical expertise.

This approach combines the strengths of modern machine learning with interpretable, rule-
based logic, resulting in a system that is both accurate and adaptable to real-world
conditions. VisionFitTrack sets a foundation for future research and development in AI-
powered fitness tracking, with potential applications in personal training, rehabilitation, and
health monitoring. M.

5
Chapter 2

Initial System Study

VisionFitTrack integrates several advanced technologies and components to deliver a


robust, real-time fitness tracking experience. This section provides an overview of the main
components and their roles in the system.

2.1 Components

2.1.1 Pose Estimation (MediaPipe Pose)

Pose estimation is the foundation of VisionFitTrack. MediaPipe Pose is used to extract 17


key body points (joints) from the user's webcam video stream in real time. These keypoints
include shoulders, elbows, wrists, hips, knees, ankles, and more. MediaPipe Pose is optimized
for speed and accuracy, making it suitable for use on consumer-grade hardware and in diverse
environments with varying lighting and backgrounds.

2.1.2 Exercise Classification Model (TensorFlow.js)

A machine learning model built with TensorFlow.js is used to classify the user's current
exercise based on the normalized coordinates of the detected keypoints. The model is trained on
labeled exercise data and outputs probabilities for each supported exercise class (e.g., push-up,
squat, deadlift). Running the model in the browser ensures privacy and enables real-time
inference without the need for server-side processing.

2.1.3 Rule-Based Logic and State Machines


In addition to the ML model, VisionFitTrack uses rule-based algorithms to analyze joint

6
angles and movement patterns. State machines track the phases of each exercise, applying logic
to determine when a valid repetition or set has occurred. Hysteresis and smoothing techniques
are used to reduce false positives and ensure stable, accurate rep counting.

2.1.4 Progress Tracking and Visualization

Workout data,including repetitions, sets, and duration, is stored and visualized for the
user. Chart.js is used to render interactive progress charts, allowing users to monitor their
performance and improvements over time. This visual feedback motivates users and helps them
set and achieve fitness goals.

2.1.5 User Interface and Experience

The frontend is built with HTML5, CSS3, JavaScript, and Bootstrap 5, providing a
responsive and user-friendly interface. The UI displays real-time feedback, workout statistics,
and progress charts. Clear instructions and intuitive controls make the system accessible to users
of all fitness and technical backgrounds.

2.1.6 Privacy and Local Processing

All video and pose data are processed locally in the user's browser. No video data is sent
to external servers, ensuring user privacy and data security. This approach also reduces latency
and makes the system usable even with limited or no internet connectivity.

2.1.7 User Authentication and Data Management

The backend, built with Flask and SQLite, manages user authentication and stores
workout history. Users can register, log in, and view their personalized progress. The database
scheme supports multiple users, each with their own secure workout records.

7
2.1.8 Extensibility and Modularity

VisionFitTrack is designed with a modular architecture, making it easy to add new


exercises, features, or integrations. The codebase is organized into clear modules for pose
detection, exercise logic, progress tracking, and user interface, supporting future research
and development.

8
Chapter 3

Feasibility Analysis

The implementation of VisionFitTrack as a browser-based, AI-powered fitness tracking


system requires careful consideration of technical, economic, and practical factors. This section
analyzes the feasibility of the system from multiple perspectives to ensure its viability and
sustainability.

3.1 Technical Feasibility

VisionFitTrack leverages modern web technologies and efficient machine learning


models to deliver real-time exercise detection and tracking. The technical requirements for
running the application are minimal and widely met by most consumer devices:
• A standard webcam for video input (integrated or external)
• A device with a modern web browser (Chrome, Firefox, Edge, Safari)
• Sufficient CPU for real-time pose estimation and inference (no dedicated GPU required, but
performance improves with better hardware)
• Screen width of at least 300 pixels for optimal UI display
• Internet connection only requires initial loading and user authentication; all video processing is
performed locally

Most laptops, desktops, and even some tablets and smartphones meet these requirements, making
VisionFitTrack technically feasible for a broad user base. The use of browser-based technologies
ensures cross-platform compatibility and ease of deployment without the need for software
installation.

9
3.2 Economical Feasibility

VisionFitTrack is designed to be cost-effective for both users and implementers:


• No need for specialized hardware or wearables, only a webcam and a modern browser are
required
• The application is open-source and can be deployed on standard web servers, reducing
licensing and infrastructure costs
• Users do not incur additional expenses for subscriptions or proprietary devices
• Maintenance and updates can be managed centrally, further reducing operational costs

The economic model supports wide adoption in individual, educational, and institutional settings,
making the solution accessible and sustainable.

3.3 Data and Model Availability

The effectiveness of VisionFitTrack depends on the availability of high-quality pose


estimation models and exercise classification datasets:
• MediaPipe Pose is an open-source, well-maintained library for real-time pose estimation
• The exercise classification model can be trained or fine-tuned using publicly available datasets
or custom data collected from users
• The modular design allows for easy integration of new models or data sources as needed

If new exercises or movement types are to be supported, additional labeled data can be collected
and incorporated into the system, ensuring ongoing relevance and accuracy.

3.4 Use Cases

VisionFitTrack supports a variety of practical use cases:


• Real-time exercise tracking and feedback for home workouts, gyms, and rehabilitation centers
• Progress monitoring and motivation through interactive charts and statistics

10
• Integration with virtual coaching or remote fitness programs
• Educational use in teaching exercise form and technique
• Research and development in human movement analysis and sports science

The system’s flexibility and extensibility make it suitable for a wide range of fitness and health
applications, supporting both individual users and organizations.

Figure: Use case diagram

11
Chapter 4

System Analysis

The system analysis for VisionFitTrack examines the core components, data flow, and
operational logic that enable real-time, privacy-focused fitness tracking using computer vision
and AI. The analysis covers the following aspects:

4.1 Data Collection and Annotation

VisionFitTrack’s accuracy and robustness depend on the quality and diversity of its
training data. Data collection involves recording videos of users performing supported exercises
(push-ups, pull-ups, bicep curls, shoulder presses, squats, deadlifts) in various environments,
lighting conditions, and camera angles. Each frame is annotated with the exercise type, phase
(e.g., up/down for push-ups), and repetition boundaries. This annotated dataset is used to train
and validate the exercise classification model, ensuring it generalizes well to real-world
scenarios. Data augmentation techniques, such as mirroring, scaling, and rotation, are applied to
increase dataset diversity and model robustness.

4.2 Pose Estimation Module

The pose estimation module is the foundation of VisionFitTrack. It uses MediaPipe Pose,
a state-of-the-art library for real-time human pose detection, to extract 17 key points (joints)
from each video frame. These keypoints include the nose, eyes, ears, shoulders, elbows, wrists,
hips, knees, and ankles. The module is optimized for speed and accuracy, enabling smooth
tracking even on consumer-grade hardware. The extracted keypoints are normalized relative to
the video frame size and serve as input for downstream exercise classification and repetition
counting. The pose estimation runs entirely in the browser, ensuring privacy and low latency.

12
4.3 Exercise Classification Model

The exercise classification model is implemented using TensorFlow.js and runs directly
in the user’s browser. It is a lightweight neural network trained on the annotated pose data. The
model takes a flattened array of normalized key point coordinates as input and outputs a
probability distribution over the supported exercise classes. The model is designed to be
efficient, allowing real-time inference without significant computational overhead. It is
periodically retrained or fine-tuned as new data becomes available, enabling the system to adapt
to new exercises or user populations.

4.4 Rule-Based Logic and State Machines

While the ML model provides exercise classification, VisionFitTrack also employs rule-
based logic to analyze joint angles and movement direction for more precise repetition and set
counting. For each exercise, specific joint angles (e.g., elbow for push-ups, knee for squats) are
monitored to detect the start and end of a repetition. State machines track the user’s movement
through different phases (e.g., down, up, rest), applying hysteresis and smoothing to avoid false
positives from jitter or noise. This hybrid approach ensures both flexibility and interpretability,
allowing for easy debugging and extension to new exercises.

4.5 Progress Tracking and Visualization


All workout data—including exercise type, repetitions, sets, and duration—is stored in a
local or server-side database. Chart.js is used to render interactive progress charts, such as line
graphs of reps/sets over time, bar charts for exercise distribution, and session summaries. Users
can view their workout history, analyze trends, and set goals. The visualization module is
designed to be intuitive and motivating, helping users track improvements and stay engaged with
their fitness journey.

13
4.6 Web Application Frontend

The frontend is built with HTML5, CSS3, JavaScript, and Bootstrap 5, ensuring a
modern, responsive, and accessible user interface. The UI displays real-time feedback, exercise
stats, and progress charts. Key features include:
- Live video feed with pose skeleton overlay
- Real-time display of detected exercise, rep/set count, and timer
- Interactive charts and workout summaries
- User authentication and profile management
- Responsive design for desktops, laptops, and tablets
The frontend is optimized for usability, with clear instructions and intuitive controls for users of
all backgrounds.

4.7 User Authentication and Data Management

User authentication and data management are handled by a Flask backend and SQLite
database. Users can register, log in, and securely store their workout history. The database
schema supports multiple users, each with their own records of exercises, reps, sets, and session
notes. Authentication is managed using Flask-Login, ensuring secure sessions and data privacy.
The backend exposes RESTful APIs for saving and retrieving workout data, enabling seamless
integration with the front end. Data management features include:
- Secure registration and login
- Password hashing and session management
- Workout history retrieval and visualization
-Support for future integration with external health platform

14
Chapter 5

Software requirements and specifications

5.1 General Description

VisionFitTrack is an AI-powered, browser-based fitness tracking web


application that uses computer vision to analyse users’ physical exercises in real-
time. Designed with a strong focus on accessibility and user privacy, it operates
entirely within the user's browser using webcam input. The system processes
video locally—ensuring no personal footage is uploaded to external servers—
thus preserving privacy while minimizing latency.
VisionFitTrack identifies various exercise types, counts repetitions and sets, and
visualizes performance metrics to guide users through personalized fitness
routines. It supports users in maintaining consistent fitness habits with intuitive
feedback and historical workout tracking.

5.1.1 Product Perspective


VisionFitTrack functions as a standalone application and does not rely on
third-party software for core features. It integrates several components into a
unified system:
- Pose Estimation using MediaPipe.
- Exercise Classification through a machine learning model.
- Repetition Counting using rule-based logic and ML outputs.
- Data Visualization for user progress.
Its modular design allows future upgrades such as new exercises, enhanced
analytics, or platform integrations without disrupting the existing structure.

15
5.1.2 Product Functions

Key functionalities include:


- Real-Time Exercise Detection: Identifies and classifies exercises from
webcam video feed using pose landmarks.
- Repetition & Set Counting: Automatically detects repetitions and groups
them into sets for each exercise.
- Progress Visualization: Displays interactive charts showing daily, weekly,
and monthly activity trends.
- User Authentication: Provides secure login and account management to
track user-specific data.
- Cross-Device Compatibility: Ensures a responsive interface that adapts
seamlessly across desktops, laptops, and tablets.

5.1.3 User Characteristics

The system is built for a diverse user base:


Target Users: Fitness enthusiasts, beginners, personal trainers, and researchers.
Skills Required: Basic computer skills and access to a webcam-enabled device.

5.1.4 General Constraints

Requires a modern web browser with support for JavaScript and


WebRTC.
- Needs webcam access for pose detection.
- All processing is local to protect user privacy.
- Internet access is only needed for initial app loading, updates, and
authentication—not for video analysis.

16
5.1.5 Assumptions and Dependencies

Users have devices equipped with webcams and stable internet.


Application relies on:
MediaPipe Pose for real-time landmark detection.
TensorFlow.js for in-browser ML inference.
Flask + SQLite backend for user authentication and data management.

5.2 Functional Requirements

5.2.1 Real-Time Exercise Detection

The system must capture live video input, extract pose landmarks, and
classify the type of exercise in progress using an ML model.

5.2.2 Automatic Repetition and Set Counting

The application must count each repetition accurately and increment set
count based on predefined rules (e.g., full-body extension and return).

5.2.3 Progress Visualization

Must present session summaries through dynamic charts and statistics,


helping users track:
- Reps and sets per session
- Duration of workouts
- Historical trends

17
5.2.4 User Authentication and History

Users must be able to register, log in, and access personalized workout
histories including dates, exercise types, and performance metrics.

5.2.5 Responsive Design

The interface must scale smoothly across different devices and screen
resolutions, ensuring consistent functionality.

5.3 Non-Functional Requirements

5.3.1 Privacy and Local Processing

All sensitive operations (e.g., video processing, pose detection) must be


performed locally. No raw video data should leave the user’s device.

5.3.2 User-Friendly Interface


The UI should be:
- Clean and intuitive
- Accessible to users with varying tech proficiency
- Consistent in layout and navigation

5.3.3 Performance

Must deliver real-time feedback with latency under 200ms/frame.


Must maintain high detection accuracy (>90%) in good lighting and camera
conditions.

18
5.3.4 Security

All user data must be encrypted during transmission and securely stored
using hashing for passwords and sanitized inputs to prevent injection attacks.

5.4 External Interface Requirements

5.4.1 Webcam Access

The app must request webcam permission and utilize its feed to perform
pose estimation in real time.

5.4.2 Browser Compatibility

The system must run smoothly on major browsers:


- Google Chrome
- Mozilla Firefox
- Microsoft Edge
- Safari

5.4.3 Server and Database

A lightweight Flask backend and SQLite database must:


- Handle user sessions and authentication
- Store user-specific workout history securely

19
5.5 Performance Requirements

Processing Speed: Frame processing latency must not exceed 200ms on


standard consumer devices.
Accuracy Goals:
- Exercise classification accuracy ≥ 90%
- Repetition count accuracy ≥ 95% in ideal conditions

5.6 Design Constraints

5.6.1 Modularity and Extensibility

The system must:


- Be designed using modular components (e.g., pose estimation, classifier,
counter).
- Allow easy integration of new exercises, additional analytics, or wearable
device support.
- Be maintainable for long-term feature expansion.

5.7 Future Scope

VisionFitTrack has a roadmap for future enhancements, including:


- Expanded Exercise Library: Add support for more exercises and custom
workout routines.
- Wearable Device Integration: Sync with smartwatches or fitness bands to
enhance data tracking.
- Advanced Analytics: Provide detailed biomechanical feedback,
performance trends, and suggestions.
- Mobile App Versions: Develop native apps for iOS and Android for
greater access
- ability and offline tracking.

20
Chapter 6

System Design

Introduction

VisionFitTrack is designed with modular, layered architecture to ensure scalability,


maintainability, and privacy. The system is divided into three main layers: Presentation
(Frontend), Application Logic, and Data (Backend & Database).

- Presentation Layer (Frontend):


Built with HTML5, CSS3, JavaScript, and Bootstrap 5, this layer handles all user interactions,
displays real-time feedback, and visualizes progress using Chart.js. It provides a responsive and
intuitive interface for users on various devices.

- Application Logic Layer:


This layer integrates MediaPipe Pose for real-time pose estimation and TensorFlow.js for
exercise classification. Rule-based logic and state machines are used for accurate repetition and
set counting. The logic is implemented in modular JavaScript files, ensuring separation of
concerns and ease of extension.

- Data Layer (Backend & Database):


The backend, developed with Flask and SQLite, manages user authentication, workout history,
and data storage. RESTful APIs facilitate secure communication between the frontend and
backend.

21
1. Sequence Diagram
Summary:
Depicts the time-ordered sequence of interactions between system components and the user:
- Actors: User
- Objects: Frontend, Backend, Classifier, Database
Flow:
Starts with video upload, proceeds to feature extraction, classification, repetition counting, and
ends with result display.

Fig: Sequence Diagram

2. Class Diagram

22
Summary:
Defines the object-oriented design of VisionFitTrack with core classes:
 User: Manages profile (userID, name, email).
 ExerciseSession: Tracks workouts (sessionID, start/end time).
 ExerciseClassifier: Classifies exercises from pose data.
 RepetitionCounter: Counts exercise repetitions.
Relationships:
 A User has multiple ExerciseSessions.
 Each session uses ExerciseClassifier and RepetitionCounter.
Purpose: Ensures modular, maintainable design with AI components integrated for classification
and rep tracking.

Fig: Class diagram


3. Workflow Diagram

23
Summary:
Describes the overall workflow of the VisionFitTrack system:
 Starts with user authentication
 Proceeds to video upload and processing
 Continues with AI-driven classification and repetition counting
 Ends with feedback/report generation
Purpose:
Provides a step-by-step operational blueprint of the system, integrating functional and technical
components.

Fig: Workflow diagram


4. Level 0 DFD (Data Flow Diagram)

24
Summary:
This top-level DFD provides a high-level overview of the system:
 Processes:
- VisionFitTrack System
 External Entities:
- User
 Data Stores/Flows:
- Flow of user inputs (video uploads, profile info) into the system
- Outputs such as detected exercises and repetition counts back to the user
Purpose:
Gives a simplified bird’s-eye view of the system’s functionality and its interaction with the user.

Fig: 0 level DFD

5. Level 1 DFD

25
Summary:
Provides more detail than Level 0 by breaking down the main process into sub-processes:
 Processes:
- 1.1: Register/Login
- 1.2: Upload Video
- 1.3: Detect Exercise
- 1.4: Count Repetitions
- 1.5: Display Results
 Data Stores:
- User Database
- Exercise Database
Purpose:
Clarifies internal system workings and how different submodules interact with each other and the
database.

Fig: 1st Level DFD

26
6. Level 2 DFD
Summary:
Drills down further, likely focusing on one of the Level 1 processes:
 Processes:
o Pose landmark extraction using MediaPipe
o Preprocessing and feeding data into the CNN model
o Outputting classification results
Purpose:
Captures detailed data flow for the AI-based exercise detection component, highlighting how
data is transformed and used internally.

Fig: 2nd Level DFD

27
7. ER Diagram
Summary:
The Entity-Relationship (ER) Diagram models the backend database structure:
 Entities: User, Exercise, Workout session
 Relationships:
o A User can have many Sessions.
o A Session can include multiple Exercises.
o Each Exercise has several Repetitions.
Purpose:
This diagram defines how data is structured and interlinked in the database, ensuring normalized
and efficient data storage for the fitness tracking platform.

Figure: ER diagram

28
Chapter 7

System Implementation

The implementation of VisionFitTrack was carried out in a series of well-defined phases, each
building on the previous to ensure a robust, scalable, and user-friendly fitness tracking
application. Below is an expanded, step-by-step breakdown of the implementation process:

1. Data Collection and Annotation:

- Videos of users performing each supported exercise (push-ups, pull-ups, bicep curls, shoulder
presses, squats, deadlifts) were collected in diverse environments and lighting conditions.
- Each video frame was annotated with exercise type, movement phase, and repetition
boundaries, creating a high-quality dataset for model training.
- Data augmentation techniques (mirroring, scaling, rotation) were applied to increase dataset
diversity and model robustness.

2. Model Training and Evaluation:

- A self-annotated dataset of over 200,000 exercise images was curated and labelled across 6
exercise classes. Pose key points (33 landmarks with x, y, z, and visibility values) were
extracted using MediaPipe Pose, resulting in 132-dimensional feature vectors per image.
- These pose vectors were normalized using training-set statistics and processed in chunks to
optimize memory usage.
- An initial custom CNN model was trained using these features, followed by a fine-tuned
MobileNetV2 model via transfer learning. The base model's convolutional layers were frozen,
and custom classification layers were added with dropout regularization.
- The model was trained with categorical cross entropy loss, Adam optimizer, and monitored
using metrics such as accuracy, precision, and recall. Training included EarlyStopping,
ModelCheckpoint, and ReduceLROnPlateau callbacks to prevent overfitting and adapt

29
learning rates dynamically.
- The best-performing model was converted to TensorFlow.js format, enabling real-time, in-
browser inference for pose-based exercise recognition in fitness applications.

Figure: Confusion matrix of the model

30
Figure: Accuracy and Loss of the model

3. Integration of Pose Estimation:

- MediaPipe Pose was integrated into the frontend using JavaScript, enabling real-time extraction
of 33 key body points from the webcam video stream.
- The pose estimation pipeline was optimized for speed and accuracy, ensuring smooth
performance on consumer hardware.
- Extracted keypoints were normalized and formatted as input for the exercise classification
model.

Figure: Classification report of the model

31
4. Implementation of Rule-Based Logic:

- Custom algorithms were developed to analyze joint angles and movement direction for each
exercise type.
- State machines were implemented to track exercise phases (e.g., up/down for push-ups),
applying hysteresis and smoothing to reduce false positives and ensure stable repetition counting.
- Rule-based logic was combined with model predictions to improve robustness, especially in
ambiguous or transitional movements.

5. Frontend Development:

- The user interface was built using HTML5, CSS3, JavaScript, and Bootstrap 5, ensuring a
modern, responsive, and accessible experience.
- Real-time feedback was provided through overlays (skeleton drawing), live statistics (reps,
sets, timer), and interactive controls.
- Chart.js was integrated to visualize workout progress, trends, and session summaries.
- The UI was tested and refined for usability, accessibility, and cross-device compatibility.

6. Backend and Database Integration:

- A Flask backend was developed to handle user authentication, session management, and secure
storage of workout history.
- SQLite was chosen for lightweight, local data storage, supporting multiple users and efficient
queries.
- RESTful APIs were implemented for communication between the frontend and backend,
enabling seamless data exchange.

7. User Authentication:

32
- Secure registration and login were implemented using Flask-Login, with password hashing and
session management.
- Only authenticated users could save and view their workout history, ensuring data privacy
and security.
- User management features were designed for extensibility and future integration with
external platforms.

8. Progress Tracking and Visualization:

- After each workout, exercise data (reps, sets, duration, notes) was saved to the database for
authenticated users.
- Chart.js was used to render interactive charts, allowing users to track their progress over time
and analyze trends.
- The progress module was designed to be extensible for future analytics and personalized
feedback features.

9. Testing and Debugging:

- Unit tests were written for core logic modules (pose extraction, exercise detection, repetition
counting).
- Integration testing ensured smooth interaction between frontend, backend, and database
components.
- User acceptance testing was conducted with real users to gather feedback, identify edge
cases, and refine the interface.
- Performance testing validated real-time responsiveness and accuracy across devices and
browsers.

33
Chapter 8

System Testing

A comprehensive and multi-layered testing strategy was employed to ensure the reliability,
accuracy, and usability of VisionFitTrack. The following testing methodologies were applied
throughout the development lifecycle:

8.1 Black Box Testing:

- The system was tested as a whole, focusing on input-output behavior without knowledge of
internal code structure.
- Multiple users performed supported exercises (push-ups, pull-ups, bicep curls, shoulder
presses, squats, deadlifts) in different environments, backgrounds, and lighting conditions.
- The accuracy of exercise detection, classification, and repetition counting was validated by
comparing system outputs to manual counts.
- Edge cases, such as partial occlusion, fast movements, and users stepping out of the camera
frame, were specifically tested to ensure robustness.

8.2 White Box Testing:

- The internal logic for pose extraction, joint angle calculation, state transitions, and rule-based
algorithms was thoroughly tested.
- Unit tests were written for core modules (e.g., poseDetection.js, exercises.js) to verify correct
calculation of joint angles, state machine transitions, and rep/set counting logic.
- Code coverage analysis was performed to ensure all critical paths and edge cases were tested.

34
8.3 Integration Testing:

- The interaction between frontend, backend, and database components was tested to ensure
seamless data flow and correct API responses.
- Scenarios included user registration, login, workout data saving, and retrieval of progress
history.
- Error handling and recovery from network or server failures were validated.

8.4 Beta Testing:

- The application was released to a group of real users for hands-on evaluation.
- User feedback was collected on usability, interface clarity, responsiveness, and overall
experience.
- Issues identified during beta testing (e.g., confusing UI elements, missed reps, slow feedback)
were addressed and the system refined accordingly.

8.5 Progress Visualization Testing:

- The accuracy and clarity of progress charts and workout summaries (using Chart.js) were
validated.
- Test data was used to ensure correct rendering of line graphs, bar charts, and session
summaries.
- The ability to filter, sort, and interpret progress data was tested for usability.

8.6 Responsiveness Testing:

- The user interface was tested on a range of devices (desktops, laptops, tablets) and screen
sizes to ensure a consistent and accessible experience.
- Cross-browser compatibility was verified for Chrome, Firefox, Edge, and Safari.
- The UI was evaluated for accessibility, including color contrast, font size, and keyboard
navigation.

35
8.7 Performance Testing:

- The system was tested for real-time responsiveness, with latency measured for pose
estimation, exercise classification, and UI updates.
- The application was profiled to identify and resolve performance bottlenecks, ensuring
smooth operation on consumer hardware.

8.8 Acceptance Testing:

- The final system was validated against all functional and non-functional requirements
specified in the SRS.
- Test cases were derived from user stories and requirements to ensure complete coverage.
- The system was deemed ready for deployment only after passing all acceptance criteria.

This rigorous testing approach ensured that VisionFitTrack is robust, accurate, user-friendly, and
ready for real-world use.

36
Chapter 9

Conclusion

VisionFitTrack demonstrates the potential of browser-based AI for fitness tracking. By


combining real-time computer vision, machine learning, and interactive data visualization, it
offers a seamless and private way for users to monitor their workouts and progress. The modular,
privacy-focused design enables real-time exercise detection, progress tracking, and user
engagement without the need for specialized hardware. The system is extensible, cost-effective,
and suitable for a wide range of fitness and health applications, laying a strong foundation for
future enhancements and research.

37
Chapter 10

Bibliography

- MediaPipe: https://mediapipe.dev/
- TensorFlow.js: https://www.tensorflow.org/js
- Flask: https://flask.palletsprojects.com/
- Chart.js: https://www.chartjs.org/
- Bootstrap: https://getbootstrap.com/
- [Other relevant papers, tutorials, or datasets as needed

38

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy