0% found this document useful (0 votes)
11 views5 pages

Flipkart Grid 6

Uploaded by

Vishal Bokhare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Flipkart Grid 6

Uploaded by

Vishal Bokhare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

FLIPKART GRID 6.

0
TEAM NAME : MaRS IITR
College/University: Indian Institute of Technology Roorkee

Frontend repository link : https://github.com/PiyushM001/flipkart-grid


Backend repository link :
https://github.com/PiyushM001/flipkart-grid-backend
Web application link: https://flipkrt-grid-mars.netlify.app/
PS : We have used the free version of Render's web hosting services, which has a
limitation: if the app stays inactive for more than 10 minutes, the service automatically
shuts down and requires manual restart. As a result, if we host the application and
submit it, there is a possibility that it might not work the following day during evaluation
unless we manually restart the hosting service. Upgrading to the paid version can
resolve this issue. Also, the accuracy in detection is quite high and the speed of
processing can be further increased by using a paid version of API.

Tech stack:
coding languages: python, CSS ,HTML, Javascript, React.js.

Libraries and deep learning framework: yolo v11,Tensorflow,pytorch,pandas,


matplotlib,roboflow, LLM Gemini API and other libraries for custom training and image
enhancement (HD making of uploaded Image).

DataBase : MongoDB
DataBase Schema :

const productSchema = new mongoose.Schema({


timestamp: { type: String },
product_name: { type: String },
brand: { type: String },
MRP: { type: String},
expiry_date: { type: String },
product_count: { type: String},
is_expired: { type: String },
expected_life_span: { type:String},
});
const fruitSchema = new mongoose.Schema({
name: { type: String, required: true },
freshness_index: { type: Number, required: true },
expected_life_span: { type: Number, required: true },
timestamp: { type: String, required: true },
});

Input source:
Images can be processed either by capturing them in real-time using an external device
or by uploading stored images from a device. This flexibility allows for seamless
handling of both on-the-spot analysis and processing of previously saved files.

Solution:
To address the outlined requirements, we have utilized a fine-tuned version of the
Gemini API model, a large language model (LLM) with integrated multimodal
capabilities and using prompt engineering to get desired results. Before selecting
Gemini API, we rigorously tested other models and APIs, including OpenAI’s API, as
well as vision-based architectures like VGG16, ResNet50, VOLO V8, YOLO V11,
Paddle OCR, and SAM 2 segmentation models. While these models performed well in
specific areas, Gemini API emerged as the most reliable and efficient solution due to its
superior accuracy, multimodal integration,wider data analysis and ability to handle
complex real-world scenarios seamlessly. Below is a detailed explanation of how the
architecture and logic are applied to each function.We also added the option to view
history so that all the billing details can be saved in our Database.

Dataset for fine tuning the LLM:

For packaged products:


https://drive.google.com/drive/folders/1m-bnfTrdwG57g9aY-RmNJhDn8K8mj88b?usp=sh
aring

For fruits and vegetables:

https://universe.roboflow.com/college-74jj5/freshness-fruits-and-vegetables/dataset/7/im
ages

A. Brand Detection

Vision-Language Integration: Combines computer vision and natural language


processing for real-time brand recognition from live camera feeds.
Convolutional Neural Networks (CNNs): Fine-tuned CNN layers process image data
for logo and text recognition.

High-Speed Parallel Processing: Optimized GPU acceleration ensures rapid


identification, supporting detection of multiple shipments in a single frame.

Custom Object Detection Models: Pre-trained on large datasets of brand logos and
fine-tuned with specific shipping scenarios to achieve a 95%+ identification rate.

B. Expiry Date Detection

Regex and Vision-Language Fusion: The LLM employs a regex-based approach


integrated with OCR (Optical Character Recognition) for extracting expiry dates across
multiple formats.

Context-Aware Processing: Leverages natural language understanding to identify


variations such as "Expiry Date," "Best Before," and "Use By."

Training on Multiformat Data: Model fine-tuned with real-world data containing diverse
date formats (e.g., MM/YY, DD-MM-YYYY) for high robustness.

C. Item Counting

Object Detection and Tracking Algorithms: Uses YOLO-based architecture for


real-time item detection and counting.

Boundary Detection Models: Advanced algorithms segment overlapping or partially


occluded items.

D. Freshness Detection

Multifactor Assessment Models: Integrates computer vision and domain-specific data


for analyzing freshness based on multiple parameters like texture, color, and surface
changes.

Freshness using LLMs : The freshness index detection system uses a sophisticated
deep learning architecture primarily based on Convolutional Neural Networks (CNNs)
and Vision Transformers. By analyzing high-resolution images of fruits and vegetables,
the model extracts critical visual features like color gradients, surface texture, bruise
detection, and microscopic changes. The system is trained on extensive datasets
containing images annotated with precise age, ripeness stages, and environmental
conditions. The neural network learns to recognize subtle indicators of degradation,
enabling it to predict the remaining shelf life, freshness percentage, and optimal storage
conditions.

Real-World Training Data: Fine-tuned with real-world images of perishable goods to


ensure practical relevance.

E. Data storage:

We have maintained a database through which we can see previously processed


images and data. This can be viewed by clicking on the view history button on web
application.

SAMPLE OUTPUT IN UI:


Results for Sample Image 1:

Practical Usage Links: Video Link

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy