Fullyandfinal
Fullyandfinal
ON
Image Analysis using Convolutional Neural Network
2023-2024
Declaration
We hereby declare that the project work presented in this report entitled “Image
Analysis using Convolutional Neural Network” in partial fulfillment of the
requirement for the award of the degree of Bachelor of Technology in
“Computer Science (AIML)” submitted to A.P.J. Abdul Kalam Technical
University, Lucknow, is based on our own work carried out at Department of
Applied Computational Science and Engineering (ACSE), G.L. Bajaj Institute of
Technology & Management, Greater Noida. The work contained in the report is
true and original to the best of our knowledge and project work reported in this
report has not been submitted by us for award of any other degree or diploma in
any other institute / college/university.
Signature:
Name: Prateek Joshi
Signature:
Name: Aman Kumar
Roll No:
2001921530008
Signature:
Name: Rahul Kumar
ii
Certificate
This is to certify that the Project report entitled “Image Analysis using
Convolutional Neural Network” done by Prateek Joshi (2001921530037), Aman
Kumar (2001921530008) and Rahul Kumar (2001921530041) is an original
work carried out by them in Department of Applied Computational Science and
Engineering (ACSE), G.L. Bajaj Institute of Technology & Management, Greater
Noida under my supervision. The matter embodied in this project work has not
been submitted earlier for the award of any degree or diploma to the best of my
knowledge and belief.
Supervisor
iii
Acknowledgement
The merciful guidance bestowed to us by the almighty made us stick out this
project to a successful end. We humbly pray with sincere heart for his guidance to
continue forever.
We pay thanks to our project guide Ms. Hina Gupta who has given guidance and
light to us during this project. His/her versatile knowledge has helped us in the
critical times during the span of this project.
We pay special thanks to our Dean of the Department Dr. Naresh Kumar and Head
of Department Dr. Mayank Singh who has been always present as a support and
help us in all possible way during this project.
We also take this opportunity to express our gratitude to all those people who
have been directly and indirectly with us during the completion of the project.
We want to thanks our friends who have always encouraged us during this project.
At the last but not least thanks to all the faculty of CSE department who provided
valuable suggestions during the period of project.
Rahul Kumar(2001921530041)
Prateek Joshi(2001921530037)
Aman Kumar(2001921530008)
iv
Abstract
The paper further delves into the practical applications of CNNs in diverse
domains such as medical imaging, autonomous vehicles, facial recognition, and
anomaly detection. Techniques for enhancing CNN performance, including data
augmentation, transfer learning, and the use of advanced optimization algorithms,
are reviewed. Additionally, challenges such as overfitting, the need for large
labeled datasets, and computational requirements are discussed, along with
potential solutions like regularization techniques and unsupervised learning.
v
Table Of Content
Declaration...................................................................................................................... (ii)
Certificate ................................................................................................................. (iii)
Acknowledgement........................................................................................................................................ (iv)
Abstract ................................................................................................................. (v)
Table of Content................................................................................................................. (vi)
List of Figures ………………………………………………………………………… (vii)
List of Tables …………………………………………………………………………. (viii)
Detail of published/ communicated research paper /patent ……………………………. (ix)
vi
LIST OF
FIGURES
Page No.
vii
Chapter 1
Introduction
Maintaining a healthy body in modern society requires careful monitoring of calorie intake
to balance calorie expenditure. An optimal Body Mass Index (BMI) should range between
22 and 28; a value higher than this indicates overweight, and a BMI over 30 signifies
obesity. Achieving and maintaining a healthy weight necessitates vigilance regarding calorie
consumption.
Traditional methods for calorie estimation are manual and often cumbersome, making them
less practical for everyday use. This project presents a novel approach to calorie
(CNNs), to estimate the calorie content of food from images. This technique is particularly
valuable in the medical field, where accurate calorie counting is essential for dietary
management.
Our model leverages TensorFlow, a robust framework for implementing machine learning
methods, to classify and calculate the calorie content of various food items, including fruits
and vegetables. Users simply capture an image of their food, which serves as the input for
our trained CNN model to detect the food object and calculate its calorie value.
To enhance the user experience, we integrate Google's Generative AI (Gemini) and prompt
engineering techniques. After the image is analyzed by the machine learning model, these
This system provides a seamless, automated solution for dietary monitoring, promoting
better health management through precise and convenient calorie counting while
manual tracking or database lookups, are prone to errors, time-consuming, and often
lack accuracy.
These methods can lead to inaccuracies in dietary records, making it difficult for
users to manage their health effectively and track their nutritional intake
comprehensively.
1.3 Objectives
Implement an intuitive user interface for users to capture food images, view
Enhance the efficiency and reliability of nutritional tracking while minimizing the
The project will focus on developing a standalone food recognition system that can
The system will support the identification of multiple food items, real-time image
2
Integration with existing health and fitness applications or databases may be
1.5Functional Requirements
1.5.1 User Registration: Allow users to create profiles and store personal dietary
1.5.2 Image Capture: Implement a feature for users to capture images of their meals
1.5.3 Food Detection: Develop a food detection algorithm to locate and identify
1.5.5 Dietary Tracking: Automatically log the nutritional values of consumed foods
1.5.6 User Management: Provide functionalities for users to update their profiles,
1.6.1 Accuracy: The system must achieve high accuracy in food detection and
1.6.2 Speed: Ensure real-time processing of food images for seamless nutritional
1.6.3 Security: Implement measures to protect sensitive user data, such as personal
1.6.4 Scalability: Design the system to handle a large number of users and
2
1.6.5 Usability: Create an intuitive and user-friendly interface for users to interact
1.6.6 Reliability: Ensure the system's stability and robustness under various
1.6.3 Interpretability:
CNNs are often referred to as "black-box" models because of their complex
architectures and high dimensionality, making it challenging to interpret how they
arrive at their predictions.
1.7 Problem Identification and Feasibility
2
Traditional methods of estimating calorie intake often rely on manual tracking or generic
not provide detailed nutritional information and require significant effort from users to input
data and calculate their intake. This issue is particularly critical for individuals managing
specific dietary needs or medical conditions, where precise nutritional information is vital.
techniques can generate detailed nutritional information for identified food items,
items and calculating their nutritional values. These technologies are well-
User Acceptance: The system aims to offer an intuitive and user-friendly interface,
2
Integration with Existing Health Apps: The system can integrate with existing
health and fitness applications, allowing users to seamlessly incorporate the tool into
Training and Support: Comprehensive training materials and technical support will
system.
are necessary. However, the benefits, such as accurate nutritional tracking, improved
health outcomes, and reduced effort for users, justify these costs.
Return on Investment (ROI): The system's ROI will be measured user health,
reduced healthcare costs due to better dietary management, and increased user
Data Privacy and Security: The system will comply with data privacy regulations,
ensuring the secure handling and storage of user data. Encryption, access controls,
Ethical Considerations: Ethical considerations, such as the potential for bias in the
AI model and user privacy concerns, will be addressed through transparent policies
1.8 Constraints
such as cameras with adequate resolution and processing power for real-time food
recognition.
2
Lighting Conditions: The performance of the food recognition algorithm may be
accurate results.
food presentation (e.g., different plating styles) may impact the system's
performance.
1.9 Assumptions
The system assumes cooperative users who willingly participate in the image capture
2
Chapter 2
Literature Survey
Roth et al. [12] In their work on improving existing CAD systems, the researchers
applied Convolutional Neural Networks (CNNs) to enhance the detection accuracy of
colonic polyps on CT colonography, sclerotic spine metastases on body CT, and
enlarged lymph nodes on body CT. They utilized previously developed candidate
detectors and generated 2D patches from three orthogonal directions, supplemented by
up to 100 randomly rotated views. This method, referred to as "2.5D" views, serves as
a decompositional image representation of the original 3D data. The CNNs processed
these 2.5D views, and their predictions were aggregated to boost accuracy further. The
integration of CNNs led to a significant improvement in sensitivity for lesion detection,
ranging from 13% to 34% across all three CAD systems. This substantial enhancement
demonstrates the generalizability and scalability of the CNN-based approach.
Achieving such improvements had been nearly unattainable with traditional non-deep
learning classifiers, such as committees of support vector machines (SVMs). This
indicates that CNNs provide a more powerful tool for enhancing the performance of
CAD systems in medical imaging.
Dou et al. [13] The researchers focused on detecting cerebral microbleeds from
susceptibility-weighted MRI scans using a two-stage approach involving 3D
Convolutional Neural Networks (CNNs). Unlike traditional methods, they replaced the
candidate detection stage with a CNN, streamlining the detection process. Their results
indicated that the 3D CNN significantly outperformed various classical methods and
2D CNN approaches previously documented in the literature. The authors validated
this by re-implementing, training, and testing these alternative methods on the same
dataset, demonstrating the superior effectiveness of their 3D CNN model.
2
Sirinukunwattana et al. [14] The researchers worked on detecting and classifying nuclei
in histopathological images using a specialized Convolutional Neural Network (CNN).
Their CNN processes small image patches and, rather than simply determining if the
central pixel is part of a cell nucleus, it generates an output characterized by a high
peak near the center of each nucleus and flat responses elsewhere. This spatially
constrained approach, combined with the fusion of overlapping patches during the
testing phase, results in superior performance compared to previous techniques. These
earlier methods included both other CNN-based approaches and classical feature-based
methods.
Van Grinsven et al. [17] The researchers aimed to improve and accelerate CNN
training for medical image analysis by dynamically selecting misclassified negative
samples during training. CNN training involves many iterations (epochs) to optimize
network parameters, with each epoch using a randomly selected subset of training data
for parameter updates via back-propagation. In medical classification tasks, which
often involve distinguishing between normal and pathological cases, the normal class is
usually over-represented. Additionally, many normal samples are highly correlated due
to repetitive patterns in the images, making only a small fraction of these samples
informative. Uniformly treating all data during training wastes time on non-informative
samples, prolonging the process. By identifying and focusing on informative normal
samples, the researchers increased the efficiency of the CNN learning process and
reduced training time.
2
with the benefits becoming more pronounced as the size of the training sets decreased.
Shin et al. reported that the GoogLeNet architecture achieved state-of-the-art detection
of mediastinal lymph nodes, outperforming other shallower architectures.
(Setio et al. [11]) A study by Setio et al. led to the creation of a challenge for
pulmonary nodule detection, organized alongside the IEEE ISBI conference. This
challenge utilizes the publicly available LIDC/IDRI dataset, allowing the system
described in their study to be directly compared with alternative approaches. More
information about the challenge can be found at http://luna16.grand-challenge.org/.
Several challenges exist in this area. Firstly, securing funding for constructing data sets
is difficult. Secondly, high-quality annotation of medical imaging data requires scarce
and expensive medical expertise. Thirdly, privacy concerns make sharing medical data
more challenging compared to natural images. Fourthly, the wide range of applications
in medical imaging necessitates the collection of diverse data sets. Despite these
challenges, there is rapid progress in data collection and sharing. Numerous public data
sets have been released and are routinely used in experimental validation. Notable
2
examples include VISCERAL and The Cancer Imaging Archive
(http://www.visceral.eu/ and http://www.cancerimagingarchive.net/). Roth et al. [12]
and Shin et al. [14] have analyzed a dataset of enlarged lymph nodes on CT scans,
which they have made publicly available on The Cancer Imaging Archive [22]. The
same group has also released a pancreas dataset online [38].
In other words, are the very deep residual networks, like the one with 152 layers that
excelled in the ILSVRC 2015 classification task, also effective for medical tasks? This
question highlights the potential transferability of network architectures across
different domains, particularly from image classification tasks like ILSVRC to medical
image analysis tasks. While the success of a network architecture in one domain doesn't
guarantee its effectiveness in another, exploring whether these deep residual networks
can achieve good results in medical tasks could lead to valuable insights and
advancements in medical image analysis.
In other words, obtaining accurate ground-truth data for medical tasks presents a
significant challenge due to its scarcity and the complexity involved in its collection.
While crowdsourcing has shown potential in annotating large datasets for general
images, its application in the biomedical field demands a more nuanced approach due
to the need for precise annotations and a deeper understanding of the medical tasks at
2
hand. This challenge underscores the importance of developing specialized methods for
annotating medical data accurately and efficiently.
Brosch et al.[21] and Dou et al. [12] much of the analysis conducted in current works is
carried out in two dimensions (2D), prompting discussions about whether transitioning
to three-dimensional (3D) analysis could lead to significant performance
improvements. Various approaches to data augmentation, including the use of 2.5D
techniques, have been explored. For instance, Roth et al. utilized axial, coronal, and
sagittal images centered on a voxel in a colonic polyp or lymph node candidate,
feeding them into the cudaconvnet CNN. This CNN incorporates three channels
typically used to represent the red, green, and blue color channels of a natural light
image. Explicitly employing 3D convolutional neural networks (CNNs) has been
demonstrated in the works of Brosch et al. and Dou et al., marking a departure from
traditional 2D approaches and potentially offering advantages in capturing spatial
information across multiple dimensions.
2
1.7 Literature Survey:
Fig.2.1
2
a focus on biological groups. By applying filtering thresholds on classification
probabilities and grouping classes into taxonomically meaningful groups, the
precision rate for nonrare biological groups reached 90.7%. The document also
discusses the importance of the training set, the impact of filtering on
classification statistics, and the potential for future developments in automated
plankton image analysis using CNNs.
The document delves into the utilization of deep convolutional neural networks
(CNNs) in medical image analysis, particularly in computer-aided detection
(CAD) systems for diseases like breast cancer, lung nodules, and prostate cancer.
The abstract emphasizes the significance of CNNs in enhancing diagnostic
accuracy and efficiency in medical imaging. It discusses the training strategies for
CNNs, including transfer learning and fine-tuning, to address the challenges of
limited labeled data. The study showcases applications of CNNs in various
medical imaging tasks and highlights the potential of deep learning methods in
revolutionizing healthcare practices.
2
Fig.2.2
results achieved in disease detection and diagnosis through CNN architectures and
transfer learning techniques. The study suggests future directions for integrating
deep learning tools in precision medicine, radiomics, and clinical decision support
systems to improve patient outcomes and streamline healthcare processes.
Fig.2.3
The document discusses the use of CNN-based image analysis for malaria
diagnosis, highlighting the inefficiency of traditional methods and the potential of
machine learning. The abstract introduces a new 16-layer CNN model with a
97.37% accuracy in classifying infected and uninfected red blood cells. Transfer
learning, with 91.99% accuracy, is also compared.
The CNN architecture, data preprocessing, and model training process are
detailed. Results show the CNN model's superior performance, attributing it to
both architecture and training data volume. The conclusion suggests deep
learning's potential to enhance malaria diagnosis efficiency and accuracy.
Acknowledgments and references are included, acknowledging funding sources
and previous studies on deep learning for genomics.
2
Fig.2.4
2
increasing number of analyzed images. This approach mimics the training process
for a physician but leverages the vast dataset to potentially achieve higher
accuracy levels than human counterparts. The model's ability to learn from a wide
array of images underscores its potential for heightened diagnostic precision in
medical imaging.
Fig.2.5
2
Fig.2.6
2
Chapter 3
Proposed Work
performance.
Conduct hyperparameter tuning experiments to find the optimal learning rate, batch
size, and other parameters for improved training efficiency and accuracy.
layers of the pre-trained model or using different pre-trained models for feature
extraction.
2
Implement methods for interpreting model predictions and understanding the
Visualize feature maps, activation patterns, and class activation maps to gain insights
into what parts of the input images are influencing the model's predictions.
Integrate the model into existing platforms or systems relevant to fruit and vegetable
applications.
image qualities.
recognition to assess its competitiveness and identify areas for further improvement.
in production environments.
Implement strategies for model retraining and updating to adapt to changes in data
2
Foster collaboration with domain experts, researchers, and stakeholders in relevant
Share insights, findings, and best practices through publications, presentations, and
Explore emerging trends and technologies in AI, such as federated learning, meta-
recognition tasks.
Foster a culture of innovation and continuous learning within the project team,
Python is the primary programming language used for developing the project
due to its simplicity, readability, and extensive support for data manipulation,
with Python, providing support for powerful numerical operations and array
manipulation.
data structures and tools for working with structured data, such as data
Python, providing simple and efficient tools for data mining and data analysis
reduction.
3.2.2.6 Keras: Keras is a high-level neural networks API that runs on top of
TensorFlow, allowing for easy and fast prototyping of deep learning models
resources for running Python code, especially for machine learning and deep
learning tasks.
2
3.2.2.8 Google Generative AI: Google Generative AI is a collection of APIs
and models developed by Google for various generative tasks, such as text
access to GPU and TPU resources for accelerating deep learning tasks.
3.2.4.1 Kaggle API: The Kaggle API is used to download datasets directly
3.3 Design
3.3.1 Project Structure:
The project follows a structured directory layout, with separate directories for data,
The fruit_vegetable_dataset directory contains the dataset used for training and
Code files are organized into logical units, such as data preprocessing, model
2
Output files, such as trained model weights and predicted labels, are saved in
Data preprocessing steps include loading images, resizing them to a uniform size,
and converting them into numerical arrays for input to the machine learning model.
applied to increase the diversity of the training data and improve model
generalization.
Additional dense layers are added on top of the pre-trained model to capture high-
The model is compiled with appropriate loss function, optimizer, and evaluation
The dataset is split into training, validation, and testing sets to train and evaluate the
model.
The model is trained using the training set, with performance monitored on the
2
Model performance metrics, such as accuracy, are computed on the testing set to
Predictions are generated for test images using the trained model, and the predicted
Content generation APIs, such as Google Generative AI, are utilized to generate
prompts.
The user interface may also include features for visualizing model performance
2
Chapter 4
Methodology
3.1 Introduction:
Fig.3.1
2
Fig.3.2
2
Fig.3.3
Fig.3.4
2
Fig.3.5
2
Create well-structured and visually appealing documents using tools Jupyter
Chapter 5
Result and Discussion
EfficientNet or ResNet.
Larger Dataset: Increase the number of images per class to provide the
and help in annotating them to continuously grow and diversify the dataset.
2
Mobile Application: Develop a mobile application that uses the trained
model for real-time fruit and vegetable recognition through the device’s
camera.
connectivity.
information that provides detailed insights into the health benefits, calorie
content, and other nutritional values of the recognized fruits and vegetables.
fruits and vegetables, helping users make healthy and creative meals.
add recognized items directly to a shopping list, with options for nutritional
educational settings.
keep track of available produce and suggest what to buy next or what recipes
to make.
2
5.1.6 Research and Development:
model, allowing users to understand how and why certain predictions are
made.
continuously learn from new data, improving its performance over time
2
Fig.5.1
2
Fig.5.2
2
Chapter 6
Conclusion, Limitation and Future Scope
5.1 Conclusion
The project successfully developed a deep learning model capable of recognizing and
balance between model performance and computational efficiency. The use of data
including training, validation, and test splits. This dataset was crucial for training a robust
model.
categories of fruits and vegetables. The model demonstrated strong performance metrics,
Image Processing Pipeline: A systematic pipeline was implemented for image loading,
Prediction and Output Generation: The model was capable of making accurate predictions on
test images. An example workflow was provided to demonstrate the process of using the
Enhanced Nutritional Lookup: By integrating the model with Google Gemini AI, we added a
and descriptive content about the identified fruits and vegetables, enriching the user
experience.
Natural Language Processing: Using Google Gemini AI’s generative capabilities, the
vegetable, including calorie information. This integration bridges the gap between image
recognition and contextual understanding, making the app more informative and engaging.
Nutrition Lookup: By integrating the model with an application that provides nutritional
information, users can easily obtain detailed insights about various fruits and vegetables.
Educational Tool: The project can serve as an educational tool for learning about different
fruits and vegetables, their nutritional content, and their health benefits.
Retail and Agriculture: The technology can be extended to commercial applications in the
retail and agricultural sectors, enhancing inventory management and crop monitoring.
The project lays a solid foundation for future enhancements, including the exploration of
more advanced architectures, expansion of the dataset, and integration with mobile and IoT
devices. By continually improving the model and expanding its application scope, this
project can evolve into a valuable tool for a wide range of users.
In conclusion, this project showcases the potential of deep learning and generative AI in
practical image recognition tasks. It demonstrates the feasibility of creating an efficient and
accurate model for fruit and vegetable classification, enhanced by the contextual intelligence
provided by Google Gemini AI. This combination paves the way for further innovations and
1. TensorFlow Documentation
https://www.tensorflow.org/
2. Keras Documentation
Chollet, F. et al. (2023). Keras: The Python Deep Learning library. Retrieved from
https://keras.io/
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018).
4. ImageDataGenerator Documentation
https://keras.io/api/preprocessing/image/
https://www.kaggle.com/datasets/kritikseth/fruit-and-vegetable-image-recognition
from https://cloud.google.com/gemini-ai
2
7. Python Data Analysis Library (Pandas)
from https://pandas.pydata.org/
8. NumPy Documentation
Harris, C. R., Millman, K. J., van der Walt, S. J., et al. (2020). Array programming
Perez, F., & Granger, B. E. (2007). IPython: A System for Interactive Scientific
https://ipython.org/
convolutional networks,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1160–1169,
May 2016.
networks and random view aggregation,” IEEE Trans. Med. Imag., vol. 35, no. 5,
pattern classification for interstitial lung diseases using a deep convolutional neural
network,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1207–1216, May 2016.
2
14. ] H.-C. Shin et al., “Deep convolutional neural networks for computeraided
Trans. Med. Imag., vol. 35, no. 5, pp. 1285–1298, May 2016
Trans. Med. Imag., vol. 35, no. 5, pp. 1262–1272, May 2016.
16. ] A. Depeursinge et al., Comput. Med. Imag. Graph., vol. 36, no. 3, pp. 227–238,
2012.
17. ] M. van Grinsven, B. van Ginneken, C. Hoyng, T. Theelen, and C. Sánchez, “Fast
hemorrhage detection in color fundus images,” IEEE Trans. Med. Imag., vol. 35, no.
18. ] N. Tajbakhsh et al., “Convolutional neural networks for medical image analysis:
Full training or fine tuning?,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1299–
“Agg-Net: Deep learning from crowds for mitosis detection in breast cancer
histology images,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1313–1321, May
2016.
segmentation and mammographic risk scoring,” IEEE Trans. Med. Imag., vol. 35,
21. ] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition
2
22. [37] H. R. Roth et al., “A new 2.5 d representation for lymph node detection in CT,”
org/10.7937/K9/TCIA.2015.AQI
23. [38] H. R. Roth et al., “Data from pancreas-CT,” Cancer Imag. Arch., 2016