Medical Plant Identification Project Report
Medical Plant Identification Project Report
in
By
Ayush Guleria 191202
I hereby declare that the work presented in this report entitled “Medicinal
Plants Detection Using ML & DL” in partial fulfillment of the
requirements for the award of the degree of Bachelor of Technology in
Computer Science and Engineering/Information Technology submitted in
the department of Computer Science & Engineering and Information
Technology, Jaypee University of Information Technology Waknaghat is an
authentic record of my own work carried out over a period from January 2023
to May 2023 under the supervision of (Dr. Ekta Gandotra) (Associate
Professor).
The matter embodied in the report has not been submitted for the award of any
other degree or diploma.
Student Signature
Ayush Guleria, 191202.
This is to certify that the above statement made by the candidate is true to the
best of my knowledge.
Supervisor Signature
Supervisor Name - Dr. Ekta Gandotra
Designation - Associate Professor
Department name - CSE
Dated:
I
Plagiarism Certificate
II
ACKNOWLEDGEMENT
Finally, I must acknowledge with due respect the constant support and
patients of my parents.
AYUSH GULERIA
191202
III
Table Of Content
IV
List of Abbreviations
2 DL Deep learning
9 NIRS Near-infrared
spectroscopy
10 AI Artificial intelligence
V
List Of Figures
3 1.3 Methodology
11 2.2 base_model.summary
12 2.3 model.summary
VI
17 2.8 Deployed web application
VII
Abstract
VIII
to categorize many medicinal plants according to the shape, texture, and color
of their leaves.
The detection and classification of medicinal plants based on their
morphological and chemical properties has shown tremendous promise for
ML and DL approaches, in conclusion. These techniques can analyze big
datasets and extract features that are difficult for the human eye to see. feature
extraction and image recognition.
Keywords: Medicinal plants, traditional medicine, natural remedies, bioactive
compounds, sustainable healthcare, machine learning (ML), deep learning
(DL), morphological, chemical characteristics, image recognition, feature
extraction, classification algorithms, convolutional neural networks (CNNs),
leaf morphology, leaf texture, leaf color, environmental conditions, growth
conditions.
IX
Chapter-1
INTRODUCTION
1.1 Introduction
1
One of the key advantages of using ML and DL algorithms for plant
classification is their ability to process large amounts of data quickly and
accurately. This capability allows for the identification of many different plant
species, which may not be feasible with traditional methods. Additionally,
these algorithms can identify subtle differences in plant morphology that may
be difficult for the human eye to detect, leading to a more accurate
classification of plant species.
Several studies have been conducted to validate the effectiveness of ML and
DL algorithms for medicinal plant classification. For instance, a study
conducted in China used a convolutional neural network (CNN) to classify
seven different medicinal plants based on their leaf images. The model
achieved an accuracy of 97.54%, demonstrating the potential of this approach
for medicinal plant identification.
Similarly, a study conducted in India used a combination of texture and shape
features to classify 12 different medicinal plants. The model achieved an
accuracy of 95.83%, which is comparable to the accuracy achieved by experts
in traditional plant classification.
Another study conducted in Brazil used a machine learning approach to
classify four different species of medicinal plants. The model achieved an
accuracy of 92.5%, demonstrating the potential of this approach for plant
identification in regions with high biodiversity.
The use of ML and DL algorithms for medicinal plant classification has
several applications. For example, it can be used to detect adulteration and
substitution of medicinal plants, which is a significant concern in the herbal
medicine industry. By accurately identifying plant species, this approach can
ensure the quality and safety of herbal medicines and prevent harmful
consequences resulting from the use of incorrect plant species.
Moreover, ML and DL algorithms can be used to identify new plant species
with potential medicinal properties. This approach involves training models on
a dataset of known medicinal plants and using them to classify unknown
plants. The identified plants can then be further studied for their potential
medicinal properties, leading to the discovery of new natural medicines.
2
In conclusion, the use of ML and DL algorithms for medicinal plant
classification has the potential to revolutionize herbal medicine by improving
the accuracy and efficiency of plant identification.
This approach is more efficient than traditional methods, can identify subtle
differences in plant morphology, and can be used to detect adulteration and
substitution of medicinal plants.
Moreover, it can be used to identify new plant species with potential medicinal
properties, leading to the discovery of new natural medicines. As such, the
future scope of using ML and DL techniques for medicinal plant detection by
classifying leaves using ResNet 50 is promising.
Researchers can continue to refine the models, integrate other modalities,
develop portable and user-friendly applications, expand the use
3
Fig 1.1 : Random leaves
4
Fig 1.2 : Example of Medicinal Plants leaves
5
1.2 Problem Statement
6
harvesting practices. Additionally, the system can aid in biodiversity
conservation efforts by enabling experts to accurately identify and classify
different types of medicinal plants in natural habitats.
1.3 Objectives
7
weights can be fine-tuned to improve the accuracy of the medicinal plant
classification task.
After training and validating the model, it can be used to classify new images
of medicinal plant leaves. A user-friendly web application can be developed
for this purpose, where users can easily upload images of plant leaves and get
instant results. The web application can be developed using popular web
frameworks like Flask or Django, along with front-end libraries like React or
Vue.
In conclusion, the development of an automated and accurate method for
identifying medicinal plant species based on their leaf images is an essential
step towards sustainable harvesting practices. By utilizing deep learning
models, pre-processing techniques, and data augmentation, we can develop a
model that is robust and efficient in identifying various medicinal plant
species. The use of transfer learning techniques can also improve the
performance of the model. With the development of a user-friendly web
application, experts in traditional medicine, agriculture, and biodiversity
conservation can easily and quickly identify medicinal plants.
1.4 Methodology
Detecting medicinal plants using machine learning and deep learning can be a
challenging task, but it can also be an effective way to identify and classify
different types of plants based on their leaves. Here's a possible methodology
for using ML and DL to detect medicinal plants by classifying their leaves
using ResNet50.
1. Data collection: The first step in developing a deep learning model for
identifying medicinal plant species based on their leaf images is to
collect a dataset of images of leaves from various plant species. The
dataset should include annotations indicating the species of each plant.
8
The images can be obtained from a variety of sources, including
botanical gardens, herbaria, and online databases.
5. Feature extraction: After splitting the dataset, the ResNet 50 model can
be used to extract features from the images in the training and
validation sets.
7. Model evaluation: Once the ResNet 50 model has been trained and
fine-tuned, its performance needs to be evaluated using the validation
set. Metrics including accuracy, precision, recall, and F1 score can be
used to assess the model's performance.
9
can be developed to allow users to upload images of plant leaves and
obtain instant classification results.
10
1.4 Organization
Project planning:
Data collection:
Data preprocessing:
11
Model development:
Application development:
System validation:
12
Documentation and reporting:
13
Chapter-2
LITERATURE SURVEY
14
images for better performance. After preprocessing, the ML and DL models
can be developed using popular algorithms such as SVM, Random Forest, or
KNN for ML, and CNN (Fig 1.4 shows the CNN Architecture ) for DL (using
ResNet 50 here). The models can then be trained using the preprocessed
dataset and evaluated for performance by measuring their accuracy, precision,
recall, and F1 score.
The comparison of the performance of the ML and DL models can help
determine which model is more suitable for this problem. Once a suitable
model is selected, a web or mobile application can be developed to allow users
to take pictures of medicinal plants and classify them using the trained model.
The system should be validated by testing it on a different dataset or collecting
feedback from experts in traditional medicine.
Documentation and reporting are also important aspects of the project work,
including dataset collection, preprocessing, model development, and
application development. The findings and the performance of the developed
system should be reported for future reference and improvements. In
summary, an automatic plant recognition system can be a valuable tool for
identifying rare and important medicinal plants, and can potentially save lives
in critical situations.[1]
15
The separation of therapeutic plants from other non-edible plants is crucial in
the realms of botany and the food industry. However, conventional techniques
for identifying medicinal plants are difficult, time-consuming, and require
skilled specialists. An autonomous real-time vision-based system has been
presented to identify commonly used medicinal herbs with similar leaves in
order to solve this problem. This system makes use of a convolutional and
classifier block-based upgraded convolutional neural network (CNN) network.
Global Average Pooling (GAP), dense, dropout, and softmax layers are all
present in the classifier. This technique improves the model's speed and
accuracy while reducing the number of parameters compared to earlier studies.
With overall accuracy rates of 99.66%, 99.32%, and 99.45%, respectively, the
proposed CNN model (Fig 1.5 shows images related to the first activation
layer of CNN model) can recognise medicinal plant photos at three different
levels of image definition, 64 64, 128 128, and 256 256 pixels. As a result,
combining image processing with the suggested CNN algorithm is a
productive replacement for conventional approaches.
To verify the efficacy of the developed approach, additional work will be done
to enhance the model's performance in the classification of additional species
of medicinal plants. A smart smartphone application for the real-time
identification of medicinal plants will also be created using the model. This is
especially crucial in light of the rising acceptance and demand for both
artisanal and commercial uses and applications of medicinal plants. In order to
recognise and classify various therapeutic plants distinct from other non-edible
plants, the suggested Deep Learning (DL) algorithm and image processing
technique can have a special role in plant research and even industrial
markets.[2]
16
Fig 1.5 : Images related to the first activation layer used in [2].
Plants have played an essential role in human survival since the dawn of
civilization, providing nourishment, shelter, and medicine. Among the most
intriguing aspects of plant utilization is herbal medicine, a practice that has
been passed down through generations of indigenous peoples. For centuries,
these remedies have been identified by experienced clinicians who rely on
their senses and extensive knowledge of plant properties. However, recent
technological advances have provided an evidence-based approach to herb
identification, making it easier for individuals who are not familiar with these
practices.
Various techniques are available for herb identification, each with its own
advantages and limitations. One promising method is based on spectral
analysis, a non-invasive technique that can distinguish plant species based on
their unique spectral signatures. Hyperspectral imaging (HSI) and
near-infrared spectroscopy (NIRS) are advanced instruments used in spectral
analysis to capture the spectral signature of a plant. This approach is rapid and
non-destructive, but it requires expensive equipment and trained personnel,
which can be a barrier for many people.
Another promising technique is based on computer vision and machine
learning algorithms, which use artificial intelligence (AI) to analyze plant
images and identify them based on their unique characteristics. Deep learning
17
(DL) algorithms have been particularly successful in identifying herbs with
high accuracy rates. However, this approach requires a large amount of
high-quality training data, which can be challenging to obtain.
Traditional morphological and chemical analysis can also be used for herb
identification. Although these methods have been used for centuries, they can
be time-consuming and require expertise in sample preparation and data
interpretation. Nevertheless, they remain an important tool in the herbal
identification toolkit, especially when used in combination with modern
analytical techniques.
In conclusion, identifying herbs is a crucial aspect of herbal medicine and
natural product research. Modern analytical techniques, such as spectral
analysis and computer vision (Fig 1.6 shows the flowchart of proposed
system), have shown great potential in providing a rapid and reliable method
for herb identification. However, their success depends on the availability of
high-quality training data and specialized equipment. Therefore, a
combination of these modern techniques with traditional methods may provide
the most effective and comprehensive approach for herb identification. As
herbal medicine gains popularity in mainstream healthcare, the need for
accurate and efficient herb identification will continue to grow.[3]
18
Fig 1.6 : Flowchart of proposed system used in [3]
Artificial intelligence has emerged as a valuable tool for data analysis and
knowledge discovery, particularly in large data systems, by uncovering
complex and hidden patterns. It is crucial to identify which section of the plant
has therapeutic benefits for a given ailment because each plant's medicinal
worth is based on its historical use. Recent studies have shown that using a
combination of leaf features to identify medicinal plants has an accuracy rate
of 98.05 percent, demonstrating the viability of this strategy.
19
A statistical analysis of leaf characteristics has been conducted to identify the
key features that aid in plant identification, and form was found to be a crucial
factor. This promising approach has the potential to aid individuals in
identifying medicinal plants automatically, as well as in conservation and
utilization efforts. The development of an artificial intelligence system for
plant recognition is essential to achieving these objectives, as it can process
large amounts of data efficiently and accurately. Moreover, the proposed
system's (Fig 1.7 shows the steps of the proposed method) accuracy will
undoubtedly improve as more data is collected and analyzed. In addition, the
use of advanced image processing techniques and machine learning algorithms
can aid in identifying complex features and patterns in plant images, resulting
in a more reliable and efficient recognition system (Fig 1.8 shows the process
of medical plant recognition).
20
Fig 1.8 : Process of medical plant recognition used in [4]
21
Chapter-3
SYSTEM DEVELOPMENT
22
identifying medicinal plant species based on their leaf images. The
methodology can be used to facilitate the identification of medicinal plant
species, which can be used in the development of new medicines and
treatments. Additionally, the methodology can be used to automate the
identification process, which can save time and resources for researchers and
botanists. Furthermore, the methodology can be extended to other fields such
as agriculture, where plant species identification is crucial.
Limitations and future research: Although the results of the methodology were
promising, there were limitations such as the limited size of the dataset and the
potential for bias in the annotations. Future research could expand the dataset,
include other plant parts for classification, and explore the use of other deep
learning architectures for image classification. Additionally, the effect of
transfer learning on model performance could be further investigated.
Overall, the analysis of the methodology for medicinal plants detection using
ResNet 50 for leaf classification suggests that it is a promising approach for
identifying medicinal plant species based on their leaf images, which can have
significant implications for the development of new medicines and treatments.
The methodology can be further improved by addressing its limitations and
exploring the potential of transfer learning and other deep learning
architectures.
23
Here are the general steps you can follow to design the medicinal plants
detection system:
2. Data Preprocessing: This step involves preparing the data for use in the
machine learning algorithm by applying various preprocessing
techniques. Firstly, the images should be resized to a uniform size to
ensure consistency in the input size of the images. Secondly,
normalizing the pixel values can be helpful to make sure that each
pixel has a similar scale and distribution. Additionally, augmenting the
data by rotating or flipping the images can be beneficial to increase the
size of the dataset and add diversity to the data. Other techniques such
as cropping, blurring, and adjusting brightness and contrast can also be
applied to enhance the quality of the images and improve the
performance of the machine learning model (Fig 1.9 shows the
24
visualization of images after preprocessing). It is important to carefully
choose the appropriate preprocessing techniques based on the
characteristics of the dataset and the requirements of the machine
learning algorithm.
25
This can significantly reduce the time and resources required for
training a deep learning model from scratch.
5. Training the Model: Splitting the dataset into training, validation, and
testing sets is a crucial step in ensuring the effectiveness of the trained
model. The training set is used to train the ResNet 50 model on the
medicinal plant leaves images, while the validation set is used to
monitor the training process and prevent overfitting. Hyperparameters
are adjusted based on the performance on the validation set to optimize
the model's accuracy. Once the model is trained, it is evaluated on the
testing set to measure its performance and generalization ability. This
ensures that the model is not only accurate on the images it was trained
on, but also on unseen images. The testing set is typically a completely
independent set of images that were not used in the training or
validation process.
By splitting the dataset into different sets, we can assess the model's
performance in different stages of the training process, and ensure that
the model is robust and can accurately classify medicinal plant species
based on their leaf images.
26
6. Deployment: Once the model is trained and validated, it can be
deployed for use in various applications such as mobile apps or web
services. The user can input an image of a medicinal plant leaf through
an interface and the model will predict the species of the plant with
high accuracy. This can be a useful tool for researchers, botanists, and
other professionals who need to identify medicinal plants for various
purposes such as drug discovery or conservation efforts. Additionally,
the model can be used to automate the identification process, which
can save time and resources. However, it is important to note that the
model's performance may vary depending on the quality of the input
image and the level of similarity between different plant species.
27
Algorithm: ResNet 50
ResNet 50 is a deep neural network architecture that has been widely adopted
in computer vision applications, including image classification. The name
"ResNet" is derived from "Residual Network", which refers to the use of
residual connections or skip connections that improve training performance.
ResNet 50 consists of 50 layers, comprising convolutional layers, max pooling
layers, and fully connected layers. The introduction of residual connections,
which let data skip a layer and go straight to the next one, is the key
innovation in ResNet 50. The vanishing gradient problem, where gradients get
smaller as they move through many layers in deep neural networks, is lessened
by using this strategy.
The ResNet 50 architecture (Fig 2.1 shows the ResNet 50 Architecture) can be
divided into several stages, each comprising multiple convolutional layers
followed by a max pooling layer. The first stage has a single convolutional
layer, followed by three additional stages, each consisting of multiple
convolutional layers with increasing filter sizes. The final stage has fully
connected layers that perform the classification task. The output of each stage
goes through a residual connection that adds the input to the output and then
applies a nonlinearity. This helps the model learn useful features at each layer
and prevent the vanishing gradient problem.
The ResNet 50 architecture is trained using mini-batch gradient descent, a
variant of stochastic gradient descent. During training (Fig 2.2 shows the
base_model.summary before training), the network's weights are updated
using backpropagation, which calculates the gradient of the loss function with
respect to the network weights. The learning rate and other hyperparameters
are typically tuned using grid search or other methods to optimize model
performance.
Once trained , the ResNet 50 model (Fig 2.3 shows the model.summary while
creating our model) can be used for image classification tasks, including
medicinal plant classification based on leaf images. The model takes an input
28
image and produces a probability distribution over the different plant species
in the dataset. The highest probability class is selected as the predicted class
for the input image.
1. Input Pre-processing
2. Cfg[0] blocks
3. Cfg[1] blocks
4. Cfg[2] blocks
5. Cfg[3] blocks
6. Fully-connected layer
29
Fig 2.1 : ResNet 50 Architecture [5]
30
Fig 2.2 : base_model.summary
31
Fig 2.3 : model.summary
32
Model Development:
2. The primary step for creating such a model involves the collection of a
large dataset of medicinal plant leaves, whereby each leaf is labeled
with its corresponding plant species. The dataset must comprise a
diverse range of images with various backgrounds, lighting conditions,
and angles to ensure that the model can generalize well and perform
efficiently on unseen data.
33
architecture, which is very good at solving the vanishing gradients
issue.
6. After the model has been trained, new medicinal plant leaves can be
classified by feeding them into the model and then evaluating the
results. Additionally, the model can be improved via transfer learning
to improve its performance on fresh datasets containing various classes
of plants.
Steps for model development for medicinal plants detection using ML and DL
by classifying leaves using ResNet50:
34
1. Import libraries: Import the necessary libraries such as Tensorflow,
Keras, Pandas, NumPy, and Matplotlib.
2. Load data: Load the preprocessed and augmented dataset into the
memory using a function like ImageDataGenerator from Keras.
3. Split data: Create training, validation, and testing sets from the dataset.
80% for training, 10% for validation, and 10% for testing would be a
typical proportion.
5. Train the model: Use Keras' fit_generator method to train the model on
the training dataset for a specified number of epochs.
8. Save the model: Save the trained and fine-tuned model in a format like
.h5 for later use.
35
9. Predictions: Use the trained model to make predictions on new images
of medicinal plant leaves by loading the saved model and using the
predict method from Keras.
Computational Method:
36
The computational method for medicinal plants detection using machine
learning and deep learning techniques by classifying leaves using ResNet 50 is
a complex process that involves several important steps.
4. Model training: The ResNet 50 model is trained using the training set
in the model training method. During training, the model develops the
ability to identify the correct plant species by separating out pertinent
elements from the input photos. The problem of vanishing gradients is
lessened by the use of the ResNet 50 design, which also makes it
possible to train very deep neural networks.
37
5. Model evaluation: Examining the model's performance on the test set
comes after it has been trained. Calculating metrics like accuracy,
precision, recall, and F1 score is necessary to determine whether the
model is capable of correctly classifying medicinal plant leaves into
the appropriate species.
6. Model optimization: The model can then be further tuned via transfer
learning based on the evaluation findings by the project team. In order
to increase a model's accuracy and generalization performance,
especially when working with fresh datasets containing various classes
of plants, this entails employing a pre-trained model and retraining it
on a new dataset.
In the context of traditional medicine, the computational method can aid in the
identification of medicinal plants and their potential uses, thereby contributing
to the preservation and promotion of traditional knowledge. In the field of
drug discovery, the identification and classification of medicinal plant species
can assist in the discovery of new drugs with therapeutic properties. In
agriculture, the computational method can help in the monitoring and
management of medicinal plant populations, thereby contributing to the
conservation of biodiversity.
38
Chapter-4
PERFORMANCE ANALYSIS
39
Precision, recall, and F1 score can all be employed as measurements for
accuracy. Recall represents the percentage of accurate forecasts for the target
class among all positive predictions, whereas precision measures the
percentage of accurate predictions for the target class among all positive actual
results. The harmonic mean of recall and precision is the F1 score.
40
Fig 2.4 : Training Accuracy
41
Fig 2.5 : Classification Report
42
to a particular class and were classified correctly or incorrectly. We can
determine different performance indicators, including accuracy,
precision, recall, and F1 score, from the analysis of the confusion
matrix, which can aid in further system optimization.
43
Fig 2.6 : Graph(Loss) for visualization
44
Fig 2.7 : Showing Early stopping of model.
Limitations
● Model performs well at training data showing 93% accuracy but does
not perform well at testing data.
● Classification Report ( Fig 2.5 : Classification Report) shows that the
model fails to identify some classes of medicinal plants.
● Model seems to be overfitting right now.
45
Chapter-5
CONCLUSIONS
5.1 Conclusions
This approach has opened up new opportunities in the field of medicinal plant
detection, where accurately identifying plants is essential. By training a
ResNet 50 model on a large dataset of medicinal plant leaves, the model can
learn to accurately classify different plants based on their unique leaf
characteristics. This can be useful in the pharmaceutical industry for drug
discovery and in traditional medicine for identifying the correct plants for
treatment.
Although this strategy has great potential, there are still some obstacles to be
overcome. The necessity for high-quality plant leaf photos and a complete
library of different medicinal plant species is one of the major issues.
Additionally, further research is needed to optimize the ML and DL techniques
used in medicinal plant detection to increase their accuracy and efficiency.
46
artificially increase the size of the dataset by creating additional images
through various image processing techniques.
47
5.2 Future Scope
The use of machine learning (ML) and deep learning (DL) techniques for
medicinal plant detection by classifying their leaves using ResNet 50 has
immense potential for development in the future. Here are some potential
areas of growth:
48
bioprospecting. By exploring these new areas of application,
researchers can uncover new uses for medicinal plants and expand
their potential benefits.
Overall, the future scope of using ML and DL techniques for medicinal plant
detection by classifying leaves using ResNet 50 is vast and promising, with
potential applications across a range of industries and fields. By addressing the
challenges and concerns, researchers can harness the full potential of these
techniques and revolutionize the field of medicinal plant research.
49
REFERENCES
[1] B. Dudi and V. Rajesh, “Medicinal plant recognition based on CNN and
Machine Learning,” International Journal of Advanced Trends in Computer
Science and Engineering, vol. 8(4), pp. 999–1003, (2019). Available at:
https://doi.org/10.30534/ijatcse/2019/03842019.
[6] Dataset:-
Link: https://data.mendeley.com/datasets/nnytj2v3n5/1
DOI: 10.17632/nnytj2v3n5.1
50
APPENDICES
Code:
51
52
53
54
55
56
57