Formatted Review
Formatted Review
Aayushi Sonkusare
Dept.of Computer Science and Sahil Dhanvij
Engineering Dept.of Computer Science and
Yeshwantrao Chavan College of Engineering
Engineering,Nagpur Yeshwantrao Chavan College of
Engineering,Nagpur
Abstract— There exist many millions of different species of birds and animals together on this planet. They live in different
habitats, exhibit morphological differences, show variance in their physical features which are indistinguishable by the human
eye. Hence, due to a plethora of species the identification of different bird and animal species can be done using fine grained
image recognition techniques. Fine graining is widely used to precisely identify visually similar species showing
morphological differences using feature extraction and deep learning techniques. With a rapid growth in Machine Learning,
Deep Learning and Computer Vision it has become possible to identify the accurate animal. Therefore, a comprehensive
study encompassing the ecological statistics of biodiversity as well as endangered species can be done.
I. INTRODUCTION
The earth is home to about 10,000 bird species and 8.7 million animal species. Out of the 8.7 million species, about 1.2
million have been formally described, and the rest remain undocumented, primarily in regions like rainforests and deep
oceans. Among vertebrates, there are around 5,500 mammal species, 32,000 fish species, and many reptile and amphibian
species. These numbers reflect the planet's incredible biodiversity and the increasing complexity to overcome the challenges
faced during the identification of these species. Evolution is a gradual process and keeps continuing at a slow pace in birds
and animals. Some species grow in number whereas some get endangered as time passes. Every specie has its own habitat,
unique metabolic and catabolic bodily requirements, favorable climatic conditions, morphological differences, and an array of
physical features. There are approximately 10,000 species of birds worldwide, categorized into 29 orders such as
Passeriformes (songbirds), Falconiformes (birds of prey), and Psittaciformes (parrots). These species exhibit remarkable
adaptations.
Migratory Species cover tens of thousands of kilometers annually. Certain species, such as the Kiwi in New Zealand or the
Hoatzin in South America, are restricted to specific regions due to unique ecological conditions. Such species are called as
Endemic Species. Birds showcase a wide array of Habitat Diversity. They thrive in varied habitats, including tropical
rainforests (e.g., toucans), wetlands (e.g., flamingos), deserts (e.g., sandgrouse) and urban environments (e.g., pigeons). They
incredibly vary in size, from the tiny Bee Hummingbird (weighing 2 grams) to the massive Ostrich, which can grow over 2.7
meters tall.
Mammals display unique characteristics such as live births and parental care that include examples such as large land
mammals like elephants and aquatic mammals like whales. Fishes are found in freshwater and marine environments, Reptiles
include snakes, lizards, and crocodiles. They exhibit fascinating survival adaptations like camouflage and venom production.
Frogs, salamanders, and caecilians are amphibians. They often serve as indicators of environmental health due to their
sensitivity to ecological changes. Certain areas like the Amazon Rainforest is a home to thousands of bird species and animals
like jaguars and pink river dolphins. African Savannah is known for its iconic "Big Five" animals which are lion, leopard,
elephant, rhino.
Even though, these are found in every corner of the globe, their conservation and protection is a challenging task as
many species face threats due to habitat destruction, climate change, poaching, and pollution. Conservation initiatives like
protected areas, biodiversity monitoring, and community-based efforts aim to conserve and preserve this vast variety for
future generations. Apart from this, every organism is an important factor in the food chain. They all have their unique role
that constitute towards a balanced, developed and enriched ecosystem. In summary, the Earth’s fauna reflects an
extraordinary tapestry of life shaped by millions of years of evolution, adapting to every conceivable habitat and
environmental condition. The human eye is prone to inaccuracy and errors while distinguishing similar species showing
subtle morphological differences, inter class and intra class variations.
II. OBJECTIVE
i)To understand, analyze and examine the latest techniques and recent studies focused on Image Recognition using Fine
Graining.
ii) To have a comparative study of different models used in the researches and their impact on the accuracy of image
recognition.
iii) To identify species with subtle morphological differences and contribute in ecological studies.
The paper titled "Advanced Techniques for Fine-Grained Image Classification Using Zero-Shot Learning" published in
2024 discusses innovative methodologies aimed at enhancing the classification of fine-grained image categories without relying
heavily on extensive labelled datasets. It focuses on identifying animal and bird species which are less in number with a small
dataset size. This approach is particularly beneficial in situations where gathering large amounts of labelled examples is
impractical, such as with rare species or novel objects. The dataset uses seen and unseen classes. Seen Classes are used to train
the model with a sufficient number of labelled samples. They help the model learn the relationships between visual features and
semantic attributes. Unseen Classes are the categories that lack labelled samples during training. The paper employs a new
model known as ZIC-LDM (Zero-shot Image Classification with Learnable Deep Metric). This model comprises two integral
components. Zero-shot learning (ZSL) is a fundamental concept in this paper, allowing models to recognize and classify unseen
categories by leveraging knowledge from previously learned categories. Knowledge transfer is done by the models as they
utilize semantic information, such as attributes or textual descriptions, to bridge the gap between known and unknown classes.
For instance, if a model has learned to identify categories like "sparrow" and "eagle," it might be able to classify a new
category, like "hawk," based on shared descriptors or attributes.
Advanced feature extraction methods are deployed to create rich representations of images. This includes exploring various
neural network architectures that are capable of capturing fine details essential for distinguishing between similar categories.
The methodology includes a comprehensive evaluation framework, assessing model performance across various benchmarks.
This involves implementing sophisticated metrics to ensure that the zero-shot models maintain high accuracy levels when
classifying unseen categories or fine-grained distinctions. By focusing on knowledge transfer, optimal data usage, and robust
evaluation methodologies, the authors enhance the capabilities of models to generalize beyond their training sets, ultimately
contributing to more effective image recognition solutions in diverse applications.
The paper titled "A Performance Comparison and Enhancement of Animal Species Detection in Images with Various R-
CNN Models" published in 2021 focuses on animal species detection using various Region-based Convolutional Neural
Network (R-CNN) models to improve accuracy and speed. The reason behind this paper is to address the challenges in
detecting animal species for real-life applications like mitigating wildlife–human and wildlife–vehicle conflicts. The authors
use four R-CNN models combined with Deformable Convolutional Neural Networks (D-CNN). Three wildlife datasets were
used in this study:
The paper titled "A Fine-Grained Bird Classification Method Based on Attention and Decoupled Knowledge Distillation"
presents a fine-grained bird classification approach combining attention-guided data augmentation and decoupled knowledge
distillation (DKD). The goal is to address challenges such as high intraclass variance, low inter-class variance, and limited
training data in bird classification. They have achieved high accuracy while reducing computational requirements.
The model is trained on the CUB-200-2011 dataset, which includes 200 bird species , 11,788 images with 5994 for training and
5794 for testing. The presence of a balanced class representation with approximately 30 images per category enriches the
dataset diversity. The proposed model effectively balances high classification accuracy and low computational cost, making it
suitable for small devices. Methods like attention-guided augmentation and DKD enhance its ability to recognize fine
differences among bird species.
Methodology
1. Attention-Guided Data Augmentation: Extracting key part images and object images using attention maps. Enhances fine
feature extraction by reducing background noise.
2. Fine-Grained Classification Model: Use DenseNet121 for feature extraction during training and ShuffleNetV2 is used as the
compressed student model.
3. Decoupled Knowledge Distillation (DKD): Separates the transfer of target and non-target class knowledge. Reduces model
size by 67% , maintaining 87.6% accuracy.
Improved accuracy: 87.6% for the student model and 89.5% for the teacher model.
Lightweight design: The model is deployable on mobile devices.
The research paper titled “ Fine-Grained Image Recognition of Birds Based on Improved YOLOv5”. Identifying bird
species accurately plays a vital role in biodiversity studies, but it is a challenging task due to high intra-species variation and
subtle inter-species differences. The study, "Research on Fine-Grained Image Recognition of Birds Using an Enhanced
YOLOv5 Model," tackles these complexities by utilizing a part-based methodology that separates the process into part
detection and classification. The researchers leveraged the CUB200-2011 dataset, comprising 11,788 images across 200 bird
species, to train their model. By integrating Res2Net-CBAM modules into the backbone and CBAM attention mechanisms in
the neck of the YOLOv5 architecture, they significantly improved the model's ability to focus on key regions of bird images,
such as heads, wings, and tails.
The approach involved modifying the YOLOv5 framework to better manage the intricate details of bird recognition. The
Res2Net-CBAM modules enhanced multi-scale feature extraction in the backbone, while the CBAM mechanisms added layers
of attention for identifying critical features. Comprehensive annotations were created to define key bird parts, and various data
augmentation strategies were applied to enhance the model's ability to generalize. Experimental outcomes revealed that the
upgraded YOLOv5 achieved an accuracy rate of 86.6%, surpassing the original version by 1.2% and outperforming other state-
of-the-art models. Ablation experiments further validated the contributions of the Res2Net-CBAM modules, demonstrating
their impact on improving detection and classification performance. This study underscores the importance of integrating
attention-based modules to enhance feature extraction, enabling the model to better identify fine-grained differences. The
adoption of a part-based recognition approach effectively addressed challenges related to intra-class variability and inter-class
similarity. By refining YOLOv5 for fine-grained image classification, this research offers a valuable framework for similar
tasks in computer vision, extending beyond bird recognition to other specialized domains.
The paper titled “Fine-Grained Image Analysis with Deep Learning: A Survey” focuses on recognizing subtle
differences between visually similar objects, such as species or models. While deep learning has greatly improved FGIA
performance, challenges like viewpoint variation, background clutter, and the identification of discriminative features remain.
This survey explores recent advances, particularly in self-supervised learning, few-shot learning, and hashing techniques, which
address the limitations of large-scale recognition tasks. FGIA datasets typically include labelled images of specific domains like
wildlife or retail products. Deep learning methods, especially convolutional neural networks (CNNs), are used to extract
features from these images. Few-shot learning and transfer learning techniques are employed to enhance performance with
fewer labelled examples. This paper discusses the progress in FGIA, including the use of self-supervised and few-shot learning,
as well as hashing for image retrieval. Despite advancements, challenges like robustness and interpretability remain, requiring
further research. The future of FGIA lies in models that are not only accurate but also adaptable and interpretable for real-world
applications. What we found out useful is the focus on few-shot and self-supervised learning valuable, as these approaches
reduce reliance on large labelled datasets. Additionally, hashing techniques for large-scale image retrieval and the emphasis on
model interpretability are crucial for improving FGIA systems in practical applications.
The next paper we reviewed titled Fine-Grained Image Classification for Precise Bird and Animal Species Identification
Using Deep Learning with a Channel Attention Mechanism introduces a channel attention mechanism. The increasing
demand for accurate wildlife monitoring and conservation has highlighted the importance of precise species identification. This
paper presents an innovative approach to fine-grained image classification, focusing on bird and animal species identification.
By integrating deep learning techniques with a channel attention mechanism, the research aimed to improve classification
accuracy, particularly in cases where species within the same family exhibit subtle visual differences. This approach leveraged
state-of-the-art deep learning architectures to enhance feature representation, thereby addressing the challenges posed by fine-
grained image classification. For this study, they utilized a comprehensive dataset comprising high-resolution images of various
bird and animal species. The dataset is carefully curated to include a wide range of species with varying physical characteristics
and poses, ensuring diversity. Each image is labelled with its corresponding species, making it suitable for supervised learning
tasks. The dataset is divided into training, validation, and test sets to evaluate model performance accurately. It incorporates a
deep learning model with a channel attention mechanism to effectively capture and emphasize the most important features in
fine-grained images. This mechanism enables the model to focus on relevant details, improving its ability to distinguish
between similar species. The model is trained on the dataset using advanced techniques such as transfer learning and data
augmentation to optimize performance. The research review highlights the effectiveness of combining deep learning and
channel attention mechanisms for fine-grained species identification. The methodology significantly improves classification
accuracy compared to traditional techniques, especially for species with minimal visual distinctions. We find this approach
useful for applications in wildlife conservation, biodiversity monitoring, and ecological research, where precise species
identification is critical. The integration of these methods enhances model performance, offering potential for more accurate
and reliable species identification, which can be particularly useful in ecological studies and conservation efforts.
The next paper in our review journey deals with bird identification using Fine Grained Image Recognition. The paper titled
“Large-scale Fine-grained Visual Categorization of Birds” by Thomas Berg and colleagues introduces an innovative tool
called Birdsnap, designed to identify 500 bird species commonly found in North America. The study addresses the complex
challenge of fine-grained categorization, where small visual differences make distinguishing between subcategories, like bird
species, difficult. To achieve this, the authors developed a novel approach using one-vs-most classifiers, which improve upon
traditional methods by training the model in a way that excludes visually similar species from the negative set. This strategy
helps the model focus on significant distinguishing features, resulting in more accurate and intuitive identification. The model
used in BsrdSnap relies on Part-based One-vs-One Features (POOFs) combined with Support Vector Machines (SVMs). This
framework extracts detailed features from specific parts of the bird's anatomy to enhance classification. By employing one-vs-
most classifiers, the model showed a notable boost in accuracy. Specifically, the model achieved a rank-1 accuracy of 66.6%
and a rank-5 accuracy of 82.4% when incorporating the spatio-temporal prior, compared to baseline models that performed
lower without these enhancements. The Birdsnap dataset itself is a significant resource, consisting of 49,829 images of 500 bird
species. Each image is annotated with bounding boxes and 17 key part locations, with additional labels for gender and maturity.
These images were sourced from Flickr and verified by Amazon Mechanical Turk to ensure correct labelling. Unlike the CUB-
200 dataset, which covers bird species globally, Birdsnap focuses specifically on North American birds, making it a practical
tool for regional bird watchers and researchers. The dataset's coverage also reflects natural variations within species, including
sexual dimorphism, where males and females have different visual features. One of the standout aspects of Birdsnap is its use
of spatio-temporal priors derived from 75 million bird sightings in the eBird database. This allows the model to adjust its
classification by considering when and where a bird is likely to be found, improving accuracy through real-world context. The
visual component of Birdsnap also automatically generates images highlighting key differences between similar species,
helping users learn distinguishing characteristics for future identification.
Through this research, we find that integrating innovative machine learning techniques, comprehensive datasets, and real-
world data can significantly enhance fine-grained categorization tasks. Birdsnap demonstrates that models using tailored
classification methods and spatial-temporal insights can achieve high accuracy and usability, providing an effective and
educational tool for bird identification. This paper underlines the potential of combining domain-specific knowledge with
machine learning to solve complex recognition problems in a practical, user-friendly way.
The research paper titled “Fine-Grained Butterfly Recognition via Peer Learning Network with Distribution-Aware
Penalty Mechanism” presents an approach that significantly enhances the automatic identification of butterfly species, which
is crucial for environmental monitoring and agricultural management. Identifying butterflies is challenging due to the subtle
differences among approximately 20,000 species, making manual identification time-consuming and dependent on expert
knowledge. The methodology used in this paper involves the development of a Peer Learning Network (PLN) combined with a
Distribution-Aware Penalty Mechanism (DPM) to improve computer-based species recognition, even with imbalanced data.
The PLN-DPM model employs two ResNeSt-50 networks that collaborate to learn effectively. These networks share
knowledge and adjust their parameters to better recognize less common species, which helps reduce errors that typically arise
when there are many images for some species and few for others.
How the Peer Learning Network with Distribution-Aware Penalty Mechanism works: The PLN consists of two identical
ResNeSt-50 networks acting as peers that process the same input data but use different parameter initialization strategies. Each
network generates predictions independently, and these results are compared to identify samples that both networks classify
similarly or differently. The Distribution-Aware Penalty Mechanism is then applied, focusing on samples where both networks
make the same prediction to mitigate biases. It uses a knowledge exchange strategy to select the most informative samples and
applies an adaptive penalty to prevent learning from noisy or skewed data. This mechanism helps in reducing errors from
misclassified data and improves the overall learning process by ensuring that the networks pay attention to both common and
rare classes effectively.
The dataset used in this study is called Butterfly-914, comprising about 72,152 images of 914 different species collected
from natural and controlled sources. This dataset is challenging due to its long-tailed distribution, where a few species have
many images while most have limited samples. This imbalance makes it difficult for models to learn uniformly.
The role of the Distribution-Aware Penalty Mechanism is to help the model spot and correct mistakes during training,
making it more capable of handling rare and complex cases. The results showed that the PLN-DPM model achieved a Top-1
accuracy of 86.2%, outperforming other leading models. This success is due to the model’s ability to learn from diverse data
and maintain balance, allowing it to identify both common and rare species accurately.
A significant practical application was demonstrated through the creation of a smartphone app. This app allows users to
take photos or upload images for butterfly identification, making the technology accessible for real-world uses like
conservation and sustainable agriculture. The successful deployment of the app shows that the PLN-DPM approach is effective
not only in experimental settings but also in practical scenarios.
In conclusion, this research proves that the Peer Learning Network with a Distribution-Aware Penalty Mechanism offers a
powerful solution for fine-grained species recognition. It effectively overcomes challenges related to subtle differences and
imbalanced datasets. The model’s strong performance and adaptability make it promising for wider applications in
environmental monitoring, biodiversity studies, and ecological research.
The paper Fine-grained seed recognition explores methods to classify seeds using advanced image recognition models. It uses
a dataset created from mobile phone and studio-based images, addressing challenges where some images lack key seed
features, offering useful insights for agricultural research.
The dataset includes seed images captured using mobile devices and studio setups. While diverse and realistic, some images
lack complete seed features, making classification more challenging. Deep learning models like SENet, ResNet50, and
DenseNet were used, achieving accuracies of 95.1%, 93.2%, and 95.0%, respectively. These models perform well even with
partial data. The paper demonstrates the effectiveness of deep learning for seed classification, achieving high accuracy despite
dataset limitations. This study is useful for agricultural research and highlights how advanced models and diverse datasets can
improve classification systems.
Enhanced model architectures: Development of efficient and accurate neural networks tailored for FGIR, integrating
transformers, attention mechanisms, and meta-learning to handle subtle differences between species.
Cross-domain applications: FGIR techniques can expand to healthcare (e.g., identifying variations in skin lesions),
agriculture (e.g., distinguishing crop species or pests), and manufacturing (e.g., product quality control).
Real-time FGIR systems: Advances in edge computing and mobile technologies can enable real-time species identification
in remote areas, assisting researchers, conservationists, and citizen scientists.
Integration with multimodal data: Leveraging data such as bird calls, habitat conditions, and migration timing alongside
visual cues for a more comprehensive identification approach.
Large-scale dataset creation: Building diverse datasets, especially for underrepresented species, to improve FGIR model
generalization.
Zero-shot and few-shot learning: Exploring learning paradigms for identifying new species with minimal labeled data,
essential for rare and endangered species.
Ethical and responsible AI integration: Ensuring FGIR technologies address biases and are deployed responsibly to avoid
ecological mismanagement due to misidentification.
V. CONCLUSION
In conclusion, fine-grained image recognition is a powerful tool for accurately identifying subtle differences among species of
birds and animals. By leveraging advanced models, diverse datasets, and innovative techniques, it supports biodiversity
conservation, ecological research, and public awareness. Its future potential lies in real-time applications, integration with
multimodal data, and interdisciplinary collaborations, making it invaluable for addressing environmental and scientific
challenges.
REFERENCES
[1] G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc.
London, vol. A247, pp. 529–551, April 1955. (references)
[2] J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.
[3] I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York:
Academic, 1963, pp. 271–350.
[4] K. Elissa, “Title of paper if known,” unpublished.
[5] R. Nicole, “Title of paper with only first word capitalized,” J. Name Stand. Abbrev., in press.
[6] Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron spectroscopy studies on magneto-optical media and plastic substrate interface,” IEEE Transl.
J. Magn. Japan, vol. 2, pp. 740–741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].
[7] M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.