Joel Repport
Joel Repport
By
Reg.No : 230011020548
CERTIFICATE
Certified that this is a Bonafide Record of Project Report done by Mr. JOEL
RAJAN , Reg.No : 230011020548 for the partial fulfilment of the requirement
for the award of the degree of M.Sc. Artificial Intelligence of Mahatma Gandhi
University, Kottayam during the period 2022- 2024.
Place:
Date:
Examiner1 Examiner2
(Name & Signature) (Name & Signature)
DECLARATION
Date: Signature
ACKNOWLEGEMENT
Special thanks are due to Mrs. RESHBA LAL, our dedicated project coordinator,
for her expectational organizational skill, meticulous attention to detail, and tireless
efforts in ensuring the smooth coordination and execution of this project. I am
immensely grateful to Mrs. AMBILY P K, our esteemed project guide, for her
expertise, mentorship and invaluable guidance, which have been instrumental in
shaping the direction and outcome of this project.
Together, with the support and encouragement of these esteemed individual and the
blessings of the Almighty, I have successfully achieved my goals and embarked on
a journey of growth, learning and fulfillment.
JOEL RAJAN
ABSTRACT
Wildcats play a critical role in maintaining ecosystem balance, but their conservation faces
challenges due to habitat loss, climate change, and illegal poaching. Accurate identification
and monitoring of wildcat species are essential for developing effective conservation strategies.
This project focuses on the development of a robust Wildcats Image Classification system
using machine learning techniques to automate the identification of different wildcat species
based on photographic data.
The system employs advanced image processing and deep learning algorithms, leveraging a
dataset of annotated wildcat images to train and validate the model. Key techniques, such as
convolutional neural networks (CNNs) and data augmentation, were utilized to improve the
model's ability to recognize species across varying lighting conditions, poses, and backgrounds.
The resulting model achieved [state performance metrics, e.g., "an accuracy of 90% and F1-
score of 87%"], demonstrating high reliability in classifying wildcat species with similar visual
characteristics.
This project contributes to wildlife research by providing a scalable, automated solution for
species identification, which can reduce manual effort and enable large-scale monitoring. It
also highlights the importance of curated datasets and algorithmic optimization in improving
model performance. Future enhancements include expanding the dataset, integrating real-time
image analysis from camera traps and drones, and adapting the system for use in diverse
ecological regions.
The Wildcats Image Classification system offers significant potential to assist researchers and
conservationists in tracking wildcat populations, understanding their habitats, and formulating
strategies for their protection. This work underscores the broader application of AI in
biodiversity conservation and paves the way for more efficient and innovative approaches to
wildlife management.
TABLE OF CONTENT
1 INTRODUCTION 7
2 LITERATURE REVIEW 9
3 DATASET 12
4 CNN ARCHITECTURE 14
5 MOBILENETV2 22
6 MODEL ARCHITECTURE 26
7 APPLICATIONS OF WILDCATS 28
IMAGE CLASSIFICATION
8 CONCLUSION 30
9 REFERENCE 32
INTRODUCTION
Wildcats are a diverse and ecologically significant group of species belonging to the family
Felidae. These species, including lions, tigers, leopards, cheetahs, and others, play a vital role
in maintaining ecosystem balance by regulating prey populations and influencing the food
chain. However, many wildcat species are increasingly threatened by habitat loss, poaching,
and human-wildlife conflict. Effective conservation efforts require accurate monitoring and
identification of these species to better understand their distribution, population dynamics, and
behaviour.
Traditionally, the identification of wildcat species from images has been performed manually
by wildlife researchers and experts. This approach is time-consuming, labour-intensive, and
subject to human error, especially when dealing with large datasets or species that exhibit visual
similarities. With the advent of modern technology, automated image classification systems
can address these challenges by providing fast and accurate species identification.
In this project, we explore the use of Convolutional Neural Networks (CNNs), a state-of-the-
art deep learning architecture, to classify images of wildcats into ten distinct species. CNNs are
particularly suited for image classification tasks due to their ability to automatically extract and
learn spatial features from images, such as textures, patterns, and shapes. By leveraging CNNs,
we aim to develop a model capable of distinguishing between wildcat species based on their
unique physical characteristics.
1. Lion
2. Tiger
3. African Leopard
4. Cheetah
5. Clouded Leopard
6. Jaguar
7. Puma
8. Snow Leopard
9. Ocelot
10. Caracal
This project has significant implications for wildlife conservation and research. An accurate
wildcat image classification system can assist in monitoring species populations, studying their
behaviour, and implementing effective conservation strategies. Furthermore, the approach can
be extended to other species, contributing to broader biodiversity preservation efforts.
LITERATURE REVIEW
Wildcat classification is a challenging task due to the similarities in the physical appearance of
certain species and variations in environmental conditions within images. Advances in machine
learning and computer vision, particularly deep learning, have paved the way for automated
and accurate image classification. In this section, we review relevant studies and technologies
that have contributed to the field of image classification and their applications in wildlife
research.
Traditional image classification methods relied heavily on handcrafted features, such as color
histograms, textures, and shape descriptors. These methods, while effective in specific
contexts, struggled to generalize across diverse datasets. The advent of deep learning,
particularly Convolutional Neural Networks (CNNs), revolutionized the field by enabling
automatic feature extraction directly from raw image data.
LeCun et al. (1998) introduced the first CNN model, LeNet, which demonstrated the
effectiveness of CNNs for handwritten digit recognition. This foundation was further
developed with deeper architectures like AlexNet (Krizhevsky et al., 2012), which won the
ImageNet competition and highlighted the power of CNNs in large-scale image classification.
Other significant advancements include VGGNet (Simonyan & Zisserman, 2014), which
utilized deeper layers with small convolutional filters, and ResNet (He et al., 2016), which
introduced residual learning to address the problem of vanishing gradients in deep networks.
CNNs have been widely applied to wildlife monitoring and classification tasks. For instance,
Norouzzadeh et al. (2018) used deep learning to classify wildlife images captured by camera
traps, achieving high accuracy in identifying animal species. Similarly, Gomez Villa et al.
(2017) reviewed the use of deep learning for animal identification and highlighted the
effectiveness of CNNs in handling large-scale wildlife datasets.
angles, and poses. Data augmentation techniques, such as flipping, cropping, and brightness
adjustment, are commonly used to address these challenges.
The success of deep learning models heavily depends on the quality and diversity of the dataset
used for training. Publicly available datasets, such as ImageNet and iNaturalist, provide a
starting point for training generic image classifiers. However, for specialized tasks like wildcat
classification, curated datasets are essential. Researchers often compile their datasets from
camera trap images, online repositories, and field studies.
A notable challenge in wildcat datasets is the presence of visually similar species. For example,
leopards and jaguars share similar coat patterns, making their distinction difficult even for
human experts. Furthermore, environmental factors such as shadows, vegetation, and varying
lighting conditions add complexity to the classification task.
While CNNs have shown great promise, there are challenges that remain in their application to
wildlife conservation.
• Class Imbalance: Many wildlife datasets are imbalanced, with some species being
underrepresented. Techniques like oversampling, synthetic data generation, and
transfer learning from pre-trained models can help mitigate this issue.
• Overfitting: Small datasets can lead to overfitting, where the model performs well on
training data but poorly on unseen data. Regularization techniques, dropout layers, and
data augmentation are effective countermeasures.
• Real-World Deployment: Deploying CNN models in real-world conservation efforts
requires models to generalize well across unseen environments and conditions.
The integration of CNNs into wildlife conservation has opened up opportunities for automated
monitoring systems. These systems can analyze thousands of images in real-time, enabling
researchers to focus on high-priority tasks such as population estimation and habitat
preservation. By combining technological advances with ecological knowledge, these
approaches have the potential to transform the field of wildlife conservation.
Conclusion
The literature indicates that CNNs are a powerful tool for image classification and have
demonstrated significant success in wildlife applications. However, challenges such as dataset
limitations, class imbalance, and environmental variability remain areas of active research.
This project builds upon these foundations, employing CNNs to address the specific challenge
of wildcat classification across ten species. By leveraging state-of-the-art architectures and
techniques, we aim to contribute to the growing body of work in wildlife conservation and
computer vision.
DATASET
Images were gathered from Google searches and downloaded using app 'download all images'
. I highly recommend this app as it is very fast and returns a zip file with the images which you
can then unzip to a specific directory. I have developed a custom set of tools to create datasets.
The first tool used creates a dataset framework in a specified directory I call Datasets. It inputs
the name of the new dataset and creates a directory with that name and within that directory
creates 4 subdirectories train, test, valid and storage. The storage directory is where the
unzipped downloaded images are placed. Downloaded images can be a crazy mix of ungodly
file names and image formats. I wrote a python program called order_by_size. It operates on
the downloaded images, within the storage directory, It removes files with extensions that are
not jpg, png, or bmp and deletes files that are below a user specified image size. Then it renames
the files sequentially using "zeros" padding and converts them to jpg format, and orders the
files so that the first file is the largest image size, 2nd file is the next largest and so on. For the
images in your dataset you want to start with images that are large. Later these images will be
cropped to a region of interest and you want these cropped images to be large and have
sufficient pixel count so that features can be extracted by your classification model. Now that
the files are sequentially ordered and have jpg extensions I use another program called
duplicate delete. This program uses file hashing to detect duplicate images and deletes any
duplicates. This prevents having images in common between the train, test and validation
images when the files are partitioned. Now when you do a Google search you will get a lot of
what you want and also a lot of junk. I wrote another python program called review_images
that sequentially shows each of the images in the storage directory and you can elect to delete
or keep the image if it is the correct type of image you want. This then eliminates unwanted
images from the storage directory. Then comes the hard part. If you want to build a high quality
dataset you should crop your images so that the resulting image has a high ratio of pixels in the
region of interest to the total number of pixels. For that I use paint shop pro version 9. If you
examine the dataset images you will see that in most cases the image of the cat takes up at least
50% of the pixels in the image. After all that is done I use the order_by_size program again
with different parameters which converts all the images to a specified size. For this dataset I
used 224 X 224 X3 as the image size. Now we have a uniform ordered and properly pruned set
of images for a specific class like tigers for example. I wrote another python program called
make_class, it inputs the new class name (tiger for example) and creates a new class sub
directory in the train, test and valid directories. Then it partitions the images in the storage
directory into train images, test images and validation images and stores them in the class
directory of the train, test and valid directories. Finally I wrote another python program that
creates a dataset csv file. To make a high quality dataset takes a lot of work but the tools I have
generated helps to reduce the work load.
CNN ARCHITECTURE
A Convolutional Neural Network (CNN) is a specialized type of neural network designed
for tasks involving spatial data, such as image recognition, object detection, and segmentation.
The architecture of a CNN is composed of a sequence of layers that extract spatial and
hierarchical features from the input data. Below is a detailed breakdown of a typical CNN
architecture:
The input layer of a Convolutional Neural Network (CNN) is the starting point of the network.
It takes the input data (e.g., an image) and prepares it for subsequent layers. Here's an in-depth
look at its key aspects:
The input layer accepts data in a structured, multi-dimensional array format known as a tensor.
For an image, this tensor has three dimensions:
Where:
Example:
2. Normalization of Input
Before passing the input to the CNN, it's often normalized to improve model performance. This
helps the network learn faster and generalize better. Common normalization techniques:
• Scaling Pixel Values: Divide pixel values by 255 to bring them into the range [0, 1].
• Standardization: Subtract the mean and divide by the standard deviation to center the
data.
Xstandardized=X−μσX_{\text{standardized}} = \frac{X -
\mu}{\sigma}Xstandardized=σX−μ Where μ\muμ is the mean and σ\sigmaσ is the
standard deviation of the dataset.
The input must be reshaped to match the expected format of the CNN model:
4. Batch Input
CNNs process inputs in batches to take advantage of parallel computation. The batch size
determines how many samples are processed together.
The Convolutional Layer is the core building block of a Convolutional Neural Network
(CNN). It is responsible for learning spatial hierarchies of features in input data, such as edges,
textures, and patterns. Below is an in-depth explanation of the convolution layer, its operation,
and how it fits into CNNs.
The convolution layer applies filters (kernels) to the input data to extract features. It slides
these filters across the input and performs an operation called the convolution operation.
Where:
• Small matrices (e.g., 3×3, 5×5) used to detect patterns in the input.
• Each filter specializes in detecting specific features (e.g., edges, corners).
• A convolution layer has multiple filters, and each generates one feature map.
2.2 Stride
• Stride = 1: The filter moves one pixel at a time, preserving spatial resolution.
• Stride > 1: The filter skips pixels, reducing the spatial dimensions of the feature map.
2.3 Padding
• Adds extra border pixels (usually zeros) around the input to control the output size.
• Common types:
o Valid Padding: No padding, reduces spatial dimensions.
o Same Padding: Padding ensures the output has the same dimensions as the
input.
3. Output Dimensions
The spatial dimensions of the output feature map are calculated as:
For example:
• Input size: 32×3232 \times 3232×32, Kernel size: 3×33 \times 33×3, Stride = 1, Padding
= 1.
• Output size: Output Height=32−3+2×11+1=32\text{Output Height} = \frac{32 - 3 + 2
\times 1}{1} + 1 = 32Output Height=132−3+2×1+1=32
4. Multi-Channel Convolutions
For images with multiple channels (e.g., RGB images), each filter spans all channels. For
example:
The convolution operation is applied across all channels, and the results are summed to form a
single feature map.
• Number of Filters: Determines how many feature maps are generated. More filters
capture more features.
• Filter Size (Kernel Size): Common choices are 3×33 \times 33×3, 5×55 \times 55×5,
or 7×77 \times 77×7.
• Stride: Controls the downsampling factor.
• Padding: Determines whether spatial dimensions are preserved.
2. Types of Pooling
Input=[1 2 3 4]→Max=4
• Averages or takes the maximum over the entire spatial dimensions of the feature map.
• Produces a single value per feature map.
• Often used in the final stages of CNNs (e.g., Global Average Pooling in ResNet).
3. Padding:
o Rarely used in pooling layers but can ensure the input and output sizes match in
some architectures.
The Fully Connected (FC) Layer is a key component of a Convolutional Neural Network
(CNN) that comes at the final stages of the architecture. It connects every neuron in one layer
to every neuron in the next layer, enabling global learning and decision-making.
The fully connected layer transforms the high-level, spatially reduced features extracted by
convolutional and pooling layers into a final decision or output. It performs classification,
regression, or other tasks based on the extracted features.
2. Structure
• Input: A flattened 1D vector from the previous layer (usually feature maps).
• Weights: A weight matrix connects every input to every output neuron.
• Bias: A bias term is added to each neuron’s output.
• Activation: A non-linear activation function (e.g., ReLU, sigmoid, or softmax) is
applied.
Where:
1. Input Features:
o The output of the previous convolutional/pooling layers is flattened into a 1D
vector.
o For example, if the feature map size is 7×7×5127 \times 7 \times 5127×7×512,
it is flattened into a 1×250881 \times 250881×25088 vector.
2. Matrix Multiplication:
o Multiply the flattened input vector with the weight matrix.
3. Add Bias:
o Add a bias vector to the result of the matrix multiplication.
4. Apply Activation:
o Apply a non-linear activation function to the result.
4. Output Dimensions
The number of neurons in the FC layer determines the output dimensions. Examples:
• For classification with nnn classes, the FC layer has nnn neurons.
• For binary classification, the FC layer has 1 neuron with a sigmoid activation function.
MOBILENETV2
The task of classifying images of wildcats into ten distinct species requires an efficient and
robust model capable of extracting relevant features from images. MobileNetV2 was chosen
as the base architecture for this project due to its efficiency in terms of both computational
resources and performance. It is well-suited for mobile and embedded applications while
maintaining high accuracy on image classification tasks.
MobileNetV2 is a lightweight and efficient deep learning model designed for mobile and
resource-constrained environments. It is based on depthwise separable convolutions, which
reduce the number of parameters and computational complexity compared to traditional
convolutions while maintaining a strong feature extraction capability.
• Why MobileNetV2?
o Efficiency: MobileNetV2 strikes a balance between speed and accuracy,
making it suitable for tasks with limited computational resources, such as
mobile or embedded systems.
o Depthwise Separable Convolutions: This technique breaks down the
traditional convolution into two layers: a depthwise convolution and a pointwise
convolution. This reduces the number of operations required, making the model
more efficient without compromising accuracy.
o Linear Bottleneck: The model uses a linear bottleneck at the final layer of each
block, which allows for better feature representation while reducing
computation.
o Pre-trained Weights: MobileNetV2 was pre-trained on the ImageNet dataset,
allowing it to learn general features such as edges, textures, and basic shapes,
which can be transferred to the wildcat image classification task.
• Global Average Pooling (GAP): The feature maps extracted by MobileNetV2 are passed
through a global average pooling layer. This layer reduces the spatial dimensions (height and
width) of the feature map, producing a single vector of features per image.
• Fully Connected (Dense) Layer: A dense layer with ReLU activation is added on top
of the global average pooling. This layer learns a weighted combination of the features
extracted by MobileNetV2.
• Dropout Layer: To prevent overfitting, a dropout layer with a rate of 0.5 was included.
This layer randomly deactivates half of the neurons during training, forcing the model
to generalize better.
• Output Layer: The final layer is a softmax layer with 10 neurons (corresponding to
the 10 wildcat species). This layer converts the model’s predictions into class
probabilities, with the highest probability indicating the predicted species.
The architecture includes multiple bottleneck blocks with varying configurations of expansion
factor ttt, output channels ccc, number of repeats nnn, and stride sss. These configurations are
summarized in the table below:
112×112112 \times
1 1 16 1 1
112112×112
112×112112 \times
2 6 24 2 2
112112×112
1. Convolutional Layer:
o 1×1 convolution with 1280 output channels.
o Activation: ReLU6.
2. Global Average Pooling:
o Converts the spatial feature maps into a 1D vector (size 1280).
3. Fully Connected Layer:
o Maps the pooled features to the number of classes (e.g., 1000 for ImageNet).
o Activation: Softmax.
4. Training Configuration
• Optimizer: Adam optimizer was used due to its adaptive learning rate and efficient
convergence properties.
• Loss Function: Sparse categorical cross-entropy loss was selected because the
classification task involves multiple classes and the labels are integer-encoded.
• Batch Size: A batch size of 32 was chosen to strike a balance between computational
efficiency and memory usage.
• Epochs: The model was trained for 50 epochs, with early stopping implemented to
prevent overfitting if validation loss stagnated.
• Learning Rate: An initial learning rate of 0.001 was used, with decay after every 10
epochs to reduce the learning rate as training progresses.
5. Model Fine-tuning
After the initial training with frozen weights, the model was fine-tuned to improve its
performance further. Fine-tuning involved unfreezing the last few layers of MobileNetV2 and
training them with a lower learning rate. This allowed the model to adjust its feature extraction
capabilities to better suit the wildcat classification task.
MODEL ARCHITECTURE
• input_shape=(224, 224, 3) : specifies the input images are 224x224 pixels with 3
channels (RGB).
• Conv2D(32, (3, 3), activation='relu’) : uses 32 filters of size 3x3 with ReLU
activation.
• Conv2D(64, (3, 3), activation='relu’) : Increase the filters to 64, detecting more
complex features.
• Third and fourth layer Both use 128 filters, refining and extracting high-level features.
• Each convolutional layer is followed by MaxPooling2D((2, 2)) reducing the spatial
dimensions by half. This prevents overfitting, reduces computational cost, and captures
dominant features.
• Flatten() : Convert the 3D feature maps into a 1D vector to prepare for the dense layers.
• Dense(512, activation='relu’) : A Dense Layer with 512 neuron to learn High level
representations.
• Dense(10, activation='softmax’) : The Final Layer Predicts the Probabilities for 10
output classes.
Wildcats image classification using convolutional neural networks (CNNs) has various
practical applications in wildlife conservation, research, and management. Here are some key
areas where such a project could be applied:
1. Wildlife Conservation
2. Ecological Research
• Early Warning Systems: Classify and detect wildcat species near human settlements
to trigger alerts and mitigate conflicts.
• Livestock Protection: Identify predator species to take preventive measures for
protecting livestock.
• Public Outreach: Provide educational tools for schools, zoos, and conservation centers
using real-world classification of wildcat species to raise awareness.
• Citizen Science Projects: Enable citizen scientists to contribute to wildcat monitoring
through applications that use CNN models for identification.
5. Technological Advancements
7. Zoological Applications
CONCLUSION
The Wildcats Image Classification project represents a significant step toward applying
artificial intelligence in wildlife research and conservation. Through the use of advanced image
processing techniques and state-of-the-art machine learning models, we developed a system
capable of identifying and classifying wildcat species with [state your performance metrics,
e.g., "an accuracy of 90% and precision of 88%"]. The model's ability to differentiate between
visually similar species highlights its potential as a reliable tool for automating tasks that
traditionally require extensive manual effort.
However, there are several avenues for further improvement. One of the primary challenges
faced during the project was the limited availability of high-quality, labeled datasets for certain
wildcat species. Expanding the dataset to include a broader range of images, especially from
different geographic locations and environmental conditions, would significantly enhance
model generalization. Additionally, incorporating advanced techniques such as transfer
learning or ensemble modeling could further improve classification accuracy and robustness.
REFERENCE
Keunwoo Choi, George Fazekas, and Mark Sandler, “Automatic tagging using deep
convolutional neural net works,” arXiv preprint arXiv:1606.00298, 2016.
Paulo Chiliguano and Gyorgy Fazekas, “Hybrid music recommender using content-
based and social informa tion,” in 2016 IEEE International Conference on Acous tics,
Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 2618–2622.
Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen, “Deep content-
based music recommenda tion,” in Advances in Neural Information Processing
Systems, 2013, pp. 2643–2651.
Keunwoo Choi, George Fazekas, and Mark Sandler, “Explaining deep convolutional
neural networks on mu sic classification,” arXiv preprint arXiv:1607.02444, 2016.
Duyu Tang, Bing Qin, and Ting Liu, “Document mod eling with gated recurrent neural
network for sentiment classification,” in Proceedings of the 2015 Conference on
Empirical Methods in Natural Language Processing, 2015, pp. 1422–1432.
[10] Zhen Zuo, Bing Shuai, Gang Wang, Xiao Liu, Xingxing Wang, Bing Wang, and
Yushi Chen, “Convolutional re current neural networks: Learning spatial dependencies
for image representation,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops, 2015, pp. 18–26
Keunwoo Choi, George Fazekas, Mark Sandler, and Kyunghyun Cho, “Convolutional
recurrent neural networks for music classification,” arXiv preprint arXiv:1609.04243,
2016.
Kyunghyun Cho, Bart Van Merri¨ enboer, Dzmitry Bah danau, and Yoshua Bengio,
“On the properties of neu ral machine translation: Encoder-decoder approaches,” arXiv
preprint arXiv:1409.1259, 2014.
Sinno Jialin Pan and Qiang Yang, “A survey on transfer learning,” IEEE Transactions
on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin ton, “Imagenet classification with
deep convolutional neural networks,” in Advances in neural information processing
systems, 2012, pp. 1097–1105. [16]
Ali Sharif Razavian, Hossein Azizpour, Josephine Sul livan, and Stefan Carlsson, “Cnn
features off-the-shelf: an astounding baseline for recognition,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp.
806–813.
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson, “How transferable are
features in deep neural networks?,” in Advances in neural information process ing
systems, 2014, pp. 3320–3328