10 21541-Apjess 1542885-4187651
10 21541-Apjess 1542885-4187651
Academic Platform Journal of Engineering and Smart Systems (APJESS) 13(1), 17–21, 2025
https://dergipark.org.tr/en/pub/apjess
Received: 03-09-2024, Accepted: 28-12-2024
https://doi.org/10.21541/apjess.1542885
1
Department of Management Information Systems, Dokuz Eylul University, Türkiye, omerfarukereken@gmail.com
*2
Corresponding Author, Department of Management Information Systems & BİMER, Dokuz Eylul University, Türkiye, cigdem.tarhan@deu.edu.tr
Abstract
Object detection and classification on digital images is an area of great importance in the digitalizing world. After deep learning
methods started being implemented for object detection, classification and segmentation a rapid development has been observed
in the field. Mask R-CNN is one of the most successful methods in the field and can be used for detection and segmentation
purposes for many different objects. Our study focuses on the use of Mask R-CNN for weapon detection, specifically handguns.
Today, there are many cameras in public areas and detecting weapons through these cameras before a forensic incident can
provide great advantages. Our model achieved a mean average precision (mAP) of 0.78 in the detection of handguns on test data.
Our findings demonstrate the potential of deep learning in security by detecting threats in images and live videos.
Keywords: Mask R-CNN; Deep Learning; Handgun Detection; Object Detection; Instance Segmentation
1. INTRODUCTION
Modeling Objects with Artificial Intelligence Based Image Processing Techniques: Object Detection with Mask R-CNN
Aware of the weaknesses of R-CNN, R. Girshick proposed study aimed to build a model which works like a human
Fast R-CNN, which solved the drawbacks of R-CNN and glance, in other words to look at an image and directly detect
enhanced its speed and accuracy. Fast R-CNN is a single- objects. Other object detection systems use multiple
stage method where an image and a set of object proposals components and complex pipelines which make the process
are taken as input. First, the whole image is passed into a slow and hard to optimize. In contrast, YOLO is rather
CNN and a feature map is achieved. After that, every object simple, using a single convolutional network which predicts
proposal is processed with a Region of Interest pooling layer bounding boxes and class probabilities, and then filters the
and a fixed-length feature vector is extracted from the feature detections based on the model’s confidence [9]. Over time,
map from the first stage. At the end, these feature vectors are YOLO received a lot of interest and made big progress in
sent to a SoftMax probability classifier and a bounding box accuracy and capabilities, including instance segmentation.
regressor [6]. Many different variants have been published and the model
continues to be improved by researchers [10]. YOLOv8
model was shown to outperform Mask R-CNN, with higher
precision and recall in less time [11].
2. LITERATURE REVIEW
Modeling Objects with Artificial Intelligence Based Image Processing Techniques: Object Detection with Mask R-CNN
In a recent study called “Weapon detection system for (500 training, 100 validation, 100 test), annotated in COCO-
surveillance and security” Yolo V5 is used for weapon style format.
detection and Mask R-CNN is used for instance
segmentation. Before proceeding with the model training,
various data augmentation and prepossessing methods are
being applied. The study achieves 90.66% detection
accuracy (DC) and 88.74% mean intersection over union
(mIoU) [16].
Figure 3. Dataset Image Examples
All studies expressed above are focused on detecting
weapons through normal camera images but there are also The data set used for this study contains images from various
studies which focus on finding concealed weapons which are conditions, however, for a real-world project the training set
also very critical to detect forensic incidents. One of these would need to be expanded to contain images from all types
studies tries to detect pistols from thermal images using deep of possible conditions with objects which might resemble a
learning. They have evaluated several deep learning gun but are not. Since this study was for educational purpose
algorithms for classification and segmentation. While the with limited resources we focused on training a prototype
best result for detecting the pistols was achieved using a model.
VGG 19-based convolutional neural network with an F1
score of 84%, for the second module which consisted of In the original paper of Mask R-CNN it’s stated that the code
classification and bounding box detection, Yolo-V3 is made available on GitHub [5]. This code is written in
achieved the highest mean average precision of 95% [17]. python, and it’s powered by the deep learning framework
Caffe2 which is now deprecated and transferred to PyTorch
Another area where deep learning is used for weapon repository. In this study we are using Mask R-CNN’s
detection is in X-ray images. One of the studies we examined deployment through Python 3, TensorFlow and Keras which
introduced an anchor-free convolutional neural network can be found on a different GitHub repository [19]. The
(CNN) approach to detect weapons in X-ray baggage model is based on Feature Pyramid Network (FPN) and a
images. By eliminating the need for preset anchor box sizes ResNet101 backbone.
and thus reducing computational complexity, the method
demonstrates robust performance in detecting knives and Data is crucial for training successful deep learning models,
handguns. By comparing different mainstream anchor-free but sometimes obtaining sufficient amount of data can be
and anchor-based methods the study has revealed that challenging. To solve this issue, scientists have built a
anchor-free methods YOLOx, Objects as Points and solution known as transfer learning. In transfer learning, you
ExtremeNet have great performance in weapon detection on can access the learned weights from previous deep learning
X-ray images [18]. studies and enable your model to start training on your data
after gaining knowledge about other classes. In our study we
All in all, there is continuous development in the use of deep used the weights of Microsoft Common Objects in Context
learning techniques for security and surveillance. As seen in (Coco) dataset trained model for Mask R-CNN. Microsoft
studies [12], [13], [14], [15], [16], [17] and [18], the aim is Coco is a data set which contains images from 91 different
to build a system that will enhance public safety and prevent objects [20].
crimes through image processing techniques. The increasing
number of studies suggest that there will be rapid progress in The evaluation of Mask R-CNN is different than standard
the field, and soon, using deep learning for safety will deep learning algorithms. In Mask R-CNN we have object
become common. classification and segmentation predictions to evaluate. As
stated in [13], popular object detection competitions have
Another valuable point to mention is the new, popular Large used mean average precision (mAP) as the primary
Language Model (LLM) based tools. Even though LLMs are evaluation metric for the models. We can briefly say that
text-based and don’t have a direct impact on object detection, mAP is the mean of estimated area under the precision-recall
as AI systems that understand user inputs, these models will curve. mAP value is used in multiclass detection problems
undoubtedly enhance the user input/request in image where the Average precision (AP) value is averaged for all
processing. the classes. AP is an approximation of the area under
precision-recall curve and it’s obtained from the equation (1)
3. MATERIALS AND METHODS by interpolating the curve values. P(r ̃ ) in (2) represents the
precision where the recall is r .̃
The first stage of our work was to create a comprehensive
dataset that includes both labeled data for classification and 𝐴𝑃 = ∑𝑛=0(𝑟(𝑛+1) − 𝑟𝑛 )𝑃𝑖𝑛𝑡𝑒𝑟𝑝 (𝑟(𝑛+1) ) (1)
segmentation purposes. To accomplish this, we utilized a
pre-labeled dataset consisting of 3000 handgun images. in which:
These images were selected from the internet, representing
at least one handgun in diverse situations [12]. The only 𝑃𝑖𝑛𝑡𝑒𝑟𝑝 (𝑟(𝑛+1) ) = max 𝑃(𝑟̃ ) (2)
𝑟̃ :𝑟̃ ≥𝑟𝑛+1
problem about this dataset was that it only had bounding box
annotations which can be used for object detection but not
for instance segmentation. To solve this, we created a new To determine if a prediction is True Positive (correct), False
dataset out of the mentioned dataset, containing 700 images Negative (undetected) or False Positive (incorrect)
Academic Platform Journal of Engineering and Smart Systems (APJESS) 13(1), 17–21, 2025 19
Ömer Faruk EREKEN, Çiğdem TARHAN
Modeling Objects with Artificial Intelligence Based Image Processing Techniques: Object Detection with Mask R-CNN
confidence score and Intersection Over Union (IoU) values surveillance. The aim of Management Information Systems
are used. Details for the evaluation of a Mask R-CNN model (MIS) is to help people and managers make decisions using
can be found in [21], [22] and [13]. We used the functions the correct technologies. Using Mask R-CNN for security
from [19] to calculate the mAP. purposes is a great example of helping people through
technology.
In order to enhance our handgun detection system, we used
the python OpenCV library to conduct real-time predictions It's important to note that, although AI seems to bring great
on streaming videos. This approach enabled generations of value for security, it may raise concerns regarding human
predictions directly from the video captured by our laptop rights. Continuous surveillance by AI will need strict
camera. If this system gets implemented on security cameras regulations to prevent it from being controlled or used by
it can provide efficient and prompt analysis to detect wrong hands for improper purposes.
handguns.
In this study, we aimed to demonstrate an example of how
4. FINDINGS security threats can be detected both from images and live
videos. In future studies our aim is to enhance our system's
Our model trained on the coco-style formatted dataset gave ability to detect a wide range of weapons in challenging
us 0.81 mAP on training data, 0.78 mAP on validation and environments and generate alarm signals for security forces
test data in 25 epochs. The instance segmentation was through real-time video analysis.
satisfying. You can find some examples of our predictions
on the test data below. Author contributions: Ömer Faruk EREKEN:
Methodology, data processing, writing-original draft
preparation; Çiğdem TARHAN: Conceptualization,
methodology, writing-reviewing and editing.
Conflicts of interest: The authors declare no conflicts of
interest.
Ethical Statement: This article is an expanded version of
the paper titled 'Modeling Objects With Artificial
Intelligence Based Image Processing Techniques: Handgun
Detection With MASK R-CNN' presented at the 10th
International Conference on Management Information
Figure 4. Model Prediction Examples Systems (IMISC 2023) held on 18-20 October 2023.
Financial Disclosure: The authors declared that this study
The examples above are from the predictions made on has received no financial support.
images. Then we ran tests on live video. The results were
also satisfying as seen on the screenshots below. REFERENCES
Modeling Objects with Artificial Intelligence Based Image Processing Techniques: Object Detection with Mask R-CNN
[7] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: [19] W. Abdulla, "Mask R-CNN for object detection and
Towards Real-Time Object Detection with Region instance segmentation on Keras and TensorFlow,"
Proposal Networks," 2016. [Online]. Available: GitHub, 2017. [Online]. Available:
https://arxiv.org/abs/1506.01497v3 https://github.com/matterport/Mask_RCNN
[8] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask [20] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R.
R-CNN," 2018. [Online]. Available: Girshick, J. Hays, P. Perona, D. Ramanan, C. L.
https://arxiv.org/abs/1703.06870v3 Zitnick, and P. Dollár, "Microsoft COCO: Common
Objects in Context," 2015. [Online]. Available:
[9] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi,
https://arxiv.org/abs/1405.0312v3
"You Only Look Once: Unified, Real-Time Object
Detection," in Proceedings of the IEEE Conference on [21] R. Padilla, S. L. Netto, and E. A. B. da Silva, "A Survey
Computer Vision and Pattern Recognition (CVPR), on Performance Metrics for Object-Detection
2016, pp. 779–788. Algorithms," in Proceedings of the 2020 International
Conference on Systems, Signals and Image Processing
[10] M. Hussain, "YOLO-v1 to YOLO-v8, the Rise of
(IWSSIP), Niteroi, Brazil, 2020.
YOLO and Its Complementary Nature toward Digital
Manufacturing and Industrial Defect Detection," [22] R. Padilla, W. L. Passos, T. L. B. Dias, S. L. Netto, and
Machines, vol. 11, 2023, Art. no. 677. [Online]. E. A. B. da Silva, "A Comparative Analysis of Object
Available: https://doi.org/10.3390/machines11070677 Detection Metrics with a Companion Open-Source
Toolkit," Electronics, vol. 10, p. 279, 2021.
[11] R. Sapkota, D. Ahmed, and M. Karkee, "Comparing
YOLOv8 and Mask R-CNN for instance segmentation
in complex orchard environments," Artificial
Intelligence in Agriculture, vol. 13, pp. 84–99, 2024.
[Online]. Available:
https://doi.org/10.1016/j.aiia.2024.07.001
[12] R. Olmos, S. Tabik, and F. Herrera, "Automatic
handgun detection alarm in videos using deep
learning," Neurocomputing, vol. 275, pp. 66-72, 2018.
[Online]. Available: doi:
10.1016/j.neucom.2017.05.012
[13] J. Salido, V. Lomas, J. Ruiz-Santaquiteria, and O.
Deniz, "Automatic handgun detection with deep
learning in video surveillance images," Applied
Sciences, vol. 11, no. 13, p. 6085, 2021. [Online].
Available: doi: 10.3390/app11136085
[14] A. A. Ahmed and M. Echi, "Hawk-eye: An AI-powered
threat detector for intelligent surveillance cameras,"
IEEE Access, vol. 9, pp. 63283-63293, 2021.
[15] A. Goenka and K. Sitara, "Weapon Detection from
Surveillance Images using Deep Learning," in 3rd
International Conference for Emerging Technology
(INCET), 2022. pp. 1-6. [Online]. Available: doi:
10.1109/INCET54531.2022.9824281
[16] S. Khalid, A. Waqar, H. U. Ain Tahir, O. C. Edo, and
I. T. Tenebe, "Weapon detection system for
surveillance and security," in 2023 International
Conference on IT Innovation and Knowledge
Discovery (ITIKD), Manama, Bahrain, 2023. pp. 1-7.
[Online]. Available: doi:
10.1109/ITIKD56332.2023.10099733
[17] O. Veranyurt and C. O. Sakar, "Concealed pistol
detection from thermal images with deep neural
networks," Multimed Tools Appl, vol. 82, pp. 44259–
44275, 2023. [Online]. Available: doi:
10.1007/s11042-023-15358-1
[18] Y. Huang, X. Fu, and Y. Zeng, "Anchor-Free Weapon
Detection for X-Ray Baggage Security Images," IEEE
Access, vol. 10, pp. 97843-97855, 2022.
Academic Platform Journal of Engineering and Smart Systems (APJESS) 13(1), 17–21, 2025 21