A Novel Architecture For Web-Based Attack Detection Using Convolutional Neural Network
A Novel Architecture For Web-Based Attack Detection Using Convolutional Neural Network
a r t i c l e i n f o a b s t r a c t
Article history: Unprotected Web applications are vulnerable places for hackers to attack an organization’s
Received 13 April 2020 network. Statistics show that 42% of Web applications are exposed to threats and hackers.
Revised 29 September 2020 Web requests that Web users request from Web applications are manipulated by hackers
Accepted 22 October 2020 to control Web servers. Web queries are detected to prevent manipulations of hacker’s at-
Available online 27 October 2020 tacks. Web attack detection is extremely essential in information distribution over the past
decades. Anomaly methods based on machine learning are preferred in the Web applica-
Keywords: tion security. This present study aimed to propose an anomaly-based Web attack detection
Web attack detection architecture in a Web application using deep learning methods. The architecture structure
Cybersecurity consists of data preprocess and Convolution Neural Network (CNN) steps. To prove the suit-
Anomaly-based detection ability and success of the proposed CNN architecture, CSIC2010v2 datasets were used. The
Deep learning proposed architecture performed detection of Web attacks, using anomaly-based detection
Convolutional neural network type. Based on the experimental results of the study, the proposed CNN deep learning ar-
chitecture presented successful outcomes.
2013 report (T. O. W. A. S. The Project, 2013) by Open Web Ap- ously to increase the stability and ease updating of the system.
plication Security Project (OWASP). The authors implemented experiments on the system with
In literature, there are many studies about Web application two concurrent deep models and compared the system with
attack detection and prevention. However, these studies show existing systems by using CSIC2010, FWAF, and HttpParams
that the number of Web-based cyber-attack detection studies datasets. To protect Web applications against Web-based at-
are still limited due to insufficient dataset. Intrusion detection tacks, Luo et al. (29) proposed a new malicious URL detection
and prevention are powerful techniques for detecting Web at- method based on deep learning. The authors used the URLs
tacks. This study specifically aims to detect Web attacks us- that were represented as inputs to a composite neural net-
ing anomaly-based detection performing Convolution Neural work for automatic encoding and detection to represent the
Network (CNN). CSIC2010v2 (; Torrano-Gimenez et al., 2010) URLs by using CSIC2010 dataset. Jin et al. (2017) proposed a
dataset is used which are specially created for Web attack de- Web-based attack detection using AutoEncoder and RNN to
tection research. figure out payload-based Web attacks. The proposed study re-
There are two types of intrusion detection methods: one of sults show that both networks have a very promising perfor-
them is signature-based detection and the other is anomaly- mance in detecting Web attacks. Zhang et al. (2017) presented
based detection. Signature-based detection is a method of a study using a deep learning method to detect Web attacks
misuse or pattern matching especially used for assign, and by using CNN. The experimental results on dataset CSIC2010
particularly for known attacks. Anomaly-based detection is show that the designed CNN has a good performance and the
used for zero-day and unknown attack detection by examin- method achieves satisfactory results in detecting Web attacks.
ing data attitude and structure. Anomaly-based detection is (Wang et al., 2018) developed a study that explores the deep
an intrusion detection type that detects data different from learning methods and evaluates CNN, LSTM, and their combi-
the standard traffic type (Tekerek and Bay, 2019; Tama et al., nation method. By comparing with the traditional methods,
2019). experimental results show that deep learning methods can
This study is composed as follows: In Section 2, related greatly outperform the traditional methods. Besides, they also
works are discussed. In Section 3, material and methods are analyze the different factors influencing the performance.
provided which include datasets, a brief review of CNN, and in- Machine learning models have enabled successful studies
dicators. In Section 4, proposed CNN architecture is presented. on intrusion detection through the classification of Web ap-
In Section 5, results and discussion are given. Section 6 con- plication attacks. In this study, a model has been proposed to
cludes the paper. prevent Web-based cyber-attacks by using CNN which is one
of the deep learning methods.
2. Related works
classes and it is defined as in Eq. (1). Network (ANN), CNN is the special neural network in which
at least one of its layers is used in a convolution process in-
TP + TN stead of matrix multiplication. CNN model consists of an in-
Accuracy = (1)
TP + TN + FP + FN put layer, convolution layer, pooling or subsampling layer, fully
connected layer, and output layers.
Precision; refers to the ratio of correctly classified attack
Deep learning models are an ANN model. However, deep
data to the number of all classified attack data. If the precision
learning algorithms differ structurally and numerically from
is higher, the ML model is better, and it is defined as in Eq. (2).
ANNs. Their main difference is that CNN consist of tens, hun-
TP dreds, and even thousands of hidden layers, depending on the
Precision = (2)
TP + FP need of the system. CNN is a deep learning model in which
the linear regression process performed for each step in the
True Positive Rate (TPR): TPR is the number of connection ANN is represented by a convolution process. CNN was first
data correctly classified to the normal class. It is also called as introduced by LeCun et al. (1990) for image processing. The
Recall. TPR estimates the ratio of the correctly classified attack first model consisted of two basic features: Spatial Shared
connection records to the total number of attack connection Weights and Spatial Pooling. LeCun et al. (1998) developed
records. TPR is defined as follows. CNN algorithms and published the second CNN version as
TP LeNet-5 in 1998. They applied the model on several bank im-
T PR = Recall = (3) ages to increase handwriting number classification success
TP + FN
and achieved success in the first 7 levels.
F1-Score: F1-Score is a harmonic mean of Precision and Re- CNN consists of three basic layers; convolution layer, pool-
call. F1-Score is defined as follows. ing or sub-sampling layer, and fully-connected layer. An ex-
emplary CNN architecture depicted by LeCun is shown in
Precision x Recall
F 1 − Score = 2 x (4) Fig. 1.
Precision + Recall
CNN aims to learn the abstract features of images by us-
ing convolution and pooling processes. Features learned in the
True Negative Rate (TNR): TNR is the number of connec-
first layers of the CNN system consist of basic properties such
tion records correctly classified to the anomaly class. It is the
as edges, curves, and colors. The features learned by the last
number of items correctly identified as negative out of total
layers consist of the holistic structures, shapes, and parts of
negatives. TNR is defined as follows.
the objects in the image. Thus, the object in the image be-
TN comes clearer as moved from the first layer to the last layers. It
T NR = (5) has been determined that this learning style has similar fea-
TP + FP
tures to the cortex of the human eye. CNNs, like human beings,
False Positive Rate (FPR): The number of normal connection can see the small pieces of the image first and see the big table
records wrongly classified to the anomaly connection record. consisting of the combination of these pieces in the following
FPR is defined as follows. layers.
In the CNN model, the input data are images that are digi-
FP
F PR = (6) talized and converted into matrix format. The matrix consists
FP + TN
of width, height, and color dimensions. If the input image has
False Negative Rate (FNR): FNR is the proportion of the indi- the gray color level, the matrix consists of a total of 2 dimen-
viduals with a known positive condition for which the test re- sions: width, height, and one color (such as 512 × 512 × 1,
sult is negative. This rate is sometimes called the missing rate the trailing 1 represents a gray color). If the image consists
which is the number of anomaly connection records wrongly of colored data in RGB format, the input matrix consists of
classified into the normal connection record. three dimensions, each of which represents a color (such as
512 × 512 × 3, the trailing 3 representing color tints of red,
FN green, and blue). The basic process that CNN use to deter-
F NR = (7)
FN + TP mine the attributes in an image is the convolution process.
CNN learns the features from the data in this matrix format.
Receiver Operating Characteristics (ROC) curve is plotted
The basic process that CNN use to determine the features is
based on the change between the TPR on the y-axis and the
the convolution. The convolution takes place by moving the
FPR on the x-axis on different axes. Area Under the ROC Curve
kernel matrix, which is a smaller matrix, on the input matrix.
(AUC) is the size of the area under the ROC curve used in con-
Thus, the features of the input matrix are mapped. The convo-
junction with ROC as a benchmark for machine learning mod-
lution process is repeated to cover all image pixels by shifting
els. If the AUC is higher, the machine learning model is better.
to the right and down on the image matrix. The mathematical
1 TP FP operation performed in the convolution process is the sum of
AUC = ∫ d (8)
0 TP + FN TN + FP multiplication of the overlapping image pixels and the kernel
pixels, and then adding the error value (bias) to these results.
3.2. Convolutional neural network (CNN) Thus, the result obtained in each process represents a pixel of
the new feature map. The neuron representation is given in
CNN is a deep neural network method mostly used in im- Fig. 2 and the basic convolution process is given in Fig. 3.
age processing studies. Unlike multi-layered Artificial Neural
4 computers & security 100 (2021) 102096
fore, an accurate input size can be selected, by considering the 3.2.3. Pooling
criteria such as the success and determination of the neural Pooling operation decreases the size of the image but pre-
network. serves the important features in the image. The height and
width of input data are reduced by pooling. Decreasing the
height and width values accelerates the training of the neu-
3.2.2. Convolution layer
ral network even if it causes loss of feature. The disadvantage
Convolution extracts features from an input image and pre-
of the pooling is the lack of parameters to be trained. Pooling
serves the relationship between pixels by learning image fea-
has fixed kernel size and stride parameters. The most com-
tures using small squares of input data. In the convolution
mon type of pooling technique used is max pooling. In max
layer, new features are extracted by applying the sum of ma-
pooling, you slide a window of n x n where n is less than the
trix multiplication process between kernel and input (image).
side of the image and determine the maximum in that win-
Applied kernel sizes can vary from 2 × 2, 3 × 3, 5 × 5, and are
dow and then shift the window with the given stride length.
represented by a matrix of the specified size. Applied kernel
The complete process is specified in Fig. 5.
to input data are used to extract a different feature.
How to select and apply kernels to the data coming from
the input layer is very important. A large size of kernel causes 3.2.4. Fully connected layer
some of the features to be overlooked in the input data. The The last layer of CNN is fully connected. Feature extraction,
coefficients of the kernels represent the weights in ANN. As a size reduction, and normalization are performed before the
result of each iteration, these kernels coefficients are trained. fully connected layer. The training process should be per-
According to the trained values, these kernels coefficients are formed according to the extracted features. The CNN weights
updated and new kernels are obtained. are calculated according to the error rate and continue until
In the input data, kernel passed on all pixels respectively. the desired convergence value is obtained or a certain num-
The input data coefficient and the kernel coefficient are multi- ber of steps is completed.
plied and this process is performed separately for each chan-
nel of the input data. The results obtained for each channel
are calculated separately by the sum of the matrix multipli- 3.3. Datasets
cation process, and this obtained value is written to the point
corresponding to the output data. In this study, CSIC2010v2 HTTP dataset is used for experimen-
Multiplication process is a mathematical operation that tation and evaluation of the proposed model.
takes inputs such as image matrix and a kernel. In Fig. 4, the
multiplication process is given. The size of the output is the to-
3.3.1. CSIC2010v2
tal number of unique kernels that can be extracted from the
Spanish Research National Council (CSIC) 2010v2 HTTP
image.
dataset, which was developed in 2010 at the Information Se-
curity Institute, is one of the most famous datasets and is
• An image matrix of dimension is h∗ w∗ d, widely used in the field of Web security. When compared with
• A kernel is fh ∗ fw ∗ d, the old datasets, e.g., KDD99 and DRAPA, the CSIC2010v2 aim
• Outputs a volume dimension is (h - fh +1) ∗ (w-fw+1) ∗ 1 especially at Web attack detection, and this dataset contains
104,000 normal and 119,585 malicious requests created on e-
The kernel size is smaller than the image size. You can con- commerce Web site shopping-cart application. The anomaly
sider a small two-dimensional 5 × 5 image with binary pixel requests include attacks such as SQL injection, buffer over-
values and an another 3 × 3 matrix. The 3 × 3 matrix is known flow, information gathering, CRLF injection, XSS, and parame-
as a kernel, and the matrix devised by gliding over the image ter tampering. CSIC2010v2 is the second version and improve-
and determining the dot product is called the convolved fea- ment version of the prior dataset CSIC 2010, where some sam-
ture or activation map or the feature map. Stride is the num- ples are not properly created. As a result, the former dataset
ber of pixels by which slide the kernel over the original image leads to the bias results for some classification algorithms (;
(Mandal and Bhattacharya, 2019). Torrano-Gimenez et al., 2010).
6 computers & security 100 (2021) 102096
Dataset F1 Precision Best Accuracy Average Accuracy FPR FNR Recall TNR TPR Epoch
CSIC2010v2 0.9696 0.9718 0.9644 0.9579 0.0400 0.0326 0.9674 0.9600 0.9674 200
0.9751 0.9743 0.9707 0.9684 0.0368 0.0240 0.9759 0.9631 0.9759 400
8 computers & security 100 (2021) 102096
Performance
Study Technique Datasets metric
(Nguyen and NaiveBayes CSIC2010 Accuracy: 72,78%
Franke, 2012) BayesNetwork Accuracy: 82,79%
Decision Stump Accuracy: 74,73%
RBFNetwork Accuracy: 72,46%
Majority Voting Accuracy: 81%
Hedge/Boosting Accuracy: 82,1%
A-IDS A-ExIDS Accuracy: 90,52%
Accuracy: 90,98%
(Kozik et al., 2014) Regular expression CSIC2010 Accuracy: 94,46%
(Epp et al., 2017) SVM CSIC2010v2 F1:82% F1:93%
CSIC2012
(Zhang et al., 2017) CNN CSIC2010 Accuracy: 96.49%
Proposed CNN CSIC2010v2 Accuracy: 97,07% F1:
97,51%
way to transform HTTP requests into representations, BoW In the proposed architecture, two outputs were used, to
based architecture and distinguished HTTP request into vari- HTTP request classification, normal and anomaly. Binary cross
able = value were used and each variable and variable = value entropy is the best way for two output classifications. It is
frequency for all anomaly and normal HTTP requests were cal- a Sigmoid activation plus a Cross-Entropy loss. Unlike Soft-
culated. In the experimental setup two different experiments, max loss it is independent for each class, meaning that the
200 and 400, were carried out for dataset and results are pre- loss computed for every CNN output class is not affected by
sented in Table 1. Even if the experimental results are close other component values. So binary cross entropy used as a
to each other, slightly better results were obtained in 400 it- loss function. As optimization technique Stochastic Gradient
erations. In 400 epoch CSIC2010v2 dataset’s average accuracy Descent (SGD) is used, because SGD is much faster than the
is 0.9684. Other indicator results are presented in Table 1. It other algorithms. SGD is an iterative method for optimizing an
seems that the BoW based architecture has a good capacity of objective function with suitable smoothness properties. It can
representing characters in HTTP requests while retaining the be regarded as a stochastic approximation of gradient descent
information in the dataset. optimization, since it replaces the actual gradient calculated
A comparison of the proposed model in this study with pre- from the entire data set by an estimate thereof calculated from
vious studies is presented in Table 2. According to the compar- a randomly selected subset of the data.
ison, proposed model has better accuracy results. In Fig. 12 loss performance graph of CSIC2010v2 is pre-
As stated in Table 1, in Fig. 11 best accuracy performance sented. The loss value of training and validation are presented
graph of CSIC2010v2 is presented. Training accuracy is 1 and and the lowest loss value is 0.0993 for CSIC2010v2 dataset in
the best validation accuracy is 0.9707 for CSIC2010v2 dataset 400 epochs as well.
in 400 epochs as well.
10 computers & security 100 (2021) 102096
research is also required to automatically find the optimal Marston S, Li Z, Bandyopadhyay S, Zhang J, Ghalsasi A. Cloud
variable and value parameters for the CNN architecture. The computingâ˘Aˇ Tthe business perspective. Decis. Support Syst.
Web-based intrusion detection architecture proposed in this 2011;51(1):176–89.
Booth TG, Andersson K. Elimination of DoS UDP reflection
paper has good accuracy, FPR, and TPR metrics. According to
amplification bandwidth attacks, protecting TCP services. In:
the metric results, the proposed architecture was found to
International Conference on Future Network Systems and
be successful enough. Although the number and variety of Security. Springer; 2015. p. 1–15.
methods such as CNN deep learning are sufficient for Web- Rabai LBA, Jouini M, Aissa AB, Mili A. A cybersecurity model in
based attack detection, the number and content of datasets cloud computing environments. J. King Saud Univ.-Comput.
are very limited. According to the results, the proposed archi- Inf. Sci. 2013;25(1):63–75.
tecture can be used to prevent Web applications from Web- Sumra HBH, AbManan J-lB. Attacks on security goals
(confidentiality, integrity, availability) in VANET: a survey. In:
based attacks. Future planned studies may include more at-
Vehicular Ad-Hoc Networks For Smart Cities. Springer; 2015.
tack datasets and consider multi-class classification tech- p. 51–61.
niques using proposed CNN based classification architecture. Cherdantseva Y, Burnap P, Blyth A, Eden P, Jones K, Soulsby H,
Stoddart K. A review of cybersecurity risk assessment
methods for SCADA systems. Comput. Sec. 2016;56:1–27.
Halfond WG, Viegas J, Orso A. A classification of SQL-injection
Declaration of Competing Interest attacks and countermeasures. In: Proceedings of the IEEE
International Symposium on Secure Software Engineering,
The authors declare that they have no known competing fi- vol. 1. IEEE; 2006. p. 13–15.
nancial interests or personal relationships that could have ap- Johari R, Sharma P. A survey on web application vulnerabilities
peared to influence the work reported in this paper. (SQLIA, XSS) exploitation and security engine for SQL
injection. In: International Conference on Communication
The authors declare the following financial inter-
Systems and Network Technologies (CSNT). IEEE; 2012.
ests/personal relationships which may be considered as p. 453–8.
potential competing interests: Kumar P, Pateriya R. A survey on SQL injection attacks, detection,
and prevention techniques. In: Third International
Conference on computing Communication & Networking
CRediT authorship contribution statement Technologies (ICCCNT). IEEE; 2012. p. 1–5.
... & Hassan MM, Nipa SS, Akter M, Haque R, Deepa FN,
Rahman M, Sharif MH. Broken authentication and session
Adem Tekerek: Conceptualization, Methodology, Software,
management vulnerability: a case study of Web application.
Data curation, Writing - original draft, Visualization, Investi-
Int. J. Simul. Syst. Sci. Technol. 2018;19(2) 6-1.
gation, Software, Validation, Writing - review & editing. T. O. W. A. S. The project, “OWASP Top Ten 2013 Project,” 2013.
[Online]. Available:
https://www.owasp.org/index.php/Top_10_2013-Top_10
Torrano-Gimenez C, Perez-Villegas A, Marañón GÁ, et al. An
Acknowledgment anomaly-based approach for intrusion detection in web
traffic. J. Inf. Assur. Sec. 2010a;5(4):446–54.
This study is supported by NVIDIA Corporation. All experi- Tekerek A, Bay OF. Design and implementation of an artificial
mental studies were performed on a TITAN XP graphic card intelligence-based web application firewall model. Neural
donated by NVIDIA. I would like to thank NVIDIA for support. Netw. World 2019;29(4):189–206.
Tama BA, Comuzzi M, Rhee K-H. TSE-IDS: a two-stage classifier
ensemble for intelligent anomaly-based intrusion detection
R E F E R E N C E S
system. IEEE Access 2019;7 94 497–94 507.
Torrano-Gimenez C, Perez-Villegas A, Marañón GÁ, et al. An
anomaly-based approach for intrusion detection in web
traffic. J. Inf. Assur. Sec. 2010b;5(4):446–54.
Verdouw CN, Wolfert J, Beulens AJ, Rialland A. Virtualization of Nguyen HT, Torrano-Gimenez C, Alvarez G, Petrović S, Franke K.
food supply chains with the internet of things. J. Food Eng. Application of the generic feature selection measure in
2016;176:128–36. detection of web attacks. In: Computational Intelligence in
Shankar V, Venkatesh A, Hofacker C, Naik P. Mobile marketing in Security for Information Systems. Springer; 2011. p. 25–32.
the retailing environment: current insights and future Nguyen HT, Franke K, Petrović S. Reliability in a feature selection
research avenues. J. Interact. Mark. 2010;24(2):111–20. process for intrusion detection. In: Reliable Knowledge
Pantano E. Ubiquitous retailing innovative scenario: from the Discovery. Springer; 2012. p. 203–18.
fixed point of sale to the flexible ubiquitous store. J. Technol. Nguyen HT, Franke K. Adaptive intrusion detection system via
Manage. Innov. 2013;8(2):84–92. online machine learning. In: 12th International Conference on
Pantano E, Priporas C-V. The effect of mobile retailing on Hybrid Intelligent Systems (HIS). IEEE; 2012. p. 271–7.
consumers’ purchasing experiences: a dynamic perspective. Kozik R, Choraś M, Renk R, Hołubowicz W. Modelling HTTP
Comput. Human Behav. 2016;61:548–55. requests with regular expressions for detection of cyber
Friedewald M, Raabe O. Ubiquitous computing: an overview of attacks targeted at web applications. In: International Joint
technology impacts. Telemat. Inform. 2011;28(2):55–65. Conference SOCO’14-CISIS’14-ICEUTE’14; 2014. p. 527–35.
Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Epp N, Funk R, Cappo C, Lorenzo-Paraguay S. In: Workshop
Lee G, Patterson D, Rabkin A, Stoica I, et al. A view of cloud Regional de Segurança da Informação e de Sistemas
computing. Commun. ACM 2010;53(4):50–8. Computacionais. Anomaly-based web application firewall
Dinh HT, Lee C, Niyato D, Wang P. A survey of mobile cloud using HTTP-specific features and one-class svm; 2017.
computing: architecture, applications, and approaches. Wirel.
Commun. Mobile Comput. 2013;13(18):1587–611.
12 computers & security 100 (2021) 102096
Tian Z, Luo C, Qiu J, Du X, Guizani M. A distributed deep learning LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning
system for Web attack detection on edge devices. IEEE Trans. applied to document recognition. Proc. IEEE
Ind. Inform. 2019. 1998;86(11):2278–324.
Luo, C., Su, S., Sun, Y., Tan, Q., Han, M., & Tian, Z. A Mandal JK, Bhattacharya D. Emerging Technology in Modelling
convolution-based System for Malicious URLs Detection. and Graphics: Proceedings of IEM Graph 2018 (Vol. 937).
Jin X, Cui B, Yang J, Cheng Z. In: International Conference on Springer, 2019.
Broadband and Wireless Computing, Communication and Learn how to create better models and predictions using data
Applications (pp. 482-488). Payload-based web attack preprocessing, “Data preprocessing in detail”, [Online].
detection using deep neural networks Cham. Springer; 2017. Available: https://developer.ibm.com/technologies/analytics/
Zhang M, Xu B, Bai S, Lu S, Lin Z. In: International Conference on articles/data-preprocessing-in-detail, 2019.
Neural Information Processing (pp. 828-836). A deep learning
method to detect web attacks using a specially designed CNN Adem Tekerek: He is a Doctor Instructor at Gazi University, De-
Cham. Springer; 2017. partment of Information Technology. He graduated from Faculty
Wang J, Zhou Z, Chen J. Evaluating CNN and LSTM for web attack of Technical Education, Electronics and Computer Education De-
detection. Proceedings of the 2018 10th International partment in 2007. He graduated from MSc program of Informatics
Conference on Machine Learning and Computing (pp. Institute in 2010. His master thesis is about Content Management
283-287), 2018. Systems. He graduated from PhD program of Informatics Institute
LeCun Y, et al. Handwritten digit recognition with a in 2016. His-PhD thesis is about Web Application Firewall algo-
back-propagation network. In: Advances in Neural rithms. He has published 24 papers on computer sciences. His-
Information Processing Systems; 1990. p. 396–404. research is data mining, machine learning and their applications
especially on information security.