0% found this document useful (0 votes)
61 views12 pages

A Novel Architecture For Web-Based Attack Detection Using Convolutional Neural Network

Uploaded by

jyoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views12 pages

A Novel Architecture For Web-Based Attack Detection Using Convolutional Neural Network

Uploaded by

jyoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

computers & security 100 (2021) 102096

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/cose

A novel architecture for web-based attack


detection using convolutional neural network
Adem Tekerek
Gazi University, Technology Faculty, Computer engineering Department, Ankara, Turkey

a r t i c l e i n f o a b s t r a c t

Article history: Unprotected Web applications are vulnerable places for hackers to attack an organization’s
Received 13 April 2020 network. Statistics show that 42% of Web applications are exposed to threats and hackers.
Revised 29 September 2020 Web requests that Web users request from Web applications are manipulated by hackers
Accepted 22 October 2020 to control Web servers. Web queries are detected to prevent manipulations of hacker’s at-
Available online 27 October 2020 tacks. Web attack detection is extremely essential in information distribution over the past
decades. Anomaly methods based on machine learning are preferred in the Web applica-
Keywords: tion security. This present study aimed to propose an anomaly-based Web attack detection
Web attack detection architecture in a Web application using deep learning methods. The architecture structure
Cybersecurity consists of data preprocess and Convolution Neural Network (CNN) steps. To prove the suit-
Anomaly-based detection ability and success of the proposed CNN architecture, CSIC2010v2 datasets were used. The
Deep learning proposed architecture performed detection of Web attacks, using anomaly-based detection
Convolutional neural network type. Based on the experimental results of the study, the proposed CNN deep learning ar-
chitecture presented successful outcomes.

© 2020 Elsevier Ltd. All rights reserved.

the user’s computers or devices using Web applications over


1. Introduction the Internet (Marston et al., 2011).
Availability and over-usage make Web applications target
Web applications have been the most used applications of
of cyber-attacks. Attackers try to make the Web application
the Internet due to high rate usage of the Internet. Web
unavailable by sending a malicious request (Booth and An-
applications enable organizations to increase their revenue
dersson, 2015). Attackers could compromise a vulnerable Web
and improve their business processes, such as adopting
application, thus compromising the confidentiality, integrity,
a virtualization or business platform in the supply chain
and availability of the organization’s resources (Rabai et al.,
(Verdouw et al., 2016). The Internet has changed business life
2013; Hasbullah and AbManan, 2015; Cherdantseva et al.,
dramatically through the ‘stay connected’ feature which en-
2016). This might result in financial loss and unrecoverable
ables the users to interact with each other anywhere and any-
damage to organizations. Cyber-attacks can damage Web ap-
time (Shankar et al., 2010; Pantano, 2013; Pantano and Pripo-
plications in different ways, destroying specific resources,
ras, 2016). The Internet has contributed significantly to the de-
stealing data from databases, service disruption, or access to
velopment of various types of Internet-based computing in-
the Web application. Structured Query Language (SQL) injec-
stitutions, such as information technologies (Friedewald and
tion (Halfond et al., 2006), cross-site scripting (XSS) (Johari and
Raabe, 2011), cloud computing (Armbrust et al., 2010), and mo-
Sharma, 2012), server-side include (Kumar and Pateriya, 2012),
bile cloud computing (Dinh et al., 2013). Running application
broken authentication (Hassan et al., 2018) are defined as Web
services are not dependent on computing resources, but vari-
application security risks at Top 10 Security Vulnerabilities
ous services such as software, storage, and servers are sent to

E-mail address: atekerek@gazi.edu.tr


https://doi.org/10.1016/j.cose.2020.102096
0167-4048/© 2020 Elsevier Ltd. All rights reserved.
2 computers & security 100 (2021) 102096

2013 report (T. O. W. A. S. The Project, 2013) by Open Web Ap- ously to increase the stability and ease updating of the system.
plication Security Project (OWASP). The authors implemented experiments on the system with
In literature, there are many studies about Web application two concurrent deep models and compared the system with
attack detection and prevention. However, these studies show existing systems by using CSIC2010, FWAF, and HttpParams
that the number of Web-based cyber-attack detection studies datasets. To protect Web applications against Web-based at-
are still limited due to insufficient dataset. Intrusion detection tacks, Luo et al. (29) proposed a new malicious URL detection
and prevention are powerful techniques for detecting Web at- method based on deep learning. The authors used the URLs
tacks. This study specifically aims to detect Web attacks us- that were represented as inputs to a composite neural net-
ing anomaly-based detection performing Convolution Neural work for automatic encoding and detection to represent the
Network (CNN). CSIC2010v2 (; Torrano-Gimenez et al., 2010) URLs by using CSIC2010 dataset. Jin et al. (2017) proposed a
dataset is used which are specially created for Web attack de- Web-based attack detection using AutoEncoder and RNN to
tection research. figure out payload-based Web attacks. The proposed study re-
There are two types of intrusion detection methods: one of sults show that both networks have a very promising perfor-
them is signature-based detection and the other is anomaly- mance in detecting Web attacks. Zhang et al. (2017) presented
based detection. Signature-based detection is a method of a study using a deep learning method to detect Web attacks
misuse or pattern matching especially used for assign, and by using CNN. The experimental results on dataset CSIC2010
particularly for known attacks. Anomaly-based detection is show that the designed CNN has a good performance and the
used for zero-day and unknown attack detection by examin- method achieves satisfactory results in detecting Web attacks.
ing data attitude and structure. Anomaly-based detection is (Wang et al., 2018) developed a study that explores the deep
an intrusion detection type that detects data different from learning methods and evaluates CNN, LSTM, and their combi-
the standard traffic type (Tekerek and Bay, 2019; Tama et al., nation method. By comparing with the traditional methods,
2019). experimental results show that deep learning methods can
This study is composed as follows: In Section 2, related greatly outperform the traditional methods. Besides, they also
works are discussed. In Section 3, material and methods are analyze the different factors influencing the performance.
provided which include datasets, a brief review of CNN, and in- Machine learning models have enabled successful studies
dicators. In Section 4, proposed CNN architecture is presented. on intrusion detection through the classification of Web ap-
In Section 5, results and discussion are given. Section 6 con- plication attacks. In this study, a model has been proposed to
cludes the paper. prevent Web-based cyber-attacks by using CNN which is one
of the deep learning methods.

2. Related works

Many anomaly-based detection studies proposed on litera- 3. Material and method


ture by researchers on Web applications and studies apply-
ing cyber-attack detection on Web applications using machine In this section, the materials and methods used in the study
learning methods are discussed as follows. are briefly explained. In addition to the nonlinear activation
Torrano-Gimenez et al. proposed a study that performed function, CNN is a deep neural network method consisting of
static and dynamic detection in (Torrano-Gimenez et al., 2010) hidden layers with convolution and subsampling functions.
to specify some kind of attacks. Nguyen et al. (2011,2012) CSIC2010v2 HTTP dataset was used to test the proposed CNN
suggested a generic feature selection and selected features deep learning model.
using CSIC-2010 to detect Web-based cyber-attacks. The se-
lected 30 proper features are used for anomaly-based cyber-
attack classification in the CSIC2010 dataset. Nguyen and 3.1. Indicators
Franke (2012) proposed an adaptive detection model based
on an ensemble of different classification accuracy which Evaluation metrics to estimate the classification measures are
is 72.78% for NaiveBayes, 82.79% for BayesNetwork, 74.73% required in scientific studies. It consists of classes labeled as
for Decision Stump, 72.46% for RBFNetwork, 81% for Major- normal or anomaly. In this study, the following most com-
ity Voting, 82.1% for Hedge/Boosting, 90.52% for A-IDS and monly used evaluation metrics are considered.
90.98% for A-ExIDS which outperforms similar algorithms. True Positive (TP); is an outcome where the model correctly
Kozik et al. (2014) proposed a method that used HTTP re- predicts the positive class.
quest headers for classification to detect Web-based attacks, True Negative (TN); is an outcome where the model cor-
whose proposed method detection rate is 94.46%. For Web- rectly predicts the negative class.
based attack detection, Epp et al. (2017) proposed an anomaly- False Positive (FP); is an outcome where the model incor-
based detection model using SVM. Authors used CSIC2010 and rectly predicts the positive class.
CSIC2012v2 datasets and achieved a reasonable performance. False Negative (FN); is an outcome where the model incor-
The proposed method average F1 score is 0.93 and the av- rectly predicts the negative class.
erage TPR is 0.95. Tian et al. (2019) designed a study to de- Accuracy; refers to the ratio of accurately detected data
tect Web-based attacks and they used the designed study on to the entire test dataset. The higher the accuracy value, the
edge devices. Authors used the Cloud to deal with the Inter- more successful the ML model. Accuracy serves as a good
net of Things, and multiple deep models were used simultane- measurement for the test dataset, which contains balanced
computers & security 100 (2021) 102096 3

classes and it is defined as in Eq. (1). Network (ANN), CNN is the special neural network in which
at least one of its layers is used in a convolution process in-
TP + TN stead of matrix multiplication. CNN model consists of an in-
Accuracy = (1)
TP + TN + FP + FN put layer, convolution layer, pooling or subsampling layer, fully
connected layer, and output layers.
Precision; refers to the ratio of correctly classified attack
Deep learning models are an ANN model. However, deep
data to the number of all classified attack data. If the precision
learning algorithms differ structurally and numerically from
is higher, the ML model is better, and it is defined as in Eq. (2).
ANNs. Their main difference is that CNN consist of tens, hun-
TP dreds, and even thousands of hidden layers, depending on the
Precision = (2)
TP + FP need of the system. CNN is a deep learning model in which
the linear regression process performed for each step in the
True Positive Rate (TPR): TPR is the number of connection ANN is represented by a convolution process. CNN was first
data correctly classified to the normal class. It is also called as introduced by LeCun et al. (1990) for image processing. The
Recall. TPR estimates the ratio of the correctly classified attack first model consisted of two basic features: Spatial Shared
connection records to the total number of attack connection Weights and Spatial Pooling. LeCun et al. (1998) developed
records. TPR is defined as follows. CNN algorithms and published the second CNN version as
TP LeNet-5 in 1998. They applied the model on several bank im-
T PR = Recall = (3) ages to increase handwriting number classification success
TP + FN
and achieved success in the first 7 levels.
F1-Score: F1-Score is a harmonic mean of Precision and Re- CNN consists of three basic layers; convolution layer, pool-
call. F1-Score is defined as follows. ing or sub-sampling layer, and fully-connected layer. An ex-
  emplary CNN architecture depicted by LeCun is shown in
Precision x Recall
F 1 − Score = 2 x (4) Fig. 1.
Precision + Recall
CNN aims to learn the abstract features of images by us-
ing convolution and pooling processes. Features learned in the
True Negative Rate (TNR): TNR is the number of connec-
first layers of the CNN system consist of basic properties such
tion records correctly classified to the anomaly class. It is the
as edges, curves, and colors. The features learned by the last
number of items correctly identified as negative out of total
layers consist of the holistic structures, shapes, and parts of
negatives. TNR is defined as follows.
the objects in the image. Thus, the object in the image be-
TN comes clearer as moved from the first layer to the last layers. It
T NR = (5) has been determined that this learning style has similar fea-
TP + FP
tures to the cortex of the human eye. CNNs, like human beings,
False Positive Rate (FPR): The number of normal connection can see the small pieces of the image first and see the big table
records wrongly classified to the anomaly connection record. consisting of the combination of these pieces in the following
FPR is defined as follows. layers.
In the CNN model, the input data are images that are digi-
FP
F PR = (6) talized and converted into matrix format. The matrix consists
FP + TN
of width, height, and color dimensions. If the input image has
False Negative Rate (FNR): FNR is the proportion of the indi- the gray color level, the matrix consists of a total of 2 dimen-
viduals with a known positive condition for which the test re- sions: width, height, and one color (such as 512 × 512 × 1,
sult is negative. This rate is sometimes called the missing rate the trailing 1 represents a gray color). If the image consists
which is the number of anomaly connection records wrongly of colored data in RGB format, the input matrix consists of
classified into the normal connection record. three dimensions, each of which represents a color (such as
512 × 512 × 3, the trailing 3 representing color tints of red,
FN green, and blue). The basic process that CNN use to deter-
F NR = (7)
FN + TP mine the attributes in an image is the convolution process.
CNN learns the features from the data in this matrix format.
Receiver Operating Characteristics (ROC) curve is plotted
The basic process that CNN use to determine the features is
based on the change between the TPR on the y-axis and the
the convolution. The convolution takes place by moving the
FPR on the x-axis on different axes. Area Under the ROC Curve
kernel matrix, which is a smaller matrix, on the input matrix.
(AUC) is the size of the area under the ROC curve used in con-
Thus, the features of the input matrix are mapped. The convo-
junction with ROC as a benchmark for machine learning mod-
lution process is repeated to cover all image pixels by shifting
els. If the AUC is higher, the machine learning model is better.
to the right and down on the image matrix. The mathematical
1 TP FP operation performed in the convolution process is the sum of
AUC = ∫ d (8)
0 TP + FN TN + FP multiplication of the overlapping image pixels and the kernel
pixels, and then adding the error value (bias) to these results.
3.2. Convolutional neural network (CNN) Thus, the result obtained in each process represents a pixel of
the new feature map. The neuron representation is given in
CNN is a deep neural network method mostly used in im- Fig. 2 and the basic convolution process is given in Fig. 3.
age processing studies. Unlike multi-layered Artificial Neural
4 computers & security 100 (2021) 102096

Fig. 1 – Basic CNN architecture introduced by LeCun et al. (1998).

3.2.1. Input layers


ANN consists of three layers as input, hidden and output lay-
ers. The input layer is used to input data to the neural net-
work for training. Even if CNN is a type of deep learning, it is
also an ANN. The architecture of CNN is formed in the input,
hidden and output layers respectively as in the standard ANN
structure. Image is used as input data in the first layer of CNN.
The image pixels entered from the input layer are converted to
numerical expressions. If an input is a colored image or gray
level, each pixel is expressed in numerical values between 0
and 255. If the input is black and white, the pixels of the im-
age are expressed in values of 0 or 1.
Fig. 2 – Simple neuron representation of CNN. Numeric values representing one pixel of the input are
stored in a matrix. The matrix property may vary depending
on the properties of the image; for example, a 16 × 16 × 1 ma-
trix design for a grayscale image of 16 × 16 pixels. 16 × 16 is
width x height and 1 is grayscale dimension. A vector with
16 × 16 × 1 = 256 elements can be displayed as a vector. If the
image is colored, the matrix size is 16 × 16 × 3, and the matrix
xrepresents the pixels in the input image, wrepresents is made into the vector, a vector consists of 16 × 16 × 3 = 768
the pixels in the kernelled image, brepresents the bias value, elements. The colored inputs are represented directly in 3 di-
yrepresents the feature map pixel, and fis the activation func- mensions. Each dimension is called the channel because each
tion. input has RGB (Red, Green, Blue) channels. Large size inputs
The convolution operation is obtained by multiplication of increase memory usage in the training of the CNN, causing
the input data matrix and the kernel data matrix and sum of the calculations speed to slow down. However, more feature
the bias term. An input matrix (5 × 5 × 3) and a kernel matrix extraction can yield effective results for a more successful pre-
(3 × 3 × 3) are given in the Fig. 3. In the Fig. 3, a new feature diction. Likewise, small inputs can provide less memory usage
map in dimensions (3 × 3 × 3) is created by moving the kernel and speed up calculations for neural network training. To ob-
matrix over the input matrix. The size of the created feature tain the best accuracy in a neural network model, it is nec-
map varies depending on the size of the input image, the filter essary to create the optimum network. Thus, the accuracy of
size, the number of empty edges on the input, and the number the predicted outputs can be as successful as desired. There-
of steps of the kernel on the input.

Fig. 3 – Basic convolution process.


computers & security 100 (2021) 102096 5

Fig. 4 – Image multiplies kernel matrix.

fore, an accurate input size can be selected, by considering the 3.2.3. Pooling
criteria such as the success and determination of the neural Pooling operation decreases the size of the image but pre-
network. serves the important features in the image. The height and
width of input data are reduced by pooling. Decreasing the
height and width values accelerates the training of the neu-
3.2.2. Convolution layer
ral network even if it causes loss of feature. The disadvantage
Convolution extracts features from an input image and pre-
of the pooling is the lack of parameters to be trained. Pooling
serves the relationship between pixels by learning image fea-
has fixed kernel size and stride parameters. The most com-
tures using small squares of input data. In the convolution
mon type of pooling technique used is max pooling. In max
layer, new features are extracted by applying the sum of ma-
pooling, you slide a window of n x n where n is less than the
trix multiplication process between kernel and input (image).
side of the image and determine the maximum in that win-
Applied kernel sizes can vary from 2 × 2, 3 × 3, 5 × 5, and are
dow and then shift the window with the given stride length.
represented by a matrix of the specified size. Applied kernel
The complete process is specified in Fig. 5.
to input data are used to extract a different feature.
How to select and apply kernels to the data coming from
the input layer is very important. A large size of kernel causes 3.2.4. Fully connected layer
some of the features to be overlooked in the input data. The The last layer of CNN is fully connected. Feature extraction,
coefficients of the kernels represent the weights in ANN. As a size reduction, and normalization are performed before the
result of each iteration, these kernels coefficients are trained. fully connected layer. The training process should be per-
According to the trained values, these kernels coefficients are formed according to the extracted features. The CNN weights
updated and new kernels are obtained. are calculated according to the error rate and continue until
In the input data, kernel passed on all pixels respectively. the desired convergence value is obtained or a certain num-
The input data coefficient and the kernel coefficient are multi- ber of steps is completed.
plied and this process is performed separately for each chan-
nel of the input data. The results obtained for each channel
are calculated separately by the sum of the matrix multipli- 3.3. Datasets
cation process, and this obtained value is written to the point
corresponding to the output data. In this study, CSIC2010v2 HTTP dataset is used for experimen-
Multiplication process is a mathematical operation that tation and evaluation of the proposed model.
takes inputs such as image matrix and a kernel. In Fig. 4, the
multiplication process is given. The size of the output is the to-
3.3.1. CSIC2010v2
tal number of unique kernels that can be extracted from the
Spanish Research National Council (CSIC) 2010v2 HTTP
image.
dataset, which was developed in 2010 at the Information Se-
curity Institute, is one of the most famous datasets and is
• An image matrix of dimension is h∗ w∗ d, widely used in the field of Web security. When compared with
• A kernel is fh ∗ fw ∗ d, the old datasets, e.g., KDD99 and DRAPA, the CSIC2010v2 aim
• Outputs a volume dimension is (h - fh +1) ∗ (w-fw+1) ∗ 1 especially at Web attack detection, and this dataset contains
104,000 normal and 119,585 malicious requests created on e-
The kernel size is smaller than the image size. You can con- commerce Web site shopping-cart application. The anomaly
sider a small two-dimensional 5 × 5 image with binary pixel requests include attacks such as SQL injection, buffer over-
values and an another 3 × 3 matrix. The 3 × 3 matrix is known flow, information gathering, CRLF injection, XSS, and parame-
as a kernel, and the matrix devised by gliding over the image ter tampering. CSIC2010v2 is the second version and improve-
and determining the dot product is called the convolved fea- ment version of the prior dataset CSIC 2010, where some sam-
ture or activation map or the feature map. Stride is the num- ples are not properly created. As a result, the former dataset
ber of pixels by which slide the kernel over the original image leads to the bias results for some classification algorithms (;
(Mandal and Bhattacharya, 2019). Torrano-Gimenez et al., 2010).
6 computers & security 100 (2021) 102096

Fig. 5 – Max pooling process.

most important reason for selecting URL and payload portions


4. Proposed CNN architecture are the vast majority of Web attacks by manipulating URLs
and payloads. The preprocessing is to segment the URL and
In this section, anomaly HTTP request detection experiments
payloads to a sequence of words. The URL and payload por-
are presented and discussed. Malicious internal or external
tions are uniquely parsed to create the dictionary cluster. Fig. 7
anomaly behaviors threaten Web applications. To detect a ma-
shows a piece of HTTP request URL and payload.
licious threat, it is necessary to detect outlier behavior. There-
The preprocessing consists of two steps. The first step is
fore, it is important to detect any malicious anomaly behavior.
the production of dictionaries; the second step is the produc-
The proposed approach can detect more than one behavior of
tion of matrixes. In all raw data, dataset is used to produce
malicious threats; however, the ultimate goal is to detect one
dictionaries. The URL and payloads are uniquely selected and
anomaly behavior per threat: HTTP request URL, User-Agent,
split into variables and values (variable=value) by special vari-
Accept-Language, Connection, Content-Length and payload
able and values filter which has special characters like “/”, “&”,
etc. HTTP anomaly detection can be divided into two cate-
“+”, etc. The number of same payloads in different HTTP re-
gories: stream-based and payload-based. Stream data records
quests are calculated as frequency by using the Bag-of-Words
statistics of the HTTP header and traffic, so stream-based
(BoW) technique. The BoW is a simple and flexible machine
anomaly detection is used to detect traffic attacks. The pay-
learning modeling method used to extract features from text.
load refers to the data portion of an HTTP packet and payload-
BoW is the histogram of words in a text, and every word count
based is used for anomaly detection. In this study, URL and
is considered as a feature. Thus, the frequency is determined
payload parameters are selected because these parameters
how many times the URL and payloads are repeated in the
can be manipulated by internal or external threats and these
whole dataset. The frequency is used as an important feature.
parameters are composed of users’ behaviors. URL and pay-
In Fig. 8, dictionary production process steps are presented.
load parameters are important for classification as well. These
After dictionary production, as shown in Fig. 8, HTTP re-
two parameters extracted from raw data using HTTP methods
quests labeled as normal and anomaly in dataset were used
such as GET, POST, and PUT. Anomaly and normal features are
for matrix production. For each HTTP request a 2-dimensional
produced by using URL and payload via the convolution layer,
array was created in the size of dictionaries and HTTP re-
and anomaly HTTP requests are determined.
quests. A 200 × 170 × 1 matrix was created to use CNN for each
HTTP request. If the payload matches an entry in the dictio-
4.1. Preprocess dataset nary, the label is set to 1 that is represented with white pixel
in image; if it does not, it is set to 0 that is represented with
Data preprocessing is the transformation of raw data into a black pixel in image. In Fig. 9, HTTP request matrix production
trainable format. Raw data may contain incomplete, incon- process is presented.
sistent, or inaccurate data. Data preprocessing is a proven The images created as given in Fig. 9 are used CNN model
method to solve such problems found in raw data. The HTTP input data for training. An image is created for each HTTP re-
requests in the CSIC2010v2 dataset are organized in a text quest. CNN based training model is presented in Fig. 10.
form. Fig. 6 shows an example and a piece of HTTP raw data
randomly retrieved from the dataset. The data preprocessing
aims to transform each record represented by characters to a
specific numerical feature vector, which is appropriate for bi- 5. Results and discussion
nary classification. This process involves six steps as follows.
These steps are cleaning, missing values, noisy data, integra- In this study, to validate the proposed model, a system is de-
tion, reduction, and transformation (Learn how to create bet- veloped to implement in experiments. The experiments were
ter models and predictions using data preprocessing, 2019). run under an environment with an Intel Core i7–6900 k, 4
In the preprocess, firstly the header parts of the HTTP re- DDR4 total 32 GB RAM, 1 CUDA-enabled NVIDIA TITAN XP
quests are removed. Since the header portions are generally graphics card, and Ubuntu 16.04. Python 3.6.3–64 bit and the
the same in the HTTP requests dataset, they reduce the size Kares TensorFlow 1.8.0 library were used.
of the data due to reduction of unnecessary data and elimi- In Table 1, experimental results on the data from
nate the duplication of the data in datasets. The URL and pay- CSIC2010v2 dataset are presented. Train and test data is ran-
load portions of the HTTP request in datasets are selected. The domly allocated between 70% and 30%. To find an effective
computers & security 100 (2021) 102096 7

Fig. 6 – Piece of the HTTP request.

Fig. 7 – URL and Payload of HTTP Request.

Fig. 8 – Dictionary Production.

Table1 – Experimental results of CSIC201v2 datasets.

Dataset F1 Precision Best Accuracy Average Accuracy FPR FNR Recall TNR TPR Epoch
CSIC2010v2 0.9696 0.9718 0.9644 0.9579 0.0400 0.0326 0.9674 0.9600 0.9674 200
0.9751 0.9743 0.9707 0.9684 0.0368 0.0240 0.9759 0.9631 0.9759 400
8 computers & security 100 (2021) 102096

Fig. 9 – HTTP request matrix process.

Fig. 10 – CNN based training model.


computers & security 100 (2021) 102096 9

Table 2 – Comparison with previous studies on literature.

Performance
Study Technique Datasets metric
(Nguyen and NaiveBayes CSIC2010 Accuracy: 72,78%
Franke, 2012) BayesNetwork Accuracy: 82,79%
Decision Stump Accuracy: 74,73%
RBFNetwork Accuracy: 72,46%
Majority Voting Accuracy: 81%
Hedge/Boosting Accuracy: 82,1%
A-IDS A-ExIDS Accuracy: 90,52%
Accuracy: 90,98%
(Kozik et al., 2014) Regular expression CSIC2010 Accuracy: 94,46%
(Epp et al., 2017) SVM CSIC2010v2 F1:82% F1:93%
CSIC2012
(Zhang et al., 2017) CNN CSIC2010 Accuracy: 96.49%
Proposed CNN CSIC2010v2 Accuracy: 97,07% F1:
97,51%

Fig. 11 – Accuracy Performance of CSIC2010v2 datasets.

way to transform HTTP requests into representations, BoW In the proposed architecture, two outputs were used, to
based architecture and distinguished HTTP request into vari- HTTP request classification, normal and anomaly. Binary cross
able = value were used and each variable and variable = value entropy is the best way for two output classifications. It is
frequency for all anomaly and normal HTTP requests were cal- a Sigmoid activation plus a Cross-Entropy loss. Unlike Soft-
culated. In the experimental setup two different experiments, max loss it is independent for each class, meaning that the
200 and 400, were carried out for dataset and results are pre- loss computed for every CNN output class is not affected by
sented in Table 1. Even if the experimental results are close other component values. So binary cross entropy used as a
to each other, slightly better results were obtained in 400 it- loss function. As optimization technique Stochastic Gradient
erations. In 400 epoch CSIC2010v2 dataset’s average accuracy Descent (SGD) is used, because SGD is much faster than the
is 0.9684. Other indicator results are presented in Table 1. It other algorithms. SGD is an iterative method for optimizing an
seems that the BoW based architecture has a good capacity of objective function with suitable smoothness properties. It can
representing characters in HTTP requests while retaining the be regarded as a stochastic approximation of gradient descent
information in the dataset. optimization, since it replaces the actual gradient calculated
A comparison of the proposed model in this study with pre- from the entire data set by an estimate thereof calculated from
vious studies is presented in Table 2. According to the compar- a randomly selected subset of the data.
ison, proposed model has better accuracy results. In Fig. 12 loss performance graph of CSIC2010v2 is pre-
As stated in Table 1, in Fig. 11 best accuracy performance sented. The loss value of training and validation are presented
graph of CSIC2010v2 is presented. Training accuracy is 1 and and the lowest loss value is 0.0993 for CSIC2010v2 dataset in
the best validation accuracy is 0.9707 for CSIC2010v2 dataset 400 epochs as well.
in 400 epochs as well.
10 computers & security 100 (2021) 102096

Fig. 12 – Loss Performance of CSIC2010v2 datasets.

Fig. 13 – ROC curve of CSIC2010v2 datasets.

Performance evaluation and measurement are important


issues in machine learning. Therefore, the AUC - ROC curve
6. Conclusion
is used to evaluate the performance of a classification prob-
In this study, Web attack detection architecture based on CNN
lem. ROC is a probability curve for different classes. A typi-
deep learning algorithm is introduced for anomaly detection
cal ROC curve has False Positive Rate (FPR) on the X-axis and
in HTTP Web traffic. The most optimal architecture is pro-
True Positive Ratio (TPR) on the Y-axis. AUC-ROC curve is a per-
posed with parametric variables and value experiments and
formance measurement for classification problems at various
data analysis. The proposed architecture is implemented in
thresholds settings. ROC is a probability curve and AUC rep-
the HTTP CSIC-2010v2 dataset with CNN algorithms. In pro-
resents the degree or measure of separability. The ROC curve
posed study, CNN-based detection architecture was developed
is plotted with TPR against the FPR where TPR is on the Y-
to automatically extract patterns in HTTP data. Normal and
axis and FPR is on the x-axis. The ideal value for AUC is 1. In
anomaly HTTP data classification are performed using CNN.
Fig. 13 for CSIC2010v2 AUC value is 0.9696. This means that
The CNN based architecture proposed in this paper classi-
proposed architecture classification is successful enough for
fies and extracts features that could not be extracted in pre-
used dataset.
vious traditional machine learning anomaly detection studies
Results show that with a certain training, CNN has
using the conventional machine learning methods. Because
achieved satisfactory results in detecting Web attacks and has
traditional methods need to be extracted manually. The pro-
a high detection rate while maintaining a low false positive
posed method preprocesses data using a bag of words. Further
rates.
computers & security 100 (2021) 102096 11

research is also required to automatically find the optimal Marston S, Li Z, Bandyopadhyay S, Zhang J, Ghalsasi A. Cloud
variable and value parameters for the CNN architecture. The computingâ˘Aˇ Tthe business perspective. Decis. Support Syst.
Web-based intrusion detection architecture proposed in this 2011;51(1):176–89.
Booth TG, Andersson K. Elimination of DoS UDP reflection
paper has good accuracy, FPR, and TPR metrics. According to
amplification bandwidth attacks, protecting TCP services. In:
the metric results, the proposed architecture was found to
International Conference on Future Network Systems and
be successful enough. Although the number and variety of Security. Springer; 2015. p. 1–15.
methods such as CNN deep learning are sufficient for Web- Rabai LBA, Jouini M, Aissa AB, Mili A. A cybersecurity model in
based attack detection, the number and content of datasets cloud computing environments. J. King Saud Univ.-Comput.
are very limited. According to the results, the proposed archi- Inf. Sci. 2013;25(1):63–75.
tecture can be used to prevent Web applications from Web- Sumra HBH, AbManan J-lB. Attacks on security goals
(confidentiality, integrity, availability) in VANET: a survey. In:
based attacks. Future planned studies may include more at-
Vehicular Ad-Hoc Networks For Smart Cities. Springer; 2015.
tack datasets and consider multi-class classification tech- p. 51–61.
niques using proposed CNN based classification architecture. Cherdantseva Y, Burnap P, Blyth A, Eden P, Jones K, Soulsby H,
Stoddart K. A review of cybersecurity risk assessment
methods for SCADA systems. Comput. Sec. 2016;56:1–27.
Halfond WG, Viegas J, Orso A. A classification of SQL-injection
Declaration of Competing Interest attacks and countermeasures. In: Proceedings of the IEEE
International Symposium on Secure Software Engineering,
The authors declare that they have no known competing fi- vol. 1. IEEE; 2006. p. 13–15.
nancial interests or personal relationships that could have ap- Johari R, Sharma P. A survey on web application vulnerabilities
peared to influence the work reported in this paper. (SQLIA, XSS) exploitation and security engine for SQL
injection. In: International Conference on Communication
The authors declare the following financial inter-
Systems and Network Technologies (CSNT). IEEE; 2012.
ests/personal relationships which may be considered as p. 453–8.
potential competing interests: Kumar P, Pateriya R. A survey on SQL injection attacks, detection,
and prevention techniques. In: Third International
Conference on computing Communication & Networking
CRediT authorship contribution statement Technologies (ICCCNT). IEEE; 2012. p. 1–5.
... & Hassan MM, Nipa SS, Akter M, Haque R, Deepa FN,
Rahman M, Sharif MH. Broken authentication and session
Adem Tekerek: Conceptualization, Methodology, Software,
management vulnerability: a case study of Web application.
Data curation, Writing - original draft, Visualization, Investi-
Int. J. Simul. Syst. Sci. Technol. 2018;19(2) 6-1.
gation, Software, Validation, Writing - review & editing. T. O. W. A. S. The project, “OWASP Top Ten 2013 Project,” 2013.
[Online]. Available:
https://www.owasp.org/index.php/Top_10_2013-Top_10
Torrano-Gimenez C, Perez-Villegas A, Marañón GÁ, et al. An
Acknowledgment anomaly-based approach for intrusion detection in web
traffic. J. Inf. Assur. Sec. 2010a;5(4):446–54.
This study is supported by NVIDIA Corporation. All experi- Tekerek A, Bay OF. Design and implementation of an artificial
mental studies were performed on a TITAN XP graphic card intelligence-based web application firewall model. Neural
donated by NVIDIA. I would like to thank NVIDIA for support. Netw. World 2019;29(4):189–206.
Tama BA, Comuzzi M, Rhee K-H. TSE-IDS: a two-stage classifier
ensemble for intelligent anomaly-based intrusion detection
R E F E R E N C E S
system. IEEE Access 2019;7 94 497–94 507.
Torrano-Gimenez C, Perez-Villegas A, Marañón GÁ, et al. An
anomaly-based approach for intrusion detection in web
traffic. J. Inf. Assur. Sec. 2010b;5(4):446–54.
Verdouw CN, Wolfert J, Beulens AJ, Rialland A. Virtualization of Nguyen HT, Torrano-Gimenez C, Alvarez G, Petrović S, Franke K.
food supply chains with the internet of things. J. Food Eng. Application of the generic feature selection measure in
2016;176:128–36. detection of web attacks. In: Computational Intelligence in
Shankar V, Venkatesh A, Hofacker C, Naik P. Mobile marketing in Security for Information Systems. Springer; 2011. p. 25–32.
the retailing environment: current insights and future Nguyen HT, Franke K, Petrović S. Reliability in a feature selection
research avenues. J. Interact. Mark. 2010;24(2):111–20. process for intrusion detection. In: Reliable Knowledge
Pantano E. Ubiquitous retailing innovative scenario: from the Discovery. Springer; 2012. p. 203–18.
fixed point of sale to the flexible ubiquitous store. J. Technol. Nguyen HT, Franke K. Adaptive intrusion detection system via
Manage. Innov. 2013;8(2):84–92. online machine learning. In: 12th International Conference on
Pantano E, Priporas C-V. The effect of mobile retailing on Hybrid Intelligent Systems (HIS). IEEE; 2012. p. 271–7.
consumers’ purchasing experiences: a dynamic perspective. Kozik R, Choraś M, Renk R, Hołubowicz W. Modelling HTTP
Comput. Human Behav. 2016;61:548–55. requests with regular expressions for detection of cyber
Friedewald M, Raabe O. Ubiquitous computing: an overview of attacks targeted at web applications. In: International Joint
technology impacts. Telemat. Inform. 2011;28(2):55–65. Conference SOCO’14-CISIS’14-ICEUTE’14; 2014. p. 527–35.
Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Epp N, Funk R, Cappo C, Lorenzo-Paraguay S. In: Workshop
Lee G, Patterson D, Rabkin A, Stoica I, et al. A view of cloud Regional de Segurança da Informação e de Sistemas
computing. Commun. ACM 2010;53(4):50–8. Computacionais. Anomaly-based web application firewall
Dinh HT, Lee C, Niyato D, Wang P. A survey of mobile cloud using HTTP-specific features and one-class svm; 2017.
computing: architecture, applications, and approaches. Wirel.
Commun. Mobile Comput. 2013;13(18):1587–611.
12 computers & security 100 (2021) 102096

Tian Z, Luo C, Qiu J, Du X, Guizani M. A distributed deep learning LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning
system for Web attack detection on edge devices. IEEE Trans. applied to document recognition. Proc. IEEE
Ind. Inform. 2019. 1998;86(11):2278–324.
Luo, C., Su, S., Sun, Y., Tan, Q., Han, M., & Tian, Z. A Mandal JK, Bhattacharya D. Emerging Technology in Modelling
convolution-based System for Malicious URLs Detection. and Graphics: Proceedings of IEM Graph 2018 (Vol. 937).
Jin X, Cui B, Yang J, Cheng Z. In: International Conference on Springer, 2019.
Broadband and Wireless Computing, Communication and Learn how to create better models and predictions using data
Applications (pp. 482-488). Payload-based web attack preprocessing, “Data preprocessing in detail”, [Online].
detection using deep neural networks Cham. Springer; 2017. Available: https://developer.ibm.com/technologies/analytics/
Zhang M, Xu B, Bai S, Lu S, Lin Z. In: International Conference on articles/data-preprocessing-in-detail, 2019.
Neural Information Processing (pp. 828-836). A deep learning
method to detect web attacks using a specially designed CNN Adem Tekerek: He is a Doctor Instructor at Gazi University, De-
Cham. Springer; 2017. partment of Information Technology. He graduated from Faculty
Wang J, Zhou Z, Chen J. Evaluating CNN and LSTM for web attack of Technical Education, Electronics and Computer Education De-
detection. Proceedings of the 2018 10th International partment in 2007. He graduated from MSc program of Informatics
Conference on Machine Learning and Computing (pp. Institute in 2010. His master thesis is about Content Management
283-287), 2018. Systems. He graduated from PhD program of Informatics Institute
LeCun Y, et al. Handwritten digit recognition with a in 2016. His-PhD thesis is about Web Application Firewall algo-
back-propagation network. In: Advances in Neural rithms. He has published 24 papers on computer sciences. His-
Information Processing Systems; 1990. p. 396–404. research is data mining, machine learning and their applications
especially on information security.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy