0% found this document useful (0 votes)
16 views99 pages

AIML Post Mid Sem1

AI and ML Notes for reading
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views99 pages

AIML Post Mid Sem1

AI and ML Notes for reading
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 99

CS-11

Intrusion Detection – I (Scanning)

What is Scanning?
• Procedure to identify live hosts, ports, and services, discover operating system and
architecture of target system.
• Collects information using complex and aggressive reconnaissance techniques.
• Identifies vulnerabilities and threats to network:
• Missing patches
• Unnecessary services
• Weak authentication
• Weak encryption algorithms
• ….
• Targets multiple destinations, e.g. several host IP addresses or services on various
ports.
Who is Interest in Scanning?
• Three type of users have interest in scanning:
• System administrators to audit networks by scanning the infrastructure.
• Peers looking for previous collaborators via P2P services.
• Attackers to detect the vulnerabilities of cyber infrastructures
• Attackers use scanning to:
• Collect intelligence of the computer network systems to break into the
systems via detecting vulnerable sites.
• Search for the paths to live and accessible resources in the system.
• Send a series of messages to the targeted system and learn the services and
the weakness in the structures of the infrastructure through the feedback
from these messages.
• Use the collected information to prepare for attacks
Scanning Example
• Responses to a message from a computer can reveal information about the IP
addresses, OS and architecture of the system.
• Ping sweep (ICMP ECHO) queries multiple hosts using the message of ICMP ECHO
request packets.
• Ping sweep generates a reply from the targeted system if the system is alive.
• Portscan finds ports or services that are alive and running on the targeted system
by connecting to the TCP or UDP ports of the system.
$ nmap -p 1-512 192.168.2.92
$ Starting Nmap 6.40 (http://nmap.org ) at 2018-05-11 17:52 EDT
$ Nmap scan report for 191.239.213.197 Host is up (0.079s latency).
Not shown: 510 filtered ports PORT STATE SERVICE
– 21/tcp open ftp
– 80/tcp open http
– 443/tcp open https
Scan Detection Techniques
• Scan detection techniques are classified into two groups
• Single source portscan detection
• Distributed source portscan detection
• Network traffic data includes connection information such as
• Source IPs
• Durations of the connections
• Starting and ending time of connections
• Others …
• Scan detection technique searches for similar and anomaly patterns among traffic
data.
UC Irvine: Kistune Network Attack Dataset
• Each row in the dataset (csv file) is a packet captured chronologically.
• Each row (feature vector) is recent (temporal) statistics which describes the
context of the packet's channel and its communicating parties.
• For each packet a behavioral snapshot is extracted of the hosts and protocols which
communicated the given packet.
• Snapshot consists of 115 traffic statistics capturing a small temporal window into:
• packet's sender in general
• traffic between the packet's sender and receiver.
• Statistics summarize all of the traffic
• originating from this packet's source MAC and IP address (denoted SrcMAC-
IP)
• originating from this packet's source IP (denoted SrcIP)
• sent between this packet's source and destination IPs (denoted Channel)
• sent between this packet's source and destination TCP/UDP Socket (denoted
Socket).
UC Irvine: Kistune Network Attack Dataset
• 23 features are extracted from a single time window.
• These features are extracted from a total of five time-damped windows of:
• 100ms, 500ms, 1.5sec, 10sec, and 1min into the past
• totalling 115 features.
• Not every packet applies to every channel type
• there is no socket if the packet does not contain a TCP or UDP datagram
• in these cases, these features are marked zero
• Feature Extractor passes the final feature vector (~x), to the Feature Miner (FM), is
always a member of R(n), where n = 115.
• Feature Extraction code Ref: https://github.com/ymirsky/Kitsune-py

This image illustrates a typical process flow for scan detection in network security. Here's a
breakdown of the steps involved:
1. Collection of Network Traffic Data:
 The first step is to collect relevant network traffic data. This can be done using various
methods like network packet capture tools, flow logs, or intrusion detection systems
(IDS).
 The collected data can include information about source and destination IP addresses,
port numbers, protocols used, and the content of the traffic.
2. Data Pre-processing:
 The collected network traffic data is often noisy and unstructured.
 Data pre-processing involves cleaning and transforming the data to make it suitable
for analysis. This may include:
o Filtering out irrelevant traffic
o Normalizing data formats
o Extracting relevant features
3. ML for Scan Patterns:
 Machine Learning (ML) techniques are employed to identify patterns and anomalies
that indicate scanning activities.
 ML algorithms can be trained on historical network traffic data to learn what normal
traffic looks like.
 Once trained, the ML model can be used to analyze real-time network traffic and flag
suspicious activities that may indicate a scanning attack.
4. Scan Detection:
 Based on the analysis of network traffic and the identified scan patterns, the system
can detect various types of scans, such as port scans, vulnerability scans, and
reconnaissance scans.
5. Report and Analysis:
 Once scans are detected, detailed reports are generated, providing information about
the type of scan, the source IP address, target IP addresses, and other relevant details.
 These reports can be used by security analysts to investigate the incident further, take
appropriate actions, and improve security measures.
Overall, this process enables organizations to proactively identify and respond to
scanning activities, which can be precursors to more serious attacks like exploitation
and data breaches.
===================================================================
Scan Detection Techniques
• Based on input data, Scan detection methods are classified into two categories:
• Packet-based detection: Uses packet level information
• Flow-based detection: Uses the aggregated traffic information obtained by
network tools
Packet Based Detection
• Messages and Headers transported by packets are analysed
• Content of messages generated in the application layer is different for different
protocols
• HTTP generates particular HTTP request messages.
• TCP and UDP, with TCP and UDP headers, are the two most common protocols in
the transportation layer.
• IP, with the IP header, is the most common protocol in the network layer.
• Ethernet header is generated in the Physical/Ethernet layer.
DQN: Deep Q Network
• Deep Q-network (DQN) algorithm is a model-free, online, off-policy reinforcement
learning method.
• DQN agent is a value-based reinforcement learning agent that trains a critic to
estimate the expected discounted cumulative long-term reward when following the
optimal policy
• Naive DQN has 3 convolutional layers and 2 fully connected layers to estimate Q
values directly from images.
• Linear DQN has 1 fully connected layer with a learning technique.
• DQN overcomes unstable learning by using 4 techniques:
• Experience Replay
• Target Network
• Clipping Rewards
• Skipping Frames
Packet Based Detection
• Step 1: Network traffic recorded in the ’pcap’ files is separated into small traffic files
according to the session-based partition rules.
• If two packets share the same 5-tuple knowledge (source IP, source port,
destination IP, destination port, transportation protocol), then they will be
categorized in the same session.
• Several separated sessions in the order of capture time are obtained from the
‘pcap’ files.
• Step 2: Separated sessions are converted into images using the image embedding
method.
• A network packet in a session consists of an Ethernet header, a TCP or UDP
header, an IP header and application messages.
• Total field length of the first three headers (except the application message)
is 54 bytes.
• Application message field is dropped during image embedding because of its
varying length.
• 54-byte packet is converted to a line of the image, with each byte
representing one pixel.
• An image of 54x54 size is created using 54 packets from a session.
• Additional images are created for sessions with more than 54 packets
• Zero-padding is used if a session has less than 54 packets.
• Packets are embedded in the order in which they appear in the session.
• Step 3: Each session is labelled as per the time stamp given in the log file of the raw
dataset.
• Step 4: All generated images’ pixels are normalized into [0, 1] from [0, 255].
Flow Based Detection
• Flow-level intrusion detection usage traffic flow characteristics, which usually
contain numerous packets and are extracted for detecting attacks.
• Flow information includes the statistics of a flow, such as:
• Number of packets
• Flow duration
• Average packet size
• Transportation protocol etc.
• A 5-tuple knowledge defines a flow that includes source IP, source port,
transportation protocol, destination IP and destination port.
• Denial of Service and Distributed Denial of Service attacks tend to transmit a large
number of packets in a short time.
• Overall architecture is similar to that of packet based detection with minor
modifications in pre-processing and RL module.

Flow Based Detection


• Step 1: Discrete and categorical features are converted into continuous features,
which can be processed by DNNs.
• Transformation of categorical features is done using one-hot encoding.
• Transformation of discrete features is done using an N-bit binary encoding
approach.
• Value of N, which ensures that N-bit binaries can encode all values of this
discrete feature, is chosen based on the maximum value of the feature.
• Step 2: max–min normalization is performed on the encoded dataset, converting all
values to [0, 1].
• Data dimension considerably increases after implementing one-hot encoding
and binary encoding.
• A single hidden layer Stacked Auto Encoder (SAE) is used with the encoded
dataset for dimension reduction.
• SAE-based approach helps in conducting dimension reduction and feature
extraction.
• Encoder is extracted as the primary structure of the RL agent.
Flow Based Detection
• Episode’s length needs to be determined and is assumed to be fixed in the
interaction process.
• Main structure of the update module is a Fully Connected Neural Network (FCN).
• A Detection agent and a Sample agent are used to facilitate the adversarial training.
• Detection Agent performs the intrusion class’s correct prediction (action) by
achieving maximum rewards.
• Sample agent tends to counteract the agent for better variability.
• Sample agent chooses a class that is most likely to be misclassified and suggests it as
the class to be sampled from for the next state.
• Reward feedback of the detection agent and sample agent is the opposite.
• If the agent’s prediction is wrong, the reward feedback of the sample agent is 1
• If the agent’s prediction is correct, the reward feedback of the sample agent is −1
• Sample agent is the adversarial training agent and ensures that state-action pairs
with high classification error rates can be adequately trained.
• Interaction stage focuses on episodic tasks, hence, the length of an episode is
important and needs to be specified in advance.
• Features collected as part of the traffic are considered as the states.
• State is fed into the agent (classifier) which then outputs the prediction (action) for
the state.
• Reward is obtained by feeding the action-label pair into the reward mechanism.
• If the current state is not reaching the end of the present episode, the current state
is fed to the sample agent to obtain the next sample class.
• Stores the state, action, reward and next state in the replay buffer.
• Treats the next state as the current state and repeat the above process.

Flow Based Detection: Exploration Module


• Most datasets collected for intrusion detection are unbalanced.
• Majority of traffic in the real world is normal traffic, so it is easier to collect normal
traffic than malignant traffic.

• In a flow-level IDS, both the 𝜀-greedy policy and Conditional Generative Adversarial
• Exploration module in RL is conductive to solve these class imbalance problems.

• The 𝜀-greedy policy applied to the sample agent, where 𝜀 controls the exploration
Network (CGAN) are used for the exploration.

degree.
• Purpose of CGAN is to generate some novel attack flows for each class, which will

• CGAN exploration rate 𝜎 controls the extent of the exploration.


augment the fixed dataset.

• CGAN takes the class label and noise as the input and outputs a state that belongs to
each class.
• Generator’s functionality is to generate simulated states that can deceive the
discriminator.
Scan Detection Methods
• Stealthy Probing and Intrusion Correlation Engine (SPICE)
• Statistical Packet Anomaly Detection Engine (SPADE).
• Graph-based Intrusion Detection System (GrIDS)
• Threshold Random Walks
• Expert Knowledge based Rule based Data Mining
• Logistic Regression in Horizontal and Vertical Scanning
Stealthy Scan Detection: SPICE and SPADE Using Cluster and Correlation Methods
• Stealthy port scans refer to the varieties of scan techniques that can elude
traditional IDS systems.
• Examples:
• Randomizing the scanning order of IP addresses and port sequences
• Randomizing the scanning lull
• Slowing down the scanning frequencies
• Randomizing attack resource IPs and ports.
• Traditional IDS systems like SNORT or Graph-based intrusion detection system
(GrIDS) use the occurrence of connections on resource IPs within ‘Short Time’
window.
• ‘Long Time’ window requires massive amount of normal traffic to be
searched patterns which becomes difficult to manage.
• Attackers tend to use slow randomized stealth scan to bypass the time window.
SPICE and SPADE Using Cluster and Correlation Methods in Stealthy Scan Detection
• Attacker is trying to gather in a systematic way rather than a normal user.
• Some of packets will be anomalous i.e. searching for port 98 (linuxconf port) on a
Windows host.
• Such anomalous packets will be saved for a longer period and grouped together to
find a pattern.
• Packets which form a sizeable group are saved and analysed. Normal traffic is timed
out quickly.
• A SPICE algorithm has two components:
• Anomaly sensor
• Correlator.
• Anomaly Sensor (SPADE) monitors the network and assigns anomalous scores to
each event.
• Events that are sufficiently anomalous are passed to correlator which groups them
together and reports scans.
• Uses an anomaly score to estimate the total information of a scan footprint based
on the conditional probability distribution of normal traffic packets.
• Traffic packets data includes source and destination addresses and ports, over days
or weeks.
• Packets with anomaly scores higher than a threshold are reported to the event
correlation engine.
• Correlation engine applies a simulated annealing algorithm to cluster the
anomalous packets and sent out reports of unusual activity (e.g. port scans).
• SPADE can also be used as pre-processor plug-in of SNORT.
• Correlation engine maintains the records of event likelihood, from which the
anomalousness of a given packet is approximated.
• Joint probabilities of a destination IP and ports and a source IP and ports, is
derived, given the probability of a destination port and a source IP based on:
• Conditional probability of P (source port | destination port)
• Conditional probability of P (destination IP | source IP, destination port)
• Correlation engine groups packets and the heuristics between events into the
architecture of a correlation graph.
• In correlation graphs, nodes (packets) denote an event and an undirected edge
denotes the correlation strength between nodes.
• Correlation strength between packets can be calculated using heuristic functions.
Coordinated Scan Detection: GrDIS Using Rule-Based Machine Learning
• Rule Sets
• Rule sets are defined by users to specify details about graph construction
• Rules are independent of other rules
• Connection data is applied to all rules
• Rule sets contain pre-conditions to filter out data which is not relevant to the
rule set i.e. port id, source IP etc
• If the data passes thru pre-conditions, it can be added to graph space
• Graph Aggregation
• When network activity crosses outside of department boundaries, the graphs
are passed up for further analysis
• A collection of hosts belonging to the same department can be reduced to
single node representing the whole department
• In reduction, graph attributes are kept but some sub-graph topology may be
lost.
Horizontal Scan Detection: Threshold Random Walk
• Threshold Random Walk algorithm automatically determines if a connection will
fail or succeed
• A successful connection drives the Random Walk upwards
• A failed connection drives the Random Walk downwards
• Benign remote sources have more precise knowledge about the targeted
hosts
• Their successful connection rate is higher than the scan rate.
• Tracks success and failure connection attempts to
• New address
• New address to old port
• old ports at old address
Scan Detection: Expert Knowledge-Rule-Based Data- Mining Method
• Features of a destination IP and port accessed by source IP (4):
• Averaged number of distinct destination IPs
• Average number of destination ports on destination IPs
• Features of source IPs describing the behavior of the source IP (6)
• Ratio of distinct destination IPs that attempted to connect by the source IP
that did not provide any service on destination ports to any source.
• Features of individual destination ports (4)
• Ratio of a distinct destination IP that attempted to connect by the source IP
that did not provide any service on destination ports to any source.
• A rule-based learning classification algorithm RIPPER is used for classification.
• RIPPER is efficient and effective in dealing with imbalanced and nonlinear data.
• Model performed better than TRW with faster speed of detection.
Scan Detection: RIPPER Algorithm
• RIPPER (Repeated Incremental Pruning to Produce Error Reduction) is a Rule-
based classification algorithm.
• RIPPER derives a set of rules from the training set.
• Works well on datasets with imbalanced class distributions.
• Works well with noisy datasets as it uses a validation set to prevent model
overfitting.
• RIPPER principle:
• Among the records given, it identifies the majority class (which appears the
most)
• Takes this class as the default class.
• If there are 100 records and 80 belong to Class A and 20 to Class B. then Class
A will be default class.
• For the other class, it tries to learn/derive various rules to detect that class.
• Consider all the classes that are available and then arrange them on the basis of
their frequency in a particular order (say increasing).
• C1,C2,C3,......,Cn
• C1 - least frequent
• Cn - most frequent
• The class with the maximum frequency (Cn) is taken as the default class.
• Rule Derivation:
• In the first instance, it derive rules for those records which belong to class C1.
• Records belonging to C1 will be considered as positive examples (+ve) and
other classes will be considered as negative examples (-ve).
• Sequential Covering Algorithm is used to generate the rules that discriminate
between +ve and -ve examples.
• Next, derive rules for C2 distinguishing it from the other classes.
• This process is repeated until left with Cn (default class).
• Ripper extracts rules from minority class to the majority class.
• Rule Growing in RIPPER Algorithm:
• Ripper makes use of general to a specific strategy of growing rules.
• It starts from an empty rule and goes on adding the best conjunct to the rule
antecedent.
• For evaluation of conjuncts the metric is chosen is FOIL’s Information Gain -
best conjunct is chosen.
• Stopping Criteria for adding the conjuncts – when the rule starts covering the
negative (-ve) examples.
• The new rule is pruned based on its performance on the validation set.
• To identify whether a particular rule should be pruned or not, following metric is
used:
• (P-N)/(P+N)
• P = number of positive examples in the validation set covered by the rule.
• N = number of negative examples in the validation set covered by the rule.
• Whenever a conjunct is added or removed, the value of the metric is calculated for
original rule (before adding/removing) and new rule (after adding/removing).
• If the value of the new rule is better than the original rule then add/remove the
conjunct else the conjunct will not be added/removed.
• Consider a rule:
• ABCD ---> Y ,where A,B,C,D are conjuncts and Y is the class.
• First it will remove the conjunct D and measure the metric value.
• If the quality of the metric is improved the conjunct D is removed.
• If the quality does not improve then the pruning is checked for CD, BCD and
so on.
Horizontal and Vertical Scan Detection on Large Networks: Logistic Regression
• Usage traffic traces in each event according to destination IP and ports.
• Six features for each event are used for analysis:
• percentage of traces that appear to have a payload
• percentage of flows with fewer than three packets
• ratio of flag combinations with an ACK flag set to all flows
• average number of source ports per destination IP address
• ratio of the number of unique destination IP addresses to the number of
traces
• ratio of traces with a backscatter-related flag combination such as SYN-ACK
to all traces
• Logistic regression model calculates the probability of an event containing a scan
using the above 6 features

Data Sets
Example: Practical Learning
• Problem Statement
• Build a network intrusion detector, a predictive model capable of
distinguishing between bad connections, called intrusions or attacks, and
good normal connections.
• Attacks Detected: Attacks categorised into four main categories:
• #DOS: denial-of-service, e.g. SYN Flood;
• #R2L: unauthorized access from a remote machine, e.g. Guessing password;
• #U2R: unauthorized access to local superuser (root) privileges, e.g. “buffer
overflow” attacks
• #probing: surveillance and another probing, e.g. port scanning.
• Dataset Used: KDD Cup 1999 dataset
• Reference
https://www.geeksforgeeks.org/intrusion-detection-system-using-machine-learning-
algorithms/
CS-13
Intrusion Detection – II

Boosting Algorithms
What is Boosting?
• Freund and Schapire developed Boosting in 1997.
• Boosting is an ensemble modelling technique suitable for binary classification
problems.
• Boosting algorithms improve the prediction power by converting a number of weak
learners to a strong learner.
• Basic principle of Boosting algorithms:
• build the first model on the training dataset
• build a second model to rectify the errors present in the first model.
• Continue the process till the errors are minimized, and the dataset is predicted
correctly.
Types of Boosting Algorithm
• There are 3 types of Boosting algorithms:
• GradientBoost
• Xtreme GradientBoost
• AdaBoost
• All Boosting algorithms work in similar manner
• Combine multiple weak learners to reach the final output or a strong
learners.

GradientBoost Algorithm
GradientBoost
• Builds a final model from the sum of several weak learning algorithms that are
trained on the same dataset.
• First weak learner is not trained on the dataset.
• First weak learner simply returns the mean of the relevant column.
• Residual for the first weak learner’s output is calculated and used as the output
column or target column for the next weak learner’s training.
• Second weak learner is trained using the same method, and the residual is
computed.
• Computed residual is utilized as an output column for the third weak learner.
• Process continues until a zero residual is achieved.
eXtreme GradientBoost (XGBoost)
• An extreme variation of the Gradient boosting algorithm.
• Key difference between XGBoost and Gradient Boosting is that XGBoost applies a
regularisation approach.
• Regularisation enables XGBoost to outperform a standard Gradient Boosting
algorithm.
• Works faster
• Has better accuracy
• Works better when the dataset contains both numerical and categorical
variables.

AdaBoost Algorithm
AdaBoost (Adaptive Boosting)
• Works on stagewise addition method where multiple weak learners are used to
create a strong learner.
• Creates binary stumps of decision tree
• Influence of a stump on final classification is known as alpha parameter
• Value of the alpha parameter is indirectly proportional to the error of the weak
learner
• For Gradient Boosting and XGBoost, the alpha parameter calculated is related to the
errors of the weak learner.
• AdaBoost is a supervised learning algorithm

AdaBoost Algorithm Accuracy


• Multiple combined algorithms provide superior prediction accuracy
• Decision Tree provides accuracy of 80%.
• KNN provides 75% accuracy
• Linear Regression provides 70% accuracy
• Combination of all these algorithms will provide more than 80% accuracy
AdaBoost: Working Example
• Step 9: Now this acts as new dataset, and we need to repeat all the above steps.
• Assign equal weights to all the data points.
• Find the stump that does the best job classifying the new collection of samples by
finding their Gini Index and selecting the one with the lowest Gini index.
• Calculate the “Amount of Say” and “Total error” to update the previous sample
weights.
• Normalize the new sample weights.
• Iterate through these steps until and unless a low training error is achieved.
• Suppose, with respect to our dataset, we have constructed 3 decision trees (DT1,
DT2, DT3) in a sequential manner.
• If we send our test data now, it will pass through all the decision trees, and finally,
we will see which class has the majority, and based on that, we will do predictions
for our test dataset.
AdaBoost Algorithm
1. Initialise the dataset and assign equal weight to each of the data point.
2. Provide this as input to the model and identify the wrongly classified data points.
3. Increase the weight of the wrongly classified data points.
4. if (got required results)
Goto step 5 else
Goto step 2
5. End
Gini Index
What in Gini Index?
• Gini Index or Gini impurity measures the degree or probability of a particular
variable being wrongly classified when it is randomly chosen.
• What is Impurity’?
• If all the elements belong to a single class, then it is called a pure class.
• Degree of Gini Index varies between 0 and 1
• 0 denotes that all elements belong to a certain class or there exists only one
class (pure)
• 1 denotes that the elements are randomly distributed across various classes
(impure).
• A Gini Index of 0.5 denotes equally distributed elements into some classes.
Intrusion Detection System Using AdaBoost
AdaBoost Based IDS
• Statistical flow features from the protocols (TCP, UDP, IP4, IP6 etc) used based on
their potential properties.
• Statistically generated features are related to normal and suspected activities.
• AdaBoost based IDS can be developed using decision tree, naïve Bayes, and artificial
neural network.
• Use of a decision tree’s-based Adaboost model, which uses the basic principles of
decision trees as a primary classifier
AdaBoost Based IDS
• Dataset used (UNSW-NB15) contains 49 features utilized in detecting network
intrusions.
• Data were divided into training and testing sets.
• Feature selection was applied based on the correlation matrix.
• Trained Adaboost model, which used a decision tree as a classifier using maximum
depth = 2
• Model complexity increases when depth is increased, and the tree tends to overfit
for higher depth values.
UNSW-NB15 Dataset Overview
• Raw network packets of the UNSW-NB 15 dataset were created by the IXIA
PerfectStorm tool in the Cyber Range Lab of UNSW Canberra.
• Tcpdump tool was used to capture 100 GB of the raw traffic (Pcap files).
• Dataset has nine types of attacks: Fuzzers, Analysis, Backdoors, DoS, Exploits,
Generic, Reconnaissance, Shellcode and Worms.
• Total 49 features with the class label.
• Key details of UNSW-NB15_features.csv file.
• Total number of records is two million and 540,044 stored in the four CSV
files, namely, UNSW-NB15_1.csv, UNSW-NB15_2.csv, UNSW-NB15_3.csv and
UNSW-NB15_4.csv.
• Ground truth table is named UNSW-NB15_GT.csv and the list of event file is
called UNSW-NB15_LIST_EVENTS.csv.
• A part of the dataset was configured as a training set and testing set namely,
UNSW_NB15_training-set.csv and UNSW_NB15_testing-set.csv respectively.
• Training set has 175,341 records and the testing set has 82,332 records from
different types, attack and normal.
Deep Learning for Intrusion Detection System
• Autoencoders
• Recurrent Neural Networks
• Deep Neural Networks
• Deep Belief Networks
• Convolution Neural Networks

01: Auto-Encoder
• Massive growth of network traffic data leads to a large volume of datasets.
• Labelling these datasets to identify intrusion attacks is laborious and error-prone.
• Traditional unsupervised solutions do not consider spatio-temporal correlations in
traffic data.
• A unified Autoencoder based on combining multi-scale convolutional neural
network and long short-term memory (MSCNN-LSTM-AE) for anomaly detection is
an effective solution.
• Model employs Multiscale Convolutional Neural Network Autoencoder (MSCNN-AE)
to analyze the spatial features of the dataset
• Latent space features learned from MSCNN-AE employs Long Short-Term Memory
(LSTM) based Autoencoder Network to process the temporal features.
• Model further employs two Isolation Forest algorithms as error correction
mechanisms to detect false positives and false negatives to improve detection
accuracy.
• Model was trained using NSL-KDD, UNSW-NB15, and CICDDoS2019 dataset and
outperformed the conventional unsupervised methods.

IDS based on AutoEncoder and Random Forest (RF).

• To make the model efficient in terms of computational and time, only the encoder
part of AE is utilized to make it work in a nonsymmetric fashion.
• Two non-symmetric AutoEncoders, with three hidden layers each, are arranged in a
stacked manner.
• Random Forest is used for classification.
• Experiments were performed for multiclass classification scenarios using KDD Cup
'99 and NSL-KDD datasets.
• Model showed efficiency compared to Deep Belief Network (DBN) used in terms of
detection accuracy and reduced training time.
• Model showed inefficiency for detecting R2L and U2R attacks due to lack of data for
training the model.

02: Auto-Encoder
• An IDS using Stacked Sparse AutoEncoder (SSAE) and SVM.
• SSAE was used as the feature extraction method and SVM as a classifier.
• Binary-class and multi-class classification problem is considered for conducting
experiments.
• Results showed the proposed model superiority in performance comparing
different feature selection, ML, and DL methods using the NSL-KDD dataset.
• Model achieves reasonable detection rates for U2R and R2L attacks, but it is still less
comparing the other classes of the dataset.
03: Auto-Encoder
• An efficient two-stage model based on Stacked AutoEncoder.
• Initial stage classified the dataset into the attack and normal classes with probability
values.
• Probability scores are used as an additional feature and are input to the final
decision stage for normal and multiclass attack classification.
• Performance of the proposed model was tested using KDD Cup'99 and UNSWNB15
datasets.
• A different methodology was adopted for both datasets to reduce the problems due
to class imbalance of the datasets.
• Down sampling was performed to remove repeated records from KDD Cup’99
• Up sampling of the dataset was performed using SMOTE to balance the distribution
of records in UNSWNB15.
• Pre-processing of the dataset significantly improves the DR efficiency of attack class
with lower training instances.
04: Auto-Encoder
• Used AE to a multistage model involving the ID convolution layer and two stacked
fully connected layers.
• In the initial unsupervised stage, two AEs were trained separately using Normal and
Attack flows to reconstruct the samples again.
• In the supervised stage, these new reconstructed samples are used to build a new
augmented dataset that is used as input to a 1D-CNN.
• Output of this convolution layer is flattened and fed to fully connected layers, and
lastly, a softmax layer classifies the dataset.
• Experiments performed on the KDD Cup'99, UNSWNB15, and CICIDS2017 datasets
and the model achieves superior performance compared to other DL models.
• Drawbacks are:
• Does not show how the minority classes perform using this model
• Does not provide any information on the characteristics of the attack
01: Recurrent Neural Networks
• RNN-based IDS designed for binary and multi class classification of the NSL- KDD
dataset.
• Model was tested using a different number of hidden nodes and learning rates.
• Results showed that different learning rates and the number of hidden nodes affect
the accuracy of the model.
• Best accuracy was obtained using 80 hidden nodes and a learning rate of
0.1 and 0.5 for binary and multi class scenarios.
• Model performed well compared to ML algorithms and a reduced-sized RNN model.
• Main shortcoming of this model is the increase in computational processing which
results in high model training time and lower detection rate for the R2L and U2R
classes.
02: Recurrent Neural Networks
• An IDS based on RNN using GRU as the main memory together with the multilayer
perceptron and a softmax classifier.
• Model was tested using KDD Cup'99 and NSL-KDD datasets.
• Experimental results showed good detection rates for comparing other
methodologies.
• Major drawback of their model is lower detection rates for minority attack classes
like U2R and R2L.
• NSL-KDD is considered as the benchmark dataset and the experimental results
showed that LSTM and Deep CNN achieved higher accuracy results comparing other
models.

01: Deep Neural Network


• A hybrid scalable DNN framework called as scale-hybrid-IDS-AlertNet, for intrusion
identification at both host and network level.
• Apache Spark cluster computing platform was used for implementing the scalable
platform.
• For NIDS, the model was tested using publically available datasets like KDDCup 99,
NSL-KDD, Kyoto, UNSW-NB15, WSN-DS, and CICIDS 2017.
• Experiment results showed the superiority of the proposed model comparing
different ML algorithms.
01: Deep Belief Network
• A distributed model based on the BDN and multilayer ensemble SVM for large-scale
network ID based on Apache Spark.
• DBN was used for extracting features, which are then forwarded to the ensemble
SVM, and then finally output was predicted using a voting mechanism.
• Efficiency of the proposed method was tested for KDD CUP'99, NSL-KDD, UNSW-
NB15, and CICIDS2017 datasets.
• Proposed system has shown high performances in the detection of abnormal
behavior.
• A DL-based model DBN, which is optimized by combining the particle swarm, fish
swarm, and genetic algorithms to further improve the detection accuracy.
• Model was tested using the NSL-KDD dataset and showed a large improvement in
the detection rate of U2R and R2L class.
• Main drawback of the model is the increase in the training time of the model due to
its complex structure.
01: Convolution Neural Network
• A complex multilayer IDS model based on CNN and gcForest.
• Used a novel P-Zigzag algorithm for converting the raw data into two- dimensional
greyscale images.
• Used an improved CNN model (GoogLeNetNP) in a coarse grain layer for initial
detection.
• The fine-grained layer, gcForest (caXGBoost) is used to further classifies the
abnormal classes into N-1 subclasses.
• Used a dataset by combining UNSW-NB15 and CIC-IDS2017 datasets.
• Results show that the model significantly improves the accuracy and detection rate
compared to the single algorithms while reducing the FAR.
02: Convolution Neural Network
• An efficient IDS system by combining CNN and bidirectional long short-term
memory (BiLSTM) in a deep hierarchy.
• Class imbalance problem is addressed by using the SMOTE to increase the minority
samples, which helps the model to fully learn the features.
• CNN was used for extracting spatial features while BiLTSM was used to temporal
features.
• Used NSL-KDD and UNSWNB15 datasets.
• Model achieves higher performance in terms of accuracy and detection rate.
• Detection rate of minority data classes improved slightly but still its is very low
comparing other attack classes.
• Due to the complex structure, the training time is higher.
03: Convolution Neural Network
• An IDS model based on novel DL idea of Few-shot Learning (FSL).
• Train using a small amount of balanced labeled data from the dataset.
• DNN and CNN are adopted as embedding functions in the model for extracting the
essential feature and reducing the dimension.
• Experimental results performed on NSL-KDD and UNSW-NB15 datasets showed
model efficiency in getting reasonable detection rates for minority attack classes.
• Model utilized less than 2% data for training to achieve a remarkable performance
for the considered dataset.
CS-14
Profiling Network Traffic

What is Network Traffic Analysis (NTA)?

• NTA is the process of detecting, recording and analysing communication patterns in


order to detect and respond to security incidents.
• Traffic analysis is primarily performed to find out the data type, the traffic flowing
through a network as well as data sources.
• Used by attackers to discover communication patterns, and break in data over the
network.
• Allows network administrators to collect data and monitor download/upload
speeds on the traffic that flows through the network.
• NTA tools have been commonly used to analyze and identify network security and
performance issues.
• Traffic statistics from network traffic analysis helps in understanding and evaluating
networks utilization, download\upload speeds and type, size, origin and destination
and content of data.
• Network traffic analysis is the process of analyzing network traffic.
• Example:
• You are working on a task within your company’s network
• Some hacker or someone unknown tries to access your network or tries to
install some malware onto your system.
• You would probably not know about it right away because you don’t
constantly watch the network for suspicious activity.
• This is where NTA helps:
• It monitors the network constantly.
• If it finds suspicious activity or a security threat and tries to resolve the
simpler issues automatically.
• If it finds more serious vulnerabilities, then alerts the IT team.
How is NTA different from other Security Tools?
• Network security tools like firewalls, intrusion prevention systems (IPS), and
intrusion detection systems (IDS):
• Secure network within its perimeter
• Stop the traffic that tries to trespass the network without permission.
• NTA also secures the network from both within-perimeter threats and outside
traffic threats from:
• Cloud
• Virtual switches
• Traditional TCP/IP packets
• Firewalls handles threats that occur when installing software or exchanging files on
a network
• There is rough traffic that a firewall cannot stop
• Ransomware could seem like a piece of software and pass through a firewall.
• Hackers can use camouflage mechanisms like VPN, to get around a firewall
and cause security issues.
• Multiple devices on a network share data via the cloud or IoT devices.
• Hackers are constantly training themselves to crack every security feature.
• NTA keeps track of every device connected to network - who’s using the network
and when.
• NTAs can tackle ransomware and other security threats that pass through the
firewall.
NTA Objectives
• Detect unknown threats: Traffic pattern analysis has been proven to be the effective
tool to detect unknown threats.
• Detect external IP addresses:
• Helps security administrators to configure IP address range of each server
within their local network
• Identify external IP address easily
• Detect a malware communication with trusted sources: Every time you open a web
browser, malware registered in your Gmail account.
• With the help of traffic pattern analysis, security administrators can monitor
the high network traffic of email hosts
• A starting point for further network inspection.
• Detect malware communications via HTTPS: Performed through traffic patterns to
identify changes

NTA Mechanisms
• Self-Similarity:
• Use of Industrial Access Control & Security Systems for the analysis of
communication and discover attacks.
• Wireless Sensor Networks (WSN):
• Can be used in large systems such as commercial applications, where
security is vital for their applicability.
• Classifies attacks in wireless sensor networks to explore patterns and
possible countermeasures.
• Flow Analysis:
• Flow analysis is used to identify anonymity networks.
• High accuracy in identifying encrypted anonymity networks.
• Three major usage:
• Identification of anonymous networks
• Determine network traffic within encrypted
• Profiling applications
• User Intention-Based Traffic Dependence Analysis:
• Uses algorithms and frameworks that analyse user actions and network
events on a host according to their credentials.
• Can detect relationships, identify anomalies, and conduct empirical
assessments of the accuracy, security, and efficiency of algorithms.
• Traffic Anomalies Detection Algorithms:
• Detects flow outliers using statistical, similarity and pattern mining
approaches.
• Derive trajectory outliers including offline processing and online processing.
• Intrusion Detection System:
• Software applications or devices that observe a systems or network for
malicious activity.
NTA Solutions
• NTA can be implemented using both traditional algorithms and machine learning
solutions.
• Machine Learning algorithms help in time series and general behaviour analysis of
the network.
• Key functions of NTA are:
• Provides analytics services
• Monitors IoT devices that generate and send a lot of data across the network
• Troubleshoots different security issues
• Enhances end-to-end cloud visibility
NTA Benefits
• Monitor resource utilization and helps manage resources accordingly.
• Provide insights into network operations (uptime, downtime, load etc).
• Account for all entities/devices attached to a network
• Identify and record the relationships between users, devices, and actions on the
network
• Identify underutilized resources which can be decommissioned to save cost.
• Notifies the network team about observed anomalies helping resolution and
downtime avoidance
• One level up security layer on top of intrusion detection systems and intrusion
prevention systems
• Machine learning algorithms for NTA can detect security threats even if they’re
encrypted.
Network Metadata Gathering Techniques
• Flow data (NetFlow, IPFIX, sFlow)
• OSI Layer 2-4 telemetry, such as source, destination, protocol, bytes
sent/received.
• Good start in understanding the basic trends of network traffic.
• Not enough for advanced cyber threat detection within application-layer
context.
• Network Packet Capture Files (PCAPs):
• PCAPs are detailed historical record of what happened on the network
• Requires high storage and data processing requirements.
• Traffic Inspection Technologies
• Extract meaningful Layer 3-7 metadata with an emphasis on Layer 7
application communications.
• Metadata can be used effectively for behavior cyber threat detection, while
only taking a fraction of the full PCAP volume.
• Information structured as time-series events corresponding to network
conversations:
• TCP/UDP/ICMP connections
• HTTP requests and replies
• DHCP leases
• SNMP messages
• SSH connections
• … and others.
Key Network Metadata Items
• Host and server IP address, port number, geo-location information
• DNS and DHCP information mapping devices to IP addresses
• Web page accesses, URL and header information
• Users to systems mapping using Domain Controller log data
• Encrypted web pages
• Encryption type
• Cypher and Hash
• Client/server FQDN (fully qualified domain names)
• Different objects hashes – such as JavaScript and images
Encrypted Traffic Analysis
• Encrypted Traffic Analysis collects network traffic metadata in a designated format
(IPFIX) using passive probes
• Internet Protocol Flow Information Export (IPFIX) is an accounting
technology that monitors traffic flows through a switch or router.
• IPFIX interprets the traffic to determine the client, server, protocol, and port
that is used.
• IPFIX counts the number of bytes and packets, and sends that data to an
IPFIX collector
• Attributes of the encrypted session between clients and servers are available
regardless of the client’s physical location or whether the server runs in the
cloud or dedicated data center.
• Provides insights about the traffic and allows for the identification of:
• out-of-date SSL certificates
• policy non-compliant certificates
• encryption strength
• old TLS versions that may contain faults or vulnerabilities.
• Machine Learning engine uses this data to perform behaviour analysis and anomaly
detection to identify malware and other threats.
Network Traffic Profiling
• Network traffic profiling detects malicious traffic patterns that might otherwise be
misclassified as benign, such as communications with legitimate sites used as part of
a command-and-control mechanism
• Profiling network traffic is similar to scan characterization.
• Scan or portscan, is a malicious behaviour in network traffic and its
characterization, including clustering and visualization, can facilitate the network
administrators to detect scan attacks.
• Profiling uses clustering algorithms or other data-mining and machine
learning methods to group similar network connections and search for
dominant behaviours/events.
• Profiling v/s Anomaly Detection.
• Anomaly detection aims to group similar normal data and build a normal
model so that we can identify outliers.
• Profiling focuses on grouping similar network behaviours and finding the
trends that these behaviours follow.
Network Traffic Profiling Categories
• Network profiling can be categorised in two groups:
• Specific applications:
• Require access to a system capable of capturing interactions between hosts
through empirical signatures or statistical analysis.
• Examples: gaming chatting, p2p, and suspicious traffic in FTP, HTTP, and
SMTP.
• Profiling common network behaviours
• Behaviours include communications between hosts and performance of the
hosts.
• Communication between hosts can be patterned using entropy, traffic
volume, feature distributions, and so on.
• Host performances appear in their port utilization to provide service or other
interactions.
• Host IP addresses and the associated port numbers are used for profiling, to
investigate the traffic flows.
Network Traffic Profiling Challenges
• Two major challenges in network profiling:
• huge amount of network traffic flows
• difficulties in detecting patterns in the traffic data
• There could be a large number of association rules to describe the correlation
between traffic flows
• Large number of rules can hamper profiling analysis and pattern recognition.
• Clustering methods along with data-mining techniques need to extract the dominant
patterns efficiently and effectively.
• Visualization ability can strengthen the role of network traffic profiling.
Network Traffic Profiling Data Collection
• Network traffic data can be collected online or offline.
• Offline profiling is sufficient for some applications, such as traffic classification at
the application level using graphlets.
• In data pre-processing, features are selected according to a profiling objective or
analysis afterward.
• A network profiling algorithm can be:
• Signature-based classification
• Data-mining or Machine-Learning clustering method
• IP blacklist filtering.
• Supervised Machine-Learning and clustering methods are used in the network
traffic profiling or pattern learning process.
ML Algorithms for Network Traffic Profiling
• NETMINE (Association Rules Mining and Classification)
• Auto Focus (Cluster Miner)
• Shared Nearest Neighbour (SNN)
• Auto Class

Network Profiling Methods


• NETMINE (Association Rule Mining and Classification)
• Aggregation and classification of association rules from traffic flows
• Generalize association rules for analysis.
• Auto Focus
• Aggregate traffic flows into clusters over the resource consumption, along a
single feature and joint features.
• Extract significant clusters and classify their behaviour
• Characterize the dominant interactions between dimensions
• Use data mining and entropy to profile the communication patterns between
end users and services.
• Shared Network Neighbour (SNN)
• Discovers unexpected patterns in network traffic.
• Traffic pattern classification done using K-Means, DBSCAN, and AutoClass
over traffic statistical features.
N
ETMINE: Association Rules Mining and Classification
• Captures network traces and concurrently pre-processes the captured traces and
packets to reduce the sample data size.
• Aggregates similar traffic packets over a continuous sliding time window and
filters out less-correlated packets for pattern extraction.
• For a set of protocol features F = {f1, …, fn}, such as source IP address, each packet
is a subset of F and associations of these features can be presented using
association rules.
• Sliding windows have two parameters:
• Window size
• Moving step of the window
• both measured by a time unit (second)
• Window size measures the coverage of the aggregating and filtering rules in
continuous enquiries.
• Aggregating function groups the packets that share similar features, such as source
IP address.
• Filtering function removes the packets that account for less than a threshold of the
aggregated traffic flows in the sliding window.
• Pre-processed streaming packets include a large number of infrequent flows, which
convey relevant information.
• To extract rules, a solution used is to aggregate or generalize the feature values or
association rules in a hierarchical taxonomy.
• Example:
• Aggregate IP addresses into subnets and port numbers into three categorical
levels for TCP ports (System/well known, User/Registered,
Private/Dynamic)
• Items in lower levels aggregate only when their generated rules are below
the minimum support value.
• Itemsets are generated from lower-level k − 1 to higher level k in iteration k.
• Only the itemsets above the support level are used for rules generation.
• Rules are classified into groups according to the basic features in network traffic.
• For example, traffic flows can be semantically presented by rules:
• {source IP} ⇒ {destination IP}
• {destination IP} ⇒ {source IP}
• Services can be presented by the following rules:
• {destination address} ⇒ {destination port}
• {destination port} ⇒ {destination address},
• Service usage can be presented by the following rules:
• {destination port} ⇒ {source address}
• {source address} ⇒ {destination port}.
• Combination of these rules can generate other basic groups, e.g., traffic flow and
service:
• {destination address} ⇒ {destination port, source address}
• {destination port, source address} ⇒ {destination address}.
• NETMINE extracts generalized association rules that:
• provide a high level of abstraction of the network traffic
• allows the discovery of unexpected
• Extracted correlations are automatically aggregated in more general association
rules according to a frequency threshold.
• Extracted rules are classified into groups according to their relevant patterns.
Auto Focus: Clustering Multi-dimensional Traffic
• Aggregation on one or few features can generalize the network flows, e.g., using
association rule generalization in NETMINE.
• May result in selection of wrong dimensions for aggregation without any
prior knowledge
• Leads the administrator to insignificant features.
• Identifying the significant features among the traffic streams is important.
• Auto Focus automatically characterizes and clusters network traffic based on
resource consumption along with dimensions.
• Resource consumption is defined as the coverage of traffic volume in the clusters of
a network e.g. using a number of packets to calculate the traffic volume.
• Auto Focus compresses, combines, and prioritizes the clustering results into an
easily comprehensive report.
Auto Focus: Clustering Multi-dimensional Traffic
• Input data attributes:
• Source IP/port
• Destination IP/port
• Protocol
• Packet count and byte count.
Auto Focus: Clustering Multi-dimensional Traffic
• Packet counter reports number of matched packets in terms of the five features
• Byte counter accounts for the number of bytes in the packets.
• Stage 1: For a single feature, source IP addresses are listed as leaves with subnets as
nodes and roots in the hierarchical tree architecture.
• Each node, including the leaves and roots, has a counter.
• A counter value above the predefined threshold value indicates a cluster.
• Once these clusters are found, multiple one-dimensional hierarchies are
combined into a dimension overlapping structure.
• Each node in the structure has a parent from each dimension.
• Optimization methods help to prune the clustering space by focusing on
clusters that have one dimensional ancestors above the threshold, and
batching clusters.
• Stage 2: Compression algorithm traverses all clusters in the order of a specific
measure.
• Each cluster has an “estimate” counter that accounts for the maximum
“estimate” among all dimensions (here we have five features).
• For each dimension, the maximum “estimate” of a cluster corresponds to the
sum of the “estimates” of its children.
• A cluster is reported when the deviation between its “estimate” and real
traffic data is above the threshold, or when the “estimate” is replaced by real
traffic data.
• Stage 3: In a measurement time interval of the actual change of each reported
cluster from the previous step is compared to the estimated change of that cluster.
• A cluster is reported when the difference between the actual change and
estimated change is greater than the threshold.
• Stage 4: Clusters are ranked using a measure called an unexpectedness score.
• Assuming features are independent from each other, an unexpected score is
defined as the deviation from a uniform model.
• Given a cluster with a real percentage of volume X% and its features having
an independent real percentage of volume {Y1%, …, Yd%}, the
unexpectedness of the cluster is X% ÷ Yi%, where d is the dimension size of
the cluster and i = 1, …, d.
• This score measures the anomaly behaviour among dimensions.
• Auto Focus was evaluated using the three collected traces on three cyber
infrastructures.
• First trace collected from 31 days of data on a small network exchange point
in San Diego
• Second trace collected over 39 days of connections in a large research
institute
• Third trace composed of an 8 h trace from an OC-48 backbone link.
• Auto Focus recognized unexpected patterns in network traffic, such as a weekly
pattern, a temporary network outage, a worm epidemic, and p2p applications.
Shared Nearest Neighbour (SNN) Clustering
• Uses shared nearest neighbours to define the similarity between data points.
• Example: data points A and B have neighbour sets NN(A) and NN(B).
• Similarity of A and B is defined as
• similarity(A,B) = NN(A) ∩ NN(B)
• Neighbourhood of a data point can be defined using k-nearest neighbour or
specified-radius area.
• SNN maintains local connections in relatively uniform regions, while it breaks links
in transition regions.
• SNN is able to prevent the distances between data points becoming uniform such
that these clustering methods cannot classify data points correctly when
dimensionality increases.
• Shared Nearest Neighbour (SNN) clustering is a density-based algorithm that
identifies clusters based on the similarity of data points and their shared
neighbours.
• Based on the principle that points within the same cluster are likely to have a
similar set of nearest neighbours.
• SNN algorithm consists of the following steps:
• Computing the k-nearest neighbours for each data point.
• Constructing a shared nearest neighbour similarity matrix.
• Calculating the SNN graph by connecting points with a shared number of
nearest neighbours above a certain threshold.
• Applying a density-based clustering algorithm, such as DBSCAN, on the SNN
graph to identify clusters.
• For a given set of data points, SNN algorithm has following steps:
• Step 1: Compute the similarities between the data points and construct a similarity
matrix, which describes the links between data points by the similarity values.
• Step 2: Retain only the predefined number of the most nearest neighbours in the
matrix and link the shared data points into clusters.
• Step 3: Obtain the size of the SNN neighbourhood at each data point and remove all
data points except those that are in an SNN neighbourhood with a greater size than
the predefined threshold; these data points are called core data points.
• Step 4: Group the core data points within a predefined window in the same
cluster; discard the noncore data points outside of the windows of any core
data point and assign the remaining data points to the nearest clusters.
• SNN is insensitive to variants of the shapes, sizes, and densities of clusters in a noisy
data set, especially in high-dimension feature space.
• SNN clustering algorithm profiles network communications and detects dominant
behaviours.
• Research implementation collected two data sets:
• One set consisting of 850,000 connections from 1 h of data at a U.S. Army fort
• One set consisting of 7500 traffic flows from the University of Minnesota
network.
• Features Extracted
• Start time
• Flow duration
• Source/Destination IP address
• Source/Destination port
• Protocol type
• Number of packets
• Flow volume.
• Ran the SNN clustering algorithm on the two data sets and obtained a number of
interesting clusters for each set.
• Based on the clustering results, analysis and identification of the anomaly or
unexpected profiling of behaviours was done.
K-Means, DBSCAN and Probability-Based Clustering
• AutoClass method aims to find the probability distribution of a data set to cluster
the data points.
• AutoClass algorithm, based on Bayesian model, uses EM to build the most
probabilistic model and its estimated parameters.
• Mixture models, including inter cluster mixture probability and intra-cluster
probability distribution functions, were estimated so that intra-cluster similarity
and interclass similarity could be calculated.
• Similarities determined the best set of parameters used for the mixture model.
• DBSCAN algorithm groups data points into clusters that have a higher density
than a threshold number (MinPts) within a window of a specified size
defined by the distance to the data point (Eps).
• Used specified Nth = 5 and Eps, the radius of the clustering window Eps = 1, data
points are classified into three types of clusters according to the local density
around them: core points, border points, and noises.
• Core point clusters have more than MinPts neighbouring data points within Eps
distance, while a border point is located in the neighbourhood of a core point
but has less neighbouring data points within the Eps distance.
• Noises include all the other data points except for core points and border points.
• Given a core point p, any data point q of the other data points within the Eps
distance from p is within the density range of p.
• Any data point q is within the density range of core point p, if q is within Eps
distance from any other data points, which are directly density reachable or density
reachable from p.
• Two data points are density connected, if they share at least one common density
reachable data point.
• DBSCAN groups the core points within a specified Eps and MinPts into one cluster,
groups the border points within a specified neighbourhood of a core point in the
same cluster, and discard noises.
• DBSCAN algorithm steps:
• Step 1: Find all of the data points that are density-reachable from a data
point of interest p.
• Step 2: Group the detected data points in a cluster, if p is a core points; if, p is
a noise, move to another data point of interest and return to Step 1.
• Continue above steps until all data points have been clustered.
• Evaluated the application of these three clustering algorithms in traffic
classification.
• Used two empirical packet traces:
• Publicly available Auckland IV trace from the University of Auckland
• Collected traffic trace from the University of Calgary.
• First data set consisted of TCP/IP headers of traffic connections for three days,
linking the campus network to the Internet.
• Second data set included a full payload of packets collected from 1 h of activity on
the University of Calgary’s Internet link.
• Used port numbers to determine the true classes of the connections for the first data
set
• Used known applications, such as http, p2p, smtp, and pop3, to classify the second
data set.
• Key features (attributes) of Data set:
• Number of packets
• Average packet size
• Average payload size
• Number of bytes
• Average interval of packets.
• To adjust for the imbalanced distribution of data across the classes, e.g., http
dominated the data sets
• Selected 1000 & 2000 random samples from each class of the first & second
data set as clustering data.
• Repeated the selection 10 times for both of the data sets to generate 20 data
sets for the clustering evaluation.
• Evaluated the algorithm effectiveness by the following accuracy measure: Accuracy
= # TP of all clusters / # of connections
Other Network Traffic Profiling
• BLIND Classification (BLINC): A multi-dimensional classification for network traffic
profiling.
• A signature-based approach focused on classifying network connections
based on host behaviours associated with applications.
• Analyzes the host patterns in three levels: social, functional, and application.
• First Step: Investigated the number of hosts communicating with the
targeted host, and the community among these hosts.
• Second Step: Investigated the functional roles of a host in providing service
and usage.
• Third Step: Generated graphlets to characterize the types of applications, so
that a host can be classified according to the degree of matching between the
graphlets and the host behaviour.
• Used the traffic header packet features to conduct experiments on two data sets
collected in universities.
• BLINC resulted in more than 90% accuracy and covered more than 80% of the
traffic flows
• Subspace method over a large-scale academic network.
• Uses three types of extracted data, including the number of bytes, the
number of packets, and the number of IP flows over a variant series time.
• Each traffic type attracts interest to a variant set of anomalies due to the use
of the proposed subspace method.
• Anomalies include abnormal host behaviour, anomalous activity, and
network failures.
• Uses entropy to summarize and analyze the packet feature distribution in the
network traffic.
• Focused on OD flows and displayed that the entropy-based subspace
methods strengthen the accurate detection of anomalous traffic data in
clusters.
• EM-based probabilistic clustering method to aggregate packet header flows into
clusters over networks.
• Uses raw features (e.g., byte counts) and the statistical features (e.g.,
minimum/maximum packet size) of traffic flows in experiments
CS-15
Adversarial Machine Learning Malware Detection

What is Adversarial Machine Learning?


• Adversarial Machine Learning is an ML method that aims to trick ML models by providing
deceptive input leading to a wrong output
• Illusion for ML model
• Adversarial Machine Learning includes:
• Generation of adversarial examples
• Detection of adversarial examples
• Inputs specially created to deceive classifiers

Small noise can turn Panda into Gibbon


Terminology
• Clean Image: Unmodified image
• Adversarial Example: An input (image or dataset) to model that has been purposefully
altered to lead to erroneous model predictions.
• Adversarial Perturbation: Element of an adversarial example or picture that results in an
inaccurate prediction - normally a low-magnitude additive noise.
• Agent or Attacker: Someone who makes an adversarial example.
• Defence/Hostile Defence: A method that increases a model’s resilience, as well as
external or internal systems to identify adversarial signals and image processing to
counteract the effects of input modifications that might be considered adversarial.
• Target Image: An instance of an opponent manipulating a picture.
• Target Label: Desired inaccurate label.
Adversarial Attack: White-box and Black-box
• White-box Attack: Attacker has complete access to the target model, including the model’s
architecture and its parameters.
• Black-box Attack: Attacker has no access to the model and can only observe the outputs of
the targeted model.
• White-box and Black-box attacks can be classified into two categories:
• Targeted Attacks: Attackers disrupt the input in such a way that the model predicts
a specific target class.
• Un-targeted Attacks: Attackers disrupt the inputs in such a way that the model
predicts a class, which is anything but the true class.
How Adversarial Attacks Work?
• There are a large variety of different adversarial attacks that can be used against machine
learning systems.
• These work on deep learning systems and traditional machine learning models.
• Adversarial attack’s objective is to deteriorate the performance of model on specific tasks,
essentially to “fool” the ML algorithm.
• Adversarial machine learning has been extensively used in:
• Image classification
• Spam detection
Poisoning Attack
• Poisoning is contamination of training dataset to increase errors in the output.
• User-generated training data, e.g. for content recommendation or natural language models
is highly susceptible to it.
• Fake accounts offers many opportunities for poisoning (Facebook reportedly removes
around 7 billion fake accounts per year).
• Social media disinformation campaigns attempt to bias recommendation and moderation
algorithms, to push certain content over others.
• Data poisoning through backdoor attack aims to teach a specific behavior for inputs with a
given trigger, e.g. a small defect on images, sounds, videos or texts.
• Intrusion detection systems are often trained using collected data - An attacker may poison
this data by injecting malicious samples during operation that subsequently disrupt
retraining.
• Data poisoning techniques can be applied to text-to-image models to alter their output.
• A poisoning attack may use a “boy who cried wolf” approach, i.e. an adversary might input
data during the training phase that is falsely labelled as harmless, when it is actually
malicious.
• Example: Poisoning a Microsoft Chatbot
• In 2016, Microsoft launched “Tay,” a Twitter chat bot programmed to learn to
engage in conversation through repeated interactions with other users.
• Microsoft’s intention was that Tay would engage in “casual and playful
conversation,”
• Internet trolls noticed the system had insufficient filters and began to feed profane
and offensive tweets into Tay’s machine learning algorithm.
• The more these users engaged, the more offensive Tay’s tweets became.
• Microsoft had to shut the AI bot down after just 16 hours after its launch.
• Ref: https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/
• Ref: https://www.cbsnews.com/news/microsoft-shuts-down-ai-chatbot-after-it-
turned- into-racist-nazi/
Evasion Attack
• Evasion attacks exploit the imperfection of a trained model.
• Spammers and hackers often attempt to evade detection by obfuscating the content
of spam emails and malware.
• Samples are modified to evade detection and to be classified as legitimate.
• Example: Image-based spam in which the spam content is embedded within an
attached image to evade textual analysis by anti-spam filters.
• Example: Spoofing attacks against biometric verification systems.
• Does not contaminate or corrupt training data.
• Data is manipulated to deceive trained classifiers/models.
• Evasion attacks can be Black-box attack & White-box attack
• Normally used attacks are intrusion and malware scenarios.
• Occurs when the ML model calculates the probability around a new sample and is often
requires trial-and-error methods.
• Approach:
• Attacker wants to investigate the algorithm of the machine learning model that is
designed to filter the spam email content.
• Attackers experiments on different emails to bypass the spam filter by introducing a
new email that includes enough extraneous words to "tip" the algorithm and
classify it as not spam from spam.
• Attack affects the accuracy and confidentiality of a machine learning model, leading to
malicious output that is intended by an attacker.
• Attack can reveal private or sensitive information.
Model Extraction Attack
• Model Extraction (Model Stealing) involves an attacker probing a black box machine
learning system in order to:
• Reconstruct the model or
• Extract the data it was trained on.
• Significant when either the training data or the model itself is sensitive and confidential.
• Extract a proprietary stock trading model which the adversary could then use for
their own financial benefit.
• Model extraction attacks are designed to steal trade secrets, IPR etc, for the benefit of
adversary.

==================================================================

Poisoning Attack

A poisoning attack happens when an attacker deliberately contaminates the training data
of a machine learning model. This causes the model to make more mistakes when it
processes new data.

Key Points:

 What happens? The attacker adds bad data into the training set. This tricks the
model into learning incorrect patterns.
 Why is this dangerous? Models trained on user-generated data (like social media or
recommendation systems) are easy to target because anyone can contribute to the
data.
 Example:
o Attackers created fake accounts on social media to spread misinformation,
biasing algorithms that decide what content users see.
o In a backdoor attack, a small "trigger" is added to inputs, like a subtle mark in
an image. When the model sees this trigger, it behaves in a specific way
designed by the attacker.

Real-Life Example: Microsoft’s Tay Chatbot (2016)

Microsoft’s chatbot Tay was trained to learn conversations from Twitter users. Trolls started
feeding offensive language to Tay. Since the model didn't have proper filters, it quickly started
mimicking harmful content and had to be shut down within 16 hours.

Evasion Attack
An evasion attack targets a trained model by manipulating the input data. Instead of
messing with the training phase, attackers craft their input to trick the model into giving
incorrect results.

Key Points:

 What happens? The attacker slightly modifies input data to fool the model into
classifying it incorrectly.
 Why is this dangerous? It bypasses security measures like spam filters or malware
detectors.
 Examples:
o Spammers hide the spam content inside images to bypass text-based filters.
o Attackers spoof biometric systems by creating fake fingerprints or face scans.

How does it work?

An attacker uses trial and error to find weaknesses in the model. For example, they might
tweak spam emails by adding harmless words that confuse the filter and allow the email to
pass through.

Model Extraction Attack

A model extraction attack is like reverse-engineering a machine learning system. The


attacker queries the system to figure out:

 How the model works (stealing the algorithm).


 What data the model was trained on (stealing sensitive data).

Key Points:

 What happens? The attacker sends many queries to the system, analyzes the output,
and builds a similar model.
 Why is this dangerous? The stolen model can:
o Be used by competitors.
o Reveal private or sensitive training data.

Example:

An attacker probes a proprietary stock trading algorithm and uses the stolen model to make
profitable trades, taking advantage of the original owner's intellectual property.
These attacks show how machine learning systems can be exploited at various stages. By
understanding these techniques, we can design better defenses to secure AI and machine
learning models.

======================================================================
Adversarial Examples Generation Methods
• L-BFGS (Limited Memory Broyden–Fletcher–Goldfarb–Shanno)
• FGSM (Fast Gradient Sign Method)
• Iterative Fast Gradient Sign Method
• Black Box Attack Method
• Jacobian-based Saliency Map Attack (JSMA)
• Deep Fool Attack
• Carlini & Wagner Attack
• Generative Adversarial Networks (GANs)
• Zeroth-Order Optimisation Attack (ZOO)
Black-Box Attack Method
• Adversarial examples transfer well between different models
• An Adversarial example can be designed for a model X but will be effective
against any other model trained on a similar dataset.
• Attackers use the Transferability property of adversarial examples when
they do not have access to complete information about the model.
• Attacker generates adversarial examples using following steps:
• Query the target model with input Xi for i=1…n and record output Yi.
• With the training dataset (Xi, Yi), build another model (substitute model).
• Use a white-box algorithm to generate an adversarial examples for the
substitute model.
• Most of these Adversarial examples are going to transfer successfully and
become adversarial examples for the target model as well.
• Ref:
http://openaccess.thecvf.com/content_ECCV_2018/papers/Arjun_Nitin_Bha
goji_Practical_Black- box_Attacks_ECCV_2018_paper.pdf
The image you provided illustrates the concept of Adversarial Machine Learning and how it can
be used to attack machine learning models.
Adversarial Machine Learning is a type of attack where the goal is to trick a machine learning
model into making incorrect predictions. This is done by adding small, carefully crafted
perturbations to the input data that are designed to be imperceptible to humans but significant
enough to mislead the model.
Here's how the image breaks down the attack:
Traditional Machine Learning (Training Phase)
 Training Data: The model is trained on a dataset of labeled examples (input data and
their corresponding correct labels).
 Deep Learning Training: The model learns to extract features from the input data and
make predictions based on those features.
 Predictive Model: The trained model is able to make accurate predictions on unseen
data.
Adversarial Attack Phase
 Input Data: An attacker takes an original input image.
 Noise: The attacker adds carefully crafted noise to the image.
 Perturbated Data: The resulting image with added noise is called the "perturbed data."
 Evading: The perturbed data is fed into the model, and the model is fooled into making
an incorrect prediction.
 Falsified Labels: The attacker can then use the model's incorrect predictions to achieve
their malicious goals.
Why is this dangerous?
Adversarial attacks can have serious consequences, especially in safety-critical applications like
autonomous vehicles or medical diagnosis. For example, an attacker could trick a self-driving
car into misidentifying a stop sign as a speed limit sign, potentially leading to a serious accident.
How can we defend against these attacks?
There are several techniques to defend against adversarial attacks, including:
 Adversarial training: Training the model on both clean and adversarial examples to
make it more robust.
 Input transformations: Applying transformations to the input data to make it more
resistant to perturbations.
 Feature squeezing: Reducing the dimensionality of the input data to make it harder for
attackers to craft effective perturbations.
 Detection methods: Developing methods to detect adversarial examples before they are
fed to the model.
It is important to note that this is an ongoing area of research, and new attack and defense
techniques are constantly being developed.
==================================================================

The image you provided illustrates a strategy for protecting machine learning models against
adversarial attacks. Here's a breakdown of the approach:
Adversarial Attack:
1. Input Data: An attacker starts with an original input image.
2. Noise: The attacker adds carefully crafted noise to the image, designed to be
imperceptible to humans but significant enough to mislead the model.
3. Perturbated Data: The resulting image with added noise is called the "perturbed
data."
4. Evading: The perturbed data is fed into the model, and the model is fooled into
making an incorrect prediction.
5. Falsified Labels: The attacker can then use the model's incorrect predictions to
achieve their malicious goals.
Protection Against Adversarial Attacks:
The image outlines a three-step approach to enhance the model's resilience against
adversarial attacks:
1. Data Enhancing:
 Training Data: The model is trained on a dataset of labeled examples (input data and
their corresponding correct labels).
 Perturbated Data: The training data is augmented by adding carefully crafted noise
to create "perturbed data." This helps the model learn to recognize and handle
adversarial examples during training.
2. Algorithm Training:
 Deep Learning Training: The model is trained on the enhanced dataset, which
includes both original and perturbed data. This helps the model learn to extract
features from the input data and make predictions based on those features, even in the
presence of adversarial noise.
3. Classification:
 Predictive Model: The trained model is able to make accurate predictions on unseen
data, even if it contains adversarial noise.
Benefits of this approach:
 Enhanced Robustness: By training the model on both clean and adversarial
examples, the model becomes more robust to attacks.
 Improved Accuracy: The model is better equipped to handle real-world data, which
may contain some level of noise or perturbations.
 Increased Security: This approach helps to protect the model from being exploited by
malicious attackers.
Additional Considerations:
 Ongoing Research: Adversarial machine learning is an active area of research, and
new attack and defense techniques are constantly being developed.
 Multiple Layers of Defense: It is important to implement a multi-layered defense
strategy, combining different techniques to achieve optimal protection.
By incorporating these defense mechanisms, machine learning models can be made more
resilient to adversarial attacks, ensuring their reliable and secure operation in various
applications.

===================================================================
Defensive Measures for Adversarial Attacks
• Threat Modelling: Formalize the attackers' goals and capabilities with respect to
the target system.
• Attack Simulation: Formalize the optimization problem the attacker tries to solve
according to possible attack strategies.
• Attack impact evaluation
• Countermeasure design
• Noise Detection: For evasion-based attack
• Information Laundering: Alter the information received by adversaries (for model
stealing attacks)
Defensive Distillation
• Generates a new model whose gradients are much smaller than the original
undefended model.
• If gradients are very small, techniques like FGSM or Iterative FGSM are not useful,
as the attacker would need great distortions of the input image to achieve a
sufficient change in the loss function.
• Defensive distillation introduces a new parameter T, called temperature, to the last
softmax layer of the network:

• For T=1, represents the usual softmax function.


• Higher value of T has smaller gradient of the loss with respect to the input images.
• Defensive distillation works as follows:
• Train a network, called the teacher network, with a temperature T»1.
• Use the trained teacher network to generate soft-labels for each image in the
training set.
• A soft-label for an image is the set of probabilities that the model assigns to
each class.
• If the output image is a parrot, the teacher model might output soft
labels like (90% parrot, 10% papagayo).
• Train a second network (the distilled network), on the soft-labels, using
again the temperature T.
• Training with soft-labels reduces overfitting and improves out-of-sample
accuracy of the distilled network.
• At prediction time, run the distilled network with temperature T=1.
Adversarial Training
• Adversarial machine learning itself can be used to protect model by giving
adversarial training.
• Normally, a machine learning model is trained with some old data or experience for
predicting the outcome.
• Model is also provided an adversarial machine learning training.
• Trained on various adversarial examples to make them robust against
malfunction in the data.
• Adversarial training is a very slow and costly process.
• Every single training example must be probed for adversarial weaknesses, and then
the model must be retrained on all those examples.
• AI researchers are working on preventing such attacks with the help of deep
learning concepts through combining parallel neural networks and generalized
neural networks.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy