AIML Post Mid Sem1
AIML Post Mid Sem1
What is Scanning?
• Procedure to identify live hosts, ports, and services, discover operating system and
architecture of target system.
• Collects information using complex and aggressive reconnaissance techniques.
• Identifies vulnerabilities and threats to network:
• Missing patches
• Unnecessary services
• Weak authentication
• Weak encryption algorithms
• ….
• Targets multiple destinations, e.g. several host IP addresses or services on various
ports.
Who is Interest in Scanning?
• Three type of users have interest in scanning:
• System administrators to audit networks by scanning the infrastructure.
• Peers looking for previous collaborators via P2P services.
• Attackers to detect the vulnerabilities of cyber infrastructures
• Attackers use scanning to:
• Collect intelligence of the computer network systems to break into the
systems via detecting vulnerable sites.
• Search for the paths to live and accessible resources in the system.
• Send a series of messages to the targeted system and learn the services and
the weakness in the structures of the infrastructure through the feedback
from these messages.
• Use the collected information to prepare for attacks
Scanning Example
• Responses to a message from a computer can reveal information about the IP
addresses, OS and architecture of the system.
• Ping sweep (ICMP ECHO) queries multiple hosts using the message of ICMP ECHO
request packets.
• Ping sweep generates a reply from the targeted system if the system is alive.
• Portscan finds ports or services that are alive and running on the targeted system
by connecting to the TCP or UDP ports of the system.
$ nmap -p 1-512 192.168.2.92
$ Starting Nmap 6.40 (http://nmap.org ) at 2018-05-11 17:52 EDT
$ Nmap scan report for 191.239.213.197 Host is up (0.079s latency).
Not shown: 510 filtered ports PORT STATE SERVICE
– 21/tcp open ftp
– 80/tcp open http
– 443/tcp open https
Scan Detection Techniques
• Scan detection techniques are classified into two groups
• Single source portscan detection
• Distributed source portscan detection
• Network traffic data includes connection information such as
• Source IPs
• Durations of the connections
• Starting and ending time of connections
• Others …
• Scan detection technique searches for similar and anomaly patterns among traffic
data.
UC Irvine: Kistune Network Attack Dataset
• Each row in the dataset (csv file) is a packet captured chronologically.
• Each row (feature vector) is recent (temporal) statistics which describes the
context of the packet's channel and its communicating parties.
• For each packet a behavioral snapshot is extracted of the hosts and protocols which
communicated the given packet.
• Snapshot consists of 115 traffic statistics capturing a small temporal window into:
• packet's sender in general
• traffic between the packet's sender and receiver.
• Statistics summarize all of the traffic
• originating from this packet's source MAC and IP address (denoted SrcMAC-
IP)
• originating from this packet's source IP (denoted SrcIP)
• sent between this packet's source and destination IPs (denoted Channel)
• sent between this packet's source and destination TCP/UDP Socket (denoted
Socket).
UC Irvine: Kistune Network Attack Dataset
• 23 features are extracted from a single time window.
• These features are extracted from a total of five time-damped windows of:
• 100ms, 500ms, 1.5sec, 10sec, and 1min into the past
• totalling 115 features.
• Not every packet applies to every channel type
• there is no socket if the packet does not contain a TCP or UDP datagram
• in these cases, these features are marked zero
• Feature Extractor passes the final feature vector (~x), to the Feature Miner (FM), is
always a member of R(n), where n = 115.
• Feature Extraction code Ref: https://github.com/ymirsky/Kitsune-py
This image illustrates a typical process flow for scan detection in network security. Here's a
breakdown of the steps involved:
1. Collection of Network Traffic Data:
The first step is to collect relevant network traffic data. This can be done using various
methods like network packet capture tools, flow logs, or intrusion detection systems
(IDS).
The collected data can include information about source and destination IP addresses,
port numbers, protocols used, and the content of the traffic.
2. Data Pre-processing:
The collected network traffic data is often noisy and unstructured.
Data pre-processing involves cleaning and transforming the data to make it suitable
for analysis. This may include:
o Filtering out irrelevant traffic
o Normalizing data formats
o Extracting relevant features
3. ML for Scan Patterns:
Machine Learning (ML) techniques are employed to identify patterns and anomalies
that indicate scanning activities.
ML algorithms can be trained on historical network traffic data to learn what normal
traffic looks like.
Once trained, the ML model can be used to analyze real-time network traffic and flag
suspicious activities that may indicate a scanning attack.
4. Scan Detection:
Based on the analysis of network traffic and the identified scan patterns, the system
can detect various types of scans, such as port scans, vulnerability scans, and
reconnaissance scans.
5. Report and Analysis:
Once scans are detected, detailed reports are generated, providing information about
the type of scan, the source IP address, target IP addresses, and other relevant details.
These reports can be used by security analysts to investigate the incident further, take
appropriate actions, and improve security measures.
Overall, this process enables organizations to proactively identify and respond to
scanning activities, which can be precursors to more serious attacks like exploitation
and data breaches.
===================================================================
Scan Detection Techniques
• Based on input data, Scan detection methods are classified into two categories:
• Packet-based detection: Uses packet level information
• Flow-based detection: Uses the aggregated traffic information obtained by
network tools
Packet Based Detection
• Messages and Headers transported by packets are analysed
• Content of messages generated in the application layer is different for different
protocols
• HTTP generates particular HTTP request messages.
• TCP and UDP, with TCP and UDP headers, are the two most common protocols in
the transportation layer.
• IP, with the IP header, is the most common protocol in the network layer.
• Ethernet header is generated in the Physical/Ethernet layer.
DQN: Deep Q Network
• Deep Q-network (DQN) algorithm is a model-free, online, off-policy reinforcement
learning method.
• DQN agent is a value-based reinforcement learning agent that trains a critic to
estimate the expected discounted cumulative long-term reward when following the
optimal policy
• Naive DQN has 3 convolutional layers and 2 fully connected layers to estimate Q
values directly from images.
• Linear DQN has 1 fully connected layer with a learning technique.
• DQN overcomes unstable learning by using 4 techniques:
• Experience Replay
• Target Network
• Clipping Rewards
• Skipping Frames
Packet Based Detection
• Step 1: Network traffic recorded in the ’pcap’ files is separated into small traffic files
according to the session-based partition rules.
• If two packets share the same 5-tuple knowledge (source IP, source port,
destination IP, destination port, transportation protocol), then they will be
categorized in the same session.
• Several separated sessions in the order of capture time are obtained from the
‘pcap’ files.
• Step 2: Separated sessions are converted into images using the image embedding
method.
• A network packet in a session consists of an Ethernet header, a TCP or UDP
header, an IP header and application messages.
• Total field length of the first three headers (except the application message)
is 54 bytes.
• Application message field is dropped during image embedding because of its
varying length.
• 54-byte packet is converted to a line of the image, with each byte
representing one pixel.
• An image of 54x54 size is created using 54 packets from a session.
• Additional images are created for sessions with more than 54 packets
• Zero-padding is used if a session has less than 54 packets.
• Packets are embedded in the order in which they appear in the session.
• Step 3: Each session is labelled as per the time stamp given in the log file of the raw
dataset.
• Step 4: All generated images’ pixels are normalized into [0, 1] from [0, 255].
Flow Based Detection
• Flow-level intrusion detection usage traffic flow characteristics, which usually
contain numerous packets and are extracted for detecting attacks.
• Flow information includes the statistics of a flow, such as:
• Number of packets
• Flow duration
• Average packet size
• Transportation protocol etc.
• A 5-tuple knowledge defines a flow that includes source IP, source port,
transportation protocol, destination IP and destination port.
• Denial of Service and Distributed Denial of Service attacks tend to transmit a large
number of packets in a short time.
• Overall architecture is similar to that of packet based detection with minor
modifications in pre-processing and RL module.
• In a flow-level IDS, both the 𝜀-greedy policy and Conditional Generative Adversarial
• Exploration module in RL is conductive to solve these class imbalance problems.
• The 𝜀-greedy policy applied to the sample agent, where 𝜀 controls the exploration
Network (CGAN) are used for the exploration.
degree.
• Purpose of CGAN is to generate some novel attack flows for each class, which will
• CGAN takes the class label and noise as the input and outputs a state that belongs to
each class.
• Generator’s functionality is to generate simulated states that can deceive the
discriminator.
Scan Detection Methods
• Stealthy Probing and Intrusion Correlation Engine (SPICE)
• Statistical Packet Anomaly Detection Engine (SPADE).
• Graph-based Intrusion Detection System (GrIDS)
• Threshold Random Walks
• Expert Knowledge based Rule based Data Mining
• Logistic Regression in Horizontal and Vertical Scanning
Stealthy Scan Detection: SPICE and SPADE Using Cluster and Correlation Methods
• Stealthy port scans refer to the varieties of scan techniques that can elude
traditional IDS systems.
• Examples:
• Randomizing the scanning order of IP addresses and port sequences
• Randomizing the scanning lull
• Slowing down the scanning frequencies
• Randomizing attack resource IPs and ports.
• Traditional IDS systems like SNORT or Graph-based intrusion detection system
(GrIDS) use the occurrence of connections on resource IPs within ‘Short Time’
window.
• ‘Long Time’ window requires massive amount of normal traffic to be
searched patterns which becomes difficult to manage.
• Attackers tend to use slow randomized stealth scan to bypass the time window.
SPICE and SPADE Using Cluster and Correlation Methods in Stealthy Scan Detection
• Attacker is trying to gather in a systematic way rather than a normal user.
• Some of packets will be anomalous i.e. searching for port 98 (linuxconf port) on a
Windows host.
• Such anomalous packets will be saved for a longer period and grouped together to
find a pattern.
• Packets which form a sizeable group are saved and analysed. Normal traffic is timed
out quickly.
• A SPICE algorithm has two components:
• Anomaly sensor
• Correlator.
• Anomaly Sensor (SPADE) monitors the network and assigns anomalous scores to
each event.
• Events that are sufficiently anomalous are passed to correlator which groups them
together and reports scans.
• Uses an anomaly score to estimate the total information of a scan footprint based
on the conditional probability distribution of normal traffic packets.
• Traffic packets data includes source and destination addresses and ports, over days
or weeks.
• Packets with anomaly scores higher than a threshold are reported to the event
correlation engine.
• Correlation engine applies a simulated annealing algorithm to cluster the
anomalous packets and sent out reports of unusual activity (e.g. port scans).
• SPADE can also be used as pre-processor plug-in of SNORT.
• Correlation engine maintains the records of event likelihood, from which the
anomalousness of a given packet is approximated.
• Joint probabilities of a destination IP and ports and a source IP and ports, is
derived, given the probability of a destination port and a source IP based on:
• Conditional probability of P (source port | destination port)
• Conditional probability of P (destination IP | source IP, destination port)
• Correlation engine groups packets and the heuristics between events into the
architecture of a correlation graph.
• In correlation graphs, nodes (packets) denote an event and an undirected edge
denotes the correlation strength between nodes.
• Correlation strength between packets can be calculated using heuristic functions.
Coordinated Scan Detection: GrDIS Using Rule-Based Machine Learning
• Rule Sets
• Rule sets are defined by users to specify details about graph construction
• Rules are independent of other rules
• Connection data is applied to all rules
• Rule sets contain pre-conditions to filter out data which is not relevant to the
rule set i.e. port id, source IP etc
• If the data passes thru pre-conditions, it can be added to graph space
• Graph Aggregation
• When network activity crosses outside of department boundaries, the graphs
are passed up for further analysis
• A collection of hosts belonging to the same department can be reduced to
single node representing the whole department
• In reduction, graph attributes are kept but some sub-graph topology may be
lost.
Horizontal Scan Detection: Threshold Random Walk
• Threshold Random Walk algorithm automatically determines if a connection will
fail or succeed
• A successful connection drives the Random Walk upwards
• A failed connection drives the Random Walk downwards
• Benign remote sources have more precise knowledge about the targeted
hosts
• Their successful connection rate is higher than the scan rate.
• Tracks success and failure connection attempts to
• New address
• New address to old port
• old ports at old address
Scan Detection: Expert Knowledge-Rule-Based Data- Mining Method
• Features of a destination IP and port accessed by source IP (4):
• Averaged number of distinct destination IPs
• Average number of destination ports on destination IPs
• Features of source IPs describing the behavior of the source IP (6)
• Ratio of distinct destination IPs that attempted to connect by the source IP
that did not provide any service on destination ports to any source.
• Features of individual destination ports (4)
• Ratio of a distinct destination IP that attempted to connect by the source IP
that did not provide any service on destination ports to any source.
• A rule-based learning classification algorithm RIPPER is used for classification.
• RIPPER is efficient and effective in dealing with imbalanced and nonlinear data.
• Model performed better than TRW with faster speed of detection.
Scan Detection: RIPPER Algorithm
• RIPPER (Repeated Incremental Pruning to Produce Error Reduction) is a Rule-
based classification algorithm.
• RIPPER derives a set of rules from the training set.
• Works well on datasets with imbalanced class distributions.
• Works well with noisy datasets as it uses a validation set to prevent model
overfitting.
• RIPPER principle:
• Among the records given, it identifies the majority class (which appears the
most)
• Takes this class as the default class.
• If there are 100 records and 80 belong to Class A and 20 to Class B. then Class
A will be default class.
• For the other class, it tries to learn/derive various rules to detect that class.
• Consider all the classes that are available and then arrange them on the basis of
their frequency in a particular order (say increasing).
• C1,C2,C3,......,Cn
• C1 - least frequent
• Cn - most frequent
• The class with the maximum frequency (Cn) is taken as the default class.
• Rule Derivation:
• In the first instance, it derive rules for those records which belong to class C1.
• Records belonging to C1 will be considered as positive examples (+ve) and
other classes will be considered as negative examples (-ve).
• Sequential Covering Algorithm is used to generate the rules that discriminate
between +ve and -ve examples.
• Next, derive rules for C2 distinguishing it from the other classes.
• This process is repeated until left with Cn (default class).
• Ripper extracts rules from minority class to the majority class.
• Rule Growing in RIPPER Algorithm:
• Ripper makes use of general to a specific strategy of growing rules.
• It starts from an empty rule and goes on adding the best conjunct to the rule
antecedent.
• For evaluation of conjuncts the metric is chosen is FOIL’s Information Gain -
best conjunct is chosen.
• Stopping Criteria for adding the conjuncts – when the rule starts covering the
negative (-ve) examples.
• The new rule is pruned based on its performance on the validation set.
• To identify whether a particular rule should be pruned or not, following metric is
used:
• (P-N)/(P+N)
• P = number of positive examples in the validation set covered by the rule.
• N = number of negative examples in the validation set covered by the rule.
• Whenever a conjunct is added or removed, the value of the metric is calculated for
original rule (before adding/removing) and new rule (after adding/removing).
• If the value of the new rule is better than the original rule then add/remove the
conjunct else the conjunct will not be added/removed.
• Consider a rule:
• ABCD ---> Y ,where A,B,C,D are conjuncts and Y is the class.
• First it will remove the conjunct D and measure the metric value.
• If the quality of the metric is improved the conjunct D is removed.
• If the quality does not improve then the pruning is checked for CD, BCD and
so on.
Horizontal and Vertical Scan Detection on Large Networks: Logistic Regression
• Usage traffic traces in each event according to destination IP and ports.
• Six features for each event are used for analysis:
• percentage of traces that appear to have a payload
• percentage of flows with fewer than three packets
• ratio of flag combinations with an ACK flag set to all flows
• average number of source ports per destination IP address
• ratio of the number of unique destination IP addresses to the number of
traces
• ratio of traces with a backscatter-related flag combination such as SYN-ACK
to all traces
• Logistic regression model calculates the probability of an event containing a scan
using the above 6 features
Data Sets
Example: Practical Learning
• Problem Statement
• Build a network intrusion detector, a predictive model capable of
distinguishing between bad connections, called intrusions or attacks, and
good normal connections.
• Attacks Detected: Attacks categorised into four main categories:
• #DOS: denial-of-service, e.g. SYN Flood;
• #R2L: unauthorized access from a remote machine, e.g. Guessing password;
• #U2R: unauthorized access to local superuser (root) privileges, e.g. “buffer
overflow” attacks
• #probing: surveillance and another probing, e.g. port scanning.
• Dataset Used: KDD Cup 1999 dataset
• Reference
https://www.geeksforgeeks.org/intrusion-detection-system-using-machine-learning-
algorithms/
CS-13
Intrusion Detection – II
Boosting Algorithms
What is Boosting?
• Freund and Schapire developed Boosting in 1997.
• Boosting is an ensemble modelling technique suitable for binary classification
problems.
• Boosting algorithms improve the prediction power by converting a number of weak
learners to a strong learner.
• Basic principle of Boosting algorithms:
• build the first model on the training dataset
• build a second model to rectify the errors present in the first model.
• Continue the process till the errors are minimized, and the dataset is predicted
correctly.
Types of Boosting Algorithm
• There are 3 types of Boosting algorithms:
• GradientBoost
• Xtreme GradientBoost
• AdaBoost
• All Boosting algorithms work in similar manner
• Combine multiple weak learners to reach the final output or a strong
learners.
GradientBoost Algorithm
GradientBoost
• Builds a final model from the sum of several weak learning algorithms that are
trained on the same dataset.
• First weak learner is not trained on the dataset.
• First weak learner simply returns the mean of the relevant column.
• Residual for the first weak learner’s output is calculated and used as the output
column or target column for the next weak learner’s training.
• Second weak learner is trained using the same method, and the residual is
computed.
• Computed residual is utilized as an output column for the third weak learner.
• Process continues until a zero residual is achieved.
eXtreme GradientBoost (XGBoost)
• An extreme variation of the Gradient boosting algorithm.
• Key difference between XGBoost and Gradient Boosting is that XGBoost applies a
regularisation approach.
• Regularisation enables XGBoost to outperform a standard Gradient Boosting
algorithm.
• Works faster
• Has better accuracy
• Works better when the dataset contains both numerical and categorical
variables.
AdaBoost Algorithm
AdaBoost (Adaptive Boosting)
• Works on stagewise addition method where multiple weak learners are used to
create a strong learner.
• Creates binary stumps of decision tree
• Influence of a stump on final classification is known as alpha parameter
• Value of the alpha parameter is indirectly proportional to the error of the weak
learner
• For Gradient Boosting and XGBoost, the alpha parameter calculated is related to the
errors of the weak learner.
• AdaBoost is a supervised learning algorithm
01: Auto-Encoder
• Massive growth of network traffic data leads to a large volume of datasets.
• Labelling these datasets to identify intrusion attacks is laborious and error-prone.
• Traditional unsupervised solutions do not consider spatio-temporal correlations in
traffic data.
• A unified Autoencoder based on combining multi-scale convolutional neural
network and long short-term memory (MSCNN-LSTM-AE) for anomaly detection is
an effective solution.
• Model employs Multiscale Convolutional Neural Network Autoencoder (MSCNN-AE)
to analyze the spatial features of the dataset
• Latent space features learned from MSCNN-AE employs Long Short-Term Memory
(LSTM) based Autoencoder Network to process the temporal features.
• Model further employs two Isolation Forest algorithms as error correction
mechanisms to detect false positives and false negatives to improve detection
accuracy.
• Model was trained using NSL-KDD, UNSW-NB15, and CICDDoS2019 dataset and
outperformed the conventional unsupervised methods.
•
• To make the model efficient in terms of computational and time, only the encoder
part of AE is utilized to make it work in a nonsymmetric fashion.
• Two non-symmetric AutoEncoders, with three hidden layers each, are arranged in a
stacked manner.
• Random Forest is used for classification.
• Experiments were performed for multiclass classification scenarios using KDD Cup
'99 and NSL-KDD datasets.
• Model showed efficiency compared to Deep Belief Network (DBN) used in terms of
detection accuracy and reduced training time.
• Model showed inefficiency for detecting R2L and U2R attacks due to lack of data for
training the model.
02: Auto-Encoder
• An IDS using Stacked Sparse AutoEncoder (SSAE) and SVM.
• SSAE was used as the feature extraction method and SVM as a classifier.
• Binary-class and multi-class classification problem is considered for conducting
experiments.
• Results showed the proposed model superiority in performance comparing
different feature selection, ML, and DL methods using the NSL-KDD dataset.
• Model achieves reasonable detection rates for U2R and R2L attacks, but it is still less
comparing the other classes of the dataset.
03: Auto-Encoder
• An efficient two-stage model based on Stacked AutoEncoder.
• Initial stage classified the dataset into the attack and normal classes with probability
values.
• Probability scores are used as an additional feature and are input to the final
decision stage for normal and multiclass attack classification.
• Performance of the proposed model was tested using KDD Cup'99 and UNSWNB15
datasets.
• A different methodology was adopted for both datasets to reduce the problems due
to class imbalance of the datasets.
• Down sampling was performed to remove repeated records from KDD Cup’99
• Up sampling of the dataset was performed using SMOTE to balance the distribution
of records in UNSWNB15.
• Pre-processing of the dataset significantly improves the DR efficiency of attack class
with lower training instances.
04: Auto-Encoder
• Used AE to a multistage model involving the ID convolution layer and two stacked
fully connected layers.
• In the initial unsupervised stage, two AEs were trained separately using Normal and
Attack flows to reconstruct the samples again.
• In the supervised stage, these new reconstructed samples are used to build a new
augmented dataset that is used as input to a 1D-CNN.
• Output of this convolution layer is flattened and fed to fully connected layers, and
lastly, a softmax layer classifies the dataset.
• Experiments performed on the KDD Cup'99, UNSWNB15, and CICIDS2017 datasets
and the model achieves superior performance compared to other DL models.
• Drawbacks are:
• Does not show how the minority classes perform using this model
• Does not provide any information on the characteristics of the attack
01: Recurrent Neural Networks
• RNN-based IDS designed for binary and multi class classification of the NSL- KDD
dataset.
• Model was tested using a different number of hidden nodes and learning rates.
• Results showed that different learning rates and the number of hidden nodes affect
the accuracy of the model.
• Best accuracy was obtained using 80 hidden nodes and a learning rate of
0.1 and 0.5 for binary and multi class scenarios.
• Model performed well compared to ML algorithms and a reduced-sized RNN model.
• Main shortcoming of this model is the increase in computational processing which
results in high model training time and lower detection rate for the R2L and U2R
classes.
02: Recurrent Neural Networks
• An IDS based on RNN using GRU as the main memory together with the multilayer
perceptron and a softmax classifier.
• Model was tested using KDD Cup'99 and NSL-KDD datasets.
• Experimental results showed good detection rates for comparing other
methodologies.
• Major drawback of their model is lower detection rates for minority attack classes
like U2R and R2L.
• NSL-KDD is considered as the benchmark dataset and the experimental results
showed that LSTM and Deep CNN achieved higher accuracy results comparing other
models.
NTA Mechanisms
• Self-Similarity:
• Use of Industrial Access Control & Security Systems for the analysis of
communication and discover attacks.
• Wireless Sensor Networks (WSN):
• Can be used in large systems such as commercial applications, where
security is vital for their applicability.
• Classifies attacks in wireless sensor networks to explore patterns and
possible countermeasures.
• Flow Analysis:
• Flow analysis is used to identify anonymity networks.
• High accuracy in identifying encrypted anonymity networks.
• Three major usage:
• Identification of anonymous networks
• Determine network traffic within encrypted
• Profiling applications
• User Intention-Based Traffic Dependence Analysis:
• Uses algorithms and frameworks that analyse user actions and network
events on a host according to their credentials.
• Can detect relationships, identify anomalies, and conduct empirical
assessments of the accuracy, security, and efficiency of algorithms.
• Traffic Anomalies Detection Algorithms:
• Detects flow outliers using statistical, similarity and pattern mining
approaches.
• Derive trajectory outliers including offline processing and online processing.
• Intrusion Detection System:
• Software applications or devices that observe a systems or network for
malicious activity.
NTA Solutions
• NTA can be implemented using both traditional algorithms and machine learning
solutions.
• Machine Learning algorithms help in time series and general behaviour analysis of
the network.
• Key functions of NTA are:
• Provides analytics services
• Monitors IoT devices that generate and send a lot of data across the network
• Troubleshoots different security issues
• Enhances end-to-end cloud visibility
NTA Benefits
• Monitor resource utilization and helps manage resources accordingly.
• Provide insights into network operations (uptime, downtime, load etc).
• Account for all entities/devices attached to a network
• Identify and record the relationships between users, devices, and actions on the
network
• Identify underutilized resources which can be decommissioned to save cost.
• Notifies the network team about observed anomalies helping resolution and
downtime avoidance
• One level up security layer on top of intrusion detection systems and intrusion
prevention systems
• Machine learning algorithms for NTA can detect security threats even if they’re
encrypted.
Network Metadata Gathering Techniques
• Flow data (NetFlow, IPFIX, sFlow)
• OSI Layer 2-4 telemetry, such as source, destination, protocol, bytes
sent/received.
• Good start in understanding the basic trends of network traffic.
• Not enough for advanced cyber threat detection within application-layer
context.
• Network Packet Capture Files (PCAPs):
• PCAPs are detailed historical record of what happened on the network
• Requires high storage and data processing requirements.
• Traffic Inspection Technologies
• Extract meaningful Layer 3-7 metadata with an emphasis on Layer 7
application communications.
• Metadata can be used effectively for behavior cyber threat detection, while
only taking a fraction of the full PCAP volume.
• Information structured as time-series events corresponding to network
conversations:
• TCP/UDP/ICMP connections
• HTTP requests and replies
• DHCP leases
• SNMP messages
• SSH connections
• … and others.
Key Network Metadata Items
• Host and server IP address, port number, geo-location information
• DNS and DHCP information mapping devices to IP addresses
• Web page accesses, URL and header information
• Users to systems mapping using Domain Controller log data
• Encrypted web pages
• Encryption type
• Cypher and Hash
• Client/server FQDN (fully qualified domain names)
• Different objects hashes – such as JavaScript and images
Encrypted Traffic Analysis
• Encrypted Traffic Analysis collects network traffic metadata in a designated format
(IPFIX) using passive probes
• Internet Protocol Flow Information Export (IPFIX) is an accounting
technology that monitors traffic flows through a switch or router.
• IPFIX interprets the traffic to determine the client, server, protocol, and port
that is used.
• IPFIX counts the number of bytes and packets, and sends that data to an
IPFIX collector
• Attributes of the encrypted session between clients and servers are available
regardless of the client’s physical location or whether the server runs in the
cloud or dedicated data center.
• Provides insights about the traffic and allows for the identification of:
• out-of-date SSL certificates
• policy non-compliant certificates
• encryption strength
• old TLS versions that may contain faults or vulnerabilities.
• Machine Learning engine uses this data to perform behaviour analysis and anomaly
detection to identify malware and other threats.
Network Traffic Profiling
• Network traffic profiling detects malicious traffic patterns that might otherwise be
misclassified as benign, such as communications with legitimate sites used as part of
a command-and-control mechanism
• Profiling network traffic is similar to scan characterization.
• Scan or portscan, is a malicious behaviour in network traffic and its
characterization, including clustering and visualization, can facilitate the network
administrators to detect scan attacks.
• Profiling uses clustering algorithms or other data-mining and machine
learning methods to group similar network connections and search for
dominant behaviours/events.
• Profiling v/s Anomaly Detection.
• Anomaly detection aims to group similar normal data and build a normal
model so that we can identify outliers.
• Profiling focuses on grouping similar network behaviours and finding the
trends that these behaviours follow.
Network Traffic Profiling Categories
• Network profiling can be categorised in two groups:
• Specific applications:
• Require access to a system capable of capturing interactions between hosts
through empirical signatures or statistical analysis.
• Examples: gaming chatting, p2p, and suspicious traffic in FTP, HTTP, and
SMTP.
• Profiling common network behaviours
• Behaviours include communications between hosts and performance of the
hosts.
• Communication between hosts can be patterned using entropy, traffic
volume, feature distributions, and so on.
• Host performances appear in their port utilization to provide service or other
interactions.
• Host IP addresses and the associated port numbers are used for profiling, to
investigate the traffic flows.
Network Traffic Profiling Challenges
• Two major challenges in network profiling:
• huge amount of network traffic flows
• difficulties in detecting patterns in the traffic data
• There could be a large number of association rules to describe the correlation
between traffic flows
• Large number of rules can hamper profiling analysis and pattern recognition.
• Clustering methods along with data-mining techniques need to extract the dominant
patterns efficiently and effectively.
• Visualization ability can strengthen the role of network traffic profiling.
Network Traffic Profiling Data Collection
• Network traffic data can be collected online or offline.
• Offline profiling is sufficient for some applications, such as traffic classification at
the application level using graphlets.
• In data pre-processing, features are selected according to a profiling objective or
analysis afterward.
• A network profiling algorithm can be:
• Signature-based classification
• Data-mining or Machine-Learning clustering method
• IP blacklist filtering.
• Supervised Machine-Learning and clustering methods are used in the network
traffic profiling or pattern learning process.
ML Algorithms for Network Traffic Profiling
• NETMINE (Association Rules Mining and Classification)
• Auto Focus (Cluster Miner)
• Shared Nearest Neighbour (SNN)
• Auto Class
==================================================================
Poisoning Attack
A poisoning attack happens when an attacker deliberately contaminates the training data
of a machine learning model. This causes the model to make more mistakes when it
processes new data.
Key Points:
What happens? The attacker adds bad data into the training set. This tricks the
model into learning incorrect patterns.
Why is this dangerous? Models trained on user-generated data (like social media or
recommendation systems) are easy to target because anyone can contribute to the
data.
Example:
o Attackers created fake accounts on social media to spread misinformation,
biasing algorithms that decide what content users see.
o In a backdoor attack, a small "trigger" is added to inputs, like a subtle mark in
an image. When the model sees this trigger, it behaves in a specific way
designed by the attacker.
Microsoft’s chatbot Tay was trained to learn conversations from Twitter users. Trolls started
feeding offensive language to Tay. Since the model didn't have proper filters, it quickly started
mimicking harmful content and had to be shut down within 16 hours.
Evasion Attack
An evasion attack targets a trained model by manipulating the input data. Instead of
messing with the training phase, attackers craft their input to trick the model into giving
incorrect results.
Key Points:
What happens? The attacker slightly modifies input data to fool the model into
classifying it incorrectly.
Why is this dangerous? It bypasses security measures like spam filters or malware
detectors.
Examples:
o Spammers hide the spam content inside images to bypass text-based filters.
o Attackers spoof biometric systems by creating fake fingerprints or face scans.
An attacker uses trial and error to find weaknesses in the model. For example, they might
tweak spam emails by adding harmless words that confuse the filter and allow the email to
pass through.
Key Points:
What happens? The attacker sends many queries to the system, analyzes the output,
and builds a similar model.
Why is this dangerous? The stolen model can:
o Be used by competitors.
o Reveal private or sensitive training data.
Example:
An attacker probes a proprietary stock trading algorithm and uses the stolen model to make
profitable trades, taking advantage of the original owner's intellectual property.
These attacks show how machine learning systems can be exploited at various stages. By
understanding these techniques, we can design better defenses to secure AI and machine
learning models.
======================================================================
Adversarial Examples Generation Methods
• L-BFGS (Limited Memory Broyden–Fletcher–Goldfarb–Shanno)
• FGSM (Fast Gradient Sign Method)
• Iterative Fast Gradient Sign Method
• Black Box Attack Method
• Jacobian-based Saliency Map Attack (JSMA)
• Deep Fool Attack
• Carlini & Wagner Attack
• Generative Adversarial Networks (GANs)
• Zeroth-Order Optimisation Attack (ZOO)
Black-Box Attack Method
• Adversarial examples transfer well between different models
• An Adversarial example can be designed for a model X but will be effective
against any other model trained on a similar dataset.
• Attackers use the Transferability property of adversarial examples when
they do not have access to complete information about the model.
• Attacker generates adversarial examples using following steps:
• Query the target model with input Xi for i=1…n and record output Yi.
• With the training dataset (Xi, Yi), build another model (substitute model).
• Use a white-box algorithm to generate an adversarial examples for the
substitute model.
• Most of these Adversarial examples are going to transfer successfully and
become adversarial examples for the target model as well.
• Ref:
http://openaccess.thecvf.com/content_ECCV_2018/papers/Arjun_Nitin_Bha
goji_Practical_Black- box_Attacks_ECCV_2018_paper.pdf
The image you provided illustrates the concept of Adversarial Machine Learning and how it can
be used to attack machine learning models.
Adversarial Machine Learning is a type of attack where the goal is to trick a machine learning
model into making incorrect predictions. This is done by adding small, carefully crafted
perturbations to the input data that are designed to be imperceptible to humans but significant
enough to mislead the model.
Here's how the image breaks down the attack:
Traditional Machine Learning (Training Phase)
Training Data: The model is trained on a dataset of labeled examples (input data and
their corresponding correct labels).
Deep Learning Training: The model learns to extract features from the input data and
make predictions based on those features.
Predictive Model: The trained model is able to make accurate predictions on unseen
data.
Adversarial Attack Phase
Input Data: An attacker takes an original input image.
Noise: The attacker adds carefully crafted noise to the image.
Perturbated Data: The resulting image with added noise is called the "perturbed data."
Evading: The perturbed data is fed into the model, and the model is fooled into making
an incorrect prediction.
Falsified Labels: The attacker can then use the model's incorrect predictions to achieve
their malicious goals.
Why is this dangerous?
Adversarial attacks can have serious consequences, especially in safety-critical applications like
autonomous vehicles or medical diagnosis. For example, an attacker could trick a self-driving
car into misidentifying a stop sign as a speed limit sign, potentially leading to a serious accident.
How can we defend against these attacks?
There are several techniques to defend against adversarial attacks, including:
Adversarial training: Training the model on both clean and adversarial examples to
make it more robust.
Input transformations: Applying transformations to the input data to make it more
resistant to perturbations.
Feature squeezing: Reducing the dimensionality of the input data to make it harder for
attackers to craft effective perturbations.
Detection methods: Developing methods to detect adversarial examples before they are
fed to the model.
It is important to note that this is an ongoing area of research, and new attack and defense
techniques are constantly being developed.
==================================================================
The image you provided illustrates a strategy for protecting machine learning models against
adversarial attacks. Here's a breakdown of the approach:
Adversarial Attack:
1. Input Data: An attacker starts with an original input image.
2. Noise: The attacker adds carefully crafted noise to the image, designed to be
imperceptible to humans but significant enough to mislead the model.
3. Perturbated Data: The resulting image with added noise is called the "perturbed
data."
4. Evading: The perturbed data is fed into the model, and the model is fooled into
making an incorrect prediction.
5. Falsified Labels: The attacker can then use the model's incorrect predictions to
achieve their malicious goals.
Protection Against Adversarial Attacks:
The image outlines a three-step approach to enhance the model's resilience against
adversarial attacks:
1. Data Enhancing:
Training Data: The model is trained on a dataset of labeled examples (input data and
their corresponding correct labels).
Perturbated Data: The training data is augmented by adding carefully crafted noise
to create "perturbed data." This helps the model learn to recognize and handle
adversarial examples during training.
2. Algorithm Training:
Deep Learning Training: The model is trained on the enhanced dataset, which
includes both original and perturbed data. This helps the model learn to extract
features from the input data and make predictions based on those features, even in the
presence of adversarial noise.
3. Classification:
Predictive Model: The trained model is able to make accurate predictions on unseen
data, even if it contains adversarial noise.
Benefits of this approach:
Enhanced Robustness: By training the model on both clean and adversarial
examples, the model becomes more robust to attacks.
Improved Accuracy: The model is better equipped to handle real-world data, which
may contain some level of noise or perturbations.
Increased Security: This approach helps to protect the model from being exploited by
malicious attackers.
Additional Considerations:
Ongoing Research: Adversarial machine learning is an active area of research, and
new attack and defense techniques are constantly being developed.
Multiple Layers of Defense: It is important to implement a multi-layered defense
strategy, combining different techniques to achieve optimal protection.
By incorporating these defense mechanisms, machine learning models can be made more
resilient to adversarial attacks, ensuring their reliable and secure operation in various
applications.
===================================================================
Defensive Measures for Adversarial Attacks
• Threat Modelling: Formalize the attackers' goals and capabilities with respect to
the target system.
• Attack Simulation: Formalize the optimization problem the attacker tries to solve
according to possible attack strategies.
• Attack impact evaluation
• Countermeasure design
• Noise Detection: For evasion-based attack
• Information Laundering: Alter the information received by adversaries (for model
stealing attacks)
Defensive Distillation
• Generates a new model whose gradients are much smaller than the original
undefended model.
• If gradients are very small, techniques like FGSM or Iterative FGSM are not useful,
as the attacker would need great distortions of the input image to achieve a
sufficient change in the loss function.
• Defensive distillation introduces a new parameter T, called temperature, to the last
softmax layer of the network: