0% found this document useful (0 votes)
8 views8 pages

Information Securtiy

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

Information Securtiy

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1| I n t r u s i o n d e t e c t i o n a n d p re ve n t i o n 2| I n t r u s i o n d e t e c t i o n a n d p re ve n t i o n

Support Vector Machines then it belongs to Class 1 otherwise it belongs when all independent variables are equal to
to Class 0 zero.
• Good design principles are always good for security.
But several important design principles are particular to • Maximum likelihood estimation: The method
security and essential for building a solid, trusted used to estimate the coefficients of the logistic
operating system. • least privilege • economy of regression model, which maximizes the
mechanism • open design • complete mediation • likelihood of observing the data given the
permission based • separation of privilege • least • Logistic regression predicts the output of a model.
common mechanism • ease of use • These design categorical dependent variable. Logistic egression Equation
principles led to the development of “trusted” computer
systems or “trusted” operating systems. • It can be either Yes or No, 0 or 1, true or False,
Figure 1: Sample data points in < 2 . Blue diamonds are etc. It gives the probabilistic values which lie
positive examples and red squares are negative between 0 and 1.
examples. • In Logistic regression, instead of fitting a
We would like to discover a simple SVM that accurately regression line, we fit an “S” shaped logistic
discriminates the two classes. Since the data is linearly function, which predicts two maximum values
separable, we can use a linear SVM (that is, one whose (0 or 1).
mapping function Φ() is the identity function). By Types of Logistic Regression
inspection, it should be obvious that there are three • Binomial: There can be only two possible types of the
support vectors (see Figure 2):
dependent variables, such as 0 or 1
1 Introduction Many learning models make use of the
idea that any learning problem can be made easy with • Multinomial: In multinomial Logistic regression,
the right set of features. The trick, of course, is there can be 3 or more possible unordered types of the
discovering that “right set of features”, which in general In what follows we will use vectors augmented with a 1 dependent variable, such as “cat”, “dogs”, or “sheep”
is a very difficult thing to do. SVMs are another attempt as a bias input, and for clarity we will differentiate these
• Ordinal: In ordinal Logistic regression, there can be
at a model that does this. The idea behind SVMs is to with an over-tilde. So, if s1 = (10), then s˜1 = (101).
3 or more possible ordered types of dependent variables,
make use of a (nonlinear) mapping function Φ that Figure 3 shows the SVM architecture, and our task is to
such as “low”, “Medium”, or “High”.
transforms data in input space to data in feature space in find values for the αi such that
such a way as to render a problem linearly separable. Terminologies
The SVM then automatically discovers the optimal
• Odds: It is the ratio of something occurring to
separating hyperplane (which, when mapped back into
something not occurring.
input space via Φ−1 , can be a complex decision Loss Function
surface). SVMs are rather interesting in that they enjoy Since for now we have let Φ() = I, this reduces to • Log-odds: The log-odds, also known as the
both a sound theoretical basis as well as state-of-the-art logit function, is the natural logarithm of the • Regression – squared loss
success in real-world applications. To illustrate the basic odds. In logistic regression, the log odds of the • Logistic Regression – Log Loss
ideas, we will begin with a linear SVM (that is, a model dependent variable are modeled as a linear
that assumes the data is linearly separable). We will then combination of the independent variables and • where:
expand the example to the nonlinear case to demonstrate the intercept.
• (x,y)∈D is the data set containing many labeled
the role of the mapping function Φ^-1
Logistic Regression • Coefficient: The logistic regression model’s examples, which are (x,y) pairs.
estimated parameters, show how the
Introduction • y is the label in a labeled example. Since this is
independent and dependent variables relate to
logistic regression, every value of y must either
• Logistic regression is used for binary one another.
be 0 or 1.
classification where we use sigmoid function,
that takes input as independent variables and • yʹ is the predicted value (somewhere between 0
produces a probability value between 0 and 1. and 1), given the set of features in x.

• For example, we have two classes Class 0 and


Class 1 if the value of the logistic function for • Intercept: A constant term in the logistic
an input is greater than 0.5 (threshold value) regression model, which represents the log odds
3| I n t r u s i o n d e t e c t i o n a n d p re ve n t i o n 4| I n t r u s i o n d e t e c t i o n a n d p re ve n t i o n

Evaluation Metrics
Confusion Matrix
• A confusion matrix is a tabular summary of the
number of correct and incorrect predictions
made by a classifier.
• True positive (TP): Correctly predicting the
positive class
• True Negative (TN): Correctly predicting the
negative class
• True Positive (TP) = 6 (Model predicted that an
• False Positive (FP): Incorrectly predicting the animal is a cat and it actually is)
positive class
• TrueNegative(TN)=11(Modelpredictedthatani
• FalseNegative(FN):Incorrectlypredictingthene malis not a cat and it actually is not (it’s a dog).
gative class Macro average
• FalsePositive(Type1Error)(FP)=2(Modelpredic
Contd.. ted that animal is a cat but it actually is not (it’s Recall • The average of a metric — precision, recall or
a dog). f1-score — over all classes
• It is a measure of actual observations which are
• FalseNegative(Type2Error)(FN)=1(Modelpred predicted correctly, i.e. how many observations of • Precision (macro avg)
icted that animal is not a cat but it actually is) positive class are actually predicted as positive. It is also = (Precision of A + Precision of B + Precision
known as Sensitivity. of C)/3
Accuracy
Micro-Average
• Accuracy is a valid choice of evaluation for
classification problems which are well balanced and not • This metric is calculated by considering all the
skewed or there is no class imbalance. TP, TN, FP and FN for each class, adding them
• A good model is one which has high TP and TN rates, up and then using those to compute the metric’s
while low FP and FN rates. micro-average

• If you have an imbalanced dataset to work with, it’s • micro avg (precision) = sum(Tp) / (sum(TP) +
always better to use confusion matrix as your sum(FP))
evaluation criteria for your machine learning model. Weighted average
Example • This is simply the average of the metric values for
• Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, individual classes weighted by the support of that class.
‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘dog’,
‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’] F-measure / F1-Score

• Precision • The F1 score is a number between 0 and


1 and is the harmonic mean of precision and recall.
Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, • It is a measure of correctness that is achieved in true
‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, prediction. In simple words, it tells us how many
‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’] predictions are actually positive out of all the total
Confusion Matrix of CAT and Dog Classification positive predicted

Sensitivity & Specificity


Actual value vs predicted value
Multi Class classification
5| I n t r u s i o n d e t e c t i o n a n d p re ve n t i o n 6| I n t r u s i o n d e t e c t i o n a n d p re ve n t i o n

ROC
• The ROC curve is a graphical representation of a
model’s ability to distinguish between classes. It plots
the True Positive Rate (Sensitivity) against the False
Micro Averaging Positive Rate (1 — Specificity) for different
classification thresholds. Introduction
• Given a K, find a partition of K clusters to
optimise the chosen partitioning criterion (cost
function)
o global optimum: exhaustively search all partitions
• The K-means algorithm: a heuristic method
o K-means algorithm (MacQueen’67): each cluster is
represented by the centre of the cluster and the
algorithm converges to stable centriods of clusters.
DC
o K-means algorithm is the simplest partitioning
method for clustering analysis and widely used in data Medicine Weight pH- Index
mining applications.
A 1 1
4
B 2 1

K-means Clustering
C 4 3
K-means Algorithm
Outline
Given the cluster number K, the K-means algorithm is D 5 4
• Introduction carried out in three steps after initialisation:
• K-means Algorithm AB
• Example Initialisation: set seed points (randomly)
6
• How K-means partitions?
1)Assign each object to the cluster of the nearest seed
• K-means Demo Example
point measured with a specific distance metric
• Relevant Issues • Step 1: Use initial seed points for partitioning
• Application: Cell Neulei Detection • Summary 2)Compute new seed points as the centroids of the
clusters of the current partition (the centroid is the
2
centre, i.e., mean point, of the cluster)
Introduction • Partitioning Clustering Approach
3)Go back to Step 1), stop when no more new
• – a typical clustering analysis approach via assignment (i.e., membership in each cluster no longer
iteratively partitioning training data set to learn changes)
a partition of the given data space
5
• – learning a partition on a data set to produce
Example
several non-empty clusters (usually, the number
Output from Python Code of clusters given in advance) • Problem
Suppose we have 4 types of medicines and each has two
• – in principle, optimal partition achieved via
attributes (pH and weight index). Our goal is to group
minimising the sum of squared distance to its
these objects into K=2 group of medicine.
“representative object” in each cluster
7| I n t r u s i o n d e t e c t i o n a n d p re ve n t i o n 8| I n t r u s i o n d e t e c t i o n a n d p re ve n t i o n

Example 11
• Step 2: Renew membership based on new centroids
Exercise
Compute the distance of all objects to the new centroids
For the medicine data set, use K-means with the
Assign the membership to objects Manhattan distance metric for clustering analysis by
setting K=2 and initialising seeds as C1 = A and C2 =
C. Answer three questions as follows:
1. How many steps are required for convergence?
2. What are memberships of two clusters after
convergence?
3. What are centroids of two clusters after
convergence?

Medicine Weight pH- Index


7
10 A 1 1
Example
• Step 2: Compute new centroids of the current partition Example B 2 1
• Step 3: Repeat the first two steps until its convergence
Knowing the members of each cluster, now we compute C 4 3
the new centroid of each group based on these new Compute the distance of all objects to the new centroids
memberships. D 5 4
Stop due to no new assignment Membership in each
cluster no longer change

9
Example
• Step 3: Repeat the first two steps until its convergence
Knowing the members of each cluster, now we compute
the new centroid of each group based on these new
memberships.

12
How K-means partitions?
When K centroids are set/fixed, they partition the whole
data space into K mutually exclusive subspaces to form
a partition.
A partition amounts to a
Voronoi Diagram
9| I n t r u s i o n d e t e c t i o n a n d p re ve n t i o n 10 | I n t r u s i o n d e t e c t i o n a n d p r e v e n t i o n

Changing positions of centroids leads to a new • – sensitive to initial seed points Step 2: Convert the image from RGB colour space to
partitioning. L*a*b* colour space
• – converge to a local optimum: maybe
an unwanted solution Unlike the RGB colour model, L*a*b* colour is
designed to approximate human vision.
• Other problems
There is a complicated transformation between RGB
• – Need to specify K, the number of
and L*a*b*.
clusters, in advance
(L*, a*, b*) = T(R, G, B). (R, G, B) = T’(L*, a*, b*).
• – Unable to handle noisy data and
outliers (K-Medoids algorithm)
• – Not suitable for discovering clusters 17
with non-convex shapes
Application
• – Applicable only when mean is
• Colour-Based Image Segmentation Using K-means
defined, then what about categorical
Step 3: Undertake clustering analysis in the (a*, b*)
data? (K-mode algorithm)
colour 19
• – how to evaluate the K-mean
space with the K-means algorithm •
performance?
• In the L*a*b* colour space, each pixel has a Application
15
properties or feature vector: (L*, a*, b*).
Application Colour-Based Image Segmentation Using K-means
13 • Like feature selection, L* feature is discarded.
• Colour-Based Image Segmentation Using K-means As a result, each pixel has a feature vector (a*, Step 5: Create Images that Segment the H&E Image by
b*). Colour
Step 1: Loading a colour image of tissue stained with
haematoxylin and eosin (H&E) • Applying the K-means algorithm to the image •
in the a*b* feature space where K = 3 (by Applythelabelandthecolourinformationofeachpixeltoac
applying the domain knowledge. hieve separate colour images corresponding to three
clusters.

18
Application
• Colour-Based Image Segmentation Using K-means
Step 4: Label every pixel in the image using the results
from
K-means Demo
K-means Clustering (indicated by three different grey
K-means Demo levels)

14
Relevant Issues
• Efficient in computation 16
– O(tKn), where n is number of objects, K is number of •
clusters, and t is number of iterations. Normally, K, t <<
n. Application

• Local optimum Colour-Based Image Segmentation Using K-means


11 | I n t r u s i o n d e t e c t i o n a n d p r e v e n t i o n 12 | I n t r u s i o n d e t e c t i o n a n d p r e v e n t i o n

Summary • l Example: A data-set has five objects arbitrary threshold, it is called a single-linkage
{a,b,c,d,e} l AGNES (Agglomerative Nesting) algorithm.
• K-means algorithm is a simple yet popular l DIANA (Divisive Analysis)
method for clustering analysis
• Its performance is determined by initialisation
and appropriate distance measure
• There are several variants of K-means to
overcome its weaknesses
• – K-Medoids: resistance to noise
and/or outliers
• – K-Modes: extension to categorical
data clustering analysis
AGNES (Agglomerative Nesting)
• – CLARA: extension to deal with large
data sets • Initially, AGNES places each objects into a

cluster of its own. The clusters are then merged
• – Mixture models (EM algorithm): step-by-step according to some criterion. For • Complete Linkage Algorithm:
handling uncertainty of clusters example, cluster C1 and C2 may be merged if • When an algorithm uses the maximum-distance
an object in C1 and object in C2 form the dmax(Ci,Cj), to measure the distance between
minimum Euclidean distance between any two
clusters, it is sometimes called a farthest-
• Hierarchical Clustering objects from different clusters.
neighbor clustering algorithm. If the clustering
20 process is terminated when the maximum
• l A hierarchical clustering method works by • l This is single-linkage approach in that each
Application grouping objects into a tree of clusters. cluster is represented by all of the objects in the distance between nearest clusters exceed an
cluster, and the similarity between two clusters arbitrary threshold, it is called a complete-
Colour-Based Image Segmentation Using K-means • l Hierarchical clustering methods can be further linkage algorithm.
is measured by similarity of the closest pair of
classified as either agglomerative or divisive,
Step 6: Segment the nuclei into a separate image with data points belonging to different clusters. • The distance between two clusters is
depending on whether the hierarchical
the L* feature In cluster 1, there are dark and light blue decomposition is formed in a buttom-up Distance between clusters determined by the most distant nodes in two
objects. The dark blue objects (merging) or top-down (splitting) fashion. clusters.
• Four widely used measure for distance between
correspond to nuclei (with the domain knowledge). Agglomerative hierarchical clustering clusters are as follows, where p - p¢ is the
L* feature specifies the brightness values of each distance between two objects or points, p and p’
colour. • This bottom-up strategy starts by placing each
;
With a threshold for L*, we achieve an image containing object in its own cluster and then merges these
the nuclei only. atomic clusters into larger and larger clusters, • mi is the mean for clusters, Ci
until all of the objects are in a single cluster or
until certain termination conditions are • ni is the number of objects Ci Minimum
satisfied. Most hierarchical clustering methods Distance:
belong to this category. • Minimum Distance:
Divisive Hierarchical Clustering • Maximum Distance: Mean Distance: Average
Distance:
• l This top-down strategy does the reverse of • Single Linkage Algorithm: • The above minimum and maximum measures
agglomerative hierarchical clustering by represent two extremes in measuring the
• When an algorithm uses the minimum-distance
starting with all objects in one cluster. It distance between clusters.
dmin(Ci,Cj), to measure the distance between
subdivides the clusters into smaller and smaller
clusters, it is sometimes called nearest-neighbor • They tend to be overly sensitive to outliers or
pieces, until each object form a cluster on its
clustering algorithm. Moreover, if the noisy data.
own or until it satisfies certain termination
clustering process is terminated when the
conditions, such as a desired number of cluster
distance between nearest clusters exceed an
or the diameter of each cluster is within a
21 certain threshold.
13 | I n t r u s i o n d e t e c t i o n a n d p r e v e n t i o n 14 | I n t r u s i o n d e t e c t i o n a n d p r e v e n t i o n

• The use of mean or average distance is


compromise between min. and max. distance
and overcomes the outlier sensitivity problem.
• Single Link

• Single Link Agglomerative Clustering


• Use maximum similarity of pairs:

Can result in “straggly” (long and thin)


• • clusters due to chaining effect.
After merging ci and cj, the similarity of the
• Dendogram – Single link
• resulting cluster to another cluster, ck, is:
sim((ci Ècj ),ck ) = max(sim(ci ,ck ),sim(cj ,ck ))

• Single Link Example


• Closest pair of clusters

• • Many variants to defining closest pair of


clusters

• • Single-link

Distance of the “closest” points (single-link)
• Complete-link
Distance of the “furthest” points
• Centroid
l Distance of the centroids (centers of gravity)
• • (Average-link)
• l Average distance between pairs of elements



15 | I n t r u s i o n d e t e c t i o n a n d p r e v e n t i o n

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy