0% found this document useful (0 votes)
3 views106 pages

ML Unit 3

The document provides an overview of classification in machine learning, detailing its definition, types (binary, multi-class, and multi-label), and various algorithms such as K-Nearest Neighbor, Logistic Regression, Decision Trees, Naive Bayes, and Support Vector Machines. It outlines the steps involved in building a classification model, including problem identification, data preprocessing, algorithm selection, and evaluation. Additionally, it discusses the strengths and weaknesses of different classification algorithms and their applications in various fields.

Uploaded by

jallusowjanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views106 pages

ML Unit 3

The document provides an overview of classification in machine learning, detailing its definition, types (binary, multi-class, and multi-label), and various algorithms such as K-Nearest Neighbor, Logistic Regression, Decision Trees, Naive Bayes, and Support Vector Machines. It outlines the steps involved in building a classification model, including problem identification, data preprocessing, algorithm selection, and evaluation. Additionally, it discusses the strengths and weaknesses of different classification algorithms and their applications in various fields.

Uploaded by

jallusowjanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

Unit-III

Classification
Topics to Be Covered
• What is Classification?
• General approach to Classification
• K-Nearest Neighbor Algorithm
• Logistic regression
• Decision Trees
• Naive Bayesian
• Support Vector Machine (SVM)
What is Classification?
• Classification is a supervised machine learning method where
the model tries to predict the correct label of a given input
data.

• In classification, the model is fully trained using the training


data, and then it is evaluated on test data before being used
to perform prediction on unseen data.

For Example, an algorithm can learn to predict whether

🡪 a given email is spam or ham (no spam)

🡪 a tumor is malignant or benign’


Classification model

🡪 A target feature is
categorical type

🡪 The target categorical


feature is known as
class .
Classification Terminologies In ML
• Classifier –A classifier is an algorithm that learns from labeled data and
assigns a category (or class) to new, unseen data.
• Classification Model –A classification model is the result you get after training
a classifier on your dataset.It’s the trained system that can predict or classify
new data based on what it learned during training.
• Feature –A feature is a single, measurable input that helps describe the data.
It’s one of the many pieces of information that the model uses to learn and
make decisions.
• Binary Classification – It is a type of classification with two outcomes, for eg –
either True or False / 1 or 0 / Yes or No

• Multi-Class Classification – The classification with more than two classes, in


multi-class classification each sample is assigned to one and only one label or
target.

• Multi-label Classification – This is a type of classification where each sample is


assigned to a set of labels or targets. (More than one label for a instance)
A model that tags an image with multiple categories like ["beach", "sunset",
"vacation"].
Binary Classification
Multi-Class Classification
Multi-label Classification
Classification Usecases
• Image classification
• Disease prediction
• Win–loss prediction of games
• Prediction of natural calamity such as earthquake, flood, etc.
• Handwriting recognition
• Document Classification
• Spam Filters
Classification Model Steps in ML

1. Problem Identification
2. Identification of Required Data
3. Data Pre-processing
4. Definition of Training Data Set
5. Algorithm Selection
6. Training
7. Evaluation with the Test Data Set
• Problem Identification: Identifying the problem is the first step in the
supervised learning model. The problem needs to be a well-formed
problem,i.e. a problem with well-defined goals and benefit, which has a
long-term impact.
• Identification of Required Data: On the basis of the problem identified
above, the required data set that precisely represents the identified
problem needs to be identified/evaluated. For example: If the problem
is to predict whether a tumour is malignant or benign, then the
corresponding patient data sets related to malignant tumour and
benign tumours are to be identified
• Data Pre-processing: This is related to the cleaning/transforming the
data set. This step ensures that all the unnecessary/irrelevant data
elements are removed. Data pre-processing refers to the transformations
applied to the identified data before feeding the same into the
algorithm. Because the data is gathered from different sources, it is
usually collected in a raw format and is not ready for immediate analysis.
This step ensures that the data is ready to be fed into the machine
learning algorithm
• Definition of Training Data Set: Before starting the analysis, the user
should decide what kind of data set is to be used as a training set. In the
case of signature analysis, for example, the training data set might be a
single handwritten alphabet, an entire handwritten word (i.e. a group of
the alphabets) or an entire line of handwriting (i.e. sentences or a group
of words). Thus, a set of ‘input meta-objects’ and corresponding ‘output
meta-objects’ are also gathered. The training set needs to be actively
representative of the real-world use of the given scenario. Thus, a set of
data input (X) and corresponding outputs (Y) is gathered either from
human experts or experiments
• Algorithm Selection: This involves determining the structure of the
learning function and the corresponding learning algorithm. This is the
most critical step of supervised learning model. On the basis of various
parameters, the best algorithm for a given problem is chosen
• Training: The learning algorithm identified in the previous step is
run on the gathered training set for further fine tuning. Some
supervised learning algorithms require the user to determine
specific control parameters (which are given as inputs to the
algorithm). These parameters (inputs given to algorithm) may
also be adjusted by optimizing performance on a subset (called
as validation set) of the training set
• Evaluation with the Test Data Set: Training data is run on the
algorithm, and its performance is measured here. If a suitable
result is not obtained, further training of parameters may be
required
Algorithms for Classification
• k nearest neighbour
• Logistic regression
• Decision tree
• support vector machine
• Naive bayes
• Random forest
How Does KNN work?

• Suppose we have height, weight and T-shirt


size of some customers and we need to
predict the T-shirt size of a new customer
given only height and weight information we
have. Data including height, weight and T-
shirt size information is shown below

If a customer has height 161cm and weight


61kg then what would be his T-Shirt size?
How Does KNN work?
• Step 1 : Calculate Similarity based on distance function
Another Problem
KNN
• Supervised learning (used for classification).
• Basic Idea: Classifies a new data point based on the majority label of its k-
nearest neighbors.
• Lazy Learning: Does not train a model; stores the training data and uses it
directly during prediction.
• Inspiration: “Similar things exist in close proximity” — neighbors tend to share the
same class.
• In kNN algorithm, the unknown and unlabelled data which comes for a prediction
problem is judged on the basis of the training data set elements which are similar to
the unknown element. So, the class label of the unknown element is assigned on
the basis of the class labels of the similar training data set elements
• Here, it is not possible during model testing to know the actual label value of an
unknown data. Therefore, the test data, which is a part of the labelled input data, is
used for this purpose. If the class value predicted for most of the test data elements
matches with the actual class value that they have, then we say that the
classification model possesses a good accuracy.
• though there are many measures of similarity, the most
common approach adopted by kNN to measure similarity
between two data elements is Euclidean distance. Considering
a very simple data set having two features (say f1 and f2 ),
Euclidean distance between two data elements d1 and d2 can
be measured by

• Where f11=value of feature f1 for data element d1


f12= value of feature f1 for data element d2
f21= value of feature f2 for data element d1
F22= value of feature f2 for data element d2
• how many similar elements should be considered. The answer lies in the value
of ‘k’ which is a user-defined parameter given as an input to the algorithm. In
the kNN algorithm, the value of ‘k’ indicates the number of neighbours that
need to be considered. For example, if the value of k is 3, only three nearest
neighbours or three training data elements closest to the test data element are
considered. Out of the three data elements, the class which is predominant is
considered as the class label to be assigned to the test data. In case the value
of k is 1, only the closest training data element is considered. The class label of
that data element is directly assigned to the test data element.

But it is often a tricky decision to decide the value of k. The reasons are as follows:

❖ If the value of k is very large (in the extreme case equal to the total number of
records in the training data), the class label of the majority class of the training
data set will be assigned to the test data regardless of the class labels of the
neighbours nearest to the test data.

❖ If the value of k is very small (in the extreme case equal to 1), the class value of
a noisy data or outlier in the training data set which is the nearest neighbour to
the test data will be assigned to the test data.
The best k value is somewhere between these two extremes. Few strategies,
highlighted below, are adopted by machine learning practitioners to arrive at a
value for k.
• One common practice is to set k equal to the square root of the number of
training records.
• An alternative approach is to test several k values on a variety of test data sets
and choose the one that delivers the best performance.
• Another interesting approach is to choose a larger value of k, but apply a
weighted voting process in which the vote of close neighbours is considered
more influential than the vote of distant neighbours
Why kNN is a Lazy Learner
• kNN is called a lazy learner because:
• It does not abstract or generalize any model from the training data.
• Instead, it stores the data and uses it at prediction time.
• There is no explicit training phase or model built.
.How kNN Algorithm Works
1. Input Required:
1. A training dataset with input features and labeled output.
2. A test data point for which the class is to be predicted.
3. A value of ‘k’, which defines the number of neighbors to
consider.
2. Process:
1. Calculate the distance (usually Euclidean) between the test
point and all points in the training dataset.
2. Identify the k closest (smallest distance) training points.
3. Perform majority voting among those k neighbors to assign
the class label to the test point.
Strengths of kNN
• Simple to implement and understand.
• No training time required.
• Works well in recommender systems and some classification tasks.
• Adapts naturally to multi-class problems.
Weaknesses of kNN
• No real learning happens – relies completely on training data.
• Slow prediction time, especially with large datasets.
• High memory usage as the entire training data must be stored.
• Performance degrades if irrelevant features or unscaled features are
used.
Applications of kNN
1. Recommender Systems:
1. Suggest items (movies, products) based on what similar users liked.

2. Information Retrieval:
1. Find documents or articles similar to a given query.

3. Pattern Recognition:
1. Handwriting, face, or voice recognition based on closest match.

4. Medical Diagnosis:
1. Classify patient data based on past patient records.
Metrics to Evaluate ML Classification Algorithms
Metrics to Evaluate ML Classification Algorithms
• True positives: The number of positive observations the model
correctly predicted as positive.

• False-positive: The number of negative observations the model


incorrectly predicted as positive.

• True negative: The number of negative observations the model


correctly predicted as negative.

• False-negative: The number of positive observations the model


incorrectly predicted as negative.
Metrics to Evaluate ML Classification Algorithms
Metrics to Evaluate ML Classification Algorithms
Decision tree
• Decision tree learning is one of the most widely adopted algorithms for classification. As the name
indicates, it builds a model in the form of a tree structure

• A decision tree is used for multi-dimensional analysis with multiple classes.

• The goal of decision tree learning is to create a model that predicts the value of the output variable
based on the input variables in the feature vector.

• Every decision node picks a feature to split the data. The branches represent possible values (or value
ranges) of that feature. This process continues until we reach a leaf node, which gives the final prediction.

• The tree terminates at different leaf nodes (or terminal nodes) where each leaf node represents a
possible value for the output variable

• The output variable is determined by following a path that starts at the root and is guided by the
values of the input variables.
Decision tree
• A decision tree is usually represented in the format depicted in Figure 7.8.

A decision tree consists


of three types of nodes:
1. Root Node
2. Branch Node
3. Leaf Node

• Each internal node (represented by boxes) tests an attribute (represented as ‘A’/‘B’


within the boxes). Each branch corresponds to an attribute value (T/F) in the above
case. Each leaf node assigns a classification. The first node is called as ‘Root’ Node.
Branches from the root node are called as ‘Leaf’ Nodes where ‘A’ is the Root Node
(first node). ‘B’ is the Branch Node. ‘T’ & ‘F’ are Leaf Nodes.
Decision Tree Example
A decision tree has 3 types of nodes:
1. Root Node
1. The topmost node of the tree.
2. Represents the first feature used to split the data.

2. Branch Node (Internal Node)


1. Represents decisions (splits) based on feature values.
2. Can have multiple child nodes.

3. Leaf Node (Terminal Node)


1. Represents the final outcome (e.g., classification result like “Buy” or “Don’t
Buy”).
Decision tree
• There are many implementations of decision tree, the most prominent ones
being C5.0, CART (Classification and Regression Tree), CHAID (Chi-square
Automatic Interaction Detector) and ID3 (Iterative Dichotomiser 3) algorithms.

• In the process of building a decision tree, the algorithm keeps splitting the
dataset into smaller groups (partitions) based on certain feature values.
Now, after a split happens:
→If each partition contains data from only one class (i.e., only “Yes”
or only “No”),
→Then we say the split has resulted in pure partitions.

• Entropy is a measure (Randomness) of impurity of an attribute or feature


adopted by many algorithms such as ID3 and C5.0.

• Let us say S is the sample set of training examples. Then, Entropy (S) measuring
the impurity of S is defined as

where c is the number of different class labels and p refers to the


proportion of values falling into the i-th class label.
Decision Tree
Decision Tree Algorithm
1. Select the best attribute using Attribute Selection Measures(ASM) to
split the records.

2. Make that attribute a decision node and breaks the dataset into
smaller subsets.

3. Starts tree building by repeating this process recursively for each


child until one of the conditions match:

• All the tuples belong to the same attribute value.

• There are no more remaining attributes.

• There are no more instances.


1. Start with the full dataset as the root.

2. If all examples belong to the same class, return a leaf node with that class.

3. If the feature set is empty, return a leaf node with the majority class.

4. Otherwise:

5. a. Calculate the Entropy of the current dataset.

6. b. For each attribute:


– Calculate the Information Gain after splitting on that attribute.
7. c. Choose the attribute with the highest Information Gain as the decision node.

8. Split the dataset based on the selected attribute’s values.

9. Recursively apply the algorithm on each subset (child node).

10. Stop when:


– All attributes are used up
– The data is perfectly classified
– Or some other stopping condition is met
Strengths of decision tree
1. It produces very simple understandable rules. For smaller trees, not
much mathematical and computational knowledge is required to
understand this model.

2. Works well for most of the problems.

3. It can handle both numerical and categorical variables.

4. Can work well both with small and large training data sets.

5. Decision trees provide a definite clue of which features are more


useful for classification.
Weaknesses of decision tree
1. Decision tree models are often biased towards features having more number of possible values, i.e. levels.

2. This model gets overfitted or underfitted quite easily.

3. Decision trees are prone to errors in classification problems with many classes and relatively small number of
training examples.

4. A decision tree can be computationally expensive to train.

5. Large trees are complex to understand.

Real-World Applications:

• Student performance prediction

• Loan approval systems

• Medical diagnosis

• Customer churn analysis

• Fraud detection
Logistic Regression (Binary/Discrete classification)
• Logistic regression is both classification and regression technique depending on the scenario
used.

• Logistic Regression is a Machine Learning classification algorithm that is used to predict the
probability of a categorical dependent variable.

• In logistic regression, the dependent variable is a binary variable that contains data coded as
1 (yes, success, etc.) or 0 (no, failure, etc.).

• In other words, the logistic regression model predicts P(Y=1) as a function of X.

• In the logistic regression model, a chi-square test is used to measure how well the logistic
regression model fits the data.

• The goal of logistic regression is to predict the likelihood that Y is equal to 1 given certain
values of X.
Logistic regression is a statistical method used when the output variable is categorical (like
Yes/No, True/False, or 1/0). It tells you the probability of a certain event happening.

• It’s used for classification (mostly binary).

• The output (dependent variable Y) is binary (0 or 1).

• The input variables (independent variables X) are continuous or categorical

Instead of drawing a straight line (like in Linear Regression), Logistic Regression draws an S-
shaped curve (called a sigmoid or logistic curve), which fits the probability of Y = 1 given
X.

This curve ensures:

• Output probabilities are always between 0 and 1.

• As X increases, the probability gradually approaches 1.

• As X decreases, the probability gradually approaches 0.


Logistic Regression
• For example, we might try to predict whether or not a small project will
succeed or fail on the basis of the number of years of experience of
the project manager handling the project.

• To illustrate this, it is convenient to segregate years of experience of


project managers into categories (i.e. 0–8, 9–16, 17–24, 25–32, 33–40). If
we compute the mean score on Y (averaging the 0s and 1s) for each
category of years of experience, we will get something like
Logistic Regression
• An explanation of logistic regression begins with an explanation of the logistic
function, which always takes values between zero and one. The logistic
formulae are stated in terms of the probability that Y = 1, which is referred to as
P. The probability that Y is 0 is 1 − P.
• Probability (P) can also be computed from the regression equation.

• So, if we know the regression equation, we could, theoretically,


calculate the expected probability that Y = 1 for a given value of X.


• Let us say we have a model that can predict whether a person is
male or female on the basis of their height.

• Given a height of 150 cm, we need to predict whether the


person is male or female.
Naïve Bayes Classifier
• Naïve Bayes is a probabilistic machine learning algorithm based on
the Bayes Theorem, used in a wide variety of classification tasks.

• It is a classification technique based on Bayes’ Theorem with an


independence assumption among predictors. In simple terms, a Naive
Bayes classifier assumes that the presence of a particular feature in a
class is unrelated to the presence of any other feature.

• The Naive Bayes classifier works on the principle of conditional


probability, as given by the Bayes theorem.
•Definition: Naive Bayes is a classification algorithm based on Bayes’
Theorem, which calculates the probability of a class given a set of
features.
•Naivety Assumption: It assumes that all features are independent of
each other given the class label. That’s why it's called "naive".
•Bayes’ Theorem:
P(C∣X)=P(X∣C)⋅P(C)/P(X)
•P(C∣X) = Posterior probability of class C given predictor X
•P(X∣C) = Likelihood
•P(C) = Prior probability of class
•P(X) = Marginal probability of predictor
• Bayes’ Theorem is used to update prior probabilities into posterior
probabilities using new data.
Key Features:
• Requires very little training data to estimate model parameters.
• Only means and variances of features are needed, not the
entire covariance matrix.
• Assumes all features are equally important and independent.
• Despite this oversimplified assumption, it often performs
surprisingly well in practice.
Naïve Bayes Classifier
• Bayes’ Theorem is a simple mathematical formula used for calculating
conditional probabilities.
• Conditional probability is a measure of the probability of an event
occurring given that another event has occurred.
• The formula is: —
Which tells us: how often A
happens given that B happens,
written P(A|B) also called posterior
probability, When we know: how
often B happens given that A
happens, written P(B|A) and how
likely A is on its own,
written P(A) and how likely B is on its
own, written P(B).
• Here in our dataset, we need to classify whether the car is stolen,
given the features of the car.
• According to this example, Bayes theorem can be rewritten as:

• The variable y is the class variable(stolen?), which represents if the car is


stolen or not given the conditions. Variable X represents the
parameters/features.
• X is given as,

Here x1, x2…, xn represent the features, i.e they can be mapped to Color,
Type, and Origin.
• By substituting for X and expanding using the chain rule we get,

• For all entries in the dataset, the denominator does not change, it
remains static. Therefore, the denominator can be removed and
proportionality can be injected.
• In our case, the class variable(y) has only two outcomes, yes or no. There could
be cases where the classification could be multivariate. Therefore, we have to
find the class variable(y) with maximum probability.

• Using the above function, we can obtain the class, given the predictors /
features.
• Step 1: First construct a frequency table. A frequency table is drawn
for each attribute against the target outcome

• Step 2: Create Likelihood table by finding the probabilities

• Step 3: Now, use Naive Bayesian equation to calculate the posterior


probability for each class. The class with the highest posterior
probability is the outcome of the prediction.
Step1&2

• Frequency and Likelihood tables of ‘Color’


Step1&2

• Frequency and Likelihood tables of ‘Type’


Step1&2

• Frequency and Likelihood tables of ‘Origin’


• As per the equations discussed above, we can calculate the
posterior probability P(Yes | X) as :
P( No | X ):

• Since 0.144 > 0.048, Which means given the features RED SUV and
Domestic, our example gets classified as ’NO’ the car is not stolen.
Applications of Naive Bayes Algorithms
• Real-time Prediction

• Multi-class Prediction

• Text classification/ Spam Filtering/ Sentiment Analysis

• Recommendation System
Support Vector Machine (SVM)
• A Support Vector Machine (SVM) is a supervised machine learning algorithm
that can be employed for both classification and regression purposes.

• SVMs are more commonly used in classification problems

• It uses the non linear mapping to transform the original training data into higher
dimension

• SVM is based on the concept of a surface, called a hyperplane, which draws a


boundary between data instances plotted in the multi-dimensional feature space.

• the SVM algorithm builds an N-dimensional hyperplane model that assigns


future instances into one of the two possible output classes.
Support Vector Machine (SVM)
• In SVM, a model is built to discriminate the data instances belonging to
different classes.

• Let us assume for the sake of simplicity that the data instances are
linearly separable. In this case, when mapped in a two dimensional
space

• The data instances belonging to different classes fall in different sides of


a straight line drawn in the two-dimensional space as depicted in Figure
7.15a.

• If the same concept is extended to a multidimensional feature space,


the straight line dividing data instances belonging to different classes
transforms to a hyperplane as depicted in Figure 7.15b.
Support Vector Machine (SVM)
• An SVM model is a representation of the input instances as points in the feature space,
which are mapped so that an apparent gap between them divides the instances of
the separate classes.

• In other words, the goal of the SVM analysis is to find a plane, or rather a hyperplane,
which separates the instances on the basis of their classes.

• New examples (i.e. new instances) are then mapped into that same space and
predicted to belong to a class on the basis of which side of the gap the new instance
will fall on.

• In summary, in the overall training process, the SVM algorithm analyses input data and
identifies a surface in the multi-dimensional feature space called the hyperplane
Support Vector Machine (SVM)
Hyperplane:

• In a p-dimensional space, a hyperplane is a flat affine subspace of


dimension p − 1.

• In two dimensions, a hyperplane is a flat one-dimensional subspace - in


other words, a line.

• In three dimensions, a hyperplane is a flat two-dimensional subspace -


that is, a plane

• In p > 3 dimensions, it can be hard to visualize a hyperplane, but the


notion of a (p − 1)-dimensional flat subspace still applies
Hyperplane:
Hyperplane:
A Point

Is on Hyperplane if

Is on one side of hyperplane if

Is on other side of hyperplane if


Classification using Hyperplane
n×p data matrix X X 1 2 … p
1
n training observations in p-dimensional
2

n

These observations fall into two classes


y1, . . . , yn ∈{−1, 1} where −1 represents one class and 1 the other class.
test observation
Our goal is to develop a classifier based on the training data that will
correctly classify the test observation using its feature measurements.
Support Vector Machine (SVM)
• The operation of the SVM algorithm is based on finding the hyperplane
that gives the largest minimum distance to the training examples.

what if the data points are not linearly


seperable? In that case how do we find
the optimal hyperplane.
Classification using Hyperplane

A new approach based on Separating Hyperplane

Various approaches like Logistic Regression, etc.

Separating Hyperplane

If f(x*) is +ve, class = 1


If f(x*) is -ve, class = -1
If f(x*) is far from 0, confident
on class.
If f(x*) is near to 0, less
confident on class.
3 hyperplanes on classes x1 and x2 data
A classifier that is based on a separating hyperplane leads to a linear decision boundary.
Maximal Margin Classifier
Maximal margin hyperplane also known as the optimal separating
hyperplane It’s a classification technique that tries to find the best boundary (a
hyperplane) to separate two classes.
This boundary is called the maximal margin hyperplane or optimal
separating hyperplane.
❑Compute the (perpendicular) distance from each
training observation to a given separating hyperplane
❑The smallest distance is the minimal distance from
the hyperplane, and is known as the margin
❑Find Minimal distance for all hyperplanes.
❑The maximal margin hyperplane is the separating
hyperplane for which the margin is largest

❑classify a test observation based on which side of


the maximal margin hyperplane it lies.
❑This is known as the maximal margin classifier.
Support Vector Classifier
Observations of classes are not be separable by a hyperplane.

Even if a separating hyperplane does exist, some instances in which a classifier might
not be desirable.

A classifier based on a separating hyperplane will necessarily perfectly classify all of the
training observations

Tiny
Margin
Support Vector Machine (SVM)
• Support Vectors: Support vectors are the data points (representing
classes), the critical component in a data set, which are near the
identified set of lines (hyperplane). If support vectors are removed, they
will alter the position of the dividing hyperplane.

• Hyperplane and Margin: For an N-dimensional feature space,


hyperplane is a flat subspace of dimension (N−1) that separates and
classifies a set of data. For example, if we consider a two-dimensional
feature space (which is nothing but a data set having two features and
a class variable), a hyperplane will be a one-dimensional subspace or a
straight line. In the same way, for a threedimensional feature space
(data set having three features and a class variable), hyperplane is a
two-dimensional subspace or a simple plane.
• Margin A margin is a gap between the two lines on the closest class points.
This is calculated as the perpendicular distance from the line to support
vectors or closest points. If the margin is larger in between the classes, then it
is considered a good margin, a smaller margin is a bad margin.
SVM Kernels
• The SVM algorithm is implemented in practice using a kernel.
• A kernel transforms an input data space into the required form.
• SVM uses a technique called the kernel trick.
• Here, the kernel takes a low-dimensional input space and transforms it into a
higher dimensional space.
• In other words, you can say that it converts non separable problem to
separable problems by adding more dimension to it.
• It is most useful in non-linear separation problem. Kernel trick helps you to
build a more accurate classifier
• Linear Kernel : A linear kernel can be used as normal dot
product any two given observations.
• Polynomial Kernel : A polynomial kernel is a more generalized
form of the linear kernel. The polynomial kernel can distinguish
curved or nonlinear input space.
• Radial Basis Function Kernel : The Radial basis function kernel is
a popular kernel function commonly used in support vector
machine classification. RBF can map an input space in infinite
dimensional space.
Strengths of SVM
• SVM can be used for both classification and regression.
• It is robust, i.e. not much impacted by data with noise or outliers.
• The prediction results using this model are very promising
Weaknesses of SVM
• SVM is applicable only for binary classification, i.e. when there are only two
classes in the problem domain.
• The SVM model is very complex – almost like a black box when it deals with a
high-dimensional data set. Hence, it is very difficult and close to impossible
to understand the model in such cases.
• It is slow for a large dataset, i.e. a data set with either a large number of
features or a large number of instances.
• It is quite memory-intensive
APPLICATIONS
•Bioinformatics and Medical Diagnosis
•Face Detection
•Text Classification
•Image Recognition
•Speech Recognition
•Banking and Finance
•Intrusion Detection Systems
•Handwriting Recognition

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy