0% found this document useful (0 votes)

3 views106 pages

ML Unit 3

The document provides an overview of classification in machine learning, detailing its definition, types (binary, multi-class, and multi-label), and various algorithms such as K-Nearest Neighbor, Logistic Regression, Decision Trees, Naive Bayes, and Support Vector Machines. It outlines the steps involved in building a classification model, including problem identification, data preprocessing, algorithm selection, and evaluation. Additionally, it discusses the strengths and weaknesses of different classification algorithms and their applications in various fields.

Uploaded by

jallusowjanya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views106 pages

ML Unit 3

Uploaded by

jallusowjanya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 106

Unit-III

Classification
Topics to Be Covered
• What is Classification?
• General approach to Classification
• K-Nearest Neighbor Algorithm
• Logistic regression
• Decision Trees
• Naive Bayesian
• Support Vector Machine (SVM)
What is Classification?
• Classification is a supervised machine learning method where
the model tries to predict the correct label of a given input
data.

• In classification, the model is fully trained using the training

data, and then it is evaluated on test data before being used
to perform prediction on unseen data.

For Example, an algorithm can learn to predict whether

🡪 a given email is spam or ham (no spam)

🡪 a tumor is malignant or benign’

Classification model

🡪 A target feature is
categorical type

🡪 The target categorical

feature is known as
class .
Classification Terminologies In ML
• Classifier –A classifier is an algorithm that learns from labeled data and
assigns a category (or class) to new, unseen data.
• Classification Model –A classification model is the result you get after training
a classifier on your dataset.It’s the trained system that can predict or classify
new data based on what it learned during training.
• Feature –A feature is a single, measurable input that helps describe the data.
It’s one of the many pieces of information that the model uses to learn and
make decisions.
• Binary Classification – It is a type of classification with two outcomes, for eg –
either True or False / 1 or 0 / Yes or No

• Multi-Class Classification – The classification with more than two classes, in

multi-class classification each sample is assigned to one and only one label or
target.

• Multi-label Classification – This is a type of classification where each sample is

assigned to a set of labels or targets. (More than one label for a instance)
A model that tags an image with multiple categories like ["beach", "sunset",
"vacation"].
Binary Classification
Multi-Class Classification
Multi-label Classification
Classification Usecases
• Image classification
• Disease prediction
• Win–loss prediction of games
• Prediction of natural calamity such as earthquake, flood, etc.
• Handwriting recognition
• Document Classification
• Spam Filters
Classification Model Steps in ML

1. Problem Identification
2. Identification of Required Data
3. Data Pre-processing
4. Definition of Training Data Set
5. Algorithm Selection
6. Training
7. Evaluation with the Test Data Set
• Problem Identification: Identifying the problem is the first step in the
supervised learning model. The problem needs to be a well-formed
problem,i.e. a problem with well-defined goals and benefit, which has a
long-term impact.
• Identification of Required Data: On the basis of the problem identified
above, the required data set that precisely represents the identified
problem needs to be identified/evaluated. For example: If the problem
is to predict whether a tumour is malignant or benign, then the
corresponding patient data sets related to malignant tumour and
benign tumours are to be identified
• Data Pre-processing: This is related to the cleaning/transforming the
data set. This step ensures that all the unnecessary/irrelevant data
elements are removed. Data pre-processing refers to the transformations
applied to the identified data before feeding the same into the
algorithm. Because the data is gathered from different sources, it is
usually collected in a raw format and is not ready for immediate analysis.
This step ensures that the data is ready to be fed into the machine
learning algorithm
• Definition of Training Data Set: Before starting the analysis, the user
should decide what kind of data set is to be used as a training set. In the
case of signature analysis, for example, the training data set might be a
single handwritten alphabet, an entire handwritten word (i.e. a group of
the alphabets) or an entire line of handwriting (i.e. sentences or a group
of words). Thus, a set of ‘input meta-objects’ and corresponding ‘output
meta-objects’ are also gathered. The training set needs to be actively
representative of the real-world use of the given scenario. Thus, a set of
data input (X) and corresponding outputs (Y) is gathered either from
human experts or experiments
• Algorithm Selection: This involves determining the structure of the
learning function and the corresponding learning algorithm. This is the
most critical step of supervised learning model. On the basis of various
parameters, the best algorithm for a given problem is chosen
• Training: The learning algorithm identified in the previous step is
run on the gathered training set for further fine tuning. Some
supervised learning algorithms require the user to determine
specific control parameters (which are given as inputs to the
algorithm). These parameters (inputs given to algorithm) may
also be adjusted by optimizing performance on a subset (called
as validation set) of the training set
• Evaluation with the Test Data Set: Training data is run on the
algorithm, and its performance is measured here. If a suitable
result is not obtained, further training of parameters may be
required
Algorithms for Classification
• k nearest neighbour
• Logistic regression
• Decision tree
• support vector machine
• Naive bayes
• Random forest
How Does KNN work?

• Suppose we have height, weight and T-shirt

size of some customers and we need to
predict the T-shirt size of a new customer
given only height and weight information we
have. Data including height, weight and T-
shirt size information is shown below

If a customer has height 161cm and weight

61kg then what would be his T-Shirt size?
How Does KNN work?
• Step 1 : Calculate Similarity based on distance function
Another Problem
KNN
• Supervised learning (used for classification).
• Basic Idea: Classifies a new data point based on the majority label of its k-
nearest neighbors.
• Lazy Learning: Does not train a model; stores the training data and uses it
directly during prediction.
• Inspiration: “Similar things exist in close proximity” — neighbors tend to share the
same class.
• In kNN algorithm, the unknown and unlabelled data which comes for a prediction
problem is judged on the basis of the training data set elements which are similar to
the unknown element. So, the class label of the unknown element is assigned on
the basis of the class labels of the similar training data set elements
• Here, it is not possible during model testing to know the actual label value of an
unknown data. Therefore, the test data, which is a part of the labelled input data, is
used for this purpose. If the class value predicted for most of the test data elements
matches with the actual class value that they have, then we say that the
classification model possesses a good accuracy.
• though there are many measures of similarity, the most
common approach adopted by kNN to measure similarity
between two data elements is Euclidean distance. Considering
a very simple data set having two features (say f1 and f2 ),
Euclidean distance between two data elements d1 and d2 can
be measured by

• Where f11=value of feature f1 for data element d1

f12= value of feature f1 for data element d2
f21= value of feature f2 for data element d1
F22= value of feature f2 for data element d2
• how many similar elements should be considered. The answer lies in the value
of ‘k’ which is a user-defined parameter given as an input to the algorithm. In
the kNN algorithm, the value of ‘k’ indicates the number of neighbours that
need to be considered. For example, if the value of k is 3, only three nearest
neighbours or three training data elements closest to the test data element are
considered. Out of the three data elements, the class which is predominant is
considered as the class label to be assigned to the test data. In case the value
of k is 1, only the closest training data element is considered. The class label of
that data element is directly assigned to the test data element.

But it is often a tricky decision to decide the value of k. The reasons are as follows:

❖ If the value of k is very large (in the extreme case equal to the total number of
records in the training data), the class label of the majority class of the training
data set will be assigned to the test data regardless of the class labels of the
neighbours nearest to the test data.

❖ If the value of k is very small (in the extreme case equal to 1), the class value of
a noisy data or outlier in the training data set which is the nearest neighbour to
the test data will be assigned to the test data.
The best k value is somewhere between these two extremes. Few strategies,
highlighted below, are adopted by machine learning practitioners to arrive at a
value for k.
• One common practice is to set k equal to the square root of the number of
training records.
• An alternative approach is to test several k values on a variety of test data sets
and choose the one that delivers the best performance.
• Another interesting approach is to choose a larger value of k, but apply a
weighted voting process in which the vote of close neighbours is considered
more influential than the vote of distant neighbours
Why kNN is a Lazy Learner
• kNN is called a lazy learner because:
• It does not abstract or generalize any model from the training data.
• Instead, it stores the data and uses it at prediction time.
• There is no explicit training phase or model built.
.How kNN Algorithm Works
1. Input Required:
1. A training dataset with input features and labeled output.
2. A test data point for which the class is to be predicted.
3. A value of ‘k’, which defines the number of neighbors to
consider.
2. Process:
1. Calculate the distance (usually Euclidean) between the test
point and all points in the training dataset.
2. Identify the k closest (smallest distance) training points.
3. Perform majority voting among those k neighbors to assign
the class label to the test point.
Strengths of kNN
• Simple to implement and understand.
• No training time required.
• Works well in recommender systems and some classification tasks.
• Adapts naturally to multi-class problems.
Weaknesses of kNN
• No real learning happens – relies completely on training data.
• Slow prediction time, especially with large datasets.
• High memory usage as the entire training data must be stored.
• Performance degrades if irrelevant features or unscaled features are
used.
Applications of kNN
1. Recommender Systems:
1. Suggest items (movies, products) based on what similar users liked.

2. Information Retrieval:
1. Find documents or articles similar to a given query.

3. Pattern Recognition:
1. Handwriting, face, or voice recognition based on closest match.

4. Medical Diagnosis:
1. Classify patient data based on past patient records.
Metrics to Evaluate ML Classification Algorithms
Metrics to Evaluate ML Classification Algorithms
• True positives: The number of positive observations the model
correctly predicted as positive.

• False-positive: The number of negative observations the model

incorrectly predicted as positive.

• True negative: The number of negative observations the model

correctly predicted as negative.

• False-negative: The number of positive observations the model

incorrectly predicted as negative.
Metrics to Evaluate ML Classification Algorithms
Metrics to Evaluate ML Classification Algorithms
Decision tree
• Decision tree learning is one of the most widely adopted algorithms for classification. As the name
indicates, it builds a model in the form of a tree structure

• A decision tree is used for multi-dimensional analysis with multiple classes.

• The goal of decision tree learning is to create a model that predicts the value of the output variable
based on the input variables in the feature vector.

• Every decision node picks a feature to split the data. The branches represent possible values (or value
ranges) of that feature. This process continues until we reach a leaf node, which gives the final prediction.

• The tree terminates at different leaf nodes (or terminal nodes) where each leaf node represents a
possible value for the output variable

• The output variable is determined by following a path that starts at the root and is guided by the
values of the input variables.
Decision tree
• A decision tree is usually represented in the format depicted in Figure 7.8.

A decision tree consists

of three types of nodes:
1. Root Node
2. Branch Node
3. Leaf Node

• Each internal node (represented by boxes) tests an attribute (represented as ‘A’/‘B’

within the boxes). Each branch corresponds to an attribute value (T/F) in the above
case. Each leaf node assigns a classification. The first node is called as ‘Root’ Node.
Branches from the root node are called as ‘Leaf’ Nodes where ‘A’ is the Root Node
(first node). ‘B’ is the Branch Node. ‘T’ & ‘F’ are Leaf Nodes.
Decision Tree Example
A decision tree has 3 types of nodes:
1. Root Node
1. The topmost node of the tree.
2. Represents the first feature used to split the data.

2. Branch Node (Internal Node)

1. Represents decisions (splits) based on feature values.
2. Can have multiple child nodes.

3. Leaf Node (Terminal Node)

1. Represents the final outcome (e.g., classification result like “Buy” or “Don’t
Buy”).
Decision tree
• There are many implementations of decision tree, the most prominent ones
being C5.0, CART (Classification and Regression Tree), CHAID (Chi-square
Automatic Interaction Detector) and ID3 (Iterative Dichotomiser 3) algorithms.

• In the process of building a decision tree, the algorithm keeps splitting the
dataset into smaller groups (partitions) based on certain feature values.
Now, after a split happens:
→If each partition contains data from only one class (i.e., only “Yes”
or only “No”),
→Then we say the split has resulted in pure partitions.

• Entropy is a measure (Randomness) of impurity of an attribute or feature

adopted by many algorithms such as ID3 and C5.0.

• Let us say S is the sample set of training examples. Then, Entropy (S) measuring
the impurity of S is defined as

where c is the number of different class labels and p refers to the

proportion of values falling into the i-th class label.
Decision Tree
Decision Tree Algorithm
1. Select the best attribute using Attribute Selection Measures(ASM) to
split the records.

2. Make that attribute a decision node and breaks the dataset into
smaller subsets.

3. Starts tree building by repeating this process recursively for each

child until one of the conditions match:

• All the tuples belong to the same attribute value.

• There are no more remaining attributes.

• There are no more instances.

1. Start with the full dataset as the root.

2. If all examples belong to the same class, return a leaf node with that class.

3. If the feature set is empty, return a leaf node with the majority class.

4. Otherwise:

5. a. Calculate the Entropy of the current dataset.

6. b. For each attribute:

– Calculate the Information Gain after splitting on that attribute.
7. c. Choose the attribute with the highest Information Gain as the decision node.

8. Split the dataset based on the selected attribute’s values.

9. Recursively apply the algorithm on each subset (child node).

10. Stop when:

– All attributes are used up
– The data is perfectly classified
– Or some other stopping condition is met
Strengths of decision tree
1. It produces very simple understandable rules. For smaller trees, not
much mathematical and computational knowledge is required to
understand this model.

2. Works well for most of the problems.

3. It can handle both numerical and categorical variables.

4. Can work well both with small and large training data sets.

5. Decision trees provide a definite clue of which features are more

useful for classification.
Weaknesses of decision tree
1. Decision tree models are often biased towards features having more number of possible values, i.e. levels.

2. This model gets overfitted or underfitted quite easily.

3. Decision trees are prone to errors in classification problems with many classes and relatively small number of
training examples.

4. A decision tree can be computationally expensive to train.

5. Large trees are complex to understand.

Real-World Applications:

• Student performance prediction

• Loan approval systems

• Medical diagnosis

• Customer churn analysis

• Fraud detection
Logistic Regression (Binary/Discrete classification)
• Logistic regression is both classification and regression technique depending on the scenario
used.

• Logistic Regression is a Machine Learning classification algorithm that is used to predict the
probability of a categorical dependent variable.

• In logistic regression, the dependent variable is a binary variable that contains data coded as
1 (yes, success, etc.) or 0 (no, failure, etc.).

• In other words, the logistic regression model predicts P(Y=1) as a function of X.

• In the logistic regression model, a chi-square test is used to measure how well the logistic
regression model fits the data.

• The goal of logistic regression is to predict the likelihood that Y is equal to 1 given certain
values of X.
Logistic regression is a statistical method used when the output variable is categorical (like
Yes/No, True/False, or 1/0). It tells you the probability of a certain event happening.

• It’s used for classification (mostly binary).

• The output (dependent variable Y) is binary (0 or 1).

• The input variables (independent variables X) are continuous or categorical

Instead of drawing a straight line (like in Linear Regression), Logistic Regression draws an S-
shaped curve (called a sigmoid or logistic curve), which fits the probability of Y = 1 given
X.

This curve ensures:

• Output probabilities are always between 0 and 1.

• As X increases, the probability gradually approaches 1.

• As X decreases, the probability gradually approaches 0.

Logistic Regression
• For example, we might try to predict whether or not a small project will
succeed or fail on the basis of the number of years of experience of
the project manager handling the project.

• To illustrate this, it is convenient to segregate years of experience of

project managers into categories (i.e. 0–8, 9–16, 17–24, 25–32, 33–40). If
we compute the mean score on Y (averaging the 0s and 1s) for each
category of years of experience, we will get something like
Logistic Regression
• An explanation of logistic regression begins with an explanation of the logistic
function, which always takes values between zero and one. The logistic
formulae are stated in terms of the probability that Y = 1, which is referred to as
P. The probability that Y is 0 is 1 − P.
• Probability (P) can also be computed from the regression equation.

• So, if we know the regression equation, we could, theoretically,

calculate the expected probability that Y = 1 for a given value of X.

•
• Let us say we have a model that can predict whether a person is
male or female on the basis of their height.

• Given a height of 150 cm, we need to predict whether the

person is male or female.
Naïve Bayes Classifier
• Naïve Bayes is a probabilistic machine learning algorithm based on
the Bayes Theorem, used in a wide variety of classification tasks.

• It is a classification technique based on Bayes’ Theorem with an

independence assumption among predictors. In simple terms, a Naive
Bayes classifier assumes that the presence of a particular feature in a
class is unrelated to the presence of any other feature.

• The Naive Bayes classifier works on the principle of conditional

probability, as given by the Bayes theorem.
•Definition: Naive Bayes is a classification algorithm based on Bayes’
Theorem, which calculates the probability of a class given a set of
features.
•Naivety Assumption: It assumes that all features are independent of
each other given the class label. That’s why it's called "naive".
•Bayes’ Theorem:
P(C∣X)=P(X∣C)⋅P(C)/P(X)
•P(C∣X) = Posterior probability of class C given predictor X
•P(X∣C) = Likelihood
•P(C) = Prior probability of class
•P(X) = Marginal probability of predictor
• Bayes’ Theorem is used to update prior probabilities into posterior
probabilities using new data.
Key Features:
• Requires very little training data to estimate model parameters.
• Only means and variances of features are needed, not the
entire covariance matrix.
• Assumes all features are equally important and independent.
• Despite this oversimplified assumption, it often performs
surprisingly well in practice.
Naïve Bayes Classifier
• Bayes’ Theorem is a simple mathematical formula used for calculating
conditional probabilities.
• Conditional probability is a measure of the probability of an event
occurring given that another event has occurred.
• The formula is: —
Which tells us: how often A
happens given that B happens,
written P(A|B) also called posterior
probability, When we know: how
often B happens given that A
happens, written P(B|A) and how
likely A is on its own,
written P(A) and how likely B is on its
own, written P(B).
• Here in our dataset, we need to classify whether the car is stolen,
given the features of the car.
• According to this example, Bayes theorem can be rewritten as:

• The variable y is the class variable(stolen?), which represents if the car is

stolen or not given the conditions. Variable X represents the
parameters/features.
• X is given as,

Here x1, x2…, xn represent the features, i.e they can be mapped to Color,
Type, and Origin.
• By substituting for X and expanding using the chain rule we get,

• For all entries in the dataset, the denominator does not change, it
remains static. Therefore, the denominator can be removed and
proportionality can be injected.
• In our case, the class variable(y) has only two outcomes, yes or no. There could
be cases where the classification could be multivariate. Therefore, we have to
find the class variable(y) with maximum probability.

• Using the above function, we can obtain the class, given the predictors /
features.
• Step 1: First construct a frequency table. A frequency table is drawn
for each attribute against the target outcome

• Step 2: Create Likelihood table by finding the probabilities

• Step 3: Now, use Naive Bayesian equation to calculate the posterior

probability for each class. The class with the highest posterior
probability is the outcome of the prediction.
Step1&2

• Frequency and Likelihood tables of ‘Color’

Step1&2

• Frequency and Likelihood tables of ‘Type’

Step1&2

• Frequency and Likelihood tables of ‘Origin’

• As per the equations discussed above, we can calculate the
posterior probability P(Yes | X) as :
P( No | X ):

• Since 0.144 > 0.048, Which means given the features RED SUV and
Domestic, our example gets classified as ’NO’ the car is not stolen.
Applications of Naive Bayes Algorithms
• Real-time Prediction

• Multi-class Prediction

• Text classification/ Spam Filtering/ Sentiment Analysis

• Recommendation System
Support Vector Machine (SVM)
• A Support Vector Machine (SVM) is a supervised machine learning algorithm
that can be employed for both classification and regression purposes.

• SVMs are more commonly used in classification problems

• It uses the non linear mapping to transform the original training data into higher
dimension

• SVM is based on the concept of a surface, called a hyperplane, which draws a

boundary between data instances plotted in the multi-dimensional feature space.

• the SVM algorithm builds an N-dimensional hyperplane model that assigns

future instances into one of the two possible output classes.
Support Vector Machine (SVM)
• In SVM, a model is built to discriminate the data instances belonging to
different classes.

• Let us assume for the sake of simplicity that the data instances are
linearly separable. In this case, when mapped in a two dimensional
space

• The data instances belonging to different classes fall in different sides of

a straight line drawn in the two-dimensional space as depicted in Figure
7.15a.

• If the same concept is extended to a multidimensional feature space,

the straight line dividing data instances belonging to different classes
transforms to a hyperplane as depicted in Figure 7.15b.
Support Vector Machine (SVM)
• An SVM model is a representation of the input instances as points in the feature space,
which are mapped so that an apparent gap between them divides the instances of
the separate classes.

• In other words, the goal of the SVM analysis is to find a plane, or rather a hyperplane,
which separates the instances on the basis of their classes.

• New examples (i.e. new instances) are then mapped into that same space and
predicted to belong to a class on the basis of which side of the gap the new instance
will fall on.

• In summary, in the overall training process, the SVM algorithm analyses input data and
identifies a surface in the multi-dimensional feature space called the hyperplane
Support Vector Machine (SVM)
Hyperplane:

• In a p-dimensional space, a hyperplane is a flat affine subspace of

dimension p − 1.

• In two dimensions, a hyperplane is a flat one-dimensional subspace - in

other words, a line.

• In three dimensions, a hyperplane is a flat two-dimensional subspace -

that is, a plane

• In p > 3 dimensions, it can be hard to visualize a hyperplane, but the

notion of a (p − 1)-dimensional flat subspace still applies
Hyperplane:
Hyperplane:
A Point

Is on Hyperplane if

Is on one side of hyperplane if

Is on other side of hyperplane if

Classification using Hyperplane
n×p data matrix X X 1 2 … p
1
n training observations in p-dimensional
2
…
n

These observations fall into two classes

y1, . . . , yn ∈{−1, 1} where −1 represents one class and 1 the other class.
test observation
Our goal is to develop a classifier based on the training data that will
correctly classify the test observation using its feature measurements.
Support Vector Machine (SVM)
• The operation of the SVM algorithm is based on finding the hyperplane
that gives the largest minimum distance to the training examples.

what if the data points are not linearly

seperable? In that case how do we find
the optimal hyperplane.
Classification using Hyperplane

A new approach based on Separating Hyperplane

Various approaches like Logistic Regression, etc.

Separating Hyperplane

If f(x*) is +ve, class = 1

If f(x*) is -ve, class = -1
If f(x*) is far from 0, confident
on class.
If f(x*) is near to 0, less
confident on class.
3 hyperplanes on classes x1 and x2 data
A classifier that is based on a separating hyperplane leads to a linear decision boundary.
Maximal Margin Classifier
Maximal margin hyperplane also known as the optimal separating
hyperplane It’s a classification technique that tries to find the best boundary (a
hyperplane) to separate two classes.
This boundary is called the maximal margin hyperplane or optimal
separating hyperplane.
❑Compute the (perpendicular) distance from each
training observation to a given separating hyperplane
❑The smallest distance is the minimal distance from
the hyperplane, and is known as the margin
❑Find Minimal distance for all hyperplanes.
❑The maximal margin hyperplane is the separating
hyperplane for which the margin is largest

❑classify a test observation based on which side of

the maximal margin hyperplane it lies.
❑This is known as the maximal margin classifier.
Support Vector Classifier
Observations of classes are not be separable by a hyperplane.

Even if a separating hyperplane does exist, some instances in which a classifier might
not be desirable.

A classifier based on a separating hyperplane will necessarily perfectly classify all of the
training observations

Tiny
Margin
Support Vector Machine (SVM)
• Support Vectors: Support vectors are the data points (representing
classes), the critical component in a data set, which are near the
identified set of lines (hyperplane). If support vectors are removed, they
will alter the position of the dividing hyperplane.

• Hyperplane and Margin: For an N-dimensional feature space,

hyperplane is a flat subspace of dimension (N−1) that separates and
classifies a set of data. For example, if we consider a two-dimensional
feature space (which is nothing but a data set having two features and
a class variable), a hyperplane will be a one-dimensional subspace or a
straight line. In the same way, for a threedimensional feature space
(data set having three features and a class variable), hyperplane is a
two-dimensional subspace or a simple plane.
• Margin A margin is a gap between the two lines on the closest class points.
This is calculated as the perpendicular distance from the line to support
vectors or closest points. If the margin is larger in between the classes, then it
is considered a good margin, a smaller margin is a bad margin.
SVM Kernels
• The SVM algorithm is implemented in practice using a kernel.
• A kernel transforms an input data space into the required form.
• SVM uses a technique called the kernel trick.
• Here, the kernel takes a low-dimensional input space and transforms it into a
higher dimensional space.
• In other words, you can say that it converts non separable problem to
separable problems by adding more dimension to it.
• It is most useful in non-linear separation problem. Kernel trick helps you to
build a more accurate classifier
• Linear Kernel : A linear kernel can be used as normal dot
product any two given observations.
• Polynomial Kernel : A polynomial kernel is a more generalized
form of the linear kernel. The polynomial kernel can distinguish
curved or nonlinear input space.
• Radial Basis Function Kernel : The Radial basis function kernel is
a popular kernel function commonly used in support vector
machine classification. RBF can map an input space in infinite
dimensional space.
Strengths of SVM
• SVM can be used for both classification and regression.
• It is robust, i.e. not much impacted by data with noise or outliers.
• The prediction results using this model are very promising
Weaknesses of SVM
• SVM is applicable only for binary classification, i.e. when there are only two
classes in the problem domain.
• The SVM model is very complex – almost like a black box when it deals with a
high-dimensional data set. Hence, it is very difficult and close to impossible
to understand the model in such cases.
• It is slow for a large dataset, i.e. a data set with either a large number of
features or a large number of instances.
• It is quite memory-intensive
APPLICATIONS
•Bioinformatics and Medical Diagnosis
•Face Detection
•Text Classification
•Image Recognition
•Speech Recognition
•Banking and Finance
•Intrusion Detection Systems
•Handwriting Recognition

MachineLearning Unit-III
No ratings yet
MachineLearning Unit-III
26 pages
ML UNIT - III-Complete
No ratings yet
ML UNIT - III-Complete
52 pages
Unit 3 (Classification)
No ratings yet
Unit 3 (Classification)
12 pages
Unit 5
No ratings yet
Unit 5
73 pages
Classification
No ratings yet
Classification
58 pages
ML 6
No ratings yet
ML 6
26 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
Machine Learning and Web Scraping Lecture 03
No ratings yet
Machine Learning and Web Scraping Lecture 03
22 pages
ML Unit 2 (Ab22)
No ratings yet
ML Unit 2 (Ab22)
61 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
12 pages
AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Unit 4 Supervised Learning
100% (1)
Unit 4 Supervised Learning
75 pages
ML 5
No ratings yet
ML 5
76 pages
FPA Unit 2
No ratings yet
FPA Unit 2
20 pages
Unit 3 - Supervise Learning Classification
No ratings yet
Unit 3 - Supervise Learning Classification
23 pages
Unit-4 Unsupervised Algorithm
No ratings yet
Unit-4 Unsupervised Algorithm
18 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Chapter 7 Supervised Learning Classification
No ratings yet
Chapter 7 Supervised Learning Classification
28 pages
12 - 23ECE216 - Nearest Neighbors
No ratings yet
12 - 23ECE216 - Nearest Neighbors
29 pages
Classification
No ratings yet
Classification
53 pages
Lecture Slides#7
No ratings yet
Lecture Slides#7
21 pages
Asset-V1 ColumbiaX+CSMM.101x+1T2017+type@asset+block@AI Edx ML 5.1intro
No ratings yet
Asset-V1 ColumbiaX+CSMM.101x+1T2017+type@asset+block@AI Edx ML 5.1intro
70 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Statistic Inference Unit 2 Notes
No ratings yet
Statistic Inference Unit 2 Notes
34 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
32 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Unit 4
No ratings yet
Unit 4
24 pages
Algorithm
No ratings yet
Algorithm
27 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
130 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
1 - KNN
No ratings yet
1 - KNN
19 pages
ML 03 Classification
No ratings yet
ML 03 Classification
15 pages
Unit 5
No ratings yet
Unit 5
28 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
ML Lec-10
No ratings yet
ML Lec-10
19 pages
Topics in Module-3-: ML & Cloud Computing For Iot
No ratings yet
Topics in Module-3-: ML & Cloud Computing For Iot
149 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Lecture 2 Final
No ratings yet
Lecture 2 Final
90 pages
Lecture7 KNN
No ratings yet
Lecture7 KNN
40 pages
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
No ratings yet
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
93 pages
CH 4
No ratings yet
CH 4
106 pages
ch-4 FML
No ratings yet
ch-4 FML
13 pages
ML Unit 5..
No ratings yet
ML Unit 5..
40 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
19 pages
Cse Vsem 503 B PR Unit 2 Notes
No ratings yet
Cse Vsem 503 B PR Unit 2 Notes
17 pages
ML 3RD Unit
No ratings yet
ML 3RD Unit
67 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Analog & Digital Control Systems
No ratings yet
Analog & Digital Control Systems
3 pages
BJMC - 04 (Answer)
No ratings yet
BJMC - 04 (Answer)
2 pages
Lyceum of Alabang Basic Education
No ratings yet
Lyceum of Alabang Basic Education
41 pages
Assignment/ Tugasan - Elementary Data Analysis
No ratings yet
Assignment/ Tugasan - Elementary Data Analysis
8 pages
Part 1 Functions Equations and Their Graphs
No ratings yet
Part 1 Functions Equations and Their Graphs
30 pages
A Primer of Celestial Navigation, Favil
100% (1)
A Primer of Celestial Navigation, Favil
288 pages
Placement With MCTS
No ratings yet
Placement With MCTS
15 pages
X Viber Balancing Method
No ratings yet
X Viber Balancing Method
8 pages
7.93 - Lecture #4 - and - Multiple Sequence Alignment: More Pairwise Sequence Comparisons
No ratings yet
7.93 - Lecture #4 - and - Multiple Sequence Alignment: More Pairwise Sequence Comparisons
44 pages
Planimeters
No ratings yet
Planimeters
13 pages
9 First-Order Circuits Noted
No ratings yet
9 First-Order Circuits Noted
67 pages
Get Signals and Systems Principles and Applications 1st Edition Shaila Dinkar Apte Free All Chapters
No ratings yet
Get Signals and Systems Principles and Applications 1st Edition Shaila Dinkar Apte Free All Chapters
55 pages
Torquimetro
No ratings yet
Torquimetro
12 pages
10 General Aptitude - GQB (Ddpanda)
No ratings yet
10 General Aptitude - GQB (Ddpanda)
71 pages
InternshipWork RoselbaCanelon ARCS2007
No ratings yet
InternshipWork RoselbaCanelon ARCS2007
106 pages
The 10 Minute Talk
No ratings yet
The 10 Minute Talk
11 pages
11 Phy DPP 32
No ratings yet
11 Phy DPP 32
4 pages
Matrices Notes Part 2 2024 SOLUTIONS
No ratings yet
Matrices Notes Part 2 2024 SOLUTIONS
13 pages
(Lecture Notes in Management and Industrial Engineering) Fethi Calisir Emr
100% (2)
(Lecture Notes in Management and Industrial Engineering) Fethi Calisir Emr
500 pages
Jain College, Jayanagar II PUC Mock Paper I 2018 Mathematics Duration: 3hr 15 Min Max - Marks: 100 Part A I. Answer All The Questions: 1 × 10 10
No ratings yet
Jain College, Jayanagar II PUC Mock Paper I 2018 Mathematics Duration: 3hr 15 Min Max - Marks: 100 Part A I. Answer All The Questions: 1 × 10 10
3 pages
AlexNet PDF
No ratings yet
AlexNet PDF
9 pages
Lecture - Part 1 - Activity Scheduling and Gantt Charts
No ratings yet
Lecture - Part 1 - Activity Scheduling and Gantt Charts
28 pages
Design and Analysis of Algorithms CSC 321 Lecture 3 29092022 032607pm
No ratings yet
Design and Analysis of Algorithms CSC 321 Lecture 3 29092022 032607pm
49 pages
02 Chapter 3 - Weight Volume Relationships
No ratings yet
02 Chapter 3 - Weight Volume Relationships
42 pages
Handwriting Enhancement Recognition-Based and Recognition-Independent Approaches For On-Device Online Handwritten Text Alignment
No ratings yet
Handwriting Enhancement Recognition-Based and Recognition-Independent Approaches For On-Device Online Handwritten Text Alignment
15 pages
Design Optimization of Solid Propellant Rocket Motor Pavel Konečný, Vojtěch Hrubý, Zdeněk Křižan
No ratings yet
Design Optimization of Solid Propellant Rocket Motor Pavel Konečný, Vojtěch Hrubý, Zdeněk Křižan
8 pages
Co5124.Sp52.Assignment1 Ngo Chi Nguyen 12528511 in
No ratings yet
Co5124.Sp52.Assignment1 Ngo Chi Nguyen 12528511 in
15 pages
Parameter Passing and Scoping
No ratings yet
Parameter Passing and Scoping
6 pages
Applied III Chapter 1
No ratings yet
Applied III Chapter 1
51 pages
Women in Mathematics
No ratings yet
Women in Mathematics
330 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.