0% found this document useful (0 votes)
6 views26 pages

MLP IA1

Machine learning is a subset of artificial intelligence that enables systems to learn from data and improve over time without explicit programming. It has various applications, advantages such as automation and efficiency, and disadvantages including difficulty in error identification and data acquisition challenges. The document also covers the history of machine learning, the modeling process, supervised and unsupervised learning, and various algorithms used in these processes.

Uploaded by

saurabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views26 pages

MLP IA1

Machine learning is a subset of artificial intelligence that enables systems to learn from data and improve over time without explicit programming. It has various applications, advantages such as automation and efficiency, and disadvantages including difficulty in error identification and data acquisition challenges. The document also covers the history of machine learning, the modeling process, supervised and unsupervised learning, and various algorithms used in these processes.

Uploaded by

saurabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIT 1

1.ALL ABOUT MACHINE LEARNING?


->
- A subset of AI that includes complex statistical techniques that enable
machines to improve at tasks with experience.
- Application of arti cial intelligence that provides systems the ability to
automatically learn and improve from experience without being explicitly
programmed.
- Application of arti cial intelligence that involves algorithms and data that
automatically analyse and make decision by itself without human
intervention.
- It describes how computer perform tasks on their own by previous
experiences.
- Therefore can say in machine language, arti cial intelligence is generated
on the basis of experience.
- Machine learning focuses on the development of computer programs that
can access data and use it to learn for themselves.
- machine learning is a sub eld of arti cial intelligence and is closely
related to applied mathematics and statistics.

• WHEN MACHINE LEARNING IS USED:


- Human expertise does not exist (navigating on Mars)
- Humans can’t explain their expertise (speech recognition)
- Models must be customized (personalized medicine)
- Models are based on huge amounts of data (genomics)

• ADVANTAGES OF MACHINE LEARNING:


- Fast, Accurate, Ef cient.
- Automation of most applications.
- Wide range of real life applications.
- Enhanced cyber security and spam detection.
- No human Intervention is needed.
- Handling multi dimensional data.

1
fi
fi
fi
fi
fi
fi
• DISADVANTAGES OF MACHINE LEARNING:
- It is very dif cult to identify and rectify the errors.
- Data Acquisition.
- Interpretation of results requires more time and space.

• APPLICATIONS OF MACHINE LEARNING:


- Analyzing Sales Data: Streamlining the data
- Real-Time Mobile Personalization: Promoting the experience
- Fraud Detection: Detecting pattern changes
- Product Recommendations: Customer personalization.
- Learning Management Systems: Decision-making programs
- Dynamic Pricing: Flexible pricing based on a need or demand
- Natural Language Processing: Speaking with humans
- Occasionally data scientists build a model (an abstraction of reality)
that provides insight to the underlying processes of a phenomenon.
When the goal of a model isn’t prediction but interpretation, it’s
called root cause analysis.

2.HISTORY OF MACHINE LEARNING?


->
2
fi
NAME YEAR INVENTION
Warren Sturgis Brain theories, wrote a
1943
McCulloch paper about neurons
Created model using
Walter Pitts 1943 electrical circuit and
gave birth to neural
networks
Turing test: able to
convince a human that it
Alan turing 1950
is a human and not a
computer.
rst Machine learning
Arthur samuel 1952 program.: the game of
checkers
First neural network for
Frank rosenblatt 1957
computers.
First neural network
Frank rosenblatt 1958
called perceptron
ADELINE, and it could
detect binary patterns
Bernard Widrow and
1959 MADELINE, and it could
Marcian Hoff
eliminate echo on phone
lines
Nearest neighbour:
N/A 1967 mapping route for
traveling salesmen.
Explanation Based
Learning: allowed a
Gerald Dejong 1981 computer to analyze
training data and create
a general rule.
First research group of
machine learning,
AT&T 1985
Neural information
processing systems.

3
fi
NAME YEAR INVENTION
automated speech
Jay Wilson and AT&T 1992 recognition using hidden
markov model
AT&T 1992 Support vector machine
First convolutional neural
Patrick Haffner 1996 network, rebranded as
deep learning
Adaboost algorithm. This
algorithm allowed
AT&T 1997 unstructured data to be
handled through decision
trees,
natural language
understanding in
AT&T 2001
Interactive Voice
Response (IVR) systems
Group of researchers 2011 deep neural networks

4
3.DIFFERENCE BETWEEN DATA MINING AND MACHINE
LEARNING?
->

4. EXPLAIN THE MODELLING PROCESS OF MACHINE


LEARNING.
->
• Chain or combine multiple techniques:
- When chain multiple models, the output of the rst model becomes
an input for the second model.
- When combine multiple models, train them independently and
combine their results and it is known as ensemble learning.
- A model consists of constructs of information called features or
predictors and a target or response variable.
- The model’s goal is to predict the target variable.
- The best models are those that accurately represent reality,
preferably while staying concise and interpretable.
5
fi
• Feature engineering and model selection:
- create possible predictors for the model which recombines these
features to achieve its predictions.
- may need to consult an expert to come up with meaningful
features.
- Apply a transformation to an input before it becomes a good
predictor or to combine multiple inputs.
- Sometimes have to use modelling techniques to derive features.
- All this new information is then poured into the model that want to
build.
- One of the biggest mistakes in model construction is the availability
bias:
- The features are the model consequently represents this one-sided
“truth.”
- Models suffering from availability bias often fail when they’re
validated.
- When the initial features are created, a model can be trained to
the data.
• Training the model:
- With the right predictors in place and a modeling technique in mind,
can progress to model training.
- In this phase, present to the model, the data from which it can
learn.
- The most common modeling techniques have industry-ready
implementations in almost every programming language, including
Python.
- Data science techniques probably performs heavy mathematical
calculations and implementing them with modern computer science
techniques.
- Once a model is trained, it’s time to test whether it can be
generalized to reality: model validation.

6
• Model validation and selection:
- In Data Science, many modeling techniques exists and challenge is
to nd which one is the right one to use.
- A good model has two properties:
(a) good predictive power
(b) generalizes well to data it hasn’t seen.
- To achieve good predictive power:
• error measure:
- Two common error measures in ML are:
- The classi cation error rate(classi cation)
- the mean squared error(regression)
- lower CE rate is better.
- The MSE measures how big the average error of the
prediction is.
- Squaring the average error has two consequences:
i. can’t cancel out a wrong prediction in one direction with
a faulty prediction in the other direction.
ii. bigger errors get even more weight than they
otherwise would.
- Small errors remain small or can even shrink, whereas big
errors are enlarged and will de nitely draw your attention.
• validation strategy:
- Split up: Dividing the data into a training set.
- K-folds cross validation: It divides the data set into k parts
and uses each part one time as a test data set while using
the others as a training data set
- Leave-1 out: It is same as k-folds but with k=1 means leave
one observation out and train on the rest of the data.
- To achieve Regularization:
• incur a penalty for every extra variable used to construct the
model.
• With L1 regularization a model with as few important predictors as
possible for the model’s robustness.

7
fi
fi
fi
fi
• L2 regularization aims to keep the variance between the coef cients
of the predictors as small as possible.
- Overlapping variance between predictors makes it hard to make out
the actual impact of each predictor.
- To keep it simple regularization is mainly used to stop a model from
using too many features and prevent over- tting and Validation.
- It is extremely important because it determines whether the model
works in real-life conditions.
- Test the models on unseen data and make sure this data is a true
representation of what it would encounter when applied on fresh
observations by other people.
- Once constructed a good model, can use it to predict the future.
• Applying the trained model to unseen data:
- If implemented the rst three steps successfully, now have a
performant model that generalizes to unseen data.
- The process of applying the model to new data is called model scoring,
implicitly did during validation.
- Trust the model enough to use it for real.
- Model scoring involves two steps:
- i. prepare a data set that has features exactly as de ned by
the model.
- ii. Apply the model on this new data set, and this results in a
prediction.

5.ALL ABOUT SUPERVISED LEARNING?


->
- It is to teach/train the machine using data that is well labelled.
- The algorithm is able to create an output for an input it has never
seen before without any help from a Human.
- It is one of the most commonly used and successful types of
Machine learning.

8
fi
fi
fi
fi
- CLASSIFICATION:
• A classi cation problem is when the output variable is a category
• the goal of classi cation is to predict the class label, which is a
choice from a prede ned list of possibilities.
• Binary class/ Multi class classi cation
• KNN Algorithm(Distance Based Algorithm- used both for Classi cation
and Regression):
- KNN is a lazy learning, non-parametric algorithm.
- It uses data with several classes to predict the classi cation of
the new sample point.
- KNN is non-parametric since it doesn't make any assumptions on
the data being studied.
- It's called a lazy learner because it doesn't perform any training
when you supply the training data.
- Instead, it just stores the data during the training time and
doesn't perform any calculations.
- Most of the computation is done during consultation time.
- It defers the processing of examples till it receives an explicit
request for information.
• When KNN should be used??
- data should be labeled
- Data should be noise free
- Data set should be small
- Even though the Data set will be huge, a proper sample of the
data should be used.
• Naïve Bayes:
• It is called Naïve because as it assumes that the occurrence of
a certain feature is independent of the occurrence of other
features.
• Naïve Bayes assumes conditional independence over the training
dataset.

9
fi
fi
fi
fi
fi
fi
• The classi er separates data into different classes according to
the Bayes’ Theorem, but assumes that the relationship between
all input features in a class is independent. Hence, the model is
called naïve.
• It depends on the principle of Bayes' Theorem.
• Naive Bayes classi ers are a family of classi ers that are quite
similar to the linear models.
• They tend to be even faster in training.
• The reason that naive Bayes models are so ef cient is that they
learn parameters by looking at each feature individually and
collect simple per-class statistics from each feature.
• Decision Tree Algorithm(Decision Based Algorithm-used both for
Classi cation and Regression):
• It is a predictive modeling technique used in Classi cation, and
Prediction tasks.
• Uses a Divide and Conquer technique to split the search space
into subsets.
• Decision Trees are widely used for Classi cation and Regression.
• Decision Trees learn a hierarchy of if-else Questions leading to
a Decision.
• Root node: No incoming edges and zero or more outgoing edges.
• Internal node: Exactly one incoming edges and two or more
outgoing edges.
• Leaf node: Exactly one incoming edge and no outgoing edges.
• ADVANTAGES AND DISADVANTAGES:
- It is simple to understand as it follows the same process
which a human follow while making any decision in real-life.
- It can be very useful for solving decision-related problems.
- It helps to think about all the possible outcomes for a
problem.
- There is less requirement of data cleaning compared to other
algorithms.
• DISADVANTAGES:
- The decision tree contains lots of layers, which makes it
complex.

10
fi
fi
fi
fi
fi
fi
fi
- It may have an over tting issue.
- For more class labels, the computational complexity of the
decision tree may increase.
- REGRESSION:
• A regression problem is when the output variable is a real value
• Goal is to predict a continuous number/ oat value in programming
terms.
• It allows one to make predictions from data by learning the relationship
between features of your data and some observed, continuous-valued
response.
• The ultimate goal of the regression algorithm is to plot a best- t line or
a curve between the data.
• Linear Regression is an algorithm that belongs to supervised machine
learning.
• It is a study of the relationship of the variables(independent variable/s
and dependent variable(target variable)
• A real number ‘y’ has to be predicted for the given the value of x.
• It tries to apply relations that will predict the outcome of an event.
• The relation is usually a straight line that best ts the different data
points as close as possible.
• It is a concept based on Line, Plane and HyperPlanes.
• ADVANTAGES:
- Easy to Implement : is computationally simple and does not require
much engineering.
- Scalability: it can be applied to cases where scaling is needed, such as
applications that handle big data.
- Interpretability: Easy to interpret and very ef cient to train.
- Applicability in real-time :can be applied to scenarios where real-time
predictions are important.
• DISADVANTAGES:
- Assumes linearity between the dependent and independent variables,
which is rarely represented in real-world data.
- Assumes a straight-line relationship between the dependent and
independent variables, which is unlikely many times.
- It is prone to noise and over tting.
11
fi
fi
fl
fi
fi
fi
- It is not the good choice for the datasets where the number of
observations is lesser than the attributes as it can lead to over tting.
- Sensitive to outliers and hence it is essential to pre-process the
dataset and remove the outliers before applying Linear Regression to
the data.
- It does not assume any relationship between the independent
variables.
- Such needs to be removed using dimensionality reduction techniques
before applying Linear Regression.

7.ALL ABOUT UNSUPERVISED LEARNING?


->
- Unsupervised learning is the training of a machine using information that is
neither classi ed nor labeled and allowing the algorithm to act on that
information without guidance.
- It mainly deals with unlabelled data to nd unknown structures or trends.
- It allows the model to work on its own to discover patterns and information
that was previously undetected.
- Only the input data is known, and no known output data is given to the
algorithm.
- Here the task of the machine is to group unsorted information according to
similarities, patterns, and differences without any prior training of data.
- They nd patterns which were previously unknown
- Patterns help in categorization or nding Association
- They can detect anomalies in the data
- They work on unlabelled data which makes our work easier.

• Different types of UnSupervised learning:


- Clustering:
• A clustering problem is to discover the inherent groupings in the data.
• It can be de ned as the task of identifying subgroups in the data such
that data points in the same subgroup (cluster) are very similar while
data points in different clusters are very different.
• In other words, try to nd homogeneous subgroups within the data.
12
fi
fi
fi
fi
fi
fi
fi
• The decision of which similarity measure to use is application-speci c.
• Clustering analysis can be done on the basis of features where we try
to nd subgroups of samples based on features.
• Clustering is used in market segmentation.
• image segmentation/compression; where we try to group similar regions
together, document clustering based on topics, etc.
• clustering is the task of partitioning the dataset into groups, called
clusters.
• Goal is to split up the data in such a way that points within a single
cluster are very similar and points in different clusters are different.
• Clustering is the process of grouping of set of data objects into multiple
groups or clusters
• Similarly to classi cation algorithms.
• TYPES OF CLUSTERING:
• Partitioning Methods – The simplest and most fundamental version
of cluster analysis is partitioning, which organizes the objects of
a set into several exclusive groups or clusters. Ex: K-Means
• Hierarchical Methods – A hierarchical clustering method works by
grouping data objects into a hierarchy or tree of clusters. Ex:
Agglomerative Heirarchical Clustering./Divisive
• Density Based Methods - when the Clusters formed are of non-
spherical this kind of clustering method is used. Ex: DBSCAN –
Density Based Spatial Clustering of Applications with Noise).
• Association:
- An association rule learning problem is where you want to discover
rules that describe large portions of your data, such as people that
buy X also tend to buy Y.

13
fi
fi
fi
8.ALL ABOUT SEMI-SUPERVISED LEARNING?
->
• Semi-supervised learning is an approach to machine learning
• Combines a small amount of labeled data with a large amount of
unlabelled data during training.
• It falls between unsupervised learning and supervised learning.
• Generating labels may not be easy or cheap, and hence due to limited
resources, may have labels for only few observations.
• There’s considerable improvement in learning accuracy.
• The acquisition of labeled data for a learning problem often requires a
skilled agent.
• The cost associated with labelling the complete process data set may be
infeasible.
• It’s a class of Machine Learning Tasks and Techniques that also make use
of unlabelled data for training. –i.e., a small amount of labeled data + large
amount of unlabelled data
• It’s found that unlabelled data, when used in conjunction with a small
amount of labeled data, can produce considerable improvement in learning
accuracy.

9.ALL ABOUT REINFORCEMENT LEARNING?


->
- Reinforcement Learning is an area of machine learning concerned with
how software agents ought to take optimal actions in an environment in
order to achieve its goals.
- In Reinforcement Learning, the learning system (called an agent) observe
the environment, select and perform actions, and get rewards in return.
- It must then learn by itself what is the best strategy, called a policy, to
get the most reward over time.
- A policy de nes what action the agent should choose when it is in a given
situation.
- It is about taking suitable action to maximize reward in a particular
situation.
- In the absence of a training dataset, it is bound to learn from its
experience.
- For Ex., Self Driven Cars, Computers Learning Chess
14
fi
- It is about taking suitable action to maximize reward in a particular
situation.
- In the absence of a training dataset, it is bound to learn from its
experience.
- For Ex., Self Driven Cars, Computers Learning Chess.

10.APPLICATION OF MACHINE LEARNING?


->
• Machine learning in stock market investment:
- The primary objective of investing is to ensure that every person is able
to meet his or her future nancial objectives.
- To meet the price increases due to in ation, investments become
important.
• Colgate’s Smart Toothbrush:
- The data collected and analysed will give details about whether the user’s
brushing technique is a viable one and whether the individual has terrible
brushing techniques, and all this analysis will be delivered in real-time on
the Colgate application.
• How uber is driven by machine learning:
- Bridging the supply-demand gap:
• Based on historical data, Uber predicts the time and areas of demand.
- Reduction in ETA(Expected Arrival Time):
• The time wasted in road traf c is one of the most frustrating problems
in the urbanized areas.
- Route optimization:
• Conventional ride-hailing systems require the driver to make assumption-
based route choices.
- AI-based one-click chat:
• Riders tend to message drivers while they wait for the cab.
- Uber Pool:
• During rush hours, it is dif cult to make individual cabs available for
everyone.
• But ridesharing solves this problem by matching the riders heading in
the same direction.
15
fi
fi
fi
fl
• AI in Fashion with Smart Mirror
- For such smart mirrors, clothing racks are RFID enabled
- use gyro-sensors and Bluetooth low-energy chips allowing the articles
selected by shoppers automatically show up in the Smart Mirror.
• Banks are using machine learning to detect Fraud detection

11.WHAT IS Q-FUNCTION?
->
• The evaluation function Q(s, a) is the maximum discounted cumulative reward
value.
• It can be achieved starting from state s and applying action a as the rst
action.
• The Q value for each state-action transition equals the r value for this
transition plus the V* value for the resulting state discounted by gamma.
• If the agent learns the Q function instead of the V* function, it will be able
to select optimal actions even when it has no knowledge of the functions r
and gamma.
• It need only consider each available action a in its current state s and
choose the action that maximizes Q(s, a).
• One can choose globally optimal action sequences by reacting repeatedly to
the local values of Q for the current state.
• The agent can choose the optimal action without ever conducting a lookahead
search to explicitly consider what state results from the action.
• Summarizes in a single number all the information needed to determine the
discounted cumulative reward that will be gained in the future if action a is
selected in state s.
STEPS:
• Set the Gamma Parameter and Environment rewards in matrix R
• Initialize matrix Q to Zero
• Select a random initial state
• Set current state=initial state
• Select one among all the possible actions for the current state
• Using this possible action, consider going to the next state

16
fi
• Get Maximum Q value for this next state based on all the possible actions
• Compute: Q(s, a)=R(s, a)+Gamma * max[Q(next state, all actions)]
• Repeat above steps until current state=goal state.

12.RELATIONSHIP TO DYNAMIC PROGRAMMING?


->
• Q learning related dynamic programming approaches used to solve
Markov decision processes.
• agent possesses perfect knowledge of the functions that de ne the
agent's environment.
• primarily addressed how to compute the optimal policy using the least
computational effort
• assuming the environment could be perfectly simulated and no direct
interaction was required.
• The novel aspect of Q learning is it assumes the agent does not have
knowledge.
• primary concern is the number of real-world actions that the agent must
perform to converge to an acceptable policy, rather than the number of
computational cycles it must expend.
• In many practical domains such as manufacturing problems, the costs in
time and in dollars of performing actions in the external world dominate
the computational costs.
• Systems that learn by moving about the real environment and observing
the results are typically called online systems, whereas those that learn
solely by simulating actions within an internal model are called of ine
systems.

• Dynamic Programming is a mathematical optimization approach typically


used to improvise recursive algorithms.
• It basically involves simplifying a large problem into smaller sub-problems.
• Dynamic programming and Q-Learning are both Reinforcement Learning
algorithms developed to maximize a reward in a given environment.

17
fi
fl
• In contrast to that are classical machine learning methods, such as SVM or
Neural Networks, that have a given set of data and draw conclusions based
on them.
• Dynamic programming can be used to solve reinforcement learning
problems when someone tells us the structure of the MDP.
• Dynamic Programming is all about remembering answers to the sub-
problems you have solved to save time later.
• WHER DO WE NEED DYNAMIC PROGRAMMING?
• If you are given a problem which can be broken down into smaller
subproblems.
• smaller sub-problems can still be broken down into smaller
subproblems and also manage to nd out that there are some
overlapping sub problem.
• The problems wherein we come across optimal substructure &
overlapping subproblems.

18
fi
UNIT 2
13.WHAT IS SVM?
->
- A classi cation that has received considerable attention is support vector
machin
- This technique has its roots in statistical learning theory (Vlamidir Vapnik,
1992).
- As a task of classi cation, it searches for optimal hyperplaneseparating
the tuples of one class from another.
- SVM works well with higher dimensional data and thus avoids
dimensionality problem.
- Although the SVM based classi cation is extremely slow, the result, is
however highly accurate.
- SVM is less prone to over fitting than other methods.
- It also facilitates compact model for classi cation.
- SVMs, are supervised learning models that analyze data and used for
classi cation and regression analysis.
- Goal: To create the best line or decision boundary that can segregate n-
dimensional space into classes and the best decision boundary is called a
hyperplane.
- can easily predict the class of a new data.
- chooses the extreme points that help in creating the hyperplane are
called as support vectors, and hence algorithm is termed as Support
Vector Machine.
- The SVM learns how important each of the training data points is to
represent the decision boundary between the two classes.
- Typically only a subset of the training points matter for de ning the
decision boundary.
- To make a prediction for a new point, the distance to each of the support
vectors is measured.
- A classi cation decision is made based on the distances to the support
vector, and the importance of the support vectors that was learned during
training.

19
fi
fi
fi
fi
fi
fi
fi
14.PROS AND CONS OF SVM?
->
• PROS:
- KSVM are powerful models and perform well on a variety of datasets.
- work well on low-dimensional and high-dimensional data
- Effective on datasets with multiple features, like nancial or medical data.
- Effective in cases where number of features is greater than the number
of data points.
- Uses a subset of training points in the decision function called support
vectors which makes it memory ef cient.
- Different kernel functions can be speci ed for the decision function.
- You can use common kernels, but it's also possible to specify custom
kernels.

• CONS:
- require careful preprocessing of the data and tuning of the parameters.
- SVM models are hard to inspect;
- it can be dif cult to understand why a particular prediction was made,
and it might be tricky to explain the model to a non-expert.
- If the number of features is a lot bigger than the number of data points,
avoiding over- tting when choosing kernel functions and regularization
term is crucial.
- SVMs don't directly provide probability estimates. Those are calculated
using an expensive ve-fold cross-validation.
- Works best on small sample sets because of its high training time.

20
fi
fi
fi
fi
fi
fi
15.HYPERPLANE AND MAXIMUM MARGIN HYPERPLANE?
->
• Hyperplane:
- There can be multiple lines/decision boundaries to segregate the classes in
n-dimensional space, but need to nd out the best decision boundary that
helps to classify the data points and this best boundary is known as the
hyperplane of SVM.
- The dimensions of the hyperplane depend on the features present in the
dataset, which means for 2 features, the hyperplane will be a straight
line, or for 3 features, the hyperplane will be a 2-dimension plane.
- always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
- A decision boundary is a boundary which is parallel to hyperplane and
touches the closest class in one side of the hyperplane.
- The distance between the two decision boundaries of a hyperplane is
called the margin.
- The margin of hyperplane implies the error in classi er, ie. the larger the
margin, lower is the classi cation error.
- The classi er that contains hyperplane with a small margin are more
susceptible to model over tting and tend to classify with weak con dence
on unseen data.
- Thus during the training or learning phase, the approach would be to
search for the hyperplane with maximum margin, and such a hyperplane is
called maximum margin hyperplane(MMH).
- Also note the shortest distance from a hyperplane to one of its decision
boundary is equal to the shortest distance from the hyperplane to the
decision boundary at its other side.
- Alternatively, hyperplane is at the middle of its decision boundaries.

21
fi
fi
fi
fi
fi
fi
16.TYPES OF SVM?
->
• Linear SVM:
- used for linearly separable data.
- It means if a dataset can be classi ed into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classi er is used called as Linear SVM classi er.
- A SVM which is used to classify data which are linearly separable is
called linear SVM.
- A classi cation technique when training data are linearly separable,
- Ia linear SVM searches for a hyperplane with the maximum margin.
- This is why a linear SVM is often termed as a maximal margin classifier
(MMC).
- Linear SVM undoubtedly better to classify data if it is trained by linearly
separable data.
- Linear SVM also can be used for non-linearly separable data, provided
that number of such instances is less.
• Non-linear SVM:
- Non-Linear SVM is used for non-linearly separated data
- It means if a dataset cannot be classi ed by using a straight line, then
such data is termed as non-linear data and classi er used is called as
Non-linear SVM classi er.
- This can be achieved in two major steps:
• Transform the original input data into a higher dimensional space.
• This is feasible because SVM’s performance is decided by number of
support vectors not by the dimension of data.
• Search for the linear decision boundaries to separate the transformed
higher dimensional data as similar to linear SVM.
- To have a nonlinear SVM, the trick is to transform non-linear data into
higher dimensional linear data and this transformation is popularly called
non linear mapping or attribute transformation.

22
fi
fi
fi
fi
fi
fi
fi
17.NON-LINEAR TO LINEAR TRANSFORMATION:
ISSUES?
->
• The non linear mapping and hence a linear decision boundary concept looks
pretty simple, but there are many potential problems to do so.
• Mapping: How to choose the non linear mapping to a higher dimensional
space? In fact, the φ-transformation works ne for small example, but fails
for realistically sized problems.
• Cost of mapping: For n-dimensional input instances there exist different
monomials comprising a feature space of dimensionality
• Dimensionality problem: It may suffer from the curse of dimensionality
problem often associated with a high dimensional data.
• More speci cally, in the calculation of W.X or Xi .X (in δ(X ), need n
multiplications and n additions (in their dot products) for each of the n-
dimensional input instances and support vectors, respectively.
• As the number of input instances as well as support vectors are enormously
large, it is therefore, computationally expensive.
• Computational cost: Solving the quadratic constrained optimization problem in
the high dimensional feature space is too computationally expensive task.
• Fortunately, mathematicians have cleverly proposes an elegant solution to the
above problems and the solution consist of the following:i. Dual formulation of
optimization problem and ii. Kernel trick

23
fi
fi
18.EXPLAIN KSVM?
->
- The function of kernel is to take data as input and transform it into the
required form, that to map from lower dimension to higher dimension in
order to separate the data points linearly.
- The kernel trick , a clever mathematical trick that allows us to learn a
classi er in a higher-dimensional space without actually computing the
new, possibly very large representation.
- It directly computes the distance of the data points for the expanded
feature representation, without ever actually computing the expansion.
- Training data which are not linearly separable, can be transformed into a
higher dimensional feature space such that in higher dimensional
transformed space a hyperplane can be decided to separate the
transformed data and hence original data.
- The data on the left in the gure is not linearly separable, mapping it to a
3D space using φ.
- The mapped data, then it is possible to have a decision boundaries and
hence hyperplane in 3D space.

24
fi
fi
19.KERNEL FUNCTIONS?
->
- Different kernel functions follow different parameters and those parameters
are called magic parameters and to be decided a priori.
- Further, which kernels to be followed also depends on the pattern of data as
well as prudent of user.
- In general, polynomial kernels result in a large dot products, Gaussian RBF
produces more support vectors than other kernels.

25
20.CHARACTERISTICS AND APPLICATIONS OF SVM?
->
• CHARACTERISTICS:
- The SVM learning problem can be formulated as a convex optimization
problem, in which ef cient algorithms are available to nd the global
minimum of the objective function.
- SVM is the best suitable to classify both linear as well as non-linear
training data ef ciently.
- SVM can be applied to categorical data by introducing a suitable similarity
measures.
- Computational complexity is in uenced by number of training data not the
dimension of data.
- In fact, learning is a bit computationally heavy and hence slow, but
classi cation of test is extremely fast and accurate.
• APPLICATIONS:
- Image classi cation: SVM is widely used in image classi cation tasks, such
as face recognition and object detection.SVM can be used to classify
images based on their features, such as texture, shape, and colour.
- Text classi cation: SVM is also used for text classi cation tasks, such as
sentiment analysis and spam detection. SVM can learn from a large
dataset of text documents and then classify new documents based on
their features.
- Bioinformatics: SVM has been successfully used in bioinformatics to
classify different types of biological data, such as gene expression data
and protein classi cation.
- Financial forecasting: SVM can be used to predict stock prices and other
nancial market trends. SVM can learn from historical data and then
predict future trends based on the learned patterns.
- Medical diagnosis: SVM can be used to diagnose medical conditions, such
as cancer and diabetes. SVM can learn from patient data and then classify
new patients based on their features.

26
fi
fi
fi
fi
fi
fi
fi
fl
fi
fi
fi

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy