0% found this document useful (0 votes)

22 views7 pages

Ai Notes V

Uploaded by

mu9140615427

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views7 pages

Ai Notes V

Uploaded by

mu9140615427

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

UNIT – V

NOTES
Principal Component Analysis

This method was introduced by Karl Pearson. It works on a condition that while the data in a higher
dimensional space is mapped to data in a lower dimension space, the variance of the data in the lower
dimensional space should be maximum.
It involves the following steps:
• Construct the covariance matrix of the data.
• Compute the eigenvectors of this matrix.
• Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a large fraction of
variance of theoriginal data.
Hence, we are left with a lesser number of eigenvectors, and there might have been some data loss in the
process. But, the most important variances should be retained by the remaining eigenvectors.

Advantages of Dimensionality Reduction

• It helps in data compression, and hence reduced storage space.
• It reduces computation time.
• It also helps remove redundant features, if any.
Disadvantages of Dimensionality Reduction
• It may lead to some amount of data loss.
• PCA tends to find linear correlations between variables, which is sometimes undesirable.
• PCA fails in cases where mean and covariance are not enough to define datasets.
• We may not know how many principal components to keep- in practice, some thumb rules are applied.
• Explain fold
• based filtering

3. Kernel Principal Component Analysis

There are a lot of machine learning problems which a nonlinear, and the use of nonlinear feature mappings can
help to produce new features which make prediction problems linear. In this section we will discuss the
following idea: transformation of the dataset to a new higher-dimensional (in some cases infinite- dimensional)
feature space and theuse of PCA in that space in order to produce uncorrelated features. Such a method is
called Kernel Principal Component Analysis or KPCA.

Let us denote a covariance matrix in a new feature space as

,
where . Will consider that the dimensionality of the feature space equals to .
Eigendecompsition of is given by

By the definition of

And therefore

It is obviously to see, that is a linear combination of and thus can be written as

Substituting it to the equation above and writing it in a matrix notation, we get

where is a Gram matrix in , andare column-vectors with elements

Eigenvectorsof should be orthonormal, therefore, we get the following:

Having eigenvectors of , we can get the projection of an item on -th eigenvector:

So far, we have assumed that the mapping is known. From the equations above, we can see, that only a
thing that we need for the data transformation is the eigendecomposition of a Gram matrix . Dot products,
which are its elements can be defined without any definition of . The function defining such dot products
in some Hilbert space is called kernel. Kernels are satisfied by the Mercer’s theorem. There are many different
types of kernels, there are several popular:

1. Linear: ;
2. Gaussian: ;
3. Polynomial: .
Using a kernel function we can write new equation for a projection of some data item onto -th
eigenvector:

So far, we have assumed that the columns of have zero mean. Using

and substituting it to the equation for , we get

where is a matrix , where each element equals to .

Summary: Now we are ready to write the whole sequence of steps to perform KPCA:

1. Calculate .
2. Calculate .
3. Find the eigenvectors of corresponding to nonzero eigenvalues and normalize them:
.
4. Sort found eigenvectors in the descending order of coresponding eigenvalues.
5. Perform projections onto the given subset of eigenvectors.

The method described above requires to define the number of components, the kernel and its parameters. It
shouldbe noted, that the number of nonlinear principal components in the general case is infinite, but since we
are computing the eigenvectors of a matrix , at maximum we can calculate nonlinear principal components.
SUPPORT VECTOR MACHINES

Support Vector Machine or SVM are supervised learning models with associated learning algorithms that
analyze data for classification( clasifications means knowing what belong to what e.g ‘apple’ belongs to class
‘fruit’ while ‘dog’ to class ‘animals’ -see fig.1)

In support vector machines, it looks somewhat like which separates the blue balls from red.
SVM is a classifier formally defined by a separating hyperplane. An hyperplane is a subspace of one
dimension lessthan its ambient space. The dimension of a mathematical space (or object) is informally
defined as the minimum

number of coordinates (x,y,z axis) needed to specify any point (like each blue and red point) within it while
anambient space is the space surrounding a mathematical object.

Therefore the hyperplane of a two dimensional space below (fig.2) is a one dimensional line dividing the red
and bluedots.
Introduction to clustering

As the name suggests, unsupervised learning is a machine learning technique in which models are
not supervised using training dataset. Instead, models itself find the hidden patterns and insights from
the given data. It can be compared to learning which takes place in the human brain while learning
new things. It can be defined as:

“Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.”

Unsupervised learning cannot be directly applied to a regression or classification problem because

unlike supervisedlearning, we have the input data but no corresponding output data. The goal of
unsupervised learning is to find the underlying structure of dataset, group that data according to
similarities, and represent that dataset in a compressed format
Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of different
types of cats and dogs. The algorithm is never trained upon the given dataset, which means it does not have any
idea about the features of the dataset. The task of the unsupervised learning algorithm is to identify the image
features on their own. Unsupervised learning algorithm will perform this task by clustering the image dataset
into the groups according to similarities between images.

Why use Unsupervised Learning?

Below are some main reasons which describe the importance of Unsupervised Learning:

o Unsupervised learning is helpful for finding useful insights from the data.

o Unsupervised learning is much similar as a human learns to think by

their ownexperiences, which makes itcloser to the real AI.

o Unsupervised learning works on unlabeled and uncategorized data

which makeunsupervised learning moreimportant.

o In real-world, we do not always have input data with the corresponding

output so tosolve such cases, weneed unsupervised learning.

Working of Unsupervised Learning

WorkHere, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine learning
model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns from the data
and then will apply suitable algorithms such as k- means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groupsaccording to
the similaritiesand difference between the objects.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types of problems:

Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group. Cluster
analysis finds the commonalities between the data objects and categorizes them as per the presence and
absence of those commonalities.

o Association: An association rule is an unsupervised learning method which is used

for finding the relationships between variables in the large database. It determines the set of items that
occurs together in the dataset. Association rule makes marketing strategy more effective. Such as people
who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

K-means clustering of unsupervised learning can be understood by the below diagram:

o KNN (k-nearest neighbors)

o Hierarchal clustering

o Anomaly detection

o Neural Networks
o Principle Component Analysis

o Independent Component Analysis

o Apriori algorithm

o Unsupervised learning is used for more complex tasks as compared to

supervisedlearning because, inunsupervised learning, we don't have labeled input data.
o Unsupervised learning is preferable as it is easy to get unlabeled data in comparison
to labeled data.

Disadvantages of Unsupervised Learning

o Unsupervised learning is intrinsically more difficult than

supervised learning asit does not havecorresponding output.

o The result of the unsupervised learning algorithm might be less

accurate as input data is not labeled, andalgorithms do not know the exact output in
advance.

k-means clustering algorithm

One of the most used clustering algorithm is k-means. It allows to group the data according
to the existingsimilarities among them in k clusters, given as input to the algorithm. I‟ll start
with asimple example.

Let‟s imagine we have 5 objects (say 5 people) and for each of them we know two features
(height and weight). Wewant to group them into k=2 clusters.

Our dataset will look like this:

How to apply k-means?

As you probably already know, I‟m using Python libraries to analyze my data. The k-means algorithm
is implemented in the scikit-learn package. To use it, you will just need the following line in your
script:

1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
MACHINE LEARNING Notes
No ratings yet
MACHINE LEARNING Notes
8 pages
Variance
No ratings yet
Variance
6 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
16 pages
Unsupervised Learning 1691392220
No ratings yet
Unsupervised Learning 1691392220
15 pages
Lecture 08 Slides
No ratings yet
Lecture 08 Slides
43 pages
Group I Discrete Mathematics
No ratings yet
Group I Discrete Mathematics
4 pages
Unit 5a
No ratings yet
Unit 5a
60 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Chapter1: Introduction: Notes On MLAPP
No ratings yet
Chapter1: Introduction: Notes On MLAPP
25 pages
QSRI Lecture4
No ratings yet
QSRI Lecture4
56 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Unsupervised Machine Learning in Python
100% (1)
Unsupervised Machine Learning in Python
89 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
No ratings yet
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
28 pages
Lecture-3 Unit 3
No ratings yet
Lecture-3 Unit 3
22 pages
Machine Learning Section3 Ebook v05
No ratings yet
Machine Learning Section3 Ebook v05
15 pages
Optimisation and Dimension Reduction Tech-Unlocked
No ratings yet
Optimisation and Dimension Reduction Tech-Unlocked
29 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Unsupervised Lec
No ratings yet
Unsupervised Lec
12 pages
Module 3
No ratings yet
Module 3
17 pages
Kmeans
No ratings yet
Kmeans
74 pages
Ds Module 5
No ratings yet
Ds Module 5
49 pages
DSA5102 Lecture9
100% (1)
DSA5102 Lecture9
35 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Ai - W8L15
No ratings yet
Ai - W8L15
44 pages
T3 Scheme 24 25
No ratings yet
T3 Scheme 24 25
4 pages
Machine Unit4
No ratings yet
Machine Unit4
55 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
130 pages
Module 4
No ratings yet
Module 4
63 pages
How To Perform Clustering Algorithms in Machine Learning
No ratings yet
How To Perform Clustering Algorithms in Machine Learning
9 pages
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
No ratings yet
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
5 pages
Machine Learning - Iv
No ratings yet
Machine Learning - Iv
13 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018
No ratings yet
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018
3 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
No ratings yet
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
51 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Empirical Finance
No ratings yet
Empirical Finance
5 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
Aam Ut-2 QB Ans
No ratings yet
Aam Ut-2 QB Ans
29 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
Module 5-Part 1
No ratings yet
Module 5-Part 1
30 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
Association Rule Mining
No ratings yet
Association Rule Mining
191 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
Vahid
No ratings yet
Vahid
18 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
New Doc 09-30-2024 20.37
No ratings yet
New Doc 09-30-2024 20.37
6 pages
Wk01 Machine Learning
No ratings yet
Wk01 Machine Learning
6 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
Linear Regression Vs Logistic Regression
No ratings yet
Linear Regression Vs Logistic Regression
8 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
15 pages
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ai Notes V

Uploaded by

Ai Notes V

Uploaded by

UNIT – V

Advantages of Dimensionality Reduction

3. Kernel Principal Component Analysis

Let us denote a covariance matrix in a new feature space as

It is obviously to see, that is a linear combination of and thus can be written as

Substituting it to the equation above and writing it in a matrix notation, we get

where is a Gram matrix in , andare column-vectors with elements

Having eigenvectors of , we can get the projection of an item on -th eigenvector:

and substituting it to the equation for , we get

where is a matrix , where each element equals to .

Unsupervised learning cannot be directly applied to a regression or classification problem because

Why use Unsupervised Learning?

o Unsupervised learning is much similar as a human learns to think by

o Unsupervised learning works on unlabeled and uncategorized data

o In real-world, we do not always have input data with the corresponding

Working of Unsupervised Learning

Types of Unsupervised Learning Algorithm:

o Association: An association rule is an unsupervised learning method which is used

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

K-means clustering of unsupervised learning can be understood by the below diagram:

o KNN (k-nearest neighbors)

o Independent Component Analysis

o Unsupervised learning is used for more complex tasks as compared to

Disadvantages of Unsupervised Learning

o Unsupervised learning is intrinsically more difficult than

o The result of the unsupervised learning algorithm might be less

k-means clustering algorithm

Our dataset will look like this:

How to apply k-means?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.