0% found this document useful (0 votes)
22 views5 pages

Unit 3,4 and 5

PCA aims to maximize the variance of projected data onto a lower dimensional subspace. It can be used for data compression by identifying important features in datasets like images. The steps involve subtracting the mean, calculating the covariance matrix, computing eigenvectors and eigenvalues to identify principal components, and reconstructing the data using selected components. Feature selection techniques like filter and wrapper methods are used to identify relevant features. Filter methods select features independently of algorithms based on statistical tests, while wrapper methods evaluate feature subsets using learning algorithms. Filter methods are faster but may fail, while wrapper methods always provide the optimal subset but are computationally expensive. PCA finds the most accurate representation but maximum variance directions may not separate classes well, so FLD projects lines

Uploaded by

kushalmadappa18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views5 pages

Unit 3,4 and 5

PCA aims to maximize the variance of projected data onto a lower dimensional subspace. It can be used for data compression by identifying important features in datasets like images. The steps involve subtracting the mean, calculating the covariance matrix, computing eigenvectors and eigenvalues to identify principal components, and reconstructing the data using selected components. Feature selection techniques like filter and wrapper methods are used to identify relevant features. Filter methods select features independently of algorithms based on statistical tests, while wrapper methods evaluate feature subsets using learning algorithms. Filter methods are faster but may fail, while wrapper methods always provide the optimal subset but are computationally expensive. PCA finds the most accurate representation but maximum variance directions may not separate classes well, so FLD projects lines

Uploaded by

kushalmadappa18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Unit 3

Define PCA.How to maximise the variance of PCA:


PCA can be defined as the orthogonal projection of the data onto a lower dimensional
linear space, known as the principal subspace, such that the variance of the projected data
is maximized

Example of PCA for Data Compression:


Consider the example of offline digits dataset. it contains a large number of handwritten digits
images.

Aim:Reducing the number of variables in a dataset while retaining as much of the information as
possible.

In the case of image data, PCA can be used to identify the most important features in the images
and compress them into a smaller set of variables.

(1) Preprocess the images by flattening them into a vector of pixel values.

(2) Then, you would use PCA to identify the most important principal components of the

image data.

(3) Finally, you would reconstruct the images using only the most important principal

components, resulting in a compressed version of the dataset with lower dimensionality.

 Consider the Dimension given as D.Let us minimize the dimension to a lower dimension M.

More the value of M, more accurate will be the image.

Lesser the value of M, more the degree of compression.


Explain steps involved in PCA with an example:
Step 1: Subtract the mean from each of the dimensions.

Subtracting the mean makes variance and covariance calculation easier by simplifying

their equations.

Step 2: Calculate the covariance matrix.It is a symmetric matrix.

Step 3: Calculate the eigen vectors V and eigen values D of the covariance matrix.

Eigenvectors are plotted as diagonal dotted lines on the plot. (note: they are

perpendicular to each other).

 One of the eigenvectors goes through the middle of the points, like drawing a line of

best fit.

 The second eigenvector gives us the other, less important, pattern in the data.

Step 4: Reduce dimensionality and form feature vector.

The eigenvector with the highest eigenvalue is the principal component of the data set.

In our example, the eigenvector with the largest eigenvalue is the one that points down

the middle of the data.

Step 5: Derive the new data.

Feature Selection and it’s types:


Feature selection is a technique used in machine learning and data mining to identify and
select the most relevant features.

Need for Feature Selection:

1. To improve performance (in terms of speed,simplicity of the model).

2. To visualize the data for model selection.

3. To reduce dimensionality and remove noise.

4. Reduce overfitting,

 Features of FS:

1. Removing irrelevant data.


2. Increasing predictive accuracy of learned models.

3. Reducing the cost of the data.

4. Improving learning efficiency, such as reducing storage requirements and computational cost.

 The selection can be represented as a binary array, with each element corresponding to

the value 1, if the feature is currently selected by the algorithm and 0, if it does not occur.

 There should be a total of 2 M subsets where M is the number of features of a data set.

Here M=3

Types:
1.Filter Method
2.Wrapper Method
(1) Filter Method:
 The selection of features is independent of any machine learning algorithms.
 Features are selected on the basis of their scores in various statistical tests.

 The correlation is a main term here.

 Filter methods do not remove multicollinearity.

(2) Wrapper Method:


 (1) Subset of featues is created.
(2) Based on the inferences that we draw from the previous model, we decide to add or
remove features from your subset.

 These methods are usually computationally very expensive.

Wrapper methods include:

1. Forward Selection:
We start with having no feature in the model.
In each iteration, we keep adding the feature which best improves our model.
2. Backward Elimination:
We start with all the features.
Remove the least significant feature at each iteration.
3. Bidirectional Generation:
We perform both FS and BE concurrently.
4. Random Generation:
It starts the search in a random direction.
The choice of adding or removing a features is a random decision.

Filter Wrapper

Method Correlation Subset construction measuring performance

Time Fast Slow

Cost Cheap Expensive

Result Might fail Always provide best subset features

Overfitting Never May get prone to.

Derive Fischer Linear Discriminant using a example


PCA finds the most accurate data representation.However the direction of maximum
variance maybe useless for classification
So FLD is used which projects a line such that samples from different classes are well
separated.
Unit 4

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy