0% found this document useful (0 votes)

22 views5 pages

Unit 3,4 and 5

PCA aims to maximize the variance of projected data onto a lower dimensional subspace. It can be used for data compression by identifying important features in datasets like images. The steps involve subtracting the mean, calculating the covariance matrix, computing eigenvectors and eigenvalues to identify principal components, and reconstructing the data using selected components. Feature selection techniques like filter and wrapper methods are used to identify relevant features. Filter methods select features independently of algorithms based on statistical tests, while wrapper methods evaluate feature subsets using learning algorithms. Filter methods are faster but may fail, while wrapper methods always provide the optimal subset but are computationally expensive. PCA finds the most accurate representation but maximum variance directions may not separate classes well, so FLD projects lines

Uploaded by

kushalmadappa18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views5 pages

Unit 3,4 and 5

Uploaded by

kushalmadappa18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Unit 3

Define PCA.How to maximise the variance of PCA:

PCA can be defined as the orthogonal projection of the data onto a lower dimensional
linear space, known as the principal subspace, such that the variance of the projected data
is maximized

Example of PCA for Data Compression:

Consider the example of offline digits dataset. it contains a large number of handwritten digits
images.

Aim:Reducing the number of variables in a dataset while retaining as much of the information as
possible.

In the case of image data, PCA can be used to identify the most important features in the images
and compress them into a smaller set of variables.

(1) Preprocess the images by flattening them into a vector of pixel values.

(2) Then, you would use PCA to identify the most important principal components of the

image data.

(3) Finally, you would reconstruct the images using only the most important principal

components, resulting in a compressed version of the dataset with lower dimensionality.

 Consider the Dimension given as D.Let us minimize the dimension to a lower dimension M.

More the value of M, more accurate will be the image.

Lesser the value of M, more the degree of compression.

Explain steps involved in PCA with an example:
Step 1: Subtract the mean from each of the dimensions.

Subtracting the mean makes variance and covariance calculation easier by simplifying

their equations.

Step 2: Calculate the covariance matrix.It is a symmetric matrix.

Step 3: Calculate the eigen vectors V and eigen values D of the covariance matrix.

Eigenvectors are plotted as diagonal dotted lines on the plot. (note: they are

perpendicular to each other).

 One of the eigenvectors goes through the middle of the points, like drawing a line of

best fit.

 The second eigenvector gives us the other, less important, pattern in the data.

Step 4: Reduce dimensionality and form feature vector.

The eigenvector with the highest eigenvalue is the principal component of the data set.

In our example, the eigenvector with the largest eigenvalue is the one that points down

the middle of the data.

Step 5: Derive the new data.

Feature Selection and it’s types:

Feature selection is a technique used in machine learning and data mining to identify and
select the most relevant features.

Need for Feature Selection:

1. To improve performance (in terms of speed,simplicity of the model).

2. To visualize the data for model selection.

3. To reduce dimensionality and remove noise.

4. Reduce overfitting,

 Features of FS:

1. Removing irrelevant data.

2. Increasing predictive accuracy of learned models.

3. Reducing the cost of the data.

4. Improving learning efficiency, such as reducing storage requirements and computational cost.

 The selection can be represented as a binary array, with each element corresponding to

the value 1, if the feature is currently selected by the algorithm and 0, if it does not occur.

 There should be a total of 2 M subsets where M is the number of features of a data set.

Here M=3

Types:
1.Filter Method
2.Wrapper Method
(1) Filter Method:
 The selection of features is independent of any machine learning algorithms.
 Features are selected on the basis of their scores in various statistical tests.

 The correlation is a main term here.

 Filter methods do not remove multicollinearity.

(2) Wrapper Method:

 (1) Subset of featues is created.
(2) Based on the inferences that we draw from the previous model, we decide to add or
remove features from your subset.

 These methods are usually computationally very expensive.

Wrapper methods include:

1. Forward Selection:
We start with having no feature in the model.
In each iteration, we keep adding the feature which best improves our model.
2. Backward Elimination:
We start with all the features.
Remove the least significant feature at each iteration.
3. Bidirectional Generation:
We perform both FS and BE concurrently.
4. Random Generation:
It starts the search in a random direction.
The choice of adding or removing a features is a random decision.

Filter Wrapper

Method Correlation Subset construction measuring performance

Time Fast Slow

Cost Cheap Expensive

Result Might fail Always provide best subset features

Overfitting Never May get prone to.

Derive Fischer Linear Discriminant using a example

PCA finds the most accurate data representation.However the direction of maximum
variance maybe useless for classification
So FLD is used which projects a line such that samples from different classes are well
separated.
Unit 4

Wrapper Method
No ratings yet
Wrapper Method
58 pages
ISTQB Agile Tester Exam - Answer
No ratings yet
ISTQB Agile Tester Exam - Answer
139 pages
Data Reduction Techniques
No ratings yet
Data Reduction Techniques
41 pages
Unit V
No ratings yet
Unit V
82 pages
Knowledge Management Strategy and Technology 1st Edition Richard F. Bellaver Instant Download
100% (1)
Knowledge Management Strategy and Technology 1st Edition Richard F. Bellaver Instant Download
61 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Unit 2 Part 4
No ratings yet
Unit 2 Part 4
47 pages
Module-2 C3-C4
No ratings yet
Module-2 C3-C4
66 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
RTN 900 V100R019C00 Configuration Guide 01 PDF
No ratings yet
RTN 900 V100R019C00 Configuration Guide 01 PDF
1,883 pages
3ML.03.Feature Reduction
No ratings yet
3ML.03.Feature Reduction
44 pages
Module5.2 Feature Selection Methods
No ratings yet
Module5.2 Feature Selection Methods
64 pages
Feature Selection
No ratings yet
Feature Selection
13 pages
ML Mod 4 & 6 Pyq
No ratings yet
ML Mod 4 & 6 Pyq
11 pages
Data Preprocessing
No ratings yet
Data Preprocessing
33 pages
UNIT04
No ratings yet
UNIT04
35 pages
L06 Feature Selection and Extraction
No ratings yet
L06 Feature Selection and Extraction
29 pages
Unit 3
No ratings yet
Unit 3
50 pages
ML Unit4 QB Solutions
No ratings yet
ML Unit4 QB Solutions
8 pages
R21 Unit 2
No ratings yet
R21 Unit 2
101 pages
10-2 Data Analysis and Pre-Processing Part 4 PDF
No ratings yet
10-2 Data Analysis and Pre-Processing Part 4 PDF
23 pages
ML Unit2 Classppt
No ratings yet
ML Unit2 Classppt
44 pages
Feature Engineering
No ratings yet
Feature Engineering
51 pages
AI Unit-5
No ratings yet
AI Unit-5
53 pages
DimensionalityReduction 13022024
No ratings yet
DimensionalityReduction 13022024
32 pages
Lesson 7-Feature Selection and Principal Component Analysis
No ratings yet
Lesson 7-Feature Selection and Principal Component Analysis
24 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Feature Selection
No ratings yet
Feature Selection
56 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
Day School 03
No ratings yet
Day School 03
32 pages
CHP 4
No ratings yet
CHP 4
72 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
Feature Selection
No ratings yet
Feature Selection
36 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Sta 5
No ratings yet
Sta 5
16 pages
Lecture 4
No ratings yet
Lecture 4
13 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
ML Unit 2 Part - 2
No ratings yet
ML Unit 2 Part - 2
6 pages
ML Notes
No ratings yet
ML Notes
15 pages
ML Unit 2 Part 2
No ratings yet
ML Unit 2 Part 2
23 pages
Business Plan Pro Rapidshare
No ratings yet
Business Plan Pro Rapidshare
11 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Across Unknown South America
No ratings yet
Across Unknown South America
42 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
ML Self Unit 2
No ratings yet
ML Self Unit 2
20 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
February
No ratings yet
February
2 pages
3c Feature Extraction
No ratings yet
3c Feature Extraction
19 pages
2017 Pascal Solution
No ratings yet
2017 Pascal Solution
9 pages
ML Micro U2 Insem
No ratings yet
ML Micro U2 Insem
4 pages
H 046 010879 00 BeneVision CMS Operators Manual R3 9.0
No ratings yet
H 046 010879 00 BeneVision CMS Operators Manual R3 9.0
182 pages
Isilon - Backend Switches-Backend Switch Upgrade-Replace
No ratings yet
Isilon - Backend Switches-Backend Switch Upgrade-Replace
8 pages
Onion Routing
No ratings yet
Onion Routing
37 pages
Appendix C - Simulink Refresher
No ratings yet
Appendix C - Simulink Refresher
27 pages
PA308 Installation Manual
No ratings yet
PA308 Installation Manual
80 pages
OnGrid Verification and Registration Format v1.9 Update .
No ratings yet
OnGrid Verification and Registration Format v1.9 Update .
17 pages
Lec6. Operator Overload
No ratings yet
Lec6. Operator Overload
28 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
System Software and Languages
No ratings yet
System Software and Languages
55 pages
Computer Graphics (Lab File) - Satyam
No ratings yet
Computer Graphics (Lab File) - Satyam
61 pages
Unit2 BCA 6thsem
No ratings yet
Unit2 BCA 6thsem
68 pages
VIH Series60
100% (2)
VIH Series60
1 page
Automotive Industry Predictive Maintenance - Case Studies - Infinite Uptime
No ratings yet
Automotive Industry Predictive Maintenance - Case Studies - Infinite Uptime
20 pages
Group3 Amazon Report
No ratings yet
Group3 Amazon Report
7 pages
Huawei CloudEngine S6730-H Series 10GE Switches Datasheet
No ratings yet
Huawei CloudEngine S6730-H Series 10GE Switches Datasheet
26 pages
Utrasonic Cleaning Generator - 2
No ratings yet
Utrasonic Cleaning Generator - 2
17 pages
Farm Land Leads PDF
No ratings yet
Farm Land Leads PDF
28 pages
Pca
No ratings yet
Pca
6 pages
SIFT Detector FPCV-2-3
No ratings yet
SIFT Detector FPCV-2-3
22 pages
Architecture of Industrial Automation Systems: Abdu Idris Omer Taleb M.M., PHD
No ratings yet
Architecture of Industrial Automation Systems: Abdu Idris Omer Taleb M.M., PHD
11 pages
RRL - Revision
No ratings yet
RRL - Revision
4 pages
Teams Data
No ratings yet
Teams Data
2 pages
Monolithic Applications and Microservices: Applications Are Made of Multiple Components. The
No ratings yet
Monolithic Applications and Microservices: Applications Are Made of Multiple Components. The
4 pages
A Dot Matrix Printer
No ratings yet
A Dot Matrix Printer
21 pages
Chandigarh Administration Chandigarh Police: JAN SAMPARK: Information Gateway of Chandigarh Administration: 1 of 1
No ratings yet
Chandigarh Administration Chandigarh Police: JAN SAMPARK: Information Gateway of Chandigarh Administration: 1 of 1
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 3,4 and 5

Uploaded by

Unit 3,4 and 5

Uploaded by

Unit 3

Define PCA.How to maximise the variance of PCA:

Example of PCA for Data Compression:

components, resulting in a compressed version of the dataset with lower dimensionality.

More the value of M, more accurate will be the image.

Lesser the value of M, more the degree of compression.

Step 2: Calculate the covariance matrix.It is a symmetric matrix.

perpendicular to each other).

Step 4: Reduce dimensionality and form feature vector.

the middle of the data.

Step 5: Derive the new data.

Feature Selection and it’s types:

Need for Feature Selection:

1. To improve performance (in terms of speed,simplicity of the model).

2. To visualize the data for model selection.

3. To reduce dimensionality and remove noise.

1. Removing irrelevant data.

3. Reducing the cost of the data.

 The correlation is a main term here.

 Filter methods do not remove multicollinearity.

(2) Wrapper Method:

 These methods are usually computationally very expensive.

Wrapper methods include:

Method Correlation Subset construction measuring performance

Time Fast Slow

Cost Cheap Expensive

Result Might fail Always provide best subset features

Overfitting Never May get prone to.

Derive Fischer Linear Discriminant using a example

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.