Updated Feature Enginering Notes
Updated Feature Enginering Notes
BENGALURU
Department of Computer Science and Engineering
Module-2
Topic: Feature Engineering
by
Dr. P B Mallikarjuna
1
Curse of Dimensionality
• Data Sparsity
• Increased computation
• Overfitting
2
Dimensionality Reduction
Dimensionality reduction is the process of reducing the number of features in a
dataset (Feature Matrix/Data Matrix).
3
Issues Associated with Features
• Curse of dimensionality
• Misleading features
• Redundant features
needs
Feature Engineering
4
Feature Engineering : Dimensionality Reduction
• Filter Methods
Correlation Index, PCA, LDA, FLDA, Autoencoders
• Wrapper Methods
SFS, SBS, SFFS, SFBS
5
Filter Method Wrapper Method
6
Wrapper Algorithm
• Supervised
• Best discriminating feature is assumed to carry
• Quadratic complexity : n(n+1)
• Approximate algorithm
• Wrapping around learning algorithm
• Classifier sensitive
• Execution time is longer than filter methods
• It is based on greedy approach
7
Wrapper Methods
• Sequential Forward Selection (SFS)
8
Sequential Forward Selection (SFS)
• Method of inclusion
9
SFS - Example
10
Sequential Backward Selection (SBS)
• Method of deduction
11
SBS - Example
12
Sequential Floating Forward Selection (SFFS)
13
Sequential Floating Backward Selection (SFBS)
14
Principal Component Analysis (PCA)
• It is a way of identifying patterns in data, and expressing the data in such a
way as to highlight their similarities and differences.
• Since patterns in data can be hard to find in data of high dimension, where
the luxury of graphical representation is not available, PCA is a powerful
tool for analyzing data
• Once found these patterns in the data, and we can compress the data by
reducing the number of dimensions, without much loss of information
✓ Subtract the mean from each of the data dimensions in the dataset. The
mean subtracted is the average across each dimension
In case of 3-dimensional
dataset
17
Steps in PCA
✓ Calculate the eigenvectors and eigenvalues of the covariance matrix
✓ Transform the Data: Project the original data onto the selected principal
components.
18
Example Problem on PCA
19
Example Problem on PCA
20
Example Problem on PCA
21
Example Problem on PCA
22
Example Problem on PCA
23
Linear Discriminant Analysis (LDA)
• Linear Discriminant Analysis (LDA) is a supervised dimensionality
reduction technique primarily used for classification tasks
• LDA finds the best projection direction that maximizes class separability
• In LDA, the eigenvectors obtained from the scatter matrices are not
necessarily perpendicular (orthogonal) to each other
24
Steps in LDA
✓ Compute the Mean Vectors
If there are C classes, calculate the mean vector for each class
where 𝑁𝑐 is the number of samples in class 𝑐, and 𝑥 represents the feature vectors
✓ Compute the Scatter Matrices
• Within-Class Scatter Matrix (𝑆𝑊 ): Measures the variance within each class
• Between-Class Scatter Matrix (𝑆𝐵): Measures the variance between different class
means
25
where 𝜇 is the overall mean of all data points
Steps in LDA
✓ Compute the Eigenvalues and Eigenvectors
27
Example Problem on LDA
28
Example Problem on LDA
29
Example Problem on LDA
30
Example Problem on LDA
31
Example Problem on LDA
32
Example Problem on LDA
33
Example Problem on LDA
34
JSS ACADEMY OF TECHNICAL EDUCATION,
BENGALURU
Department of Computer Science and Engineering
Machine Learning
by
Dr. P B Mallikarjuna
Date: 19.08.2023
35
Matrix Decomposition
• Matrix decomposition is the process of breaking down a matrix into a
product of simpler matrices
✓ QR Decomposition
36
Singular Value Decomposition (SVD)
• Singular Value Decomposition (SVD) is a fundamental matrix factorization
technique in linear algebra
37
Steps in SVD
✓ Consider some data (Matrix A)
• The square roots of the eigenvalues of 𝐴𝑇𝐴 (or 𝐴𝐴𝑇) give the singular
values in Σ
✓ Construct 𝑈, Σ, 𝑉𝑇
39
Example Problem on SVD
40
Example Problem on SVD
41
Example Problem on SVD
42
Example Problem on SVD
43
Example Problem on SVD
44
Example Problem on SVD
To Compute Eigen Values of 𝐴𝐴𝑇
45
Example Problem on SVD
46