0% found this document useful (0 votes)
14 views47 pages

Updated Feature Enginering Notes

The document discusses feature engineering, particularly focusing on dimensionality reduction techniques such as PCA and LDA, which help address issues like the curse of dimensionality and overfitting. It outlines various methods for feature selection, including filter and wrapper methods, and explains the steps involved in PCA and LDA for transforming high-dimensional data into lower-dimensional spaces while preserving important information. Additionally, it covers matrix decomposition techniques like SVD, which are essential for applications in machine learning and data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views47 pages

Updated Feature Enginering Notes

The document discusses feature engineering, particularly focusing on dimensionality reduction techniques such as PCA and LDA, which help address issues like the curse of dimensionality and overfitting. It outlines various methods for feature selection, including filter and wrapper methods, and explains the steps involved in PCA and LDA for transforming high-dimensional data into lower-dimensional spaces while preserving important information. Additionally, it covers matrix decomposition techniques like SVD, which are essential for applications in machine learning and data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

JSS ACADEMY OF TECHNICAL EDUCATION,

BENGALURU
Department of Computer Science and Engineering

Module-2
Topic: Feature Engineering
by

Dr. P B Mallikarjuna

1
Curse of Dimensionality
• Data Sparsity

• Increased computation

• Overfitting

• Distances lose meaning

• Performance degradation. Algorithms, especially those relying


on distance measurements like k-nearest neighbors,

• High-dimensional data is hard to visualize, making exploratory


data analysis more difficult.

2
Dimensionality Reduction
Dimensionality reduction is the process of reducing the number of features in a
dataset (Feature Matrix/Data Matrix).

• Subspace Methods: Projection of data from high dimension space to lower


dimension space

PCA (Principal Component analysis) – Unsupervised Algorithm

LDA (Linear Discriminant Analysis) - Supervised Algorithm

• Feature Subset Selection:


Sequential Forward Selection (SFS)
Sequential Backward Selection (SBS)
Sequential Floating Forward Selection (SFFS)
Sequential Floating Backward Selection (SFFS)

3
Issues Associated with Features

• Curse of dimensionality
• Misleading features
• Redundant features

needs

Feature Engineering

4
Feature Engineering : Dimensionality Reduction
• Filter Methods
Correlation Index, PCA, LDA, FLDA, Autoencoders

• Wrapper Methods
SFS, SBS, SFFS, SFBS

✓ Wrapper methods evaluate feature subsets based on the performance of a


chosen machine learning model

✓ Filter methods evaluate features based on statistical measures without


involving a machine learning model

5
Filter Method Wrapper Method

6
Wrapper Algorithm
• Supervised
• Best discriminating feature is assumed to carry
• Quadratic complexity : n(n+1)
• Approximate algorithm
• Wrapping around learning algorithm
• Classifier sensitive
• Execution time is longer than filter methods
• It is based on greedy approach

7
Wrapper Methods
• Sequential Forward Selection (SFS)

• Sequential Backward Selection (SBS)

• Sequential Floating Forward Selection (SFFS)

• Sequential Floating Backward Selection (SFFS)

8
Sequential Forward Selection (SFS)

• Method of inclusion

• Starts with empty set

• At each step it adds a best feature such that


criterion function is maximized

• Criterion function fails stops

9
SFS - Example

10
Sequential Backward Selection (SBS)

• Method of deduction

• Starts with set of all features

• At each step it eliminates a worst feature such


that criterion function is maximized

• Criterion function fails stops

11
SBS - Example

12
Sequential Floating Forward Selection (SFFS)

• Method of inclusion and deduction

• Starts with empty set

• At each step – forward walk followed by backward walk(s)


Forward walk
It adds a best feature such that criterion function is maximized
Backward walk
It eliminates a worst feature such that criterion function is
maximized
• Criterion function fails stops

13
Sequential Floating Backward Selection (SFBS)

• Method of deduction and inclusion

• Starts with set of all features (Full set)

• At each step – Backward walk followed by forward walk(s)


Backward walk
It eliminates a worst feature such that criterion function is
maximized
Forward walk
It adds a best feature such that criterion function is maximized
• Criterion function fails stops

14
Principal Component Analysis (PCA)
• It is a way of identifying patterns in data, and expressing the data in such a
way as to highlight their similarities and differences.

• Since patterns in data can be hard to find in data of high dimension, where
the luxury of graphical representation is not available, PCA is a powerful
tool for analyzing data

• Once found these patterns in the data, and we can compress the data by
reducing the number of dimensions, without much loss of information

• Principal Component Analysis (PCA) is a dimensionality reduction


technique that transforms high-dimensional data into a lower-
dimensional subspace while preserving the most important
information (variance)

• The variance of a dataset represents the spread or distribution of data points


15
Principal Component Analysis (PCA)
• PCA finds the Principal components (Eigen Vectors) that represent the
directions of maximum variance in the data. They are uncorrelated and
ordered (ranked) based on the amount of variance they capture (Eigen
Values)

• Each principal component is a linear combination of the original features

• Principal Components are perpendicular (uncorrelated) to each other

• Eigenvector with the highest eigenvalue is the first Principal Component


(PC1) & it captures the highest variance

• Second principal component (PC2) captures next highest variance &


This continues for all components, ensuring dimensionality reduction while
preserving information
16
Steps in PCA
✓ Get some dataset

✓ Subtract the mean from each of the data dimensions in the dataset. The
mean subtracted is the average across each dimension

✓ Compute the covariance matrix: It helps determine how strongly features


vary together

In case of 3-dimensional
dataset

17
Steps in PCA
✓ Calculate the eigenvectors and eigenvalues of the covariance matrix

✓ Sort Eigenvectors by Eigenvalues. Eigenvector with the highest


eigenvalue is the first Principal Component (PC1) & it captures the
highest variance

✓ Choosing components and forming a feature vector

✓ Transform the Data: Project the original data onto the selected principal
components.

Transformed Data= Original_Data Matrix × Selected_Eigen_Vectors Matrix

18
Example Problem on PCA

19
Example Problem on PCA

20
Example Problem on PCA

21
Example Problem on PCA

22
Example Problem on PCA

23
Linear Discriminant Analysis (LDA)
• Linear Discriminant Analysis (LDA) is a supervised dimensionality
reduction technique primarily used for classification tasks

• It projects high-dimensional data onto a lower-dimensional space while


maximizing class separability

• Maximize the between-class variance

• Minimize the within-class variance

• Improve class separation in a lower-dimensional space

• LDA finds the best projection direction that maximizes class separability

• In LDA, the eigenvectors obtained from the scatter matrices are not
necessarily perpendicular (orthogonal) to each other

24
Steps in LDA
✓ Compute the Mean Vectors
If there are C classes, calculate the mean vector for each class

where 𝑁𝑐 is the number of samples in class 𝑐, and 𝑥 represents the feature vectors
✓ Compute the Scatter Matrices
• Within-Class Scatter Matrix (𝑆𝑊​ ): Measures the variance within each class

• Between-Class Scatter Matrix (𝑆𝐵): Measures the variance between different class
means

25
where 𝜇 is the overall mean of all data points
Steps in LDA
✓ Compute the Eigenvalues and Eigenvectors

where 𝑣 are the eigenvectors and 𝜆 are the eigenvalues

✓ Select the Top 𝑘 Eigenvectors


• Choose the top 𝑘 eigenvectors corresponding to the largest eigenvalues
• These eigenvectors form the transformation matrix 𝑊

✓ Project the Data


Transform the original dataset 𝑋 using the learned projection matrix 𝑊

This reduces the dimensionality while preserving class separability 26


Example Problem on LDA

Each data point has two features (x₁, x₂)

27
Example Problem on LDA

28
Example Problem on LDA

29
Example Problem on LDA

30
Example Problem on LDA

31
Example Problem on LDA

32
Example Problem on LDA

33
Example Problem on LDA

34
JSS ACADEMY OF TECHNICAL EDUCATION,
BENGALURU
Department of Computer Science and Engineering

Machine Learning
by

Dr. P B Mallikarjuna

Date: 19.08.2023

35
Matrix Decomposition
• Matrix decomposition is the process of breaking down a matrix into a
product of simpler matrices

• Matrix decomposition is also known as matrix factorization

• This is widely used in numerical analysis, optimization, machine learning,


and linear algebra for solving equations, reducing dimensionality, and
improving computational efficiency
✓ LU Decomposition (Lower-Upper Decomposition)

✓ QR Decomposition

✓ Eigenvalue Decomposition : used in PCA, LDA

✓ Singular Value Decomposition (SVD)

36
Singular Value Decomposition (SVD)
• Singular Value Decomposition (SVD) is a fundamental matrix factorization
technique in linear algebra

• It is used in many applications such as dimensionality reduction, image


compression, machine learning, and signal processing

• It is unsupervised dimensionality technique

37
Steps in SVD
✓ Consider some data (Matrix A)

✓ Compute 𝐴𝑇𝐴 and 𝐴𝐴𝑇

• Compute eigenvalues and eigenvectors of 𝐴𝑇𝐴 and 𝐴𝐴𝑇

• The eigenvectors of 𝐴𝑇𝐴 form the columns of 𝑉

• The eigenvectors of 𝐴𝐴𝑇 form the columns of 𝑈

✓ Compute Singular Values

• The square roots of the eigenvalues of 𝐴𝑇𝐴 (or 𝐴𝐴𝑇) give the singular
values in Σ

✓ Construct 𝑈, Σ, 𝑉𝑇

• Arrange the singular values in descending order in Σ

• Use corresponding eigenvectors for 𝑈 and 𝑉


38
Example Problem on SVD

39
Example Problem on SVD

40
Example Problem on SVD

41
Example Problem on SVD

42
Example Problem on SVD

43
Example Problem on SVD

44
Example Problem on SVD
To Compute Eigen Values of 𝐴𝐴𝑇

45
Example Problem on SVD

46

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy