0% found this document useful (0 votes)
3 views31 pages

Dimensionality Reduction DR

The document outlines various dimensionality reduction techniques, including Principal Component Analysis (PCA), Latent Discriminant Analysis (LDA), and t-Distributed Stochastic Neighbor Embedding (t-SNE), emphasizing their importance in data compression, visualization, and handling high-dimensional data. It discusses the mathematical foundations such as covariance, eigenvalues, and singular value decomposition, along with practical applications and interpretations of these methods. The document serves as a comprehensive guide for understanding and implementing dimensionality reduction in machine learning and data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views31 pages

Dimensionality Reduction DR

The document outlines various dimensionality reduction techniques, including Principal Component Analysis (PCA), Latent Discriminant Analysis (LDA), and t-Distributed Stochastic Neighbor Embedding (t-SNE), emphasizing their importance in data compression, visualization, and handling high-dimensional data. It discusses the mathematical foundations such as covariance, eigenvalues, and singular value decomposition, along with practical applications and interpretations of these methods. The document serves as a comprehensive guide for understanding and implementing dimensionality reduction in machine learning and data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Dimensionality

Reduction

26/06/2025 Dr Gargi
Peeyush Singhal | 1
Outline – Dimensionality Reduction
• Recap: Covariance, Eigen values, Eigen vectors and
Motivation Eigen decomposition
• Matrices are sometimes redundant or rather • Singular Value Decomposition
computationally expensive. We may want to
compress data • Derivation
• Need
• Need for data visualization – cannot visualize n- • Implementation example
Dimension data (n >3)
• Principal Component Analysis
• Untangle information – not bogged down by too
much information. Reduces cost of further • Introduction, working
experimentation etc. • Interpretation
• Implementation example
• Curse of Dimensionality
• Latent Discriminant Analysis
• Introduction, working
• Derivation
• Implementation example
• Manifold Learning : Introduction
• t-SNE
• Introduction
• Process
• Objective Function
26/06/2025 Dr Gargi • Implementation Example
Peeyush Singhal | 2
Dimensionality Reduction – Focus on extraction
rather than selection
Supervised LDA
Component /
Factor based
Unsupervised PCA | SVD…
Feature Extraction

Dimensionality t-SNE | UMAP |


Projection Based
Reduction (Feature MDS…
Treatment)

Feature Recursive Feature


Feature Elimination Elimination |
Selection Filters

When to use Feature Extraction – answer to all the questions is


• Primarily used for yes:
unsupervised learning
• Generic settings – 1.Do you want to reduce the number of variables, but aren’t able
visualization etc. to identify variables to completely remove from consideration?
• Method largely
independent on the 2.Do you want to ensure your variables are independent of one
dependent variable another?
26/06/2025 Dr Gargi
3.Are you comfortable making your independent variables less
Peeyush Singhal | 3
Recap: Variance, Covariance, Correlation
• Variance and Covariance:
• Measure of the “spread” of a set of points around their center of mass(mean)
• Variance:
• Measure of the deviation from the mean for points in one dimension
• Covariance:
• Measure of how much each of the dimensions vary from the mean with respect to each other

• Correlation:
• Measures of how much each of the dimensions vary from the mean with respect to each other, also
includes a band [-1,1]

• Covariance is measured between two dimensions


• Covariance sees if there is a relation between two dimensions
• Covariance between one dimension is the variance
26/06/2025 Dr Gargi
Peeyush Singhal | 4
Recap: eigen vectors and values
Eigen Value, Scalar:
The value by which the
eigen vector is scaled

Transformation Matrix Eigen Vector:


One can visualize as Does not change direction after
where 𝑖 ^ and ^𝑗 lie after
transformation, and therefore it is
transformation. Basis just scaled in transformation
vector transformation

Eigen vectors are vectors which do not change direction


after transformations (i.e., multiplication by a matrix).
See the blue ones in above picture

Eigen values are values by which the vectors are scaled


Please note that Transformation matrix here
(extended or squished), negative eigen values means is square – 2 basis vector in 2 dimension, 3
opposition direction to original vector basis vector in 3 dimension etc.
26/06/2025 Dr Gargi
Image Source: https://commons.wikimedia.org/wiki/File:Eig Peeyush Singhal | 5
Eigen decomposition
Generally we can say that:
Transformation Matrix for this equation

Transformation Matrix for this equation

The above means that area spanned after transformation is zero, i.e. =
determinant of first matrix is zero (or invertible) as cannot be zero
Or. (Λ is capital λ)

(…..(remember = )
Specifically, say the transformation matrix , [always] or [ = I]
then means Or
[This is called Eigen decomposition]

i.e., it has two roots (eigen values) and putting the roots into , gives
and

26/06/2025 Dr Gargi
Peeyush Singhal | 6
Singular Value Decomposition : A = USVT :
What is SVD?
or

• Eigen decomposition works only for square matrix (mxm or


nxn)– so what to do for non-square matrix (mxn)?
• Nearly all the real-world data in the real world is non-square
[number of columns (num_c) == features, number of rows
(num_r) == samples, num_c ≠ num_r]
• We see that and are square matrices (and also symmetric,
with positive or zero eigen values)
• Further and have the same eigen values. They have
orthonormal eigen vectors – perpendicular to each other
• SVD is a method to support decomposition of non-square
matrix
• is eigen vector of , the set of is called . = …orthogonal
matrix
• is eigen vector of . = …orthogonal matrix
• U and26/06/2025
V are called singular vectors of A Dr Gargi
Special Matrix properties : https://jonathan-hui.medium.com/machine-learning-linear-algebra-special- Peeyush Singhal | 7
Singular Value Decomposition : A = USVT :
How it is derived?
If generally, we have (Λ is capital λ
(matrix name changed to P from A)
; so
(since, U is eigen-vector set of )
Similarly, we have = .

Let us do a reverse proof, this is less rigorous. i.e. let us


assume

Then,
Now =
=
or
= )=U
=)
Similarly, = can be proved.

26/06/2025 Dr Gargi
Special Matrix properties : https://jonathan-hui.medium.com/machine-learning-linear-algebra-special- Peeyush Singhal | 8
SVD : Geometric interpretation

26/06/2025 Dr Gargi
Peeyush Singhal | 9
SVD – How we can use it?
• Truncated or Compact SVD
• Does not reduces dimension at all, but good • Let us think about a rank, say
for understanding the components

• Good for reducing dimensions as

𝑟
𝑟
𝑟

𝑟 𝑟
𝑟

26/06/2025 Dr Gargi
Peeyush Singhal | 10
Principal Component Analysis (PCA) -
Introduction Reduce data from 2D to 1D
• Aim to simplify | understand “independent” features / factors | explain the
variability of data with less number of features.
https://setosa.io/ev/principal-component-analysis/

26/06/2025 Dr Gargi
Peeyush Singhal | 11
PCA - Process
1. For a dataset, segregate into (dependent Why Covariance matrix and why not any other matrix
variable) and (independent variables / covariance matrix
feature matrix). We would work with X.
2. Take the matrix of and take mean for
each column (must do)
3. (optional) Divide by standard deviation –
if you think that features of higher
variance are not important as features
with lesser variance. If you are in doubt,
then divide • is square, A is not
4. Let’s say matrix after 2 and optional 3 is . • is symmetric, A is not. Also it is positive semidefinite
Then, create covariance matrix of (eigen values are positive or zero)
5. Find eigen vectors and eigen values of Z • provides a view of variances and we want to
using eigen decomposition. . Sort based understand eigen values and vectors in that space.
on decreasing value of D (eigen values),
call this
6. New data

26/06/2025 Dr Gargi
Peeyush Singhal | 12
PCA Interpretation
• Principle Can ignore the components of lesser significance.
• Linear projection method to
reduce the number of 25

parameters 20
• Transfer a set of correlated

Variance (%)
variables into a new set of 15

uncorrelated variables
10
• Map the data into a space of
lower dimensionality 5

• Form of unsupervised learning 0

• Properties PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

• It can be viewed as a rotation of


the existing axes to new You do lose some information, but if the eigenvalues are small,
positions in the space defined by
original variables you don’t lose much
– n dimensions in original data
• New axes are orthogonal and
represent the directions with – calculate n eigenvectors and eigenvalues
maximum variability – choose only the first p eigenvectors, based on their eigenvalues
– final data set has only p dimensions
26/06/2025 Dr Gargi
Peeyush Singhal | 13
PCA – Usage : Compression, Visualization
Dimensions = 144 Dimensions = 60 Dimensions = 6 Dimensions = 3

26/06/2025 Dr Gargi
Peeyush Singhal | 14
Latent Discriminant Analysis (LDA) –
Introduction, difference from PCA
• linear model for classification and dimensionality reduction
• It requires class assignment of examples – not fully
unsupervised algorithm. Typically used for separating n
classes (n>2)
• LDA projects data from a D dimensional feature space down
to a D’ (D>D’) dimensional space in a way to maximize the
variability between the classes and reducing the variability
within the classes. The variability is called scatter.

• Its differs from PCA in that LDA focus on class separation,


while PCA focus on capturing maximum variability in data
• Both rank the new axes in order of importance

26/06/2025 Dr Gargi
https://sebastianraschka.com/Articles/2014_python_lda.html Peeyush Singhal | 15
LDA – how does it work?
• 2 Dimension example
• arg max J(W) = (M1 – M2)2 / S12 + S22
M1 and M2 are means of classes and S1 and S2 are class scatter.. The numerator
here is between class scatter while the denominator is within-class scatter. So
to maximize the function we need to maximize the numerator and minimize the
denominator (TOGETHER)
• Methodology

• 3 Dimension example

26/06/2025 Dr Gargi
StatQuest (Youtube): Linear Discriminant Analysis (LDA) clearly explained. Peeyush Singhal | 16
LDA – How it is derived?

26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 17
LDA – How it is derived?

26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 18
LDA – How it is derived?

26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 19
LDA – How it is derived?

26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 20
LDA – How it is derived?

26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 21
Manifold Learning : Introduction
• When there is a non-
linear relationship in
data, manifold learning
is used.
• Manifold learning is a
class of unsupervised
estimators that seeks
to describe datasets as
low-dimensional
manifolds embedded
in high-dimensional
spaces

26/06/2025 Dr Gargi
https://stats.stackexchange.com/questions/143105/manifold-learning-does-an-embedding-function-need-to-be-well-behaving Peeyush Singhal | 22
t-SNE (t-Distributed Stochastic Neighbor
Embedding) : Introduction
• Alternate to linear dimension
reduction techniques like PCA.
• t-SNE / SNE are manifold learning
techniques.
• t-SNE tries to preserve local /
neighborhood structure : Low
dimensional neighborhood should
be the same as original
neighborhood.
• t-SNE as a technique that utilizes a
gradual iterative approach to find
a lower-dimensional
representation of the original data
while preserving information
about local neighborhoods.
• Embedding — typically high-dimensional data represented in a
lower-dimensional space;
• Neighbor — a data point that resides close to the data point of
interest;
• Stochastic —the use of randomness in the iterative process when
searching for a representative embedding;
• t-Distributed — probability distribution used by the algorithm to
26/06/2025 Dr Gargi
calculate similarity scores in the lower-dimensional Peeyush
Image: https://towardsdatascience.com/t-sne-machine-learning-algorithm-a-great-tool-for-dimensionality-reduction-in-python-ec01552f1a1e embedding.
Singhal | 23
t-SNE : Process (1/3)
1 2 3 Iterative approach to make
Similarity of data in higher Random mapping in lower
similarity matrix in lower
dimension : finding similarity dimension data: Finding similarity
dimension closer to similarity
matrix in higher dimension matrix in lower dimension
matrix in higher dimension

A parameter called perplexity is chosen to understand the “relevant” neighborhood – it defines the number of nearest
neighbors for which the similarity matrix is calculated. Similarity is inversely proportional to distance. For points outside the
perplexity, the similarity is assigned zero without calculation
26/06/2025 Dr Gargi
https://distill.pub/2016/misread-tsne/ Peeyush Singhal | 24
t-SNE : Process (2/3)
1 2 3 Iterative approach to make
Similarity of data in higher Random mapping in lower
similarity matrix in lower
dimension : finding similarity dimension data: Finding similarity
dimension closer to similarity
matrix in higher dimension matrix in lower dimension
matrix in higher dimension

t-SNE randomly maps


all the points onto a
lower-dimensional
space and calculates
“similarities”
between points as
described in the
process above. One
difference, though,
this time, the
algorithm uses t-
distribution instead
of Normal
distribution.

26/06/2025 Dr Gargi
Peeyush Singhal | 25
t-SNE : Process (3/3)
1 2 3 Iterative approach to make
Similarity of data in higher Random mapping in lower
similarity matrix in lower
dimension : finding similarity dimension data: Finding similarity
dimension closer to similarity
matrix in higher dimension matrix in lower dimension
matrix in higher dimension

• Now the goal of an algorithm is to make the new


“similarity” matrix look like the original one by
using an iterative approach. With each iteration,
points move towards their “closest neighbors”
from the original higher-dimensional space and
away from the distant ones.

• The new “similarity” matrix gradually begins to


look more like the original one. The process
continues until the maximum number of
iterations is reached or no further improvement
can be made.

• In more scientific terms, the above explanation


describes the process of an algorithm trying to
minimize the Kullback–Leibler divergence (KL
26/06/2025 Dr Gargi
divergence) through gradient descent. Peeyush Singhal | 26
t-SNE – Objective function (1/2)

26/06/2025 Dr Gargi
Peeyush Singhal | 27
t-SNE – Objective function (2/2)

26/06/2025 Dr Gargi
Peeyush Singhal | 28
UMAP : Uniform Manifold Approximation and
Projection
• UMAP is a much faster algorithm, works with large
datasets
• UMAP also clusters similar samples in the output
• Goal of UMAP is to create a low dimensional graph of
the data that preserves the high-dimension clusters
and relationship to each other
• Works based on similarity score also.

26/06/2025 Dr Gargi
Peeyush Singhal | 29
UMAP v t-SNE

26/06/2025 Dr Gargi
Peeyush Singhal | 30
Thank you

26/06/2025 Dr Gargi
Peeyush Singhal | 31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy