Dimensionality Reduction DR
Dimensionality Reduction DR
Reduction
26/06/2025 Dr Gargi
Peeyush Singhal | 1
Outline – Dimensionality Reduction
• Recap: Covariance, Eigen values, Eigen vectors and
Motivation Eigen decomposition
• Matrices are sometimes redundant or rather • Singular Value Decomposition
computationally expensive. We may want to
compress data • Derivation
• Need
• Need for data visualization – cannot visualize n- • Implementation example
Dimension data (n >3)
• Principal Component Analysis
• Untangle information – not bogged down by too
much information. Reduces cost of further • Introduction, working
experimentation etc. • Interpretation
• Implementation example
• Curse of Dimensionality
• Latent Discriminant Analysis
• Introduction, working
• Derivation
• Implementation example
• Manifold Learning : Introduction
• t-SNE
• Introduction
• Process
• Objective Function
26/06/2025 Dr Gargi • Implementation Example
Peeyush Singhal | 2
Dimensionality Reduction – Focus on extraction
rather than selection
Supervised LDA
Component /
Factor based
Unsupervised PCA | SVD…
Feature Extraction
• Correlation:
• Measures of how much each of the dimensions vary from the mean with respect to each other, also
includes a band [-1,1]
The above means that area spanned after transformation is zero, i.e. =
determinant of first matrix is zero (or invertible) as cannot be zero
Or. (Λ is capital λ)
(…..(remember = )
Specifically, say the transformation matrix , [always] or [ = I]
then means Or
[This is called Eigen decomposition]
i.e., it has two roots (eigen values) and putting the roots into , gives
and
26/06/2025 Dr Gargi
Peeyush Singhal | 6
Singular Value Decomposition : A = USVT :
What is SVD?
or
Then,
Now =
=
or
= )=U
=)
Similarly, = can be proved.
26/06/2025 Dr Gargi
Special Matrix properties : https://jonathan-hui.medium.com/machine-learning-linear-algebra-special- Peeyush Singhal | 8
SVD : Geometric interpretation
26/06/2025 Dr Gargi
Peeyush Singhal | 9
SVD – How we can use it?
• Truncated or Compact SVD
• Does not reduces dimension at all, but good • Let us think about a rank, say
for understanding the components
𝑟
𝑟
𝑟
𝑟 𝑟
𝑟
26/06/2025 Dr Gargi
Peeyush Singhal | 10
Principal Component Analysis (PCA) -
Introduction Reduce data from 2D to 1D
• Aim to simplify | understand “independent” features / factors | explain the
variability of data with less number of features.
https://setosa.io/ev/principal-component-analysis/
26/06/2025 Dr Gargi
Peeyush Singhal | 11
PCA - Process
1. For a dataset, segregate into (dependent Why Covariance matrix and why not any other matrix
variable) and (independent variables / covariance matrix
feature matrix). We would work with X.
2. Take the matrix of and take mean for
each column (must do)
3. (optional) Divide by standard deviation –
if you think that features of higher
variance are not important as features
with lesser variance. If you are in doubt,
then divide • is square, A is not
4. Let’s say matrix after 2 and optional 3 is . • is symmetric, A is not. Also it is positive semidefinite
Then, create covariance matrix of (eigen values are positive or zero)
5. Find eigen vectors and eigen values of Z • provides a view of variances and we want to
using eigen decomposition. . Sort based understand eigen values and vectors in that space.
on decreasing value of D (eigen values),
call this
6. New data
26/06/2025 Dr Gargi
Peeyush Singhal | 12
PCA Interpretation
• Principle Can ignore the components of lesser significance.
• Linear projection method to
reduce the number of 25
parameters 20
• Transfer a set of correlated
Variance (%)
variables into a new set of 15
uncorrelated variables
10
• Map the data into a space of
lower dimensionality 5
• Properties PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
26/06/2025 Dr Gargi
Peeyush Singhal | 14
Latent Discriminant Analysis (LDA) –
Introduction, difference from PCA
• linear model for classification and dimensionality reduction
• It requires class assignment of examples – not fully
unsupervised algorithm. Typically used for separating n
classes (n>2)
• LDA projects data from a D dimensional feature space down
to a D’ (D>D’) dimensional space in a way to maximize the
variability between the classes and reducing the variability
within the classes. The variability is called scatter.
26/06/2025 Dr Gargi
https://sebastianraschka.com/Articles/2014_python_lda.html Peeyush Singhal | 15
LDA – how does it work?
• 2 Dimension example
• arg max J(W) = (M1 – M2)2 / S12 + S22
M1 and M2 are means of classes and S1 and S2 are class scatter.. The numerator
here is between class scatter while the denominator is within-class scatter. So
to maximize the function we need to maximize the numerator and minimize the
denominator (TOGETHER)
• Methodology
• 3 Dimension example
26/06/2025 Dr Gargi
StatQuest (Youtube): Linear Discriminant Analysis (LDA) clearly explained. Peeyush Singhal | 16
LDA – How it is derived?
26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 17
LDA – How it is derived?
26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 18
LDA – How it is derived?
26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 19
LDA – How it is derived?
26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 20
LDA – How it is derived?
26/06/2025 Dr Gargi
http://www.facweb.iitkgp.ac.in/~sudeshna/courses/ml08/lda.pdf Peeyush Singhal | 21
Manifold Learning : Introduction
• When there is a non-
linear relationship in
data, manifold learning
is used.
• Manifold learning is a
class of unsupervised
estimators that seeks
to describe datasets as
low-dimensional
manifolds embedded
in high-dimensional
spaces
26/06/2025 Dr Gargi
https://stats.stackexchange.com/questions/143105/manifold-learning-does-an-embedding-function-need-to-be-well-behaving Peeyush Singhal | 22
t-SNE (t-Distributed Stochastic Neighbor
Embedding) : Introduction
• Alternate to linear dimension
reduction techniques like PCA.
• t-SNE / SNE are manifold learning
techniques.
• t-SNE tries to preserve local /
neighborhood structure : Low
dimensional neighborhood should
be the same as original
neighborhood.
• t-SNE as a technique that utilizes a
gradual iterative approach to find
a lower-dimensional
representation of the original data
while preserving information
about local neighborhoods.
• Embedding — typically high-dimensional data represented in a
lower-dimensional space;
• Neighbor — a data point that resides close to the data point of
interest;
• Stochastic —the use of randomness in the iterative process when
searching for a representative embedding;
• t-Distributed — probability distribution used by the algorithm to
26/06/2025 Dr Gargi
calculate similarity scores in the lower-dimensional Peeyush
Image: https://towardsdatascience.com/t-sne-machine-learning-algorithm-a-great-tool-for-dimensionality-reduction-in-python-ec01552f1a1e embedding.
Singhal | 23
t-SNE : Process (1/3)
1 2 3 Iterative approach to make
Similarity of data in higher Random mapping in lower
similarity matrix in lower
dimension : finding similarity dimension data: Finding similarity
dimension closer to similarity
matrix in higher dimension matrix in lower dimension
matrix in higher dimension
A parameter called perplexity is chosen to understand the “relevant” neighborhood – it defines the number of nearest
neighbors for which the similarity matrix is calculated. Similarity is inversely proportional to distance. For points outside the
perplexity, the similarity is assigned zero without calculation
26/06/2025 Dr Gargi
https://distill.pub/2016/misread-tsne/ Peeyush Singhal | 24
t-SNE : Process (2/3)
1 2 3 Iterative approach to make
Similarity of data in higher Random mapping in lower
similarity matrix in lower
dimension : finding similarity dimension data: Finding similarity
dimension closer to similarity
matrix in higher dimension matrix in lower dimension
matrix in higher dimension
26/06/2025 Dr Gargi
Peeyush Singhal | 25
t-SNE : Process (3/3)
1 2 3 Iterative approach to make
Similarity of data in higher Random mapping in lower
similarity matrix in lower
dimension : finding similarity dimension data: Finding similarity
dimension closer to similarity
matrix in higher dimension matrix in lower dimension
matrix in higher dimension
26/06/2025 Dr Gargi
Peeyush Singhal | 27
t-SNE – Objective function (2/2)
26/06/2025 Dr Gargi
Peeyush Singhal | 28
UMAP : Uniform Manifold Approximation and
Projection
• UMAP is a much faster algorithm, works with large
datasets
• UMAP also clusters similar samples in the output
• Goal of UMAP is to create a low dimensional graph of
the data that preserves the high-dimension clusters
and relationship to each other
• Works based on similarity score also.
26/06/2025 Dr Gargi
Peeyush Singhal | 29
UMAP v t-SNE
26/06/2025 Dr Gargi
Peeyush Singhal | 30
Thank you
26/06/2025 Dr Gargi
Peeyush Singhal | 31