Unit 4 Dimenstionality Reduction
Unit 4 Dimenstionality Reduction
Dimensionality Reduction
Syllabus
• Dimensionality Reduction:
Singular Value Decomposition
Principal Component Analysis
Linear Discriminated Analysis
• Dimensionality reduction is a process and technique to
reduce the number of dimensions -- or features -- in a
data set.
• The goal of dimensionality reduction is to decrease the
data set's complexity by reducing the number of
features while keeping the most important properties of
the original data.
What is Dimensionality Reduction?
• The number of input features, variables, or columns present
in a given dataset is known as dimensionality, and the process
to reduce these features is called dimensionality reduction.
• 2.Wrapper Method
• Forward Selection
• Backward Selection
• Bi-directional Elimination
• 3.Embedded Method
• LASSO
• Elastic Net
Feature Extraction
• Feature extraction is the process of transforming the
space containing many dimensions into space with
fewer dimensions.
• 0.5674 x1 = -0.6154 y1
• Divide both side by 0.5674.
• You will get : x1 = -1.0845 y1
• x1 = -1.0845 y1
• So in that case (x1, y1) will be (-1.0845,1). This will be the initial eigen vector. Needs
normalization to get the final value.
• To normalize, take square-root of sum of square of each eigen vector values, and
consider this as ‘x’
• Finally divide each eigen vector values by ‘x’ to get the final eigen vector.
eigen vectors are generated for the eigen value : 0.490
Describe the algorithm with an example:
• Consider a 2-D dataset
• Cl =X1 =(x1,x2) ={(4,1),(2,4),(2,3),(3,6), (4,4)}
• C2=X2=(x1,x2) = {(9,10),(6,8),(9,5),(8,7),(10,8)}
PCA
Theory – Algorithms – steps explained
Steps/ Functions to perform PCA
• Subtract mean.
• Calculate the covariance matrix.
• Calculate eigenvectors and eigenvalues.
• Select principal components.
• Reduce the data dimension.
• Principal components is a form of multivariate statistical analysis and is one method of
studying the correlation or covariance structure in a set of measurements on m variables
for n observations.
• Reducing the number of variables of a data set naturally comes at the expense of accuracy,
but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because
smaller data sets are easier to explore and visualize and make analyzing data much easier
and faster for machine learning algorithms without extraneous variables to process.
• So to sum up, the idea of PCA is simple — reduce the number of variables of a data set,
• What do the covariances that we have as entries of the matrix tell us
about the correlations between the variables?
• It’s actually the sign of the covariance that matters
• Now, that we know that the covariance matrix is not more than a table
that summaries the correlations between all the possible pairs of
variables, let’s move to the next step.
Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute from
the covariance matrix in order to determine the principal components of the data.
Principal components are new variables that are constructed as linear combinations or
mixtures of the initial variables.
These combinations are done in such a way that the new variables (i.e., principal components)
are uncorrelated and most of the information within the initial variables is squeezed or
compressed into the first components.
So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to put
maximum possible information in the first component.
Then maximum remaining information in the second and so on, until having something like
• As there are as many principal components as there are variables in the data, principal components are
constructed in such a manner that the first principal component accounts for the largest possible
variance in the data set.
• Organizing information in principal components this way, will allow you to reduce dimensionality without
losing much information, and this by discarding the components with low information and considering
the remaining components as your new variables.
• An important thing to realize here is that, the principal components are less interpretable and don’t have
any real meaning since they are constructed as linear combinations of the initial variables.
Characteristic Polynomial and characteristic equation
and
Eigen Values and Eigen Vectors
61
2 X 2 Example : Compute Eigen Values
A= 1 -2 so A - I = 1 - -2
3 -4 3 -4 -
Set 2 + 3 + 2 to 0
1 2 3 1
0 1
0 2 3
A I n 0 4 2 0
0 0
1 4 2
0 0 7 1
0
0
0 0 7
1 2 3
det( A I n ) 0 det 0 4 2 0
0 0 7
1 4 7 0
1, 4, 7
Example 3: Eigenvalues and Eigenvectors
Find the eigenvalues and eigenvectors of the matrix
5 4 2
A 4 5 2
2
2 2
Solution The matrix A – I is obtained by subtracting from the diagonal elements of A. Thus
3
5 4 2
A I 3 4 5 2
2
2
2
The characteristic polynomial of A is |A – I3|. Using row and column operations to simplify
determinants, we get
Alternate Solution
Solve any two equations
• 2 = 1
Let = 1 in (A – I3)x = 0. We get
( A 1I 3 ) x 0
4 4 2 x1
4 4 2 x2 0
2 2 1 x3
The solution to this system of equations can be shown to be x1 = – s – t, x2 = s, and x3 = 2t, where s and
t are scalars. Thus the eigenspace of 2 = 1 is the space of vectors of the form.
s t
s
2t
Separating the parameters s and t, we can write
s t 1 1
s s 1 t 0
2t
0
2
Thus the eigenspace of = 1 is a two-dimensional subspace of R3 with basis
1 1
0
1 ,
0 0
If an eigenvalue occurs as a k times repeated root of the characteristic equation, we say that it is of
multiplicity k. Thus l=10 has multiplicity 1, while l=1 has multiplicity 2 in this example.
Linear Discriminant Analysis (LDA)
Data representation vs. Data Classification
Difference between PCA vs. LDA
• Sw= = s1+s2