3.2 Pca
3.2 Pca
Correlation: It signifies that how strongly two variables are related to each other.
Such as if one changes, the other variable also gets changed.
Orthogonal: It defines that variables are not correlated to each other, and hence the
correlation between the pair of variables is zero.
Covariance Matrix: A matrix containing the covariance between the pair of variables
is called the Covariance Matrix.
Procedure for performing principal component analysis
5. Calculating the eigenvectors and eigenvalues and Sorting the Eigen Vectors
7. Reducing the dimensions of the data set(Remove less or unimportant features from
the new dataset.)
1. Getting the dataset
Firstly, we need to take the input dataset
2. Representing data into a structure
Consider a dataset which has 4 features and a total of 5 training examples.
Here each row corresponds to the data items, and the column corresponds to the
Features.
The number of columns is the dimensions of the dataset.
3. Standardization of the data
PCA helps to identify the correlation and dependencies among the features in a
data set.
A covariance matrix expresses the correlation between the different variables in
the data set.
It is essential to identify heavily dependent variables because they contain biased
and redundant information which reduces the overall performance of the model.
Eg: Consider a case where we have a 2-Dimensional data set with variables a and b,
the covariance matrix is a 2×2 matrix
5. Calculating the eigenvectors and eigenvalues
Eigen vector
to use the Covariance matrix to understand where in the data there is the
most amount of variance.
Since more variance in the data denotes more information about the
data, eigenvectors are used to identify and compute Principal Components.
Eigenvalues - simply denote the scalars of the respective eigenvectors.
6. Computing the Principal Components
where the eigenvector with the highest eigenvalue is the most significant and thus
forms the first principal component.
The last step in performing PCA is to re-arrange the original data with the
final principal components which represent the maximum and the most
significant information of the data set.
Advantages of PCA
Easy to compute.
Speeds up other machine learning algorithms.
Counteracts the issues of high-dimensional data.
Disadvantages of PCA