0% found this document useful (0 votes)
65 views2 pages

Things To Remember - Principal Component Analysis

PCA is a technique for dimensionality reduction that combines input variables to drop the least important while retaining the most valuable parts. It generates new variables that are independent of each other, satisfying the assumptions of linear models. PCA works best when you want to reduce variables without removing any, ensure independence, and accept less interpretability. It standardizes data, generates the covariance matrix, performs eigendecomposition to identify principal components capturing maximum variance, and sorts them to select the most important new variables. Its effectiveness depends on attribute scales, and interpretation challenges arise with discrete or skewed data.

Uploaded by

Umaprasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views2 pages

Things To Remember - Principal Component Analysis

PCA is a technique for dimensionality reduction that combines input variables to drop the least important while retaining the most valuable parts. It generates new variables that are independent of each other, satisfying the assumptions of linear models. PCA works best when you want to reduce variables without removing any, ensure independence, and accept less interpretability. It standardizes data, generates the covariance matrix, performs eigendecomposition to identify principal components capturing maximum variance, and sorts them to select the most important new variables. Its effectiveness depends on attribute scales, and interpretation challenges arise with discrete or skewed data.

Uploaded by

Umaprasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Things to Remember - Principal Component Analysis ( PCA)

What is PCA?

Let’s say that you want to predict what the gross domestic product (GDP) of India will be for 2019. You
have lots of information available:. GDP for the first quarter of 2019, GDP for the entirety of 2018, 2017,
and so on. You have any publicly-available economic indicators, like the unemployment rate, inflation
rate, and so on. You have Census data from 2012 estimating how many Indians work in each industry
and Indian statistical data updating those estimates in between each census. You could gather stock
price data, the number of IPOs occurring in a year. Despite being an overwhelming number of variables
to consider, this just scratches the surface.

With so many variables at hand it would be difficult in hand to decide which variables to focus on, in
technical terms it is important to reduce the dimension of your feature space.Reducing the dimension of
the feature space is called “dimensionality reduction.

Principal component analysis is a technique for dimension reduction — so it combines input variables in
a specific way, to drop the “least important” variables while still retaining the most valuable parts of all
of the variables! As an added benefit, each of the “new” variables after PCA are all independent of one
another. This is a benefit because the assumptions of a linear model require our independent variables
to be independent of one another. If we decide to fit a linear regression model with these “new”
variables (see “principal component regression” below), this assumption will necessarily be satisfied.

When should I use PCA?

1. Do you want to reduce the number of variables, but aren’t able to identify variables to
completely remove from consideration?
2. Do you want to ensure your variables are independent of one another?
3. Are you comfortable making your independent variables less interpretable?

If you answered “yes” to all three questions, then PCA is a good method to use. If you answered “no” to
question 3, you should not use PCA.
Steps for PCA

1. Begins by standardising the data. Data on all the dimensions are subtracted from their means to
shift the data points to the origin. i.e. the data is centered on the origins
2. Generate the covariance matrix / correlation matrix for all the dimensions
3. Perform eigen decomposition, that is, compute eigenvectors which are the principal
components and the corresponding eigenvalues which are the magnitudes of variance captured
4. Sort the eigen pairs in descending order of eigenvalues and select the one with the largest
value. This is the first principal component that covers the maximum information from the
original data.

Performance issues with PCA

1. PCA effectiveness depends upon the scales of the attributes. If attributes have different scales,
PCA will pick variable with highest variance rather than picking up attributes based on
correlation
2. Changing scales of the variables can change the PCA
3. Interpreting PCA can become challenging due to presence of discrete data
4. Presence of skew in data with long thick tail can impact the effectiveness of the PCA (related to
point 1)
5. PCA assumes a linear relationship between attributes. It is ineffective when relationships are
non linear

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy