Basic Theory
Basic Theory
- Through the picture above we can see two camels, but with different perspectives
(information axis) we can receive information in many different directions and from there
can infer conclusions. different. A more detailed image of a small concept of PCA:
2.2. Concept:
- Principal component analysis is a frequently used method when statistical analysts are
faced with data sets with large dimensions (big data) to minimize data dimensionality
without losing information. information and retain the information necessary for building
models using a statistical algorithm that uses orthogonal transformations to transform a
set of data from a high-dimensional space to a new, low-dimensional space. more
dimensional (2 or 3 dimensional) to optimize the representation of data variability.
The original dimensions of the data X all have a certain degree of importance, it is
impossible
omit its direction. Therefore, a transformation is needed to rotate the dimensions of the X
data until there are B dimensions receiving the largest variance value. Since the variance
of the data X is constant
number so we can say (A-B) the other dimension has very little importance (the
on a new basis with the least “loss” in space with the number of dimensions.
2.3.Characteristics:
Helps reduce the dimensionality of data.
Instead of keeping the coordinate axes of the old space, PCA builds a new space with less
dimensionality, but has the same good data representation as the old space, that is,
guarantees the variability of the space. data on each new dimension.
The coordinate axes in the new space are linear combinations of the old space, so
semantically, PCA builds a new feature based on the observed features. The good thing is
that these features still represent the original data well.
In the new space, latent associations of data can be discovered, which would be more
difficult to detect in the old space, or such links would not be evident.
2.4.Mathematical basis:
- Expectation (mean): Is the desired value, it is simply the average of all the values
Given N values: x1, x2,…, xn
N
1
X = ∑ xi
N i=1
- Variance: is the average of the square of the distance from each point to the expectation,
the smaller the variance, the closer the data points are to the expectation, the more similar
the data points are. The larger the variance, the more distributed we say the data is
N
1
σ ❑2 = ∑
N −1 i=1
(x i−x)2
- Covariance: Is a measure of the variation of two random variables together (as distinct
from variance - measuring the degree of variation of a variable). If the two variables tend
to vary together (that is, when one variable has a higher value than the expected value,
the other tends to also be higher than the expected value), then the covariance between
the two variables is positive values. On the other hand, if one variable is above the
expected value and the other tends to be below the expected value, the covariance of the
two variables is negative. If these two variables are independent of each other, the value
is 0.
N
∑ (X i− X)(Y i−Y )
COV ( X ,Y )= i=1
N
- Variance matrix:
Given N data points represented by column vectors x1, x2,…, xn, Then, the expectation
vector and covariance matrix of the entire data are defined as:
N N
1 1 1
X= ∑x;
N i=1 i
S= ∑
N −1 i=1
(x i−x )(x i−x)T =
N
N n=1 i=1
Find the maximum value of uT1 S u1 with the condition uT1 . u1=1
Using the Lagrange multiplier method of multivariable functional analysis, we have
Lagrange function: L=u T1 S u 1+۸ 1 (1−uT1 u1 )=0
The stopping point of the Lagrange function occurs when S u1=۸ 1 u1 , or ۸1 is the
eigenvalue of S and u1 is the eigenvector of S corresponding to the eigenvalue ۸1 .
In short, the maximum value of the variance is equal to ۸ 1, when we choose vectors.
Reference:
1. Principal component analysis – Wikipedia
2. Pca Là Gì - Principal Component Analysis (Pca) (ceds.edu.vn)
3. Ma trậ n hiệp phương sai – Wikipedia tiếng Việt