0% found this document useful (0 votes)

29 views28 pages

Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals

The document provides an overview of Principal Components Analysis (PCA), a technique used for reducing the dimensionality of data sets to identify patterns and visualize high-dimensional data. It explains the mathematical foundations of PCA, including covariance matrices, eigenvalues, and eigenvectors, and discusses its applications in fields like face recognition, image compression, and gene expression analysis. The document also highlights the steps involved in PCA and presents examples, particularly in the context of gene expression analysis.

Uploaded by

Manish Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views28 pages

Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals

Uploaded by

Manish Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

School of Computing

Science and Engineering

Program: M.C.A.
Course Code: MCAS9220
Course Name: Data Science Fundamentals
Principal Components Analysis ( PCA)
• An exploratory technique used to reduce the
dimensionality of the data set to 2D or 3D
• Can be used to:
– Reduce number of dimensions in data
– Find patterns in high-dimensional data
– Visualize data of high dimensionality
• Example applications:
– Face recognition
– Image compression
– Gene expression analysis
2
Principal Components Analysis Ideas ( PCA)

• Does the data set ‘span’ the whole of d

dimensional space?
• For a matrix of m samples x n genes, create a new
covariance matrix of size n x n.
• Transform some large number of variables into a
smaller number of uncorrelated variables called
principal components (PCs).
• developed to capture as much of the variation in
data as possible

3
Principal Component Analysis
 See online tutorials such as
http://www.cs.otago.ac.nz/cosc453/stu
X2
dent_tutorials/principal_components.pd
f
Y1
Y2 x
x
x
xx
Note: Y1 is x
x x
x
the first x
x
x

eigen x
x x
x
vector, x x
x x X1
Y2 is the x
x
x Key observation:
x
second. Y2
x variance = largest!
ignorable.

4
Principal Component Analysis: one
attribute first Temperat
ure
42
40
• Question: how much
24
spread is in the data 30
along the axis? 15
(distance to the mean) 18

• Variance=Standard 15
30
deviation^2 n

 (Xi  X ) 2 15
30
s 2  i 1 35
(n  1) 30
40
5
30
Now consider two dimensions
X=Temperatur
Y=Humidity
Covariance: measures the e
correlation between X and Y 40 90
• cov(X,Y)=0: independent 40 90
•Cov(X,Y)>0: move same dir 40 90
•Cov(X,Y)<0: move oppo dir 30 90
15 70
15 70
15 70
n
30 90
 (X
i 1
i  X )(Yi  Y )
15 70
cov( X , Y ) 
(n  1) 30 70
30 70
6
30 90
More than two attributes: covariance
matrix
• Contains covariance values between all
possible dimensions (=attributes):
nxn
C (cij | cij cov(Dimi , Dim j ))

• Example for three attributes (x,y,z):

 cov( x, x) cov( x, y ) cov( x, z ) 
 
C  cov( y, x) cov( y, y ) cov( y, z ) 
 cov( z , x) cov( z , y ) cov( z , z ) 
 
7
Eigenvalues & eigenvectors
• Vectors x having same direction as Ax are called
eigenvectors of A (A is an n by n matrix).
• In the equation Ax=x,  is called an eigenvalue of A.

 2 3   3   12   3
  x    4 x 
 2 1  2  8   2

8
Eigenvalues & eigenvectors

• Ax=x  (A-I)x=0
• How to calculate x and :
– Calculate det(A-I), yields a polynomial
(degree n)
– Determine roots to det(A-I)=0, roots are
eigenvalues 
– Solve (A- I) x=0 for each  to obtain
eigenvectors x
9
Principal components
• 1. principal component (PC1)
– The eigenvalue with the largest absolute value
will indicate that the data have the largest variance
along its eigenvector, the direction along which
there is greatest variation
• 2. principal component (PC2)
– the direction with maximum variation left in data,
orthogonal to the 1. PC
• In general, only few directions manage to
capture most of the variability in the data.
10
Steps of PCA
• Let X be the mean
• For matrix C, vectors e
vector (taking the mean (=column vector) having
of all rows) same direction as Ce :
• Adjust the original data – eigenvectors of C is e such
that Ce=e,
by the mean
  is called an eigenvalue of
X’ = X – X C.
• Compute the covariance • Ce=e  (C-I)e=0
matrix C of adjusted X
• Find the eigenvectors – Most data mining
packages do this for you.
and eigenvalues of C.

11
Eigenvalues
• Calculate eigenvalues  and eigenvectors x for
covariance matrix:
– Eigenvalues j are used for calculation of [% of total
variance] (Vj) for each component j:

j n
V j 100  n  x n
 x
x 1
x 1

12
Principal components - Variance
25

20
Variance (%)

0
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

13
Transformed Data
• Eigenvalues j corresponds to variance on each
component j
• Thus, sort by j
• Take the first p eigenvectors ei; where p is the number of
top eigenvalues
• These are the directions with the largest variances
 yi1   e1   xi1  x1 
    
 yi 2   e2   xi 2  x2 
 ...   ...   
     ... 
 y   e  x  x 
 ip   p   in n 14
An Example Mean1=24.1
Mean2=53.8
X1 X2 X1' X2' 100
90
80
70
60
19 63 -5.1 9.25 50 Series1
40
30
20
39 74 14.9 20.25 10
0
0 10 20 30 40 50

30 87 5.9 33.25
40

30
30 23 5.9 -30.75 20

0 Series1
15 35 -9.1 -18.75 -15 -10 -5
-10
0 5 10 15 20

-20

-30
15 43 -9.1 -10.75 -40

15 32 -9.1 -21.75 15
Covariance Matrix
75 106
• C=
106 482

• Using MATLAB, we find out:

– Eigenvectors:
– e1=(-0.98,-0.21), 1=51.8
– e2=(0.21,-0.98), 2=560.2
– Thus the second eigenvector is more important!

16
If we only keep one dimension: e2
0.5
yi
0.4
0.3 -10.14
0.2 -16.72
• We keep the dimension 0.1 -31.35
of e2=(0.21,-0.98) -40 -20
0
-0.1 0 20 40
31.37
4
• We can obtain the final -0.2
-0.3
16.46
4
data as -0.4
8.624
-0.5
19.40
4
-17.63
 xi1 
yi 0.21  0.98  0.21* xi1  0.98 * xi 2
 xi 2 

17
18
19
20
PCA –> Original Data
• Retrieving old data (e.g. in data compression)
– RetrievedRowData=(RowFeatureVectorT x
FinalData)+OriginalMean
– Yields original data using the chosen components

21
Principal components
• General about principal components
– summary variables
– linear combinations of the original variables
– uncorrelated with each other
– capture as much of the original variance as
possible

22
Applications – Gene expression analysis
• Reference: Raychaudhuri et al. (2000)
• Purpose: Determine core set of conditions for useful
gene comparison
• Dimensions: conditions, observations: genes
• Yeast sporulation dataset (7 conditions, 6118 genes)
• Result: Two components capture most of variability (90%)
• Issues: uneven data intervals, data dependencies
• PCA is common prior to clustering
• Crisp clustering questioned : genes may correlate with multiple
clusters
• Alternative: determination of gene’s closest neighbours

23
Two Way (Angle) Data Analysis
Genes 103–104 Conditions 101–
102
Samples 101-102

Genes 103-104
Gene Gene
expression expression
matrix matrix

Sample space Gene space

analysis analysis
24
PCA - example

25
PCA on all Genes
Leukemia data, precursor B and T
Plot of 34 patients, dimension of 8973 genes
reduced to 2

26
PCA on 100 top significant genes
Leukemia data, precursor B and T
Plot of 34 patients, dimension of 100 genes
reduced to 2

27
PCA of genes (Leukemia data)
Plot of 8973 genes, dimension of 34 patients reduced to 2

DECS450 Manual
100% (1)
DECS450 Manual
356 pages
7SR51 Numerical Relay Datasheet
100% (3)
7SR51 Numerical Relay Datasheet
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
63944en3 PDF
No ratings yet
63944en3 PDF
784 pages
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
No ratings yet
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
31 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Attacking and Auditing Docker Containers and Kubernetes Clusters
No ratings yet
Attacking and Auditing Docker Containers and Kubernetes Clusters
165 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Weekly Status Report - Template
0% (1)
Weekly Status Report - Template
4 pages
MT5 Manual
No ratings yet
MT5 Manual
35 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Ariesogeo: Location Intelligence
No ratings yet
Ariesogeo: Location Intelligence
4 pages
C++ Operator Overloading 2
No ratings yet
C++ Operator Overloading 2
38 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Ipv4 Addressing: © 2008 Cisco Systems, Inc. All Rights Reserved. Cisco Confidential Presentation - Id
No ratings yet
Ipv4 Addressing: © 2008 Cisco Systems, Inc. All Rights Reserved. Cisco Confidential Presentation - Id
35 pages
Pca
No ratings yet
Pca
18 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Computer Tips and Tricks
100% (1)
Computer Tips and Tricks
46 pages
PCA
100% (1)
PCA
33 pages
Mlfa Autumn 2023 Pca
No ratings yet
Mlfa Autumn 2023 Pca
32 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Beautiful Rising Game
No ratings yet
Beautiful Rising Game
41 pages
Microsoft: Exam Questions MS-500
No ratings yet
Microsoft: Exam Questions MS-500
14 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Pattern Recognition (CSE4213) : Principal Components Analysis (PCA)
No ratings yet
Pattern Recognition (CSE4213) : Principal Components Analysis (PCA)
38 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Face Recognition PAC
No ratings yet
Face Recognition PAC
24 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Lab 4 - Abstract Classes and Interfaces
No ratings yet
Lab 4 - Abstract Classes and Interfaces
12 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Exploded View and Parts List
No ratings yet
Exploded View and Parts List
2 pages
Tech 1 Module 3
No ratings yet
Tech 1 Module 3
11 pages
Assignment 10-13 PDF
No ratings yet
Assignment 10-13 PDF
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Computer Basics - Basic Parts of A Computer - PowerPoint
No ratings yet
Computer Basics - Basic Parts of A Computer - PowerPoint
6 pages
MLSP Exp02
No ratings yet
MLSP Exp02
10 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
DR Pca
No ratings yet
DR Pca
22 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Normal
No ratings yet
Normal
41 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
HOMO2
No ratings yet
HOMO2
15 pages
Ui
100% (1)
Ui
5 pages
Oracle PLM - Abhishek
No ratings yet
Oracle PLM - Abhishek
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Jenkins
No ratings yet
Jenkins
8 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Customer Service Advisor Training Manual Nexus 3
No ratings yet
Customer Service Advisor Training Manual Nexus 3
6 pages
Business Operations Engineer
No ratings yet
Business Operations Engineer
3 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
Unit 3
No ratings yet
Unit 3
28 pages
Weekly Lesson Plan (Grade 10)
No ratings yet
Weekly Lesson Plan (Grade 10)
8 pages
LT 6139 MIX 4040 Dual Input Module
No ratings yet
LT 6139 MIX 4040 Dual Input Module
2 pages
Reserch Paper
No ratings yet
Reserch Paper
8 pages
Reserch Paperupdated
No ratings yet
Reserch Paperupdated
8 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
Fake News Detection System by Manish Verma 16scse111009
No ratings yet
Fake News Detection System by Manish Verma 16scse111009
7 pages
Lecture W6 EDA
No ratings yet
Lecture W6 EDA
28 pages
Exploratory Data Analysis Reference
No ratings yet
Exploratory Data Analysis Reference
50 pages
The Ribbons - MS Word Review 12345
No ratings yet
The Ribbons - MS Word Review 12345
14 pages
VC Dim
No ratings yet
VC Dim
22 pages
Assignment No-1 CC Ibca+Mca-2016 (Ix) Mca-A2
No ratings yet
Assignment No-1 CC Ibca+Mca-2016 (Ix) Mca-A2
1 page
PCA Dev
No ratings yet
PCA Dev
16 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
9 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
GGJ Upload Instructions
No ratings yet
GGJ Upload Instructions
31 pages
Singular Value Decomposition
No ratings yet
Singular Value Decomposition
43 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Networks Sna
No ratings yet
Networks Sna
126 pages
SOFTWARE
No ratings yet
SOFTWARE
5 pages
Abhishek Paul 19SCSE2030072 Big Data and Technologies MCA Section 2 Assignment 4
No ratings yet
Abhishek Paul 19SCSE2030072 Big Data and Technologies MCA Section 2 Assignment 4
3 pages
Visual Basic Theory Notes
No ratings yet
Visual Basic Theory Notes
6 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Principal Component Analysis: by Eesha Tur Razia Babar
No ratings yet
Principal Component Analysis: by Eesha Tur Razia Babar
38 pages
Naive456 Bayes297Classification
No ratings yet
Naive456 Bayes297Classification
21 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Percept Ron
No ratings yet
Percept Ron
54 pages
Clustering 47698 Techniques
No ratings yet
Clustering 47698 Techniques
47 pages
4 1 Pca
No ratings yet
4 1 Pca
21 pages
Incident Response Report
No ratings yet
Incident Response Report
4 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
PCA Steps - Numerical Problem
No ratings yet
PCA Steps - Numerical Problem
8 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Pca 1
No ratings yet
Pca 1
3 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals

Uploaded by

Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals

Uploaded by

School of Computing

Science and Engineering

• Does the data set ‘span’ the whole of d

• Example for three attributes (x,y,z):

• Using MATLAB, we find out:

Sample space Gene space

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.