0% found this document useful (0 votes)
13 views73 pages

Visualization 9 Dim Reduction

The document discusses dimensionality reduction (DR) techniques in data visualization, highlighting the importance of reducing the complexity of high-dimensional data for better interpretability and computational efficiency. It covers various methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and manifold learning, along with their applications in fields like multimedia, bioinformatics, and finance. The goal of DR is to extract relevant information from data while simplifying the representation for analysis and visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views73 pages

Visualization 9 Dim Reduction

The document discusses dimensionality reduction (DR) techniques in data visualization, highlighting the importance of reducing the complexity of high-dimensional data for better interpretability and computational efficiency. It covers various methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and manifold learning, along with their applications in fields like multimedia, bioinformatics, and finance. The goal of DR is to extract relevant information from data while simplifying the representation for analysis and visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

EECE 5642

Data Visualization

Dimensionality Reduction

Y. Raymond Fu
Professor
Electrical and Computer Engineering (ECE), COE
Khoury College of Computer Science (KCCS)
Northeastern University
Attribute Dimensions and Orders

• Dimensions
– 1D: scalar
– 2D: two-dimensional vector
– 3D: three-dimensional vector
– >3D: multi-dimensional vector
• Orders
– scalars
– vectors
– matrix 1-st order 2-nd order
– tensors (high-order)

vector matrix

2
Data Table

www.many-eyes.com
3 Courtesy of Prof. Hanspeter Pfister, Harvard University.
Univariate Data Representations
Matlab Box Plot

Courtesy of Prof. Hanspeter Pfister, Harvard University.


4 Original figures were from the slides of Stasko
Bivariate Data Representations

Courtesy of Prof. Hanspeter Pfister, Harvard University.


5 Original figures were from the slides of Stasko
Trivariate Data Representations

Courtesy of Prof. Hanspeter Pfister, Harvard University.


6 Original figures were from the slides of Stasko
Multi-Dimensional Data

Courtesy of Prof. Hanspeter Pfister, Harvard University.


7 Original figures were from the slides of Stasko
Multi-Dimensional Data Visualization

8 https://www.youtube.com/watch?v=wvsE8jm1GzE
What if the dimension of the data is 4, 5, 6, and
even more?

A world of high-dimensional measurements!

Dimensionality Reduction (DR)

9
High Dimensional Data
• Multimedia
– High-resolution images
– High-resolution videos
– Data from multiple sensors
• Bioinformatics
– Expressions of genes
– Neurons
• Social networks
– Tweets/likes/friendships
– Other interactions
• Weather and climate
– Multiple measurements (e.g., temperature)
– Time series data
• Finance
– Stock markets
– Time series data

10
Motivation and Goal of DR
• Reduce the degree of freedom in measurements
 Replace a large set of measured variables with a small set of more
“condensed” variables
 Simpler models are more robust on small datasets
• Reduce the computational load
 By reducing the dimensionality of data, the computational burden
(time and space) could be greatly decreased.
• Visualization
 “Looking at the data”—more interpretable; simpler explanations
 Make sense of the data before processing

Goal
• Extract information hidden in the data
 Detect variables relevant for a specific task and how variables interact
with each other  Reformulate data with less variables.

11 Samuel Kaski, Jaakko Peltonen: Dimensionality Reduction for Data Visualization [Applications Corner]. IEEE Signal Process. Mag. 28(2): 100-104 (2011)
Motivation and Goal of DR

This is easier to interpret … … than this

12 Courtesy of Prof. Jaakko Peltonen, Aalto University.


Feature Selection vs. Feature Extraction
Given a data set X consisting of n samples, and the dimension of
each sample is d.
• Feature Selection
 Choose k important features (k < d), ignoring the remaining d – k.
 Example: microarray data analysis

• Feature Extraction
 Transform the original data set X from the d-dimensional space to a
k-dimensional space (k < d).
 A general problem: 𝑌 = 𝑃𝑇 𝑋, where 𝑋 ∈ 𝑹𝑑 , 𝑌 ∈ 𝑹𝑘 .

13
Statistics & Linear Algebra Background
• Given a set of n-point data {Xk} in Rd
– The mean is E{x}

– The variance is Var{x}

– The co-variance between two data set is

14 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


Eigenvectors
• For transform Y=AX, if exists

• e=[e1, e2, e3]T is an eigenvector, λ is the eigenvalue associated


with this eigenvector.
• For transform A, e is just a scaling function.
• Example

Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


15 http://en.wikipedia.org/wiki/Eigenvalue,_eigenvector_and_eigenspace
Dimensionality Reduction Methods
• Linear Methods
– Principal Component Analysis (PCA), M.A. Turk & A.P. Pentland
– Multidimensional Scaling (MDS), T.F. Cox and M.A.A. Cox
– Locality Preserving Projections (LPP), X.F. He, S.C. Yan, Y.X. Hu
– Locality Persuit Embedding (LPE), W.L. Min, K. Lu, and X.F. He.
– Locally Embedded Analysis (LEA), Y. Fu and T.S. Huang
• Nonlinear Methods
– Locally Linear Embedding (LLE), S.T. Roweis & L.K. Saul
– Laplacian Eigenmaps, M. Belkin & P. Niyogi
– Isomap, J.B. Tenenbaum, V.de Silva, and J.C. Langford
– Hessian LLE, D.L. Donoho & C.E. Grimes
– Semidefinite Programming (SDE), K.Q. Weinberger & L.K. Saul
• Fisher Graph Methods
– Linear Discriminant Analysis (LDA), R.A. Fisher
– Marginal Fisher Analysis (MFA), S.C. Yan, et al.
– Local Discriminant Embedding (LDE), H.-T. Chen, et al.
– Discriminant Simplex Analysis (DSA), Y. Fu and T.S. Huang
– Correlation Embedding Analysis (CEA), Y. Fu and T.S. Huang

16
Parametric vs. Nonparametric Learning
• Parametric Model
– Use a parameterized family of probability distributions to describe the nature
of a set of data (Moghaddam & Pentland, 1997).
– The data distribution is empirically assumed or estimated.
– Learning is conducted by measuring a set of fixed parameters, such as mean
and variance.
– Effective for the large sample, but degrade for complicated data distribution.
• Nonparametric Model
– Distribution free.
– Learning is conducted by measuring the pair-wise data relationship in both
global and local manners.
– Effective and robust due to the reliance on fewer assumptions and parameters.
– Work for cases with small-sample, high-dimensionality, and complicated data
distribution.

17
Parametric Model
• Principal Component Analysis (PCA) and Linear Discriminant
Analysis (LDA)
• PCA is trying to captures the “principle” variations in the data
• It is computed by finding the Eigenvectors of the covariance
matrix of the data
• Geometrically, PCA finds the largest variations directions of
the underlying data
• Can be applied in data compression, pattern recognition, etc.
• Find a line going through the
data mean and along the max
variation direction of the data.
• Assuming zero mean, line is
represented as y=wTx, where
w is the basis, wTw=1.
18 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
Principal Component Analysis

19 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


Principal Component Analysis

20 Courtesy of Prof. Jaakko Peltonen, Aalto University.


Principal Component Analysis

• OptDigits Dataset
 The data set contains 5620 instances of digitized handwritten digits in
range 0 ~ 9.
 Each digit is a 𝑹64 vector: 8 × 8 = 64 pixels.

21
Principal Component Analysis

22 Courtesy of Prof. Jaakko Peltonen, Aalto University.


Principal Component Analysis

Eigenvector Eigenfaced
23 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
Linear Discriminant Analysis

24 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


Linear Discriminant Analysis

• Instead of PCA, it finds the discriminant subspace by including class label info
in subspace modeling (Supervised learning).
– Compute within class scatter
– Compute between class scatter
– Maximize between scatters and minimize within scatters
25 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
LDA Definition

26 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


LDA Two-Class Case

27 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


LDA Multiple-Class Case

28 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


Different Subspace Base Vectors
• Different subspace base vectors show
different projective directions
• Subspace base vector w forms a Fisherface

29
PCA vs. LDA

Digits data after PCA Digits data after LDA

30 Courtesy of Prof. Jaakko Peltonen, Aalto University.


PCA vs. LDA

PCA LDA
31 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
PCA vs. LDA

PCA LDA
32 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
PCA vs. LDA

• PCA performs
worse under
this condition
• LDA (FLD-Fisher
Linear Discrimi-
nant) provides
better low
dimensional
representation.

33 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


When LDA Fails

• LDA fails in the right figure( v1 is the projected


direction). Think about why

34 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.


Criteria of Nonparametric Model
Effective to model sample
distributions
Manifold Learning
Effective to classify
different classes
Fisher Graph
Effective to measure
Similarity sample distances
Metric
Effective to describe
intrinsic data structures
High-
Order
Data
Structure
35
Manifold
• “Manifold, is an abstract mathematical space in which every
point has a neighborhood which resembles Euclidean space,
but in which the global structure may be more complicated.” --
-from Wikipedia
• “A manifold is a topological space that is locally Euclidean.” ---
from Mathworld
• e. g. 2D map of the 3D earth is a manifold.
• Manifold could be obtained by a projection from original data
to a low-dimensional representation via subspace learning.
• Manifold criterion can provide more effective ways to model
the data distribution than conventional learning methods
based on the Gaussian distribution.

36
Manifold

37 http://en.wikipedia.org/wiki/Manifold
Manifold Learning
Swiss Roll

Dimensionality
Reduction

38 Courtesy of Sam T. Roweis and Lawrence K. Saul, Sience 2002


Locally Linear Embedding

http://www.cs.toronto.edu/~roweis/lle/
39
LEA for Pose Manifold

Linear embedding and subspace projection of 400 rotating teapot images. The number of nearest neighbors is k = 6.

40
Yun Fu, et. al. “Locally Adaptive Subspace and Similarity Metric Learning for Visual Clustering and Retrieval”, CVIU, Vol. 110, No. 3, pp: 390-402, 2008.
LEA for Expression Manifold

Manifold visualization of 1,965 Frey’s face images by LEA using k = 6 nearest neighbors.

41
Yun Fu, et. al. “Locally Adaptive Subspace and Similarity Metric Learning for Visual Clustering and Retrieval”, CVIU, Vol. 110, No. 3, pp: 390-402, 2008.
LEA for Emotion State Manifold

Manifold visualization for 11,627 AAI sequence images of a male subject using LLE algorithm. (a) A video frame
snapshot and the 3D face tracking result. The yellow mesh visualizes the geometric motion of the face. (b) Manifold
visualization with k=5 nearest neighbors. (c) k=8 nearest neighbors. (d) k=15 nearest neighbors and labeling results.
42
Yun Fu, et. al. “Locally Adaptive Subspace and Similarity Metric Learning for Visual Clustering and Retrieval”, CVIU, Vol. 110, No. 3, pp: 390-402, 2008.
LEA for Head Pose Manifold

43
Fisher Graph
• Graph Embedding (S. Yan, IEEE TPAMI, 2007)
– G={X, W} is an undirected weighted graph.
– W measures the similarity between a pair of vertices.
– Laplacian matrix

– Most manifold learning method can be reformulated as

where d is a constant and B is the constraint matrix.

Between-Locality Graph Within-Locality Graph Courtesy of Shuicheng Yan


Discriminant Simplex Analysis

Y. Fu, et. al., IEEE Transactions on Information Forensics and Security, 2008.
Similarity Metric
• Single-Sample Metric
– Euclidean Distance and Pearson Correlation Coefficient.

• Multi-Sample Metric
– k-Nearest- Neighbor Simplex

Q
Q
Correlation Embedding Analysis

 Objective Function

Correlation Distance Fisher Graph

Y. Fu, et. al., IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
High-Order Data Structure
– m-th order tensors
– Representation where
– Define , where
– Here, tensor means multilinear representation.

1-st order 2-nd order

vector matrix
Tensor

Y. Fu, et. al., IEEE Transactions on Circuits and Systems for Video Technology, 2009.
Correlation Tensor Analysis
Given two m-th order tensors,
Pearson Correlation Coefficient (PCC):

CTA objective function

Correlation Distance and Fisher Graph


Multilinear Representation

m different subspaces
Y. Fu, et. al., IEEE Transactions on Image Processing, 2008.
Manifold with Noise Effect
Robust Manifold by Low-Rank Recovery
Real-world ATR data are
Automated, real-
large scale, unbalanced in time, and robust
dynamic sampling, and description of ATR
easily affected by noises data space under
and outliers, which are uncertainty.
difficult to represent.

Low-rank matrix recovery


can deal with noises and
outliers for data
reconstruction.
Stabilized Manifold Learning

Raw Data Existing Method New Method

LLE Noise Outlier

Voting for Outlier Detection


Stabilized Manifold Learning
Large Scale Manifold Learning
 Graph based methods require spectral decomposition of matrices
of n x n, where n denotes the number of samples.
 The storage cost and computational cost of building neighborhood
maps are O(n2) and O(n3), it is almost intractable to apply these
methods to large-scale scenarios.
 Neighborhood search is also a large scale aspect.
Large Scale Manifold Learning

Graph oriented clustering K-means clustering


Robust Matching of Sub-Manifolds
 A robust visual representation must be insensitive to durations in the case of
dynamics or time series, such as action/activity videos.
 A generalized manifold can be considered as a union of sub-manifolds with
different durations which characterize different instances with similar structures,
such as different individuals performing the same action, instead of a single
continuous manifold as conventionally regarded.
 Robust matching of these sub-manifolds can be achieved through both low-rank
matrix recovery and simplex synchronization.
Applications
 Chemical data visualization
 DR algorithm: multidimensional scaling (MDS)

Seung-Hee Bae, Jong Youl Choi, Judy Qiu, Geoffrey Fox: Dimension reduction and visualization of large high-dimensional data via interpolation. HPDC
58 2010: 203-214
Applications
 Biology data visualization
 DR algorithm: principal component analysis (PCA)

Andreas Lehrmann, Michael Huber, Aydin Can Polatkan, Albert Pritzkau, Kay Nieselt: Visualizing dimensionality reduction of systems biology data.
59 Data Min. Knowl. Discov. 27(1): 146-165 (2013)
Applications
 Biology data visualization
 DR algorithm: locally linear embedding (LLE)

Andreas Lehrmann, Michael Huber, Aydin Can Polatkan, Albert Pritzkau, Kay Nieselt: Visualizing dimensionality reduction of systems biology data.
60 Data Min. Knowl. Discov. 27(1): 146-165 (2013)
Applications
 Bioinformatics
 DR algorithm: multidimensional scaling (MDS)

Adam Hughes, Yang Ruan, Saliya Ekanayake, Seung-Hee Bae, Qunfeng Dong, Mina Rho, Judy Qiu, Geoffrey Fox: Interpolative multidimensional scaling
61 techniques for the identification of clusters in very large sequence sets. BMC Bioinformatics 13(S-2): S9 (2012)
Applications
 Metagenomic data visualization
 DR algorithm: stochastic neighbor embedding (SNE)

CC Laczny, N Pinel, N Vlassis, P Wilmes: Alignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction, Scientific reports, 4
62 (2014).
Applications
 Neuroscience
 DR algorithm: multiple algorithms

J. P. Cunningham and B. M. Yu.: Dimensionality reduction for large-scale neural recordings. Nature Neuroscience, (2014), doi:10.1038/nn.3776.
63
Applications
 Semantic visualization in data mining
 DR algorithm: spherical semantic embedding (SSE).

64 Tuan M. V. Le, Hady Wirawan Lauw: Semantic visualization for spherical representation. KDD (2014): 1007-1016.
Applications
 Visualization of machine learning datasets
 DR algorithm: stochastic neighbor embedding (SNE)

Zhirong Yang, Jaakko Peltonen, Samuel Kaski: Scalable Optimization of Neighbor Embedding for Visualization. ICML (2) 2013: 127-135
65
Transfer Learning in Dimension Reduction

• We are facing huge amount of unlabeled data


nowadays
• Only a few databases are labeled
• The problem of inconsistency of training and
test data
• Transfer learning can help: training in one
domain and test in another
• Knowledge is better utilized
66
Recent Advances: Transfer Learning in DR
Motivation:
• We are facing huge amount of unlabeled data nowadays
• Only a few databases are labeled
• Knowledge is better utilized
Basic Idea:
Given two data sets A and B, use the knowledge learned from A to
help the learning task for B.

67
Recent Advances: Transfer Learning in DR
Object Face
Recognition Recognition

Learning Framework
Ming Shao, Carlos Castillo, Zhenghong Gu, Yun Fu: Low-Rank Transfer Subspace Learning. ICDM (2012): 1104-1109. 68
Recent Advances: Robust Subspace Discovery
Low-rank matrix recovery

Clean images Noisy images -observation - low-rank - sparse

Images from Twitter

Subspace Learning Subspace Clustering


 Find low-dimensional projection  Discover underlying subspaces in
with specific properties. data set, and correct errors.
 Unsupervised (e.g., PCA, LPP) /  Sparse subspace clustering (SSC),
Supervised (e.g., LDA) Low-rank representation (LRR).
Sheng Li, Yun Fu: Robust Subspace Discovery through Supervised Low-Rank Constraints. SDM 2014: 163-171 69
Recent Advances: Robust Subspace Discovery

Learning Framework
Sheng Li, Yun Fu: Robust Subspace Discovery through Supervised Low-Rank Constraints. SDM 2014: 163-171 70
Self-Taught Low-Rank Coding for Visual Learning

Self-taught Learning (Raina et al, 2007) Our Motivations and Contributions


 Transferring knowledge from auxiliary domain  Learn effective feature representations for
with minimum restrictions. target domain.
 A special type of transfer learning.  High-quality dictionary bridges auxiliary
domain and target domain.
Objective Function  Low-rank constraint characterizes the
structure information.
 The first general self-taught learning
framework is developed, including supervised
and unsupervised learning tasks.
Self-Taught Low-Rank Coding for Visual Learning
Application I: Subspace
Clustering

Application II: Image


Classification
Summary
• The motivation of using dimensionality reduction (DR) for
visualization
• DR mainly includes feature selection, feature extraction.
• Two basic linear DR methods: PCA and LDA.
• Nonlinear DR methods: LLE, SNE, etc.
• Applications of DR methods.
• Recent advances in DR.

73

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy