Visualization 9 Dim Reduction
Visualization 9 Dim Reduction
Data Visualization
Dimensionality Reduction
Y. Raymond Fu
Professor
Electrical and Computer Engineering (ECE), COE
Khoury College of Computer Science (KCCS)
Northeastern University
Attribute Dimensions and Orders
• Dimensions
– 1D: scalar
– 2D: two-dimensional vector
– 3D: three-dimensional vector
– >3D: multi-dimensional vector
• Orders
– scalars
– vectors
– matrix 1-st order 2-nd order
– tensors (high-order)
vector matrix
2
Data Table
www.many-eyes.com
3 Courtesy of Prof. Hanspeter Pfister, Harvard University.
Univariate Data Representations
Matlab Box Plot
8 https://www.youtube.com/watch?v=wvsE8jm1GzE
What if the dimension of the data is 4, 5, 6, and
even more?
9
High Dimensional Data
• Multimedia
– High-resolution images
– High-resolution videos
– Data from multiple sensors
• Bioinformatics
– Expressions of genes
– Neurons
• Social networks
– Tweets/likes/friendships
– Other interactions
• Weather and climate
– Multiple measurements (e.g., temperature)
– Time series data
• Finance
– Stock markets
– Time series data
10
Motivation and Goal of DR
• Reduce the degree of freedom in measurements
Replace a large set of measured variables with a small set of more
“condensed” variables
Simpler models are more robust on small datasets
• Reduce the computational load
By reducing the dimensionality of data, the computational burden
(time and space) could be greatly decreased.
• Visualization
“Looking at the data”—more interpretable; simpler explanations
Make sense of the data before processing
Goal
• Extract information hidden in the data
Detect variables relevant for a specific task and how variables interact
with each other Reformulate data with less variables.
11 Samuel Kaski, Jaakko Peltonen: Dimensionality Reduction for Data Visualization [Applications Corner]. IEEE Signal Process. Mag. 28(2): 100-104 (2011)
Motivation and Goal of DR
• Feature Extraction
Transform the original data set X from the d-dimensional space to a
k-dimensional space (k < d).
A general problem: 𝑌 = 𝑃𝑇 𝑋, where 𝑋 ∈ 𝑹𝑑 , 𝑌 ∈ 𝑹𝑘 .
13
Statistics & Linear Algebra Background
• Given a set of n-point data {Xk} in Rd
– The mean is E{x}
16
Parametric vs. Nonparametric Learning
• Parametric Model
– Use a parameterized family of probability distributions to describe the nature
of a set of data (Moghaddam & Pentland, 1997).
– The data distribution is empirically assumed or estimated.
– Learning is conducted by measuring a set of fixed parameters, such as mean
and variance.
– Effective for the large sample, but degrade for complicated data distribution.
• Nonparametric Model
– Distribution free.
– Learning is conducted by measuring the pair-wise data relationship in both
global and local manners.
– Effective and robust due to the reliance on fewer assumptions and parameters.
– Work for cases with small-sample, high-dimensionality, and complicated data
distribution.
17
Parametric Model
• Principal Component Analysis (PCA) and Linear Discriminant
Analysis (LDA)
• PCA is trying to captures the “principle” variations in the data
• It is computed by finding the Eigenvectors of the covariance
matrix of the data
• Geometrically, PCA finds the largest variations directions of
the underlying data
• Can be applied in data compression, pattern recognition, etc.
• Find a line going through the
data mean and along the max
variation direction of the data.
• Assuming zero mean, line is
represented as y=wTx, where
w is the basis, wTw=1.
18 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
Principal Component Analysis
• OptDigits Dataset
The data set contains 5620 instances of digitized handwritten digits in
range 0 ~ 9.
Each digit is a 𝑹64 vector: 8 × 8 = 64 pixels.
21
Principal Component Analysis
Eigenvector Eigenfaced
23 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
Linear Discriminant Analysis
• Instead of PCA, it finds the discriminant subspace by including class label info
in subspace modeling (Supervised learning).
– Compute within class scatter
– Compute between class scatter
– Maximize between scatters and minimize within scatters
25 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
LDA Definition
29
PCA vs. LDA
PCA LDA
31 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
PCA vs. LDA
PCA LDA
32 Courtesy of Prof. Zhu Li, Hong Kong Polytechnic University.
PCA vs. LDA
• PCA performs
worse under
this condition
• LDA (FLD-Fisher
Linear Discrimi-
nant) provides
better low
dimensional
representation.
36
Manifold
37 http://en.wikipedia.org/wiki/Manifold
Manifold Learning
Swiss Roll
Dimensionality
Reduction
http://www.cs.toronto.edu/~roweis/lle/
39
LEA for Pose Manifold
Linear embedding and subspace projection of 400 rotating teapot images. The number of nearest neighbors is k = 6.
40
Yun Fu, et. al. “Locally Adaptive Subspace and Similarity Metric Learning for Visual Clustering and Retrieval”, CVIU, Vol. 110, No. 3, pp: 390-402, 2008.
LEA for Expression Manifold
Manifold visualization of 1,965 Frey’s face images by LEA using k = 6 nearest neighbors.
41
Yun Fu, et. al. “Locally Adaptive Subspace and Similarity Metric Learning for Visual Clustering and Retrieval”, CVIU, Vol. 110, No. 3, pp: 390-402, 2008.
LEA for Emotion State Manifold
Manifold visualization for 11,627 AAI sequence images of a male subject using LLE algorithm. (a) A video frame
snapshot and the 3D face tracking result. The yellow mesh visualizes the geometric motion of the face. (b) Manifold
visualization with k=5 nearest neighbors. (c) k=8 nearest neighbors. (d) k=15 nearest neighbors and labeling results.
42
Yun Fu, et. al. “Locally Adaptive Subspace and Similarity Metric Learning for Visual Clustering and Retrieval”, CVIU, Vol. 110, No. 3, pp: 390-402, 2008.
LEA for Head Pose Manifold
43
Fisher Graph
• Graph Embedding (S. Yan, IEEE TPAMI, 2007)
– G={X, W} is an undirected weighted graph.
– W measures the similarity between a pair of vertices.
– Laplacian matrix
Y. Fu, et. al., IEEE Transactions on Information Forensics and Security, 2008.
Similarity Metric
• Single-Sample Metric
– Euclidean Distance and Pearson Correlation Coefficient.
• Multi-Sample Metric
– k-Nearest- Neighbor Simplex
Q
Q
Correlation Embedding Analysis
Objective Function
Y. Fu, et. al., IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
High-Order Data Structure
– m-th order tensors
– Representation where
– Define , where
– Here, tensor means multilinear representation.
vector matrix
Tensor
Y. Fu, et. al., IEEE Transactions on Circuits and Systems for Video Technology, 2009.
Correlation Tensor Analysis
Given two m-th order tensors,
Pearson Correlation Coefficient (PCC):
m different subspaces
Y. Fu, et. al., IEEE Transactions on Image Processing, 2008.
Manifold with Noise Effect
Robust Manifold by Low-Rank Recovery
Real-world ATR data are
Automated, real-
large scale, unbalanced in time, and robust
dynamic sampling, and description of ATR
easily affected by noises data space under
and outliers, which are uncertainty.
difficult to represent.
Seung-Hee Bae, Jong Youl Choi, Judy Qiu, Geoffrey Fox: Dimension reduction and visualization of large high-dimensional data via interpolation. HPDC
58 2010: 203-214
Applications
Biology data visualization
DR algorithm: principal component analysis (PCA)
Andreas Lehrmann, Michael Huber, Aydin Can Polatkan, Albert Pritzkau, Kay Nieselt: Visualizing dimensionality reduction of systems biology data.
59 Data Min. Knowl. Discov. 27(1): 146-165 (2013)
Applications
Biology data visualization
DR algorithm: locally linear embedding (LLE)
Andreas Lehrmann, Michael Huber, Aydin Can Polatkan, Albert Pritzkau, Kay Nieselt: Visualizing dimensionality reduction of systems biology data.
60 Data Min. Knowl. Discov. 27(1): 146-165 (2013)
Applications
Bioinformatics
DR algorithm: multidimensional scaling (MDS)
Adam Hughes, Yang Ruan, Saliya Ekanayake, Seung-Hee Bae, Qunfeng Dong, Mina Rho, Judy Qiu, Geoffrey Fox: Interpolative multidimensional scaling
61 techniques for the identification of clusters in very large sequence sets. BMC Bioinformatics 13(S-2): S9 (2012)
Applications
Metagenomic data visualization
DR algorithm: stochastic neighbor embedding (SNE)
CC Laczny, N Pinel, N Vlassis, P Wilmes: Alignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction, Scientific reports, 4
62 (2014).
Applications
Neuroscience
DR algorithm: multiple algorithms
J. P. Cunningham and B. M. Yu.: Dimensionality reduction for large-scale neural recordings. Nature Neuroscience, (2014), doi:10.1038/nn.3776.
63
Applications
Semantic visualization in data mining
DR algorithm: spherical semantic embedding (SSE).
64 Tuan M. V. Le, Hady Wirawan Lauw: Semantic visualization for spherical representation. KDD (2014): 1007-1016.
Applications
Visualization of machine learning datasets
DR algorithm: stochastic neighbor embedding (SNE)
Zhirong Yang, Jaakko Peltonen, Samuel Kaski: Scalable Optimization of Neighbor Embedding for Visualization. ICML (2) 2013: 127-135
65
Transfer Learning in Dimension Reduction
67
Recent Advances: Transfer Learning in DR
Object Face
Recognition Recognition
Learning Framework
Ming Shao, Carlos Castillo, Zhenghong Gu, Yun Fu: Low-Rank Transfer Subspace Learning. ICDM (2012): 1104-1109. 68
Recent Advances: Robust Subspace Discovery
Low-rank matrix recovery
Learning Framework
Sheng Li, Yun Fu: Robust Subspace Discovery through Supervised Low-Rank Constraints. SDM 2014: 163-171 70
Self-Taught Low-Rank Coding for Visual Learning
73