0% found this document useful (0 votes)
140 views10 pages

Partial Least Squares Regression and Projection On Latent Structure Regression (PLS Regression)

Uploaded by

Tiruneh GA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views10 pages

Partial Least Squares Regression and Projection On Latent Structure Regression (PLS Regression)

Uploaded by

Tiruneh GA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Focus Article

Partial least squares regression


and projection on latent structure
regression (PLS Regression)
Hervé Abdi∗

Partial least squares (PLS) regression (a.k.a. projection on latent structures) is a


recent technique that combines features from and generalizes principal component
analysis (PCA) and multiple linear regression. Its goal is to predict a set of
dependent variables from a set of independent variables or predictors. This
prediction is achieved by extracting from the predictors a set of orthogonal
factors called latent variables which have the best predictive power. These latent
variables can be used to create displays akin to PCA displays. The quality of the
prediction obtained from a PLS regression model is evaluated with cross-validation
techniques such as the bootstrap and jackknife. There are two main variants of
PLS regression: The most common one separates the roles of dependent and
independent variables; the second one—used mostly to analyze brain imaging
data—gives the same roles to dependent and independent variables . © 2010 John
Wiley & Sons, Inc. WIREs Comp Stat 2010 2 97–106

P LS is an acronym which originally stood for


partial least squares regression, but, recently,
some authors have preferred to develop this acronym
experimental data alike (e.g., neuroimaging, see Refs
7–11). It was first presented as an algorithm akin to
the power method (used for computing eigenvectors)
as projection to latent structures. In any case, PLS but was rapidly interpreted in a statistical framework.
regression combines features from and generalizes (see Refs 12–17).
principal component analysis (PCA) and multiple Recent developments, including, extensions to
linear regression. Its goal is to analyze or predict a multiple table analysis, are explored in Ref 18, and in
set of dependent variables from a set of independent the volume edited by Esposito Vinzi et al. (Ref 19)
variables or predictors. This prediction is achieved
by extracting from the predictors a set of orthogonal
factors called latent variables which have the best PREREQUISITE NOTIONS AND
predictive power. NOTATIONS
PLS regression is particularly useful when
The I observations described by K dependent variables
we need to predict a set of dependent variables
are stored in an I × K matrix denoted Y, the values
from a (very) large set of independent variables
of J predictors collected on these I observations are
(i.e., predictors). It originated in the social sciences
collected in an I × J matrix X.
(specifically economics from the seminal work of
Herman Wold, see Ref 1) but became popular first
in chemometrics (i.e., computational chemistry) due
in part to Herman’s son Svante,2 and in sensory
GOAL OF PLS REGRESSION:
evaluation.3 But PLS regression is also becoming a PREDICT Y FROM X
tool of choice in the social sciences as a multivariate The goal of PLS regression is to predict Y from X
technique for nonexperimental (e.g., Refs 4–6) and and to describe their common structure. When Y is
a vector and X is a full rank matrix, this goal could
∗ Correspondence to: herve@utdallas.edu be accomplished using ordinary multiple regression.
School of Behavioral and Brain Sciences, The University of Texas at When the number of predictors is large compared to
Dallas, Richardson, TX 75080–-3021, USA the number of observations, X is likely to be singular
DOI: 10.1002/wics.51 and the regression approach is no longer feasible (i.e.,

Vo lu me 2, Jan u ary /Febru ary 2010 © 2010 Jo h n Wiley & So n s, In c. 97


Focus Article www.wiley.com/wires/compstats

because of multicollinearity). This data configuration finds components from X that best predict Y.
has been recently often called the ‘small N large P Specifically, PLS regression searches for a set of
problem.’ It is characteristic of recent data analysis components (called latent vectors) that performs a
domains such as, e.g., bio-informatics, brain imaging, simultaneous decomposition of X and Y with the
chemometrics, data mining, and genomics. constraint that these components explain as much as
possible of the covariance between X and Y. This step
generalizes PCA. It is followed by a regression step
Principal component regression where the latent vectors obtained from X are used to
Several approaches have been developed to cope predict Y.
with the multicollinearity problem. For example, one PLS regression decomposes both X and Y as a
approach is to eliminate some predictors (e.g., using product of a common set of orthogonal factors and a
stepwise methods, see Ref 20), another one is to use set of specific loadings. So, the independent variables
ridge regression.21 One method, closely related to PLS are decomposed as:
regression is called principal component regression
(PCR), it performs a principal component analysis X = TPT with TT T = I, (3)
(PCA) of the X matrix and then uses the principal
components of X as the independent variables of a with I being the identity matrix (some variations
multiple regression model predicting Y. Technically, of the technique do not require T to have unit
in PCA, X is decomposed using its singular value norms, these variations differ mostly by the choice
decomposition (see Refs 22, 23 for more details) as: of the normalization, they do not differ in their
final prediction, but the differences in normalization
X = RVT (1) may make delicate the comparisons between different
implementations of the technique). By analogy with
with: PCA, T is called the score matrix, and P the
loading matrix (in PLS regression the loadings are
RT R = VT V = I, (2) not orthogonal). Likewise, Y is estimated as:
(where R and V are the matrices of the left and 
Y = TBCT , (4)
right singular vectors), and  being a diagonal
matrix with the singular values as diagonal elements. where B is a diagonal matrix with the ‘regression
The singular vectors are ordered according to their weights’ as diagonal elements and C is the ‘weight
corresponding singular value which is the square root matrix’ of the dependent variables (see below for
of the variance (i.e., eigenvalue) of X explained by more details on the regression weights and the weight
the singular vectors. The columns of V are called matrix). The columns of T are the latent vectors.
the loadings. The columns of G = R are called When their number is equal to the rank of X, they
the factor scores or principal components of X, perform an exact decomposition of X. Note, however,
or simply scores or components. The matrix R of that the latent vectors provide only an estimate of Y
the left singular vectors of X (or the matrix G of (i.e., in general 
Y is not equal to Y).
the principal components) are then used to predict
Y using standard multiple linear regression. This
approach works well because the orthogonality of PLS REGRESSION AND COVARIANCE
the singular vectors eliminates the multicolinearity
problem. But, the problem of choosing an optimum The latent vectors could be chosen in a lot of different
subset of predictors remains. A possible strategy is ways. In fact, in the previous formulation, any set
to keep only a few of the first components. But these of orthogonal vectors spanning the column space of
components were originally chosen to explain X rather X could be used to play the role of T. In order to
than Y, and so, nothing guarantees that the principal specify T, additional conditions are required. For PLS
components, which ‘explain’ X optimally, will be regression this amounts to finding two sets of weights
relevant for the prediction of Y. denoted w and c in order to create (respectively) a
linear combination of the columns of X and Y such
that these two linear combinations have maximum
Simultaneous decomposition of covariance. Specifically, the goal is to obtain a first
predictors and dependent variables pair of vectors:
So, PCA decomposes X in order to obtain components
which best explains X. By contrast, PLS regression t = Xw and u = Yc (5)

98 © 2010 Jo h n Wiley & So n s, In c. Vo lu me 2, Jan u ary /Febru ary 2010


WIREs Computational Statistics PLS REGRESSION

with the constraints that wT w = 1, tT t = 1 and tT u is and this is indeed the case. For example, if we
maximal. When the first latent vector is found, it is start from Step 1 of the algorithm, which computes:
subtracted from both X and Y and the procedure w ∝ ET u, and substitute the rightmost term iteratively,
is re-iterated until X becomes a null matrix (see the we find the following series of equations:
algorithm section for more details).
w ∝ ET u ∝ ET Fc ∝ ET FFT t ∝ ET FFT Ew. (6)
NIPALS: A PLS ALGORITHM
This shows that the weight vector w is the first right
The properties of PLS regression can be analyzed singular vector of the matrix
from a sketch of the original algorithm (called nipals).
The first step is to create two matrices: E = X and
F = Y. These matrices are then column centered and S = ET F. (7)
normalized (i.e., transformed into Z-scores). The sum
of squares of these matrices are denoted SSX and SSY . Similarly, the first weight vector c is the left singular
Before starting the iteration process, the vector u is vector of S. The same argument shows that the first
initialized with random values. The nipals algorithm vectors t and u are the first eigenvectors of EET FFT
then performs the following steps (in what follows and FFT EET . This last observation is important from
the symbol ∝ means ‘to normalize the result of the a computational point of view because it shows
operation’): that the weight vectors can also be obtained from
matrices of size I by I.25 This is useful when the
Step 1. w ∝ ET u (estimate X weights). number of variables is much larger than the number
Step 2. t ∝ Ew (estimate X factor scores). of observations (e.g., as in the ‘small N, large P
problem’).
Step 3. c ∝ FT t (estimate Y weights).
Step 4. u = Fc (estimate Y scores).

If t has not converged, then go to Step 1, if t has PREDICTION OF THE DEPENDENT


converged, then compute the value of b which is
VARIABLES
used to predict Y from t as b = tT u, and compute
the factor loadings for X as p = ET t. Now subtract The dependent variables are predicted using the
(i.e., partial out) the effect of t from both E and F as multivariate regression formula as:
follows E = E − tpT and F = F − btcT . This subtraction
is called a deflation of the matrices E and F. The vectors

Y = TBCT = XBPLS with BPLS = (PT+ )BCT (8)
t, u, w, c, and p are then stored in the corresponding
matrices, and the scalar b is stored as a diagonal
element of B. The sum of squares of X (respectively (where PT+ is the Moore–Penrose pseudo-inverse of
Y) explained by the latent vector is computed as PT, see Ref 26). This last equation assumes that both X
pT p (respectively b2 ), and the proportion of variance and Y have been standardized prior to the prediction.
explained is obtained by dividing the explained sum In order to predict a nonstandardized matrix Y from
of squares by the corresponding total sum of squares a nonstandardized matrix X, we use BPLS which
(i.e., SSX and SSY ). is obtained by reintroducing the original units into
If E is a null matrix, then the whole set of latent BPLS and adding a first column corresponding to the
vectors has been found, otherwise the procedure can intercept (when using the original units, X needs to
be re-iterated from Step 1 on. be augmented with a first columns of 1, as in multiple
regression).
If all the latent variables of X are used, this
PLS REGRESSION AND THE SINGULAR regression is equivalent to PCR. When only a subset
VALUE DECOMPOSITION of the latent variables is used, the prediction of Y is
The nipals algorithm is obviously similar to the power optimal for this number of predictors.
method (for a description, see, e.g., Ref 24) which finds The interpretation of the latent variables is often
eigenvectors. So PLS regression is likely to be closely facilitated by examining graphs akin to PCA graphs
related to the eigen- and singular value decompositions (e.g., by plotting observations in a t1 × t2 space, see
(see Refs 22,23 for an introduction to these notions) Figure 1).

Vo lu me 2, Jan u ary /Febru ary 2010 © 2010 Jo h n Wiley & So n s, In c. 99


Focus Article www.wiley.com/wires/compstats

STATISTICAL INFERENCE: the value of RESS, the better the prediction, with a
EVALUATING THE QUALITY OF THE value of 0 indicating perfect prediction. For a fixed
PREDICTION effect model, the larger L (i.e., the number of latent
variables used), the better the prediction.
Fixed effect model
The quality of the prediction obtained from PLS
regression described so far corresponds to a fixed Random effect model
effect model (i.e., the set of observations is considered In most applications, however, the set of observations
as the population of interest, and the conclusions of is a sample from some population of interest. In
the analysis are restricted to this set). In this case, this context, the goal is to predict the value of the
the analysis is descriptive and the amount of variance dependent variables for new observations originating
(of X and Y) explained by a latent vector indicates from the same population as the sample. This
its importance for the set of data under scrutiny. In corresponds to a random model. In this case, the
this context, latent variables are worth considering if amount of variance explained by a latent variable
their interpretation is meaningful within the research indicates its importance in the prediction of Y. In this
context. context, a latent variable is relevant only if it improves
For a fixed effect model, the overall quality of the prediction of Y for new observations. And this,
a PLS regression model using L latent variables is in turn, opens the problem of which and how many
evaluated by first computing the predicted matrix of latent variables should be kept in the PLS regression
dependent variables denoted  Y[L] and then measuring model in order to achieve optimal generalization
 [L]
the similarity between Y and Y. Several coefficients (i.e., optimal prediction for new observations). In
are available for the task. The squared coefficient of order to estimate the generalization capacity of PLS
correlation is sometimes used as well as its matrix regression, standard parametric approaches cannot
specific cousin the RV coefficient.27 The most popular be used, and therefore the performance of a PLS
coefficient, however, is the residual sum of squares, regression model is evaluated with computer-based
abbreviated as RESS. It is computed as: resampling techniques such as the bootstrap and cross-
validation techniques where the data are separated
RESS = Y − 
Y[L] 2 , (9) into learning set (to build the model) and testing
set (to test the model). A popular example of this
(where  is the norm of Y, i.e., the square root of last approach is the jackknife (sometimes called the
the sum of squares of the elements of Y). The smaller ‘leave-one-out’ approach). In the jackknife,28,29 each

(a) (b)
LV2
2

5 2

Acidity 1 Hedonic
3
LV1
Alcohol
Meat

Price
1

4
Dessert

Sugar

FIGURE 1 | PLS regression regression. (a) Projection of the wines and the predictors on the first two latent vectors (respectively matrices T and
W). (b) Circle of correlation showing the correlation between the original dependent variables (matrix Y) and the latent vectors (matrix T).

100 © 2010 Jo h n Wiley & So n s, In c. Vo lu me 2, Jan u ary /Febru ary 2010


WIREs Computational Statistics PLS REGRESSION

observation is, in turn, dropped from the data set, alternative set of values sets the threshold to .05
the remaining observations constitute the learning set when I ≤ 100 and to 0 when I > 100, see Refs
and are used to build a PLS regression model that is 16,31). Obviously, the choice of the threshold is
applied to predict the left-out observation which then important from a theoretical point of view, but, from
constitutes the testing set. With this procedure, each a practical point of view, the values indicated above
observation is predicted according to a random effect seem satisfactory.
model. These predicted observations are then stored
in a matrix denoted  Y.
For a random effect model, the overall quality Bootstrap confidence intervals for the
of a PLS regression model using L latent variables is dependent variables
evaluated by using L variables to compute—according When the number of latent variables of the model has
to the random model—the matrix denoted  Y[L] which been decided, confidence intervals for the predicted
stores the predicted values of the observations for the values can be derived using the bootstrap.32 When
dependent variables. The quality of the prediction is using the bootstrap, a large number of samples is
then evaluated as the similarity between  Y[L] and Y. obtained by drawing, for each sample, observations
As for the fixed effect model, this can be done with the with replacement from the learning set. Each sample
squared coefficient of correlation (sometimes called, provides a value of BPLS which is used to estimate
in this context, the ‘cross-validated r,’30 ) as well as the the values of the observations in the testing set. The
RV coefficient. By analogy with the RESS coefficient, distribution of the values of these observations is then
one can also use the predicted residual sum of squares, used to estimate the sampling distribution and to
abbreviated PRESS. It is computed as: derive confidence intervals.

PRESS = Y − 
Y[L] 2 . (10)
A SMALL EXAMPLE
The smaller the value of PRESS, the better the
We want to predict the subjective evaluation of a set
prediction for a random effect model, with a value
of five wines. The dependent variables that we want
of 0 indicating perfect prediction.
to predict for each wine are its likeability, and how
well it goes with meat or dessert (as rated by a panel
How many latent variables? of experts, see Table 1). The predictors are the price,
By contrast with the fixed effect model, the quality sugar, alcohol, and acidity content of each wine (see
of prediction for a random model does not always Table 2).
increase with the number of latent variables used in the The different matrices created by PLS regression
model. Typically, the quality first increases and then are given in Tables 3–13. From Table 9, one can
decreases. If the quality of the prediction decreases find that two latent vectors explain 98% of the
when the number of latent variables increases this variance of X and 85% of Y. This suggests that
indicates that the model is overfitting the data (i.e., these two dimensions should be kept for the final
the information useful to fit the observations from the solution as a fixed effect model. The examination of
learning set is not useful to fit new observations). the two-dimensional regression coefficients (i.e., BPLS ,
Therefore, for a random model, it is critical to see Table 10) shows that sugar is mainly responsible
determine the optimal number of latent variables for choosing a dessert wine, and that price is negatively
to keep for building the model. A straightforward correlated with the perceived quality of the wine
approach is to stop adding latent variables as soon (at least in this example . . . ), whereas alcohol is
as the PRESS decreases. A more elaborated approach positively correlated with it. Looking at the latent
(see, e.g., Ref 16) starts by computing for the th vectors shows that t1 expresses price and t2 reflects
latent variable the ratio Q2 defined as: sugar content. This interpretation is confirmed and
illustrated in Figure 1a and b which display in (a) the
PRESS projections on the latent vectors of the wines (matrix
Q2 = 1 − , (11)
RESS−1 T) and the predictors (matrix W), and in (b) the
correlation between the original dependent variables
with PRESS (resp. RESS−1 ) being the value of PRESS and the projection of the wines on the latent vectors.
(resp. RESS) for the th (resp.  − 1) latent variable From Table 9, we find that PRESS reaches its
[where RESS0 = K × (I − 1)]. A latent variable is kept minimum value for a model including only the first
if its value of Q2 is larger than some arbitrary latent variable and that Q2 is larger than .0975
value generally set equal to (1 − 952 ) = 0.0975 (an only for the first latent variable. So, both PRESS

Vo lu me 2, Jan u ary /Febru ary 2010 © 2010 Jo h n Wiley & So n s, In c. 101


Focus Article www.wiley.com/wires/compstats

TABLE 1 The Matrix Y of the Dependent Variables Tables 12 and 13 display the predicted value of 
Y and

Y when the prediction uses one latent vector.
Wine Hedonic Goes with Meat Goes with Dessert
1 14 7 8
2 10 7 6 SYMMETRIC PLS REGRESSION: BPLS
3 8 5 5 REGRESSION
4 2 4 7 Interestingly, two different, but closely related,
5 6 2 4 techniques exist under the name of PLS regression.
The technique described so far originated from the
work of Wold and Martens. In this version of PLS
TABLE 2 The X Matrix of Predictors
regression, the latent variables are computed from a
Wine Price Sugar Alcohol Acidity succession of singular value decompositions followed
1 7 7 13 7 by deflation of both X and Y. The goal of the analysis is
to predict Y from X and therefore the roles of X and Y
2 4 3 14 7
are asymmetric. As a consequence, the latent variables
3 10 5 12 5
computed to predict Y from X are different from the
4 16 7 11 3 latent variables computed to predict X from Y.
5 13 3 10 3 A related technique, also called PLS regression,
originated from the work of Bookstein (Ref 44; see
also Ref 33 for early related ideas; Ref 8, and Ref 45
TABLE 3 The Matrix T for later applications). To distinguish this version of
Wine t1 t2 t3 PLS regression from the previous one, we will call it
1 0.4538 −0.4662 0.5716 BPLS regression.
This technique is particularly popular for the
2 0.5399 0.4940 −0.4631
analysis of brain imaging data (probably because
3 0 0 0 it requires much less computational time, which is
4 −0.4304 −0.5327 −0.5301 critical taking into account the very large size of brain
5 −0.5633 0.5049 0.4217 imaging data sets). Just like standard PLS regression
(cf. Eqs. (6) and (7)), BPLS regression starts with the
matrix
TABLE 4 The Matrix U
Wine u1 u2 u3 S = XT Y. (12)
1 1.9451 −0.7611 0.6191
The matrix S is then decomposed using its singular
2 0.9347 0.5305 −0.5388 value decomposition as:
3 −0.2327 0.6084 0.0823
4 −0.9158 −1.1575 −0.6139 S = WCT with WT W = CT C = I, (13)
5 −1.7313 0.7797 0.4513
(where W and C are the matrices of the left and right
singular vectors of S and  is the diagonal matrix of
TABLE 5 The Matrix P the singular values, cf. Eq. (1)). In BPLS regression,
the latent variables for X and Y are obtained as (cf.
p1 p2 p3
Eq. (5)):
Price −1.8706 −0.6845 −0.1796
Sugar 0.0468 −1.9977 0.0829 T = XW and U = YC. (14)
Alcohol 1.9547 0.0283 −0.4224
Acidity 1.9874 0.0556 0.2170 Because BPLS regression uses a single singular
value decomposition to compute the latent variables,
they will be identical if the roles of X and Y are
reversed: BPLS regression treats X and Y symmetri-
and Q2 suggest that a model including only the first cally. So, while standard PLS regression is akin to
latent variable is optimal for generalization to new multiple regression, BPLS regression is akin to cor-
observations. Consequently, we decided to keep one relation or canonical correlation.34 BPLS regression,
latent variable for the random PLS regression model. however, differs from canonical correlation because

102 © 2010 Jo h n Wiley & So n s, In c. Vo lu me 2, Jan u ary /Febru ary 2010


WIREs Computational Statistics PLS REGRESSION

TABLE 6 The Matrix W


TABLE 10 The Matrix BPLS When Two Latent Vectors Are Used
w1 w2 w3
Price −0.5137 −0.3379 −0.3492 Hedonic Goes with meat Goes with dessert
Sugar 0.2010 −0.9400 0.1612 Price −0.2662 −0.2498 0.0121
Alcohol 0.5705 −0.0188 −0.8211 Sugar 0.0616 0.3197 0.7900
Acidity 0.6085 0.0429 0.4218 Alcohol 0.2969 0.3679 0.2568
Acidity 0.3011 0.3699 0.2506

TABLE 7 The Matrix C


c1 c2 c3 TABLE 11 The Matrix BPLS When Two Latent Vectors Are Used
Hedonic 0.6093 0.0518 0.9672
Hedonic Goes with meat Goes with dessert
Goes with meat 0.7024 −0.2684 −0.2181
Intercept −3.2809 −3.3770 −1.3909
Goes with dessert 0.3680 −0.9619 −0.1301
Price −0.2559 −0.1129 0.0063
Sugar 0.1418 0.3401 0.6227
BPLS regression extracts the variance common to X Alcohol 0.8080 0.4856 0.2713
and Y whereas canonical correlation seeks linear com- Acidity 0.6870 0.3957 0.1919
binations of X and Y having the largest correlation. In
fact, the name of partial least squares covariance anal-
ysis or canonical covariance analysis would probably
be more appropriate for BPLS regression. TABLE 12 The Matrix 
Y When One Latent Vector Is Used

Wine Hedonic Goes with meat Goes with dessert


1 11.4088 6.8641 6.7278
Varieties of BPLS regression 2 12.0556 7.2178 6.8659
BPLS regression exists in three main varieties, one of 3 8.0000 5.0000 6.0000
which being specific to brain imaging. The first variety 4 4.7670 3.2320 5.3097
of BPLS regression is used to analyze experimental
5 3.7686 2.6860 5.0965
results, it is called behavior BPLS regression if the Y
matrix consists of measures or Task BPLS regression if
the Y matrix consists of contrasts or describes the
experimental conditions with dummy coding. TABLE 13 The Matrix 
Y When One Latent Vector Is Used
The second variety is called mean centered
Wine Hedonic Goes with meat Goes with dessert
task BPLS regression and is closely related to
barycentric discriminant analysis (e.g., discriminant 1 8.5877 5.7044 5.5293
2 12.7531 7.0394 7.6005
3 8.0000 5.0000 6.2500
TABLE 8 The b Vector
4 6.8500 3.1670 4.4250
b1 b2 b3 5 3.9871 4.1910 6.5748
2.7568 1.6272 1.1191

TABLE 9 Variance of X and Y Explained by the Latent Vectors, RESS, PRESS, and Q2
Cumulative Cumulative
Percentage of Percentage of Percentage of Percentage of
Explained Explained Explained Explained
Latent Vector Variance for X Variance for X Variance for Y Variance for Y RESS PRESS Q2
1 70 70 63 63 32.11 95.11 7.93
2 28 98 22 85 25.00 254.86 −280
3 2 100 10 95 1.25 101.56 −202.89

Vo lu me 2, Jan u ary /Febru ary 2010 © 2010 Jo h n Wiley & So n s, In c. 103


Focus Article www.wiley.com/wires/compstats

correspondence analysis, see Ref 35). Like discrimi- the variance common to X and Y whereas canoni-
nant analysis, this approach is suited for data in which cal correlation seeks linear combinations of X and
the observations originate from groups defined a pri- Y having the largest correlation (some connections
ori, but, unlike discriminant analysis, it can be used between BPLS regression and other multivariate tech-
for small N, large P problems. The X matrix con- niques relevant for brain imaging are explored in Refs
tains the deviations of the observations to the average 41–43). The relationships between BPLS regression,
vector of all the observations, and the Y matrix uses and statis or multiple factor analysis have not been
a dummy code to identify the group to which each analyzed formally, but these techniques are likely to
observation belongs (i.e., Y has as many columns as provide similar conclusions.
there are groups, with a value of 1 at the intersection
of the ith row and the kth column indicating that the
ith row belongs to the kth group, whereas a value of
0 indicates that it does not). With this coding scheme, SOFTWARE
the S matrix contains the group barycenters and the
PLS regression necessitates sophisticated computa-
BPLS regression analysis of this matrix is equivalent
tions and therefore its application depends on the
to a PCA of the matrix of the barycenters (which is
availability of software. For chemistry, two main
the first step of barycentric discriminant analysis).
programs are used: the first one called simca-p was
The third variety, which is specific to brain imag-
ing, is called seed PLS regression. It is used to study developed originally by Wold, the second one called
patterns of connectivity between brain regions. Here, the Unscrambler was first developed by Martens.
the columns of a matrix of brain measurements (where For brain imaging, spm, which is one of the most
rows are scans and columns are voxels) are partitioned widely used programs in this field, has recently
into two sets: a small one called the seed and a larger (2002) integrated a PLS regression module. Outside
one representing the rest of the brain. In this con- these domains, several standard commercial statis-
text, the S matrix contains the correlation between the tical packages (e.g., SAS, SPSS, Statistica), include
columns of the seed and the rest of the brain. The anal- PLS regression. The public domain R language also
ysis of the S matrix reveals the pattern of connectivity includes PLS regression. A dedicated public domain
between the seed and the rest of the brain. called Smartpls is also available.
In addition, interested readers can download
a set of matlab programs from the author’s home
RELATIONSHIP WITH OTHER page (www.utdallas.edu/∼herve). Also, a pub-
TECHNIQUES lic domain set of matlab programs is available from
PLS regression is obviously related to canonical corre- the home page of the N-Way project (www.models.
lation (see Ref 34), statis, and multiple factor analysis kvl.dk/source/nwaytoolbox/) along with tuto-
(see Refs 36, 37 for an introduction to these tech- rials and examples. Staying with matlab, the statistical
niques). These relationships are explored in detail in toolbox includes a PLS regression routine.
Refs 16, 38–40, and in the volume edited by Esposito For brain imaging (a domain where the
Vinzi et al.19 The main original goal of PLS regres- Bookstein approach is, by far, the most popu-
sion is to preserve the asymmetry of the relationship lar PLS regression approach), a special toolbox
between predictors and dependent variables, whereas written in matlab (by McIntosh, Chau, Lobaugh,
these other techniques treat them symmetrically. and Chen) is freely available from www.rotman-
By contrast, BPLS regression is a symmetric baycrest.on.ca:8080. And, finally, a commercial
technique and therefore is closely related to canon- matlab toolbox has also been developed by Eigenre-
ical correlation, but BPLS regression seeks to extract search.

REFERENCES
1. Wold, H. Estimation of principal components and 2. Wold S. Personal memories of the early PLS develop-
related models by iterative least squares. In: Krishnaiah ment. Chemometrics Intell Lab Syst 2001, 58:83–84.
PR, ed. Multivariate Analysis. New York: Academic 3. Martens H, Naes T. Multivariate Calibration. London:
Press; 1966, 391–420. John Wiley & Sons, 1989.

104 © 2010 Jo h n Wiley & So n s, In c. Vo lu me 2, Jan u ary /Febru ary 2010


WIREs Computational Statistics PLS REGRESSION

4. Fornell C, Lorange P, Roos J. The cooperative venture In: Salkind NJ, ed. Encyclopedia of Measurement and
formation process: a latent variable structural modeling Statistics. Thousand Oaks, CA: Sage; 2007a, 907–912.
approach. Manage Sci 1990, 36:1246–1255. 23. Abdi H. Eigen-decomposition: eigenvalues and eigen-
5. Hulland J. Use of partial least square in strategic vecteurs. In: Salkind NJ, ed. Encyclopedia of Mea-
management research: a review of four recent studies. surement and Statistics. Thousand Oaks: Sage; 2007b,
Strategic Manage J 1999, 20:195–204. 304–308.
6. Graham JL, Evenko LI, Rajan MN. An empirical com- 24. Abdi H, Valentin D, Edelman B. Neural Networks.
parison of soviet and american business negotiations. Thousand Oaks, CA: Sage; 1999.
J Int Business Studies 1992, 5:387–418. 25. Rännar S, Lindgren F, Geladi P, Wold D. A PLS kernel
7. Worsley KJ. An overview and some new developments algorithm for data sets with many variables and fewer
in the statistical analysis of PET and fMRI data. Hum obsects. Part 1: theory and algorithms. J Chemom,
Brain Mapp 1997, 5:254–258. 1994, 8:111–125.
8. McIntosh AR, Lobaugh NJ. Partial least squares anal- 26. Abdi H. Linear algebra for neural networks. In: Smelser
ysis of neuroimaging data: applications and advances. NJ, Baltes PB, eds. International Encyclopedia of the
Neuroimage 2004, 23:250–263. Social and Behavioral Sciences. Oxford, UK: Elsevier;
2001.
9. Giessing C, Fink GR, Ros̈ler F, Thiel CM. fMRI data
predict individual differences of behavioral effects of 27. Abdi H. RV coefficient and Congruence coefficient.
nicotine: a partial least square analysis. J Cogn Neu- In: Salkind NJ, ed. Encyclopedia of Measurement and
rosci 2007, 19:658–6670. Statistics. Thousand Oaks, CA: Sage; 2007c, 849–853.
10. Kovacevic N, McIntosh R. Groupwise independent 28. Quenouille M. Notes on bias and estimation.
component decomposition of EEG data and partial least Biometrika 1956, 43:353–360.
square analysis. NeuroImage 2007, 35:1103–1112. 29. Efron B. The Jackknife, the Bootstrap and Other
11. Wang JY, Bakhadirov K, Devous MD Sr, Abdi H, Resampling Plans. Philadelphia, PA: SIAM; 1982.
McColl R, et al. Diffusion tensor tractography of 30. Wakeling IN, Morris J. A test for significance for par-
traumatic diffuse axonal injury. Arch Neurol 2008, tial least squares regression. J Chemometrics 1993,
65:619–626. 7:291–304.
12. Burnham AJ, Viveros R, MacGregor JF. Frameworks 31. Wold S. PLS for multivariate linear modelling. In: van
for latent variable multivariate regression. J Chemom de Waterbeenl H, ed. QSAR: Chemometric Methods in
1996, 10:31–45. Molecular Design, Methods and Principles in Medic-
13. Garthwaite P. An interpretation of partial least squares. inal Chemistry, Vol. 2, Weinheim, Germany: Verlag
J Am Stat Assoc 1994, 89:122–127. Chemie; 1995.

14. Höskuldsson A. Weighting schemes in multivariate data 32. Efron B, Tibshirani RJ. An Introduction to the Boot-
analysis. J Chemom 2001, 15:371–396. strap. New York: Chapman and Hall; 1993.
33. Tucker LR. Determination of parameters of a func-
15. Phatak A, de Jong S. The geometry of partial least
tional relation by factor analysis. Psychometrika 1958,
squares. J Chemom 1997, 11:311–338.
23:19–23.
16. Tenenhaus M. La régression PLS. Paris: Technip; 1998.
34. Gittins R. Canonical Analysis: A Review with Applica-
17. Ter Braak CJF, de Jong S. The objective function tions in Ecology. New York: Springer; 1985.
of partial least squares regression. J Chemom 1998,
35. Abdi H. Discriminant correspondence analysis. In:
12:41–54.
Salkind NJ, ed. Encyclopedia of Measurement and
18. Höskuldsson A. Modelling procedures for directed net- Statistics. Thousand Oaks, CA: Sage; 2007d, 270–275.
work of data blocks. Chemometrics Intell Lab Syst
36. Abdi H, Valentin D. STATIS. In: Salkind NJ, ed.
2009, 97:3–10.
Encyclopedia of Measurement and Statistics. Thousand
19. Esposito Vinzi V, Chin WW, Henseler J, Wang H, eds. Oaks, CA: Sage; 2007a, 955–962.
Handbook of Partial Least Squares Concepts, Methods
37. Abdi H, Valentin D. Multiple factor analysis. In:
and Applications in Marketing and Related Fields. New
Salkind NJ, ed. Encyclopedia of Measurement and
York: Springer Verlag; 2009.
Statistics. Thousand Oaks, CA: Sage; 2007b, 651–657.
20. Draper NR, Smith H. Applied Regression Analysis. 3rd 38. Pagès J, Tenenhaus M. Multiple factor analysis com-
ed. New York: John Wiley & Sons; 1998. bined with PLS regression path modeling. Application
21. Hoerl AE, Kennard RW. Ridge regression: Biased to the analysis of relationships between physicochemi-
estimation for nonorthogonal problem. Technometrics cal variables, sensory profiles and hedonic judgments.
1970, 12:55–67. Chemometrics Intell Lab Syst 2001, 58:261–273.
22. Abdi H. Singular Value Decomposition (SVD) and 39. Abdi H. Multivariate analysis. In: Lewis-Beck M, Bry-
Generalized Singular Value Decomposition (GSVD). man A, Futing T, eds. Encyclopedia for Research

Vo lu me 2, Jan u ary /Febru ary 2010 © 2010 Jo h n Wiley & So n s, In c. 105


Focus Article www.wiley.com/wires/compstats

Methods for the Social Sciences. Thousand Oaks, CA: et al., eds. Human Brain Function. New York: Elsevier;
Sage; 2003b, 699–702. 2004, 999–1019.
40. Rosipal R, Krämer N. Overview and recent advances
43. Lazar NA. The Statistical Analysis of Functional MRI
in partial least squares. In: Craig Saunders C, Grobel-
Data. New York: Springer; 2008.
nik M, Gunn J, Shawe-Taylor J, eds. Subspace, Latent
Structure and Feature Selection: Statistical and Opti- 44. (a) Bookstein FL, Streissguth AP, Sampson PD, Connor
mization Perspectives Workshop (SLSFS 2005). New PD, Barr HM. Corpus callosum shape and neuropsy-
York: Springer-Verlag; 2006, 34–51. chological deficits in adult males with heavy fetal
41. Kherif F, Poline JB, Flandin G, Benali H, Simon O, alcohol exposure. Neuro Image 2002, 15:233–25145.
et al. Multivariate model specification for fMRI data. (b) Bookstein FL. Partial least squares: a dose – response
NeuroImage 2002, 16:1068–1083. model for measurement in the behavioral and brain
42. Friston K, Büchel C. Functional integration. In: Frack- sciences. [Revised] Psycoloquy [an electronic journal]
owiak RSJ, Friston KJ, Frith CD, Dolan RJ, Price CJ, 1994, 5:(23).

FURTHER READING
Abdi H. PLS regression. In: Lewis-Beck M, Bryman A, Futing T , eds. Encyclopedia for Research Methods for
the Social Sciences. Thousand Oaks, CA : Sage ; 2003a, 792–795.
Abdi H, Williams LJ. Principal component analysis. WIREs Comp Stat.
Escofier B, Pagès J. Analyses Factorielles Multiples. Paris : Dunod; 1988.
Frank IE, Friedman JH. A statistical view of chemometrics regression tools. Technometrics 1993, 35:109–148.
Helland IS. PLS and statistical models. Scand J Stat 1990, 17:97–114.
Höskuldson A. PLS methods. J Chemom 1988, 2:211–228.
Geladi P, Kowlaski B. Partial least square regression: a tutorial. Anal Chem Acta 1986, 35:1–17.
Martens H, Martens M. Multivariate Analysis of Quality : An Introduction. London : John Wiley & Sons;
2001.
McIntosh AR, Bookstein FL, Haxby JV, Grady CL. Spatial pattern analysis of functional brain images using
partial least squares. Neuroimage 1996, 3:143–157.

106 © 2010 Jo h n Wiley & So n s, In c. Vo lu me 2, Jan u ary /Febru ary 2010

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy