0% found this document useful (0 votes)
38 views6 pages

Bioinformatics: Missing Value Estimation Methods For DNA Microarrays

Uploaded by

m.ansari722
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views6 pages

Bioinformatics: Missing Value Estimation Methods For DNA Microarrays

Uploaded by

m.ansari722
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Vol. 17 no.

6 2001
BIOINFORMATICS Pages 520–525

Missing value estimation methods for DNA


microarrays
Olga Troyanskaya 1, Michael Cantor 1, Gavin Sherlock 2,
Pat Brown 3, Trevor Hastie 4, Robert Tibshirani 4, David Botstein 2
and Russ B. Altman 1,∗
1 StanfordMedical Informatics, 2 Department of Genetics, Stanford University School
of Medicine, Stanford, CA, USA, 3 Department of Biochemistry, Stanford University
School of Medicine, and Howard Hughes Medical Institute, Stanford, CA, USA and
4 Departments of Statistics and Health Research and Policy, Stanford University,

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014


Stanford, CA, USA

Received on November 13, 2000; revised on February 22, 2001; accepted on February 26, 2001

ABSTRACT INTRODUCTION
Motivation: Gene expression microarray experiments can DNA microarray technology allows for the monitoring
generate data sets with multiple missing expression val- of expression levels of thousands of genes under a
ues. Unfortunately, many algorithms for gene expression variety of conditions (DeRisi et al., 1997; Spellman
analysis require a complete matrix of gene array values as et al., 1998). Microarrays have been used to study a
input. For example, methods such as hierarchical cluster- variety of biological processes, from differential gene
ing and K-means clustering are not robust to missing data, expression in human tumors (Perou et al., 2000) to yeast
and may lose effectiveness even with a few missing values. sporulation (Chu et al., 1998). Various analysis techniques
Methods for imputing missing data are needed, therefore, have been developed, aimed primarily at identifying
to minimize the effect of incomplete data sets on analy- regulatory patterns or similarities in expression under
ses, and to increase the range of data sets to which these similar conditions. Commonly used analysis methods
algorithms can be applied. In this report, we investigate include clustering techniques (Eisen et al., 1998; Tamayo
automated methods for estimating missing data. et al., 1999), techniques based on partitioning of data
Results: We present a comparative study of several (Heyer et al., 1999; Tamayo et al., 1999), as well as
methods for the estimation of missing values in gene various supervised learning algorithms (Alter et al., 2000;
microarray data. We implemented and evaluated three Brown et al., 2000; Golub et al., 1999; Raychaudhuri et
methods: a Singular Value Decomposition (SVD) based al., 2000; Hastie et al., 2000).
method (SVDimpute), weighted K-nearest neighbors (KN- The data from microarray experiments is usually in
Nimpute), and row average. We evaluated the methods the form of large matrices of expression levels of genes
using a variety of parameter settings and over different real (rows) under different experimental conditions (columns)
data sets, and assessed the robustness of the imputation and frequently with some values missing. Missing values
methods to the amount of missing data over the range of occur for diverse reasons, including insufficient resolution,
1–20% missing values. We show that KNNimpute appears image corruption, or simply due to dust or scratches on
to provide a more robust and sensitive method for missing the slide. Missing data may also occur systematically
value estimation than SVDimpute, and both SVDimpute as a result of the robotic methods used to create them.
and KNNimpute surpass the commonly used row average Our informal analysis of the distribution of missing
method (as well as filling missing values with zeros). We data in real samples shows a combination of all of
report results of the comparative experiments and provide these, but none dominating. Such suspicious data is
recommendations and tools for accurate estimation of usually manually flagged and excluded from subsequent
missing microarray data under a variety of conditions. analysis (Alizadeh et al., 2000). Many analysis methods,
Availability: The software is available at http://smi-web. such as principle components analysis or singular value
stanford.edu/projects/helix/pubs/impute/ decomposition, require complete matrices (Alter et al.,
Contact: russ.altman@stanford.edu 2000; Raychaudhuri et al., 2000). Of course, one solution
∗ To whom correspondence should be addressed. to the missing data problem is to repeat the experiment.
This strategy can be expensive, but has been used in

520 
c Oxford University Press 2001
Missing values in DNA microarrays

validation of microarray analysis algorithms (Butte et al., Each data set was pre-processed for the evaluation by
2001). Missing log2 transformed data are often replaced removing rows and columns containing missing expres-
by zeros (Alizadeh et al., 2000) or, less often, by an sion values, yielding ‘complete’ matrices. The methods
average expression over the row, or ‘row average’. This were then evaluated over each dataset as follows. Between
approach is not optimal, since these methods do not 1 and 20% of the data were deleted at random to create
take into consideration the correlation structure of the test data sets. Each method was then used to recover the
data. Thus, many analysis techniques, as well as other introduced missing values for each data set, and the esti-
analysis methods such as hierarchical clustering, k-means mated values were compared to those in the original data
clustering, and self-organizing maps, may benefit from set. The metric used to assess the accuracy of estimation
using more accurately estimated missing values. (henceforth referred to as normalized RMS error) was cal-
There is not a large published literature concerning culated as the Root Mean Squared (RMS) difference be-
missing value estimation for microarray data, but much tween the imputed matrix and the original matrix, divided
work has been devoted to similar problems in other fields. by the average data value in the complete data set. This
The question has been studied in contexts of non-response normalization allowed for comparison of estimation accu-

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014


issues in sample surveys and missing data in experiments racy between different data sets.
(Little and Rubin, 1987). Common methods include filling We examined different parameter sets for the KNN- and
in least squares estimates, iterative analysis of variance SVD-based algorithms. For KNN, the number of neigh-
methods (Yates, 1933), randomized inference methods, boring genes optimal for estimation was varied, whereas
and likelihood-based approaches (Wilkinson, 1958). for SVD, different numbers of principal components, here
An algorithm similar to nearest neighbors was used to termed ‘eigengenes’ in the sense of Alter et al. (2000),
handle missing values in CART-like algorithms (Loh and were used. Thus the experimental design allowed us to as-
Vanichsetakul, 1988). Most commonly applied statistical sess the accuracy of each method under different condi-
techniques for dealing with missing data are model-based tions (type of data, fraction of data missing) and determine
approaches. We have tried to minimize the influence of optimal parameters.
specific modeling assumptions in our methods.
In this work, we describe and evaluate three methods KNNimpute algorithm
of estimation for missing values in DNA microarrays. We The KNN-based method selects genes with expression
compare our KNN- and SVD-based methods to the row profiles similar to the gene of interest to impute missing
average method, which is likely the most sophisticated values. If we consider gene A that has one missing
estimation technique currently employed for microarray value in experiment 1, this method would find K other
missing data estimation. genes, which have a value present in experiment 1, with
expression most similar to A in experiments 2–N (where
N is the total number of experiments). A weighted average
SYSTEM AND METHODS
of values in experiment 1 from the K closest genes is
Experimental methods then used as an estimate for the missing value in gene A.
We implemented and evaluated three data imputation In the weighted average, the contribution of each gene is
methods: a method based on K Nearest Neighbors (KNN) weighted by similarity of its expression to that of gene A.
algorithm, a Singular Value Decomposition based method, After examining a number of metrics for gene similar-
and simple row (gene) average. ity (Pearson correlation, Euclidean distance, variance min-
Three microarray data sets were used: a study in yeast imization), we determined that Euclidean distance was a
Saccharomyces cerevisiae focusing on identification sufficiently accurate norm. This finding is somewhat sur-
of cell-cycle regulated genes (Spellman et al., 1998), prising, given that the Euclidean distance measure is often
an exploration of temporal gene expression during the sensitive to outliers, which could be present in microarray
metabolic shift from fermentation to respiration in Sac- data. However, we found that log-transforming the data
charomyces cerevisiae (DeRisi et al., 1997), and a study seems to sufficiently reduce the effect of outliers on gene
of response to environmental changes in yeast (Gasch similarity determination.
et al., 2000). Two of the datasets were time-series data
(DeRisi et al., 1997; Spellman et al., 1998) and one SVDimpute algorithm
contained a non-time series subset of experiments from In this method, we employ singular value decomposi-
Gasch et al. (2000). In addition, one of the time-series tion (1) to obtain a set of mutually orthogonal expression
data sets contained less apparent noise (Botstein, personal patterns that can be linearly combined to approximate the
communication) than the other. We refer to those data sets expression of all genes in the data set. These patterns,
by their characteristics: time series, noisy time series, and which in this case are identical to the principle compo-
non-time series. nents of the gene expression matrix, are further referred to

521
O.Troyanskaya et al.

as eigengenes (Alter et al., 2000; Anderson, 1984; Golub


and Van Loan, 1996). 0.22

Normalized RMS error


1% entries
0.21
missing
Am×n = Um×m m×n Vn×n
T
. (1) 0.2 5% entries
missing
0.19
10% entries
Matrix V T now contains eigengenes, whose contribution 0.18
missing
to the expression in the eigenspace is quantified by 15% entries
0.17 missing
corresponding eigenvalues on the diagonal of matrix 20% entries
0.16
. We then identify the most significant eigengenes missing

23

6
12

17

92
1

45

91
by sorting the eigengenes based on their corresponding Number of genes used as neighbors
eigenvalue. Although it has been shown by Alter et al.
(2000) that several significant eigengenes are sufficient to
describe most of the expression data, the exact fraction
Fig. 1. Effect of number of nearest neighbors used for KNN-based
of eigengenes best for estimation needs to be determined

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014


estimation on noisy time series data. Different curves correspond to
empirically. experiments performed for data sets with different percent of entries
Once k most significant eigengenes from V T are missing.
selected, we estimate a missing value j in gene i by first
regressing this gene against the k eigengenes and then use 16000
the coefficients of the regression to reconstruct j from a
14000
linear combination of the k eigengenes. The jth value of Count of errors in range
gene i and the jth values of the k eigengenes are not used 12000

in determining these regression coefficients. 10000


It should be noted that SVD can only be performed on
8000
complete matrices; therefore we originally substitute row
 6000
average for all missing values in matrix A, obtaining A .
We then utilize an expectation maximization method to 4000

arrive at the final estimate, as follows. Each missing value 2000



in A is estimated using the above algorithm, and then the 0
procedure is repeated on the newly obtained matrix, until 0 0.5 1
Normalized RMS error range
1.5

the total change in the matrix falls below the empirically


determined threshold of 0.01.
Fig. 2. Distribution of errors for KNN-based estimation on a noisy
RESULTS AND DISCUSSION time-series data set. Individual errors from estimation with K = 15
at 10% of data missing are displayed in a histogram. Most of the
KNNimpute normalized RMS errors are under 0.25.
Performance of the KNN-based method was assessed over
different data sets (both types of data and percent of
data missing) and over different values of K (Figure 1). Although a smaller percentage of missing data makes
The method is very accurate, with the estimated values data imputation more precise, the algorithm is robust to
showing only 6–26% average deviation from the true increasing the percent of values missing, with a maximum
values, depending on the type of data and fraction of of 10% decrease in accuracy with 20% of the data missing
values missing. Notably, this method is successful in (Figure 1). In addition, the method is relatively insensitive
accurate estimation of missing values for genes that are to the exact value of K within the range of 10–20
expressed in small clusters. Other methods, such as row neighbors (Figure 1). Performance declines when a lower
average and SVD, are likely to be more inaccurate on number of neighbors is used for estimation, primarily due
such clusters because the clusters themselves do not to overemphasis of a few dominant expression patterns.
contribute significantly to the global parameters upon However, when the same gene is present twice on the
which these methods rely. When errors for individual arrays, the method appropriately gives a very strong
values are considered, approximately 88% of the values weight to that gene in the estimation. The deterioration
are estimated with normalized RMS error under 0.25, with in performance at larger values of K (above 20) may be
KNN-based estimation for a noisy time series data set with explained as follows. First, the inclusion of expression
10% entries missing (Figure 2). Under low apparent noise patterns that are significantly different from the gene of
levels in time series data, as many as 94% of values are interest can decrease accuracy because the ‘neighborhood’
estimated within 0.25 of the original value. has become too large and not sufficiently relevant to the

522
Missing values in DNA microarrays

0.4 0.34
1% entries
Normalized RMS error

0.35 0.32 missing

Normalized error
0.3 0.3 5% entries
0.25 missing
0.28
KNN
0.2 10%
SVD 0.26
entries
0.15 missing
0.24
0.1 15%
0.22 entries
0.05 missing
0.2 20%
0
30 20 10 5 entries
6 7 8 9 10 11 12 13 14 missing
Percent eigengenes used
Number of arrays in data set

Fig. 4. Performance of SVD-based imputation with different


Fig. 3. Effect of reduction of array number on KNN- and SVD-based
fractions of eigengenes used for estimation. Normalized RMS error
estimation. On a time series data set, estimation was performed on
was assessed for a non-time course microarray (most challenging
matrices with successively lower number of columns. The SVD
estimation) with 5–30% eigengenes used. Different color curves

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014


algorithm could not be applied to matrices with less than eight
correspond to various percents of data missing from the data set.
columns.

data with low noise level (Figures 5 and 6). Under such
estimation problem. In fact, optimal selection of K likely conditions the method performs better than KNNimpute
depends on the average cluster size for the given data if the right number of eigengenes is used for estimation
set. Second, there may be significant noise present in (Figure 6). This likely reflects the signal-processing nature
microarray data. As K increases, the contribution of noise of the SVD-based method. When the expression data
to the estimate overwhelms the contribution of the signal, is dominated by the combined effect of strong patterns
leading to a decrease in accuracy. of regulation over time (as in time-series data), SVD is
To assess the variance in RMS error over repeated ideally suited to estimating expression of an individual
estimations for the same file with the same percent of gene in terms of these constituent patterns. In contrast,
missing values removed, we performed 60 additional runs the KNN-based method exhibits higher performance for
of missing value removal and subsequent estimation on both noisy time series data and non-time series data. As
one of the time series data sets. At 5% values missing SVD-based estimation is essentially a linear regression
and K = 123, the average RMS error was 0.203, with method in lower-dimensional space, this deterioration in
variance of 0.001. Thus, our evaluation method appears performance is not surprising for non-time series data,
to be reliable. where a clear expression pattern is often not present.
Although microarray experiments typically involve a The slightly lower sensitivity to noise compared to
large number of arrays, sometimes experimenters need KNNimpute is most likely due to the fact that expression
to analyze data sets with small numbers of experiments patterns for smaller groups of genes can sometimes not be
(columns in the matrix). KNNimpute can accurately sufficiently represented in the dominant eigengenes used
estimate data for matrices with as low as six columns for estimation.
(Figure 3). We do not recommend using this method on
matrices with less than four columns. Row average
Estimation by row (gene) average, although an im-
SVDimpute provement upon replacing missing values with zeros,
To determine the optimal parameter set for SVDimpute, yielded drastically lower accuracy than either KNN-
the method was evaluated using the most significant 5, 10, or SVD-based estimation (Figure 5). As expected, the
20, and 30% of the eigengenes for estimation (Figure 4). method performs most poorly on non-time series data
The most accurate estimation is achieved when approxi- (normalized RMS error of 0.40 and more), but error on
mately 20% of the eigengenes are used for estimation. In other data sets was also significantly higher than both of
contrast with KNNimpute, where the error curve appears the other methods. This is not surprising, since this row
relatively flat between 10 and 20 neighbors, performance averaging assumes that the expression of a gene in one of
of the SVD-based method deteriorates sharply as the num- the experiments is similar to its expression in a different
ber of eigengenes used is changed. experiment, which is often not true. In contrast to SVD
Although SVD-based estimation provides significantly and KNN, row average does not take advantage of the
higher accuracy than row average on all data sets, its rich information provided by the expression patterns of
performance is sensitive to the type of data being an- other genes (or even duplicate runs of the same gene) in
alyzed. SVDimpute yields best results on time-series the data set.

523
O.Troyanskaya et al.

0.25
CONCLUSIONS
Normalized RMS error

0.24 row
0.23 average KNN- and SVD-based methods provide fast and accurate
0.22
0.21 SVDimpute ways of estimating missing values for microarray data.
0.2 Both methods far surpass the currently accepted solutions
0.19 KNNimpute (filling missing values with zeros or row average) by
0.18
0.17 taking advantage of the correlation structure of the data to
0.16 filled with
zeros
estimate missing expression values. Based on the results
0.15
0 5 10 15 20
of our study, we recommend KNN-based method for
Percent of entries missing imputation of missing values.
Although both KNN and SVD methods are robust
to increasing the fraction of data missing, KNN-based
Fig. 5. Comparison of KNN, SVD, and row average based imputation shows less deterioration in performance with
estimations’ performance on a noisy time series data set. The same increasing percent of missing entries. In addition, the
data set (with identical entries missing) was used to assess the KNNimpute method is more robust than SVD to the type

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014


accuracy of each method, and normalized RMS error was plotted
of data for which estimation is performed, performing
as a function of fraction of values missing in the data.
better on non-time series or noisy data. KNNimpute is
also less sensitive to the exact parameters used (number of
0.3
time series nearest neighbors), whereas the SVD-based method shows
Normalized RMS error

KNN
0.25 sharp deterioration in performance when a non-optimal
non-time
0.2 series fraction of missing values is used. From the biological
KNN
0.15
noisy time
series
standpoint, KNNimpute has the advantage of providing
KNN
time series
accurate estimation for missing values in genes that belong
0.1 SVD to small tight expression clusters. Missing points for such
non-time
0.05
series genes could be estimated poorly by SVD-based estimation
0
SVD
noisy time if their expression pattern is not similar to any of the
series
0 5 10 15
Percent of entries missing
20
SVD eigengenes used for regression.
KNN-based imputation provides for a robust and sensi-
tive approach to estimating missing data for microarrays.
Fig. 6. Performance of KNNimpute and SVDimpute methods on However, it is important to exercise caution when drawing
different types of data as a function of entries missing. Best critical biological conclusions from data that is partially
performance of each of the methods was plotted. Three sets of imputed. The goal of this method is to provide an accurate
curves represent three data sets (non-time series—top, noisy time way of estimating missing values in order to minimally
series—middle, and time series—bottom). bias the performance of microarray analysis methods.
However, estimated data should be flagged where possible
Although an in-depth study was not performed on and its significance on the discovery of biological results
column average, some experiments were performed with should be assessed in order to avoid drawing unwarranted
this method and it does not yield satisfactory performance conclusions.
(results not shown).

Performance ACKNOWLEDGEMENTS
For a matrix of m rows (genes) and n columns (experi- We would like to thank Soumya Raychaudhari and
ments), the computational complexity of the KNNimpute Joshua Stuart for thoughtful comments on the manuscript
method is approximately O(m 2 n), assuming m  k and and discussions, and Orly Alter and Mike Liang for
fewer than 20% of the values missing. The computational helpful suggestions. O.T. is supported by a Howard
complexity of a full SVD calculation is O(n 2 m). How- Hughes Medical Institute predoctoral fellowship and by
ever, SVDimpute utilizes an expectation–maximization a Stanford Graduate Fellowship. M.C. is supported by
algorithm, thus bringing the complexity to O(n 2 mi), NIH training grant LM-07033. T.H. is partially supported
where i is the number of iterations performed before the by NSF grant DMS-9803645 and NIH grant ROI-CA-
threshold value is reached. The row average algorithm is 72028-01. R.T. is supported by the NIH grant 2 R01
the fastest, with computational complexity of O(nm). The CA72028, and NSF grant DMS-9971405. D.B. is partially
KNNimpute method, implemented in C++, takes 3.23 min supported by CA 77097 from the NCI. R.B.A. is supported
on a Pentium III 500 MHz computer to estimate missing by NIH-GM61374, NIH-LM06244, NSF DBI-9600637,
values for a data set with 6153 genes and 14 experiments, SUN Microsystems and a grant from the Burroughs-
with 10% of the entries missing. Wellcome Foundation.

524
Missing values in DNA microarrays

REFERENCES Caligiuri,M.A., Bloomfield,C.D. and Lander,E.S. (1999) Molec-


ular classification of cancer: class discovery and class prediction
Alizadeh,A.A., Eisen,M.B., Davis,R.E., Ma,C., Lossos,I.S., Rosen- by gene expression monitoring. Science, 286, 531–537.
wald,A., Boldrick,J.C., Sabet,H., Tran,T., Yu,X., Powell,J.I., Hastie,T., Tibshirani,R., Eisen,M., Alizadeh,A., Levy,R., Staudt,L.,
Yang,L., Marti,G.E., Moore,T., Hudson,Jr,J., Lu,L., Lewis,D.B., Chan,W., Botstein,D. and Brown,P.P. (2000) ‘Gene shaving’ as a
Tibshirani,R., Sherlock,G., Chan,W.C., Greiner,T.C., Weisen- method for identifying distinct sets of genes with similar expres-
burger,D.D., Armitage,J.O., Warnke,R. and Staudt,L.M., et al. sion patterns. Genome Biol., 1, research0003.1–research0003.21.
(2000) Distinct types of diffuse large B-cell lymphoma identified Heyer,L.J., Kruglyak,S. and Yooseph,S. (1999) Exploring expres-
by gene expression profiling. Nature, 403, 503–511. sion data: identification and analysis of coexpressed genes.
Alter,O., Brown,P.O. and Botstein,D. (2000) Singular value decom- Genome Res., 9, 1106–1115.
position for genome-wide expression data processing and mod- Little,R.J.A. and Rubin,D.B. (1987) Statistical Analysis with Miss-
eling. Proc. Natl Acad. Sci. USA, 97, 10101–10106. ing Data. Wiley, New York.
Anderson,T.W. (1984) An Introduction to Multivariate Statistical Loh,W. and Vanichsetakul,N. (1988) Tree-structured classification
Analysis. Wiley, New York. via generalized discriminant analysis. J. Am. Stat. Assoc., 83,
Brown,M.P., Grundy,W.N., Lin,D., Cristianini,N., Sugnet,C.W., 715–725.

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014


Furey,T.S., Ares,Jr.,M. and Haussler,D. (2000) Knowledge- Perou,C.M., Sorlie,T., Eisen,M.B., van de Rijn,M., Jeffrey,S.S.,
based analysis of microarray gene expression data by using Rees,C.A., Pollack,J.R., Ross,D.T., Johnsen,H., Akslen,L.A.,
support vector machines. Proc. Natl Acad. Sci. USA, 97, 262– Fluge,O., Pergamenschikov,A., Williams,C., Zhu,S.X., Lon-
267. ning,P.E., Borresen-Dale,A.L., Brown,P.O. and Botstein,D.
Butte,A.J. and Ye,J., et al. (2001) Determining significant fold (2000) Molecular portraits of human breast tumours. Nature,
differences in gene expression analysis. Pac. Symp. Biocomput., 406, 747–752.
6, 6–17. Raychaudhuri,S., Stuart,J.M. and Altman,R.B. (2000) Principal
Chu,S., DeRisi,J., Eisen,M., Mulholland,J., Botstein,D., Brown,P.O. components analysis to summarize microarray experiments:
and Herskowitz,I. (1998) The transcriptional program of sporu- application to sporulation time series. Pac. Symp. Biocomput.,
lation in budding yeast. Science, 282, 699–705. 455–466.
DeRisi,J.L., Iyer,V.R. and Brown,P.O. (1997) Exploring the Spellman,P.T., Sherlock,G., Zhang,M.Q., Iyer,V.R., Anders,K.,
metabolic and genetic control of gene expression on a genomic Eisen,M.B., Brown,P.O., Botstein,D. and Futcher,B. (1998)
scale. Science, 278, 680–686. Comprehensive identification of cell cycle-regulated genes of
Eisen,M.B., Spellman,P.T., Brown,P.O. and Botstein,D. (1998) the yeast Saccharomyces cerevisiae by microarray hybridization.
Cluster analysis and display of genome-wide expression patterns. Mol. Biol. Cell, 9, 3273–3297.
Proc. Natl Acad. Sci. USA, 95, 14863–14868. Tamayo,P., Slonim,D., Mesirov,J., Zhu,Q., Kitareewan,S., Dmitro-
Gasch,A.P., Spellman,P.T., Kao,C.M., Carmel-Harel,O., vsky,E., Lander,E.S. and Golub,T.R. (1999) Interpreting patterns
Eisen,M.B., Storz,G., Botstein,D. and Brown,P.O. (2000) of gene expression with self-organizing maps: methods and ap-
Genomic expression programs in the response of yeast cells to plication to hematopoietic differentiation. Proc. Natl Acad. Sci.
environmental changes. Mol. Biol. Cell., in press. USA, 96, 2907–2912.
Golub,G.H. and Van Loan,C.F. (1996) Matrix Computations. Johns Wilkinson,G.N. (1958) Estimation of missing values for the analysis
Hopkins University Press, Baltimore, MD. of incomplete data. Biometrics, 14, 257–286.
Golub,T.R., Slonim,D.K., Tamayo,P., Huard,C., Gaasen- Yates,Y. (1933) The analysis of replicated experiments when the
beek,M., Mesirov,J.P., Coller,H., Loh,M.L., Downing,J.R., field results are incomplete. Emp. J. Exp. Agric., 1, 129–142.

525

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy