0% found this document useful (0 votes)

38 views6 pages

Bioinformatics: Missing Value Estimation Methods For DNA Microarrays

Uploaded by

m.ansari722

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views6 pages

Bioinformatics: Missing Value Estimation Methods For DNA Microarrays

Uploaded by

m.ansari722

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Vol. 17 no.

6 2001
BIOINFORMATICS Pages 520–525

Missing value estimation methods for DNA

microarrays
Olga Troyanskaya 1, Michael Cantor 1, Gavin Sherlock 2,
Pat Brown 3, Trevor Hastie 4, Robert Tibshirani 4, David Botstein 2
and Russ B. Altman 1,∗
1 StanfordMedical Informatics, 2 Department of Genetics, Stanford University School
of Medicine, Stanford, CA, USA, 3 Department of Biochemistry, Stanford University
School of Medicine, and Howard Hughes Medical Institute, Stanford, CA, USA and
4 Departments of Statistics and Health Research and Policy, Stanford University,

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

Stanford, CA, USA

Received on November 13, 2000; revised on February 22, 2001; accepted on February 26, 2001

ABSTRACT INTRODUCTION
Motivation: Gene expression microarray experiments can DNA microarray technology allows for the monitoring
generate data sets with multiple missing expression val- of expression levels of thousands of genes under a
ues. Unfortunately, many algorithms for gene expression variety of conditions (DeRisi et al., 1997; Spellman
analysis require a complete matrix of gene array values as et al., 1998). Microarrays have been used to study a
input. For example, methods such as hierarchical cluster- variety of biological processes, from differential gene
ing and K-means clustering are not robust to missing data, expression in human tumors (Perou et al., 2000) to yeast
and may lose effectiveness even with a few missing values. sporulation (Chu et al., 1998). Various analysis techniques
Methods for imputing missing data are needed, therefore, have been developed, aimed primarily at identifying
to minimize the effect of incomplete data sets on analy- regulatory patterns or similarities in expression under
ses, and to increase the range of data sets to which these similar conditions. Commonly used analysis methods
algorithms can be applied. In this report, we investigate include clustering techniques (Eisen et al., 1998; Tamayo
automated methods for estimating missing data. et al., 1999), techniques based on partitioning of data
Results: We present a comparative study of several (Heyer et al., 1999; Tamayo et al., 1999), as well as
methods for the estimation of missing values in gene various supervised learning algorithms (Alter et al., 2000;
microarray data. We implemented and evaluated three Brown et al., 2000; Golub et al., 1999; Raychaudhuri et
methods: a Singular Value Decomposition (SVD) based al., 2000; Hastie et al., 2000).
method (SVDimpute), weighted K-nearest neighbors (KN- The data from microarray experiments is usually in
Nimpute), and row average. We evaluated the methods the form of large matrices of expression levels of genes
using a variety of parameter settings and over different real (rows) under different experimental conditions (columns)
data sets, and assessed the robustness of the imputation and frequently with some values missing. Missing values
methods to the amount of missing data over the range of occur for diverse reasons, including insufficient resolution,
1–20% missing values. We show that KNNimpute appears image corruption, or simply due to dust or scratches on
to provide a more robust and sensitive method for missing the slide. Missing data may also occur systematically
value estimation than SVDimpute, and both SVDimpute as a result of the robotic methods used to create them.
and KNNimpute surpass the commonly used row average Our informal analysis of the distribution of missing
method (as well as filling missing values with zeros). We data in real samples shows a combination of all of
report results of the comparative experiments and provide these, but none dominating. Such suspicious data is
recommendations and tools for accurate estimation of usually manually flagged and excluded from subsequent
missing microarray data under a variety of conditions. analysis (Alizadeh et al., 2000). Many analysis methods,
Availability: The software is available at http://smi-web. such as principle components analysis or singular value
stanford.edu/projects/helix/pubs/impute/ decomposition, require complete matrices (Alter et al.,
Contact: russ.altman@stanford.edu 2000; Raychaudhuri et al., 2000). Of course, one solution
∗ To whom correspondence should be addressed. to the missing data problem is to repeat the experiment.
This strategy can be expensive, but has been used in

520
c Oxford University Press 2001
Missing values in DNA microarrays

validation of microarray analysis algorithms (Butte et al., Each data set was pre-processed for the evaluation by
2001). Missing log2 transformed data are often replaced removing rows and columns containing missing expres-
by zeros (Alizadeh et al., 2000) or, less often, by an sion values, yielding ‘complete’ matrices. The methods
average expression over the row, or ‘row average’. This were then evaluated over each dataset as follows. Between
approach is not optimal, since these methods do not 1 and 20% of the data were deleted at random to create
take into consideration the correlation structure of the test data sets. Each method was then used to recover the
data. Thus, many analysis techniques, as well as other introduced missing values for each data set, and the esti-
analysis methods such as hierarchical clustering, k-means mated values were compared to those in the original data
clustering, and self-organizing maps, may benefit from set. The metric used to assess the accuracy of estimation
using more accurately estimated missing values. (henceforth referred to as normalized RMS error) was cal-
There is not a large published literature concerning culated as the Root Mean Squared (RMS) difference be-
missing value estimation for microarray data, but much tween the imputed matrix and the original matrix, divided
work has been devoted to similar problems in other fields. by the average data value in the complete data set. This
The question has been studied in contexts of non-response normalization allowed for comparison of estimation accu-

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

issues in sample surveys and missing data in experiments racy between different data sets.
(Little and Rubin, 1987). Common methods include filling We examined different parameter sets for the KNN- and
in least squares estimates, iterative analysis of variance SVD-based algorithms. For KNN, the number of neigh-
methods (Yates, 1933), randomized inference methods, boring genes optimal for estimation was varied, whereas
and likelihood-based approaches (Wilkinson, 1958). for SVD, different numbers of principal components, here
An algorithm similar to nearest neighbors was used to termed ‘eigengenes’ in the sense of Alter et al. (2000),
handle missing values in CART-like algorithms (Loh and were used. Thus the experimental design allowed us to as-
Vanichsetakul, 1988). Most commonly applied statistical sess the accuracy of each method under different condi-
techniques for dealing with missing data are model-based tions (type of data, fraction of data missing) and determine
approaches. We have tried to minimize the influence of optimal parameters.
specific modeling assumptions in our methods.
In this work, we describe and evaluate three methods KNNimpute algorithm
of estimation for missing values in DNA microarrays. We The KNN-based method selects genes with expression
compare our KNN- and SVD-based methods to the row profiles similar to the gene of interest to impute missing
average method, which is likely the most sophisticated values. If we consider gene A that has one missing
estimation technique currently employed for microarray value in experiment 1, this method would find K other
missing data estimation. genes, which have a value present in experiment 1, with
expression most similar to A in experiments 2–N (where
N is the total number of experiments). A weighted average
SYSTEM AND METHODS
of values in experiment 1 from the K closest genes is
Experimental methods then used as an estimate for the missing value in gene A.
We implemented and evaluated three data imputation In the weighted average, the contribution of each gene is
methods: a method based on K Nearest Neighbors (KNN) weighted by similarity of its expression to that of gene A.
algorithm, a Singular Value Decomposition based method, After examining a number of metrics for gene similar-
and simple row (gene) average. ity (Pearson correlation, Euclidean distance, variance min-
Three microarray data sets were used: a study in yeast imization), we determined that Euclidean distance was a
Saccharomyces cerevisiae focusing on identification sufficiently accurate norm. This finding is somewhat sur-
of cell-cycle regulated genes (Spellman et al., 1998), prising, given that the Euclidean distance measure is often
an exploration of temporal gene expression during the sensitive to outliers, which could be present in microarray
metabolic shift from fermentation to respiration in Sac- data. However, we found that log-transforming the data
charomyces cerevisiae (DeRisi et al., 1997), and a study seems to sufficiently reduce the effect of outliers on gene
of response to environmental changes in yeast (Gasch similarity determination.
et al., 2000). Two of the datasets were time-series data
(DeRisi et al., 1997; Spellman et al., 1998) and one SVDimpute algorithm
contained a non-time series subset of experiments from In this method, we employ singular value decomposi-
Gasch et al. (2000). In addition, one of the time-series tion (1) to obtain a set of mutually orthogonal expression
data sets contained less apparent noise (Botstein, personal patterns that can be linearly combined to approximate the
communication) than the other. We refer to those data sets expression of all genes in the data set. These patterns,
by their characteristics: time series, noisy time series, and which in this case are identical to the principle compo-
non-time series. nents of the gene expression matrix, are further referred to

521
O.Troyanskaya et al.

as eigengenes (Alter et al., 2000; Anderson, 1984; Golub

and Van Loan, 1996). 0.22

Normalized RMS error

1% entries
0.21
missing
Am×n = Um×m m×n Vn×n
T
. (1) 0.2 5% entries
missing
0.19
10% entries
Matrix V T now contains eigengenes, whose contribution 0.18
missing
to the expression in the eigenspace is quantified by 15% entries
0.17 missing
corresponding eigenvalues on the diagonal of matrix 20% entries
0.16
. We then identify the most significant eigengenes missing

6
12

92
1

91
by sorting the eigengenes based on their corresponding Number of genes used as neighbors
eigenvalue. Although it has been shown by Alter et al.
(2000) that several significant eigengenes are sufficient to
describe most of the expression data, the exact fraction
Fig. 1. Effect of number of nearest neighbors used for KNN-based
of eigengenes best for estimation needs to be determined

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

estimation on noisy time series data. Different curves correspond to
empirically. experiments performed for data sets with different percent of entries
Once k most significant eigengenes from V T are missing.
selected, we estimate a missing value j in gene i by first
regressing this gene against the k eigengenes and then use 16000
the coefficients of the regression to reconstruct j from a
14000
linear combination of the k eigengenes. The jth value of Count of errors in range
gene i and the jth values of the k eigengenes are not used 12000

in determining these regression coefficients. 10000

It should be noted that SVD can only be performed on
8000
complete matrices; therefore we originally substitute row
6000
average for all missing values in matrix A, obtaining A .
We then utilize an expectation maximization method to 4000

arrive at the final estimate, as follows. Each missing value 2000

in A is estimated using the above algorithm, and then the 0
procedure is repeated on the newly obtained matrix, until 0 0.5 1
Normalized RMS error range
1.5

the total change in the matrix falls below the empirically

determined threshold of 0.01.
Fig. 2. Distribution of errors for KNN-based estimation on a noisy
RESULTS AND DISCUSSION time-series data set. Individual errors from estimation with K = 15
at 10% of data missing are displayed in a histogram. Most of the
KNNimpute normalized RMS errors are under 0.25.
Performance of the KNN-based method was assessed over
different data sets (both types of data and percent of
data missing) and over different values of K (Figure 1). Although a smaller percentage of missing data makes
The method is very accurate, with the estimated values data imputation more precise, the algorithm is robust to
showing only 6–26% average deviation from the true increasing the percent of values missing, with a maximum
values, depending on the type of data and fraction of of 10% decrease in accuracy with 20% of the data missing
values missing. Notably, this method is successful in (Figure 1). In addition, the method is relatively insensitive
accurate estimation of missing values for genes that are to the exact value of K within the range of 10–20
expressed in small clusters. Other methods, such as row neighbors (Figure 1). Performance declines when a lower
average and SVD, are likely to be more inaccurate on number of neighbors is used for estimation, primarily due
such clusters because the clusters themselves do not to overemphasis of a few dominant expression patterns.
contribute significantly to the global parameters upon However, when the same gene is present twice on the
which these methods rely. When errors for individual arrays, the method appropriately gives a very strong
values are considered, approximately 88% of the values weight to that gene in the estimation. The deterioration
are estimated with normalized RMS error under 0.25, with in performance at larger values of K (above 20) may be
KNN-based estimation for a noisy time series data set with explained as follows. First, the inclusion of expression
10% entries missing (Figure 2). Under low apparent noise patterns that are significantly different from the gene of
levels in time series data, as many as 94% of values are interest can decrease accuracy because the ‘neighborhood’
estimated within 0.25 of the original value. has become too large and not sufficiently relevant to the

522
Missing values in DNA microarrays

0.4 0.34
1% entries
Normalized RMS error

0.35 0.32 missing

Normalized error
0.3 0.3 5% entries
0.25 missing
0.28
KNN
0.2 10%
SVD 0.26
entries
0.15 missing
0.24
0.1 15%
0.22 entries
0.05 missing
0.2 20%
0
30 20 10 5 entries
6 7 8 9 10 11 12 13 14 missing
Percent eigengenes used
Number of arrays in data set

Fig. 4. Performance of SVD-based imputation with different

Fig. 3. Effect of reduction of array number on KNN- and SVD-based
fractions of eigengenes used for estimation. Normalized RMS error
estimation. On a time series data set, estimation was performed on
was assessed for a non-time course microarray (most challenging
matrices with successively lower number of columns. The SVD
estimation) with 5–30% eigengenes used. Different color curves

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

algorithm could not be applied to matrices with less than eight
correspond to various percents of data missing from the data set.
columns.

data with low noise level (Figures 5 and 6). Under such
estimation problem. In fact, optimal selection of K likely conditions the method performs better than KNNimpute
depends on the average cluster size for the given data if the right number of eigengenes is used for estimation
set. Second, there may be significant noise present in (Figure 6). This likely reflects the signal-processing nature
microarray data. As K increases, the contribution of noise of the SVD-based method. When the expression data
to the estimate overwhelms the contribution of the signal, is dominated by the combined effect of strong patterns
leading to a decrease in accuracy. of regulation over time (as in time-series data), SVD is
To assess the variance in RMS error over repeated ideally suited to estimating expression of an individual
estimations for the same file with the same percent of gene in terms of these constituent patterns. In contrast,
missing values removed, we performed 60 additional runs the KNN-based method exhibits higher performance for
of missing value removal and subsequent estimation on both noisy time series data and non-time series data. As
one of the time series data sets. At 5% values missing SVD-based estimation is essentially a linear regression
and K = 123, the average RMS error was 0.203, with method in lower-dimensional space, this deterioration in
variance of 0.001. Thus, our evaluation method appears performance is not surprising for non-time series data,
to be reliable. where a clear expression pattern is often not present.
Although microarray experiments typically involve a The slightly lower sensitivity to noise compared to
large number of arrays, sometimes experimenters need KNNimpute is most likely due to the fact that expression
to analyze data sets with small numbers of experiments patterns for smaller groups of genes can sometimes not be
(columns in the matrix). KNNimpute can accurately sufficiently represented in the dominant eigengenes used
estimate data for matrices with as low as six columns for estimation.
(Figure 3). We do not recommend using this method on
matrices with less than four columns. Row average
Estimation by row (gene) average, although an im-
SVDimpute provement upon replacing missing values with zeros,
To determine the optimal parameter set for SVDimpute, yielded drastically lower accuracy than either KNN-
the method was evaluated using the most significant 5, 10, or SVD-based estimation (Figure 5). As expected, the
20, and 30% of the eigengenes for estimation (Figure 4). method performs most poorly on non-time series data
The most accurate estimation is achieved when approxi- (normalized RMS error of 0.40 and more), but error on
mately 20% of the eigengenes are used for estimation. In other data sets was also significantly higher than both of
contrast with KNNimpute, where the error curve appears the other methods. This is not surprising, since this row
relatively flat between 10 and 20 neighbors, performance averaging assumes that the expression of a gene in one of
of the SVD-based method deteriorates sharply as the num- the experiments is similar to its expression in a different
ber of eigengenes used is changed. experiment, which is often not true. In contrast to SVD
Although SVD-based estimation provides significantly and KNN, row average does not take advantage of the
higher accuracy than row average on all data sets, its rich information provided by the expression patterns of
performance is sensitive to the type of data being an- other genes (or even duplicate runs of the same gene) in
alyzed. SVDimpute yields best results on time-series the data set.

523
O.Troyanskaya et al.

0.25
CONCLUSIONS
Normalized RMS error

0.24 row
0.23 average KNN- and SVD-based methods provide fast and accurate
0.22
0.21 SVDimpute ways of estimating missing values for microarray data.
0.2 Both methods far surpass the currently accepted solutions
0.19 KNNimpute (filling missing values with zeros or row average) by
0.18
0.17 taking advantage of the correlation structure of the data to
0.16 filled with
zeros
estimate missing expression values. Based on the results
0.15
0 5 10 15 20
of our study, we recommend KNN-based method for
Percent of entries missing imputation of missing values.
Although both KNN and SVD methods are robust
to increasing the fraction of data missing, KNN-based
Fig. 5. Comparison of KNN, SVD, and row average based imputation shows less deterioration in performance with
estimations’ performance on a noisy time series data set. The same increasing percent of missing entries. In addition, the
data set (with identical entries missing) was used to assess the KNNimpute method is more robust than SVD to the type

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

accuracy of each method, and normalized RMS error was plotted
of data for which estimation is performed, performing
as a function of fraction of values missing in the data.
better on non-time series or noisy data. KNNimpute is
also less sensitive to the exact parameters used (number of
0.3
time series nearest neighbors), whereas the SVD-based method shows
Normalized RMS error

KNN
0.25 sharp deterioration in performance when a non-optimal
non-time
0.2 series fraction of missing values is used. From the biological
KNN
0.15
noisy time
series
standpoint, KNNimpute has the advantage of providing
KNN
time series
accurate estimation for missing values in genes that belong
0.1 SVD to small tight expression clusters. Missing points for such
non-time
0.05
series genes could be estimated poorly by SVD-based estimation
0
SVD
noisy time if their expression pattern is not similar to any of the
series
0 5 10 15
Percent of entries missing
20
SVD eigengenes used for regression.
KNN-based imputation provides for a robust and sensi-
tive approach to estimating missing data for microarrays.
Fig. 6. Performance of KNNimpute and SVDimpute methods on However, it is important to exercise caution when drawing
different types of data as a function of entries missing. Best critical biological conclusions from data that is partially
performance of each of the methods was plotted. Three sets of imputed. The goal of this method is to provide an accurate
curves represent three data sets (non-time series—top, noisy time way of estimating missing values in order to minimally
series—middle, and time series—bottom). bias the performance of microarray analysis methods.
However, estimated data should be flagged where possible
Although an in-depth study was not performed on and its significance on the discovery of biological results
column average, some experiments were performed with should be assessed in order to avoid drawing unwarranted
this method and it does not yield satisfactory performance conclusions.
(results not shown).

Performance ACKNOWLEDGEMENTS
For a matrix of m rows (genes) and n columns (experi- We would like to thank Soumya Raychaudhari and
ments), the computational complexity of the KNNimpute Joshua Stuart for thoughtful comments on the manuscript
method is approximately O(m 2 n), assuming m k and and discussions, and Orly Alter and Mike Liang for
fewer than 20% of the values missing. The computational helpful suggestions. O.T. is supported by a Howard
complexity of a full SVD calculation is O(n 2 m). How- Hughes Medical Institute predoctoral fellowship and by
ever, SVDimpute utilizes an expectation–maximization a Stanford Graduate Fellowship. M.C. is supported by
algorithm, thus bringing the complexity to O(n 2 mi), NIH training grant LM-07033. T.H. is partially supported
where i is the number of iterations performed before the by NSF grant DMS-9803645 and NIH grant ROI-CA-
threshold value is reached. The row average algorithm is 72028-01. R.T. is supported by the NIH grant 2 R01
the fastest, with computational complexity of O(nm). The CA72028, and NSF grant DMS-9971405. D.B. is partially
KNNimpute method, implemented in C++, takes 3.23 min supported by CA 77097 from the NCI. R.B.A. is supported
on a Pentium III 500 MHz computer to estimate missing by NIH-GM61374, NIH-LM06244, NSF DBI-9600637,
values for a data set with 6153 genes and 14 experiments, SUN Microsystems and a grant from the Burroughs-
with 10% of the entries missing. Wellcome Foundation.

524
Missing values in DNA microarrays

REFERENCES Caligiuri,M.A., Bloomfield,C.D. and Lander,E.S. (1999) Molec-

ular classification of cancer: class discovery and class prediction
Alizadeh,A.A., Eisen,M.B., Davis,R.E., Ma,C., Lossos,I.S., Rosen- by gene expression monitoring. Science, 286, 531–537.
wald,A., Boldrick,J.C., Sabet,H., Tran,T., Yu,X., Powell,J.I., Hastie,T., Tibshirani,R., Eisen,M., Alizadeh,A., Levy,R., Staudt,L.,
Yang,L., Marti,G.E., Moore,T., Hudson,Jr,J., Lu,L., Lewis,D.B., Chan,W., Botstein,D. and Brown,P.P. (2000) ‘Gene shaving’ as a
Tibshirani,R., Sherlock,G., Chan,W.C., Greiner,T.C., Weisen- method for identifying distinct sets of genes with similar expres-
burger,D.D., Armitage,J.O., Warnke,R. and Staudt,L.M., et al. sion patterns. Genome Biol., 1, research0003.1–research0003.21.
(2000) Distinct types of diffuse large B-cell lymphoma identified Heyer,L.J., Kruglyak,S. and Yooseph,S. (1999) Exploring expres-
by gene expression profiling. Nature, 403, 503–511. sion data: identification and analysis of coexpressed genes.
Alter,O., Brown,P.O. and Botstein,D. (2000) Singular value decom- Genome Res., 9, 1106–1115.
position for genome-wide expression data processing and mod- Little,R.J.A. and Rubin,D.B. (1987) Statistical Analysis with Miss-
eling. Proc. Natl Acad. Sci. USA, 97, 10101–10106. ing Data. Wiley, New York.
Anderson,T.W. (1984) An Introduction to Multivariate Statistical Loh,W. and Vanichsetakul,N. (1988) Tree-structured classification
Analysis. Wiley, New York. via generalized discriminant analysis. J. Am. Stat. Assoc., 83,
Brown,M.P., Grundy,W.N., Lin,D., Cristianini,N., Sugnet,C.W., 715–725.

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

Furey,T.S., Ares,Jr.,M. and Haussler,D. (2000) Knowledge- Perou,C.M., Sorlie,T., Eisen,M.B., van de Rijn,M., Jeffrey,S.S.,
based analysis of microarray gene expression data by using Rees,C.A., Pollack,J.R., Ross,D.T., Johnsen,H., Akslen,L.A.,
support vector machines. Proc. Natl Acad. Sci. USA, 97, 262– Fluge,O., Pergamenschikov,A., Williams,C., Zhu,S.X., Lon-
267. ning,P.E., Borresen-Dale,A.L., Brown,P.O. and Botstein,D.
Butte,A.J. and Ye,J., et al. (2001) Determining significant fold (2000) Molecular portraits of human breast tumours. Nature,
differences in gene expression analysis. Pac. Symp. Biocomput., 406, 747–752.
6, 6–17. Raychaudhuri,S., Stuart,J.M. and Altman,R.B. (2000) Principal
Chu,S., DeRisi,J., Eisen,M., Mulholland,J., Botstein,D., Brown,P.O. components analysis to summarize microarray experiments:
and Herskowitz,I. (1998) The transcriptional program of sporu- application to sporulation time series. Pac. Symp. Biocomput.,
lation in budding yeast. Science, 282, 699–705. 455–466.
DeRisi,J.L., Iyer,V.R. and Brown,P.O. (1997) Exploring the Spellman,P.T., Sherlock,G., Zhang,M.Q., Iyer,V.R., Anders,K.,
metabolic and genetic control of gene expression on a genomic Eisen,M.B., Brown,P.O., Botstein,D. and Futcher,B. (1998)
scale. Science, 278, 680–686. Comprehensive identification of cell cycle-regulated genes of
Eisen,M.B., Spellman,P.T., Brown,P.O. and Botstein,D. (1998) the yeast Saccharomyces cerevisiae by microarray hybridization.
Cluster analysis and display of genome-wide expression patterns. Mol. Biol. Cell, 9, 3273–3297.
Proc. Natl Acad. Sci. USA, 95, 14863–14868. Tamayo,P., Slonim,D., Mesirov,J., Zhu,Q., Kitareewan,S., Dmitro-
Gasch,A.P., Spellman,P.T., Kao,C.M., Carmel-Harel,O., vsky,E., Lander,E.S. and Golub,T.R. (1999) Interpreting patterns
Eisen,M.B., Storz,G., Botstein,D. and Brown,P.O. (2000) of gene expression with self-organizing maps: methods and ap-
Genomic expression programs in the response of yeast cells to plication to hematopoietic differentiation. Proc. Natl Acad. Sci.
environmental changes. Mol. Biol. Cell., in press. USA, 96, 2907–2912.
Golub,G.H. and Van Loan,C.F. (1996) Matrix Computations. Johns Wilkinson,G.N. (1958) Estimation of missing values for the analysis
Hopkins University Press, Baltimore, MD. of incomplete data. Biometrics, 14, 257–286.
Golub,T.R., Slonim,D.K., Tamayo,P., Huard,C., Gaasen- Yates,Y. (1933) The analysis of replicated experiments when the
beek,M., Mesirov,J.P., Coller,H., Loh,M.L., Downing,J.R., field results are incomplete. Emp. J. Exp. Agric., 1, 129–142.

525

Methods of Microarray Data Analysis III Papers From CAMDA 02 1st Edition DOCX PDF Download
100% (8)
Methods of Microarray Data Analysis III Papers From CAMDA 02 1st Edition DOCX PDF Download
14 pages
Improved Statistical Test
87% (172)
Improved Statistical Test
20 pages
Analysis of Microarray Gene Expression Data Ebook Full Text
100% (15)
Analysis of Microarray Gene Expression Data Ebook Full Text
17 pages
Methods Used For Identification of Differentially Expressing Genes (Degs) From Microarray Gene Dataset: A Review
No ratings yet
Methods Used For Identification of Differentially Expressing Genes (Degs) From Microarray Gene Dataset: A Review
8 pages
A Robust Missing Value Imputation Method Mifoimpute For Incomplete Molecular Descriptor Data and Comparative Analysis With Other Missing Value Imputation Methods
No ratings yet
A Robust Missing Value Imputation Method Mifoimpute For Incomplete Molecular Descriptor Data and Comparative Analysis With Other Missing Value Imputation Methods
12 pages
BMC Bioinformatics: A Meta-Data Based Method For DNA Microarray Imputation
No ratings yet
BMC Bioinformatics: A Meta-Data Based Method For DNA Microarray Imputation
10 pages
An Efficient Ensemble Method For Missing Value Imputation in Microarray Gene Expression Data
No ratings yet
An Efficient Ensemble Method For Missing Value Imputation in Microarray Gene Expression Data
25 pages
Handbook For V&V of Digital Systems
No ratings yet
Handbook For V&V of Digital Systems
282 pages
Business Mathematics Key Concepts of Ratio and Proportion: Quarter 1 Week 3 Module 3
No ratings yet
Business Mathematics Key Concepts of Ratio and Proportion: Quarter 1 Week 3 Module 3
13 pages
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
No ratings yet
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
8 pages
BMC Genetics: Imputation Methods For Missing Data For Polygenic Models
No ratings yet
BMC Genetics: Imputation Methods For Missing Data For Polygenic Models
4 pages
Khairul - Naim.bin - Ahmad 109213 PDF
100% (1)
Khairul - Naim.bin - Ahmad 109213 PDF
623 pages
Cienciadedatos
No ratings yet
Cienciadedatos
21 pages
Imputability
No ratings yet
Imputability
12 pages
Engineering Journal Missing Data Imputation Methods in Classification Contexts
No ratings yet
Engineering Journal Missing Data Imputation Methods in Classification Contexts
6 pages
Methods of Microarray Data Analysis III Papers From CAMDA 02 - 1st Edition Scribd PDF Download
No ratings yet
Methods of Microarray Data Analysis III Papers From CAMDA 02 - 1st Edition Scribd PDF Download
17 pages
Microarray Experiment Design
No ratings yet
Microarray Experiment Design
18 pages
25 Prasannajit Dash MicroarrayGeneExpression
No ratings yet
25 Prasannajit Dash MicroarrayGeneExpression
6 pages
Biostatistics Assignment: Dna Microarray: AN
No ratings yet
Biostatistics Assignment: Dna Microarray: AN
14 pages
Missing Data Imputation in Multivariate Data by Evolutionary Algorithms
No ratings yet
Missing Data Imputation in Multivariate Data by Evolutionary Algorithms
7 pages
IJDKP
No ratings yet
IJDKP
17 pages
Journal of Statistical Software: Imputation With The R Package VIM
No ratings yet
Journal of Statistical Software: Imputation With The R Package VIM
16 pages
Information Retrieval 8 Term Weighting A
No ratings yet
Information Retrieval 8 Term Weighting A
11 pages
PG2607 PJT11009 COL28204 Imputation Aware Design Whitepaper D1 CG - SMJ - AM
No ratings yet
PG2607 PJT11009 COL28204 Imputation Aware Design Whitepaper D1 CG - SMJ - AM
5 pages
A Comparison of Six Methods For Missing Data Imputation 2155 6180 1000224 PDF
No ratings yet
A Comparison of Six Methods For Missing Data Imputation 2155 6180 1000224 PDF
6 pages
Microarray Data Analysis
No ratings yet
Microarray Data Analysis
11 pages
Yana Bondarenko Statistical Analysis With Missing Values
No ratings yet
Yana Bondarenko Statistical Analysis With Missing Values
5 pages
GATE Electromagnetic Theory Book
No ratings yet
GATE Electromagnetic Theory Book
12 pages
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
No ratings yet
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
16 pages
Missing Value
No ratings yet
Missing Value
11 pages
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
No ratings yet
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
6 pages
Genes 13 01839 v2
No ratings yet
Genes 13 01839 v2
22 pages
8 Hron Et Al 2010
No ratings yet
8 Hron Et Al 2010
13 pages
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
No ratings yet
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
20 pages
Week 1 - Introduction To Discrete Structures
No ratings yet
Week 1 - Introduction To Discrete Structures
3 pages
1 Improved Statistical Test
No ratings yet
1 Improved Statistical Test
20 pages
Centraltendencywhattoconsider 1
No ratings yet
Centraltendencywhattoconsider 1
6 pages
Jin-Xing Liu - 2013 - Pmid23815087
No ratings yet
Jin-Xing Liu - 2013 - Pmid23815087
10 pages
ChemPhysChem - 2018 - Mayerhöfer - Beer S Law Why Absorbance Depends Almost Linearly On Concentration
No ratings yet
ChemPhysChem - 2018 - Mayerhöfer - Beer S Law Why Absorbance Depends Almost Linearly On Concentration
5 pages
JDS 612 PDF
No ratings yet
JDS 612 PDF
18 pages
Introduction To Data Science With R Programming
No ratings yet
Introduction To Data Science With R Programming
12 pages
Tolerances and Fits: Min Max
No ratings yet
Tolerances and Fits: Min Max
24 pages
Platias2020 Greece
No ratings yet
Platias2020 Greece
10 pages
Grade 6 DLL MATH 6 Q4 Week 3
100% (1)
Grade 6 DLL MATH 6 Q4 Week 3
8 pages
Wigner 1939
No ratings yet
Wigner 1939
56 pages
Unit 2 Notes - Docx-3
No ratings yet
Unit 2 Notes - Docx-3
14 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
Multivariate Exploratory
No ratings yet
Multivariate Exploratory
13 pages
Gsimp: A Gibbs Sampler Based Left-Censored Missing Value Imputation Approach For Metabolomics Studies
No ratings yet
Gsimp: A Gibbs Sampler Based Left-Censored Missing Value Imputation Approach For Metabolomics Studies
24 pages
Quarter 1-Module 5: Mathematics
100% (1)
Quarter 1-Module 5: Mathematics
14 pages
Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts
No ratings yet
Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts
4 pages
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
No ratings yet
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
398 pages
Communications in Computer and Information Science 298
No ratings yet
Communications in Computer and Information Science 298
614 pages
M Akaba 2019
No ratings yet
M Akaba 2019
7 pages
Ijctt V3i2p104
No ratings yet
Ijctt V3i2p104
5 pages
CO2 Ged102 pg.193
No ratings yet
CO2 Ged102 pg.193
3 pages
MMPBSA Python Manual
No ratings yet
MMPBSA Python Manual
17 pages
Electronic System Assistance For Grade 10 Mathematics of Rizal National Science High School
No ratings yet
Electronic System Assistance For Grade 10 Mathematics of Rizal National Science High School
15 pages
Missing Data Imputation Using Singular Value Decomposition
No ratings yet
Missing Data Imputation Using Singular Value Decomposition
6 pages
ISAT 600 Progress Report 2
No ratings yet
ISAT 600 Progress Report 2
6 pages
OMBC106 Research Methodology
No ratings yet
OMBC106 Research Methodology
13 pages
III-Day 37
No ratings yet
III-Day 37
3 pages
FEWS NET Matrix Example
No ratings yet
FEWS NET Matrix Example
10 pages
Roles of Imputation Methods For Filling The Missing Values: A Review
No ratings yet
Roles of Imputation Methods For Filling The Missing Values: A Review
9 pages
Ajol File Journals - 716 - Articles - 249533 - Submission - Proof - 249533 8452 596659 1 10 20230619
No ratings yet
Ajol File Journals - 716 - Articles - 249533 - Submission - Proof - 249533 8452 596659 1 10 20230619
21 pages
Project Planning and Approval Worksheet
100% (2)
Project Planning and Approval Worksheet
8 pages
Unit 1 Lesson 1-5
No ratings yet
Unit 1 Lesson 1-5
24 pages
Extra LPR281 Questions
No ratings yet
Extra LPR281 Questions
4 pages
AP Physics 1 Study Guide
No ratings yet
AP Physics 1 Study Guide
29 pages
Form 1 Term 2 Mathematics SOW 2024
No ratings yet
Form 1 Term 2 Mathematics SOW 2024
4 pages
Trading Strategies Market Colour Ravi Kashyap 2018
No ratings yet
Trading Strategies Market Colour Ravi Kashyap 2018
26 pages
Daily Lesson Log
No ratings yet
Daily Lesson Log
6 pages
19-10-2024 SR - Super60 Nucleus&Sterling-bt Jee-Main Rptm-11&14 Final Key
No ratings yet
19-10-2024 SR - Super60 Nucleus&Sterling-bt Jee-Main Rptm-11&14 Final Key
1 page
CH2114
No ratings yet
CH2114
2 pages
First Term MTH
No ratings yet
First Term MTH
2 pages
TMP 9 AA7
No ratings yet
TMP 9 AA7
12 pages
Model Systems in Biology: History, Philosophy, and Practical Concerns
From Everand
Model Systems in Biology: History, Philosophy, and Practical Concerns
Georg F. Striedter
No ratings yet
Biostatistics and Research Methodology
From Everand
Biostatistics and Research Methodology
Dr. G. Nageswara Rao
5/5 (5)
Advanced Mathematical Applications in Data Science
From Everand
Advanced Mathematical Applications in Data Science
Biswadip Basu Mallik
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
An Introduction to Statistical Genetic Data Analysis
From Everand
An Introduction to Statistical Genetic Data Analysis
Melinda C. Mills
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Biostatistics Explored Through R Software: An Overview
From Everand
Biostatistics Explored Through R Software: An Overview
Vinaitheerthan Renganathan
3.5/5 (2)
Smart Business Problems and Analytical Hints in Cancer Research
From Everand
Smart Business Problems and Analytical Hints in Cancer Research
Zemelak Goraga
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Bioinformatics: Missing Value Estimation Methods For DNA Microarrays

Uploaded by

Bioinformatics: Missing Value Estimation Methods For DNA Microarrays

Uploaded by

Vol. 17 no.

Missing value estimation methods for DNA

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

as eigengenes (Alter et al., 2000; Anderson, 1984; Golub

Normalized RMS error

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

in determining these regression coefficients. 10000

arrive at the final estimate, as follows. Each missing value 2000

the total change in the matrix falls below the empirically

0.35 0.32 missing

Fig. 4. Performance of SVD-based imputation with different

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

REFERENCES Caligiuri,M.A., Bloomfield,C.D. and Lander,E.S. (1999) Molec-

Downloaded from http://bioinformatics.oxfordjournals.org/ at Georgetown University on September 27, 2014

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.