Abstract
In the context of microarray data analysis, biclustering allows the simultaneous identification of a maximum group of genes that show highly correlated expression patterns through a maximum group of experimental conditions (samples). This paper introduces a heuristic algorithm called BicFinder (The BicFinder software is available at: http://www.info.univ-angers.fr/pub/hao/BicFinder.html) for extracting biclusters from microarray data. BicFinder relies on a new evaluation function called Average Correspondence Similarity Index (ACSI) to assess the coherence of a given bicluster and utilizes a directed acyclic graph to construct its biclusters. The performance of BicFinder is evaluated on synthetic and three DNA microarray datasets. We test the biological significance using a gene annotation web-tool to show that our proposed algorithm is able to produce biologically relevant biclusters. Experimental results show that BicFinder is able to identify coherent and overlapping biclusters.
Similar content being viewed by others
References
Aguilar-Ruiz JS (2005) Shifting and scaling patterns from gene expression data. Bioinformatics 21: 3840–3845
Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2010) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst, Published online: 10 March 2010
Alizadeh AA, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403: 503–511
Angiulli F, Cesario E, Pizzuti C (2008) Random walk biclustering for microarray data. J Inf Sci 178: 1479–1497
Ayadi W, Elloumi M (2011) Algorithms in computational molecular biology: techniques, approaches and applications, chapter biclustering of microarray data. In: Wiley book series on bioinformatics : computational techniques and engineering, Wiley-Blackwell, John Wiley & Sons Ltd., New Jersey (Publish.) (to appear)
Ayadi W, Elloumi M, Hao JK (2009) A biclustering algorithm based on a bicluster enumeration tree: application to dna microarray data. BioData Min 2(1): 9
Balasubramaniyan R, llermeier H, Weskamp E, Kamper J (2005) Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 21: 1069–1077
Barkow S, Bleuler S, Prelic A, Zimmermann P, Zitzler E (2006) Bicat: a biclustering analysis toolbox. Bioinformatics 22(10): 1282–1283
Ben-Dor A, Chor B, Karp R, Yakhini Z (2002) Discovering local structure in gene expression data: the order-preserving submatrix problem. In: RECOMB ’02: proceedings of the sixth annual international conference on computational biology. ACM, New York, pp 49–57
Bergmann S, Ihmels J, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13): 1993–2003
Berriz GF, King OD, Bryant B, Sander C, Roth FP (2003) Characterizing gene sets with funcassociate. Bioinformatics 19(18): 2502–2504
Bleuler S, Prelic A, Zitzler E (2004) An ea framework for biclustering of gene expression data. In: Proceedings of congress on evolutionary computation. pp 166–173
Bryan K, Cunningham P, Bolshakova N (2006) Application of simulated annealing to the biclustering of gene expression data. In: IEEE Transactions on information technology on biomedicine, 10(3): 519–525
Cano C, Adarve L, Lopez J, Blanco A (2007) Possibilistic approach for biclustering microarray data. In: Computers in biology and medicine, 37, pp 1426–1436
Cheng KO, Law NF, Siu WC, Liew AW (2008) Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization. BMC Bioinformatics 9(210): 1282–1283
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology. AAAI Press, pp 93–103
Cheng Y, Church GM (2006) Biclustering of expression data. Technical report (supplementary information)
Christinat Y, Wachmann B, Zhang L (2008) Gene expression data analysis using a novel approach to biclustering combining discrete and continuous data. IEEE/ACM Trans Comput Biol Bioinform 5(4): 583–593
Dharan A, Nair AS (2009) Biclustering of gene expression data using reactive greedy randomized adaptive search procedure. BMC Bioinform 10(Suppl 1): S27
Dimaggio P, Mcallister S, Floudas C (2008) Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies. BMC Bioinform 9(1):458
Divina F, Aguilar-Ruiz JS (2007) A multi-objective approach to discover biclusters in microarray data. In: GECCO ’07: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, New York
Gallo CA, Carballido JA, Ponzoni I (2009) Microarray biclustering: A novel memetic approach based on the pisa platform. In: EvoBIO ’09: Proceedings of the 7th European conference on evolutionary computation, machine learning and data mining in bioinformatics. Springer, Berlin, pp 44–55
Hartigan JA (1972) Direct clustering of a data matrix. J American Statistical Association 67(337): 123–129
Jiang D, Pei J, Ramanathan M, Lin C, Tang C, Zhang A (2007) Mining gene-sample-time microarray data: a coherent gene cluster discovery approach. Knowl Inf Syst 13(3): 305–335
Lehmann EL, D’Abrera HJM (1998) Nonparametrics: statistical methods based on ranks. Prentice-Hall, rev. ed. Englewood Cliffs, NJ, pp 292–323
Liu J, Li Z, Hu X, Chen Y (2009) Biclustering of microarray data with MOSPO based on crowding distance. BMC Bioinform 10(S–4)
Liu J, Wang W (2003) Op-cluster: clustering by tendency in high dimensional space. IEEE Int Conf Data Min. ISBN 0-7695-1978-4, pp 187–194
Liu JW, Li ZJ, Liu FF, Chen YM (2008) Multi-objective particle swarm optimization biclustering of microarray data. In: IEEE international conference on bioinformatics and biomedicine(BIBM 2008). IEEE Computer Society, Washington, pp 363–366
Liu X, Wang L (2007) Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23(1): 50–56
Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects model with b-splines. Bioinformatics 19: 474–482
Madeira SaraC, Oliveira ArlindoL (2004) Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 1(1): 24–45
Madeira SC, Oliveira AL (2009) A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series. Algorithms Mol Biol 4: 8
Maulik U, Mukhopadhyay A, Bandyopadhyay S (2009) Combining pareto-optimal clusters using supervised learning for identifying co-expressed genes. BMC Bioinform 10: 27
Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn 39(12): 2464–2477
Myers JL, Arnold DW (2003) Research design and statistical analysis
Okada Y, Okubo K, Horton P, Fujibuchi W (2007) Exhaustive search method of gene expression modules and its application to human tissue data. In: IAENG international journal of computer science, 34, pp 1–16
Peddada SD, Lobenhofer EK, Li L, Afshari CA, Weinberg CR, Umbach DM (2003) Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19: 834–841
Pontes B, Divina F, Giráldez R, Aguilar-Ruiz JS (2007) Virtual error: a new measure for evolutionary biclustering. In: Evolutionary computation, machine learning and data mining in bioinformatics. pp 217–226
Prelic A, Bleuler S, Zimmermann P, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9): 1122–1129
Schliep A, Schonhuth A, Steinhoff C (2003) Using hidden markov models to analyze gene expression time course data. Bioinformatics 19: i255–i263
Son YS, Baek J (2008) A modified correlation coefficient based similarity measure for clustering time-course gene expression data. Pattern Recognit Lett 29(3): 232–242
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18: S136–S144
Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nat Genet 22: 281–285
Teng L, Chan L (2008) Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. J Signal Process Syst 50(3): 267–280
Wei JM, Wang SQ, Yuan XJ (2010) Ensemble rough hypercuboid approach for classifying cancers. IEEE Trans Knowl Data Eng 22(3): 381–391
Yang J, Wang H, Wang W, Yu P (2003) Enhanced biclustering on expression data. In: BIBE ’03: Proceedings of the 3rd IEEE symposium on bioInformatics and bioengineering. IEEE Computer Society, Washington, p 321
Zhang Z, Teo A, Ooi BC, Tan KL (2004) Mining deterministic biclusters in gene expression data. Bioinformatic and bioengineering, IEEE international symposium on, pp 283–290
Zhao H, Liew A, Xie X, Yan H (2008) A new geometric biclustering algorithm based on the hough transform for analysis of large scale microarray data. J Theoretical Biol 251: 264–274
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ayadi, W., Elloumi, M. & Hao, JK. BicFinder: a biclustering algorithm for microarray data analysis. Knowl Inf Syst 30, 341–358 (2012). https://doi.org/10.1007/s10115-011-0383-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0383-7