Skip to main content
Log in

Gene ontology based quantitative index to select functionally diverse genes

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Among the large number of gene selection algorithms available in literature, the rough set based maximum relevance-maximum significance (RSMRMS) algorithm has been shown to be successful for selecting a set of relevant and significant genes from microarray data. However, the analysis of functional diversity of a gene set is essential to understand the role of genes in a particular disease as well as to evaluate the effectiveness of a gene selection algorithm. In this regard, a gene ontology based quantitative index, termed as degree of functional diversity (DoFD), is proposed to quantify the functional diversity of a set of genes selected by any gene selection algorithm. Moreover, a new gene selection algorithm is presented, integrating judiciously the merits of both DoFD and RSMRMS, to select relevant and significant genes those are also functionally diverse. The performance of the proposed gene selection algorithm, along with a comparison with other gene selection methods, is studied using the proposed DoFD and predictive accuracy of K-nearest neighbor rule and support vector machine on six cancer and one arthritis microarray data sets. An important finding is that the proposed gene ontology based quantitative index can accurately evaluate functional diversity of a set of genes. Also, the proposed gene selection algorithm is shown to be effective for selecting relevant, significant, and functionally diverse genes from microarray data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750

    Article  Google Scholar 

  2. Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134

    Article  Google Scholar 

  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  4. Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the computational systems bioinformatics, pp 523–528

  5. Du Z, Li L, Chen CF, Yu PS, Wang JZ (2009) G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Res 37:W345–W349

    Article  Google Scholar 

  6. Duan K, Rajapakse JC, Wang H, Azuaje F (2005) Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobiosci 4(3):228–234

    Article  Google Scholar 

  7. Duda RO, Hart PE, Stork DG (1999) Pattern classification and scene analysis. Wiley, New York

    Google Scholar 

  8. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  9. Gordon GJ, Jensen RV, Hsiao LL, Gullans, SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967

    Google Scholar 

  10. Hall M (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the seventeenth international conference on machine learning, pp 359–366

  11. Hu Q, Pan W, An S, Ma P, Wei J (2010) An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cybern 1(1–4):63–74

    Article  Google Scholar 

  12. Kang Y, Siegel PM, Shu W, Drobnjak M, Kakonen SM, Cardo CC, Guise TA, Massague J (2003) A multigenic program mediating breast cancer metastasis to bone. Cancer Cell 3(6):537G–549

    Article  Google Scholar 

  13. Kononenko I, Simec E, Sikonja MR (1997) Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl Intell 7:39–55

    Article  Google Scholar 

  14. Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of 15th international conference on machine learning, pp 296–304

  15. Loennstedt I, Speed TP (2002) Replicated microarray data. Stat Sin 12:31–46

    MATH  Google Scholar 

  16. Maji P (2009) f-information measures for efficient selection of discriminative genes from microarray data. IEEE Trans Biomed Eng 56(4):1063–1069

    Article  MathSciNet  Google Scholar 

  17. Maji P, Pal SK (2010) Feature selection using f-information measures in fuzzy approximation spaces. IEEE Trans Knowl Data Eng 22(6):854–867

    Article  Google Scholar 

  18. Maji P, Pal SK (2010) Fuzzy-rough sets for information measures and selection of relevant genes from microarray data. IEEE Trans Syst Man Cybern B Cybern 40(3):741–752

    Article  Google Scholar 

  19. Maji P, Paul S (2010) Rough sets for selection of molecular descriptors to predict biological activity of molecules. IEEE Trans Syst Man Cybern C Appl Rev 40(6):639–648

    Article  Google Scholar 

  20. Maji P, Paul S (2011) Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. Int J Approx Reason 52(3):408–426

    Article  Google Scholar 

  21. Pawlak Z (1991) Rough sets, theoretical aspects of resoning about data. Kluwer, Dordrecht

    Google Scholar 

  22. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  23. Pevsner J (2009) Bioinformatics and functional genomics. Wiley, New York

  24. van der Pouw Kraan TCTM, van Gaalen FA, Kasperkovitz PV, Verbeet NL, Smeets TJM, Kraan MC, Fero M, Tak PP, Huizinga TWJ, Pieterman E, Breedveld FC, Alizadeh AA, Verweij CL (2003) Rheumatoid arthritis is a heterogeneous disease: evidence for differences in the activation of the STAT-1 pathway between rheumatoid tissues. Arthritis Rheum 48(8):2132–2145

    Article  Google Scholar 

  25. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th international joint conference on artificial intelligence, pp 448–453

  26. Sharma A, Imoto S, Miyano S, Sharma V (2011) Null space based feature selection method for gene expression data. Int J Mach Learn Cybern

  27. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR (2002) Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nat Med 8(1):68–74

    Article  Google Scholar 

  28. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Res 1:203–209

    Google Scholar 

  29. Slavkov I, Gjorgjioski V, Struyf J, Deroski S (2010) Finding explained groups of time-course gene expression profiles with predictive clustering trees. Mol Biosyst 6:729–740

    Article  Google Scholar 

  30. Tusher V, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121

    Article  MATH  Google Scholar 

  31. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  32. Wang H, Azuaje F, Bodenreider O, Dopazo J (2004) Gene Expression Correlation and Gene Ontology-Based Similarity: An Assessment of Quantitative Relationships. In: Proceedings of IEEE Symposium Computational Intelligence in Bioinformatics and Computational Biology, pp. 25–31

  33. Wang X, Dong C (2009) Improving Generalization of Fuzzy IF-THEN Rules by Maximizing Fuzzy Entropy. IEEE Transactions on Fuzzy Systems 17(3):556–567

    Article  Google Scholar 

  34. Wang X, Dong L, Yan J (2012) Maximum Ambiguity Based Sample Selection in Fuzzy Decision Tree Induction. IEEE Transactions on Knowledge and Data Engineering 24(8):1491–1505

    Article  Google Scholar 

  35. West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR (2001) Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles. Proceedings of the National Academy of Sciences, USA 98(20):11462–11467

    Article  Google Scholar 

Download references

Acknowledgments

The work was done when one of the authors, S. Paul, was a Senior Research Fellow of Council of Scientific and Industrial Research, Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sushmita Paul.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paul, S., Maji, P. Gene ontology based quantitative index to select functionally diverse genes. Int. J. Mach. Learn. & Cyber. 5, 245–262 (2014). https://doi.org/10.1007/s13042-012-0133-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-012-0133-5

Keywords

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy