Abstract
We examine a new approach to building decision tree by introducing a geometric splitting criterion, based on the properties of a family of metrics on the space of partitions of a finite set. This criterion can be adapted to the characteristics of the data sets and the needs of the users and yields decision trees that have smaller sizes and fewer leaves than the trees built with standard methods and have comparable or better accuracy.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lerman, I.C.: Classification et analyse ordinale des données, Dunod, Paris (1981)
Daróczy, Z.: Generalized information functions. Information and Control 16, 36–51 (1970)
Simovici, D.A., Jaroszewicz, S.: An axiomatization of partition entropy. IEEE Transactions on Information Theory 48, 2138–2142 (2002)
de Mántaras, R.L.: A distance-based attribute selection measure for decision tree induction. Machine Learning 6, 81–92 (1991)
Simovici, D.A., Jaroszewicz, S.: Generalized entropy and decision trees. In: EGC 2003 - Journees francophones d’Extraction et de Gestion de Connaissances, Lyon, France, pp. 369–380 (2003)
Birkhoff, G.: Lattice Theory. American Mathematical Society, Providence (1973)
Barthélemy, J., Leclerc, B.: The median procedure for partitions. In: Partitioning Data Sets, Providence, American Mathematical Society, pp. 3–34 (1995)
Barthélemy, J.: Remarques sur les propriétés metriques des ensembles ordonnés. Math. Sci. hum. 61, 39–60 (1978)
Monjardet, B.: Metrics on partially ordered sets – a survey. Discrete Mathematics 35, 173–184 (1981)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Addison-Wesley, Boston (2005)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, Boca Raton (1998)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Simovici, D.A., Singla, N., Kuperberg, M.: Metric incremental clustering of nominal data. In: Proceedings of ICDM 2004, Brighton, UK, pp. 523–527 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Simovici, D.A., Jaroszewicz, S. (2006). Generalized Conditional Entropy and a Metric Splitting Criterion for Decision Trees. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_7
Download citation
DOI: https://doi.org/10.1007/11731139_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)