Abstract
In the Knowledge Discovery in Databases (KDD) field, the human comprehensibility of models is as important as the accuracy optimization. To address this problem, many methods have been proposed to simplify decision trees and improve their understandability. Among different classes of methods, we find strategies which deal with this problem by a priori reducing the database, either through feature selection or case selection. At the same time, many other efficient selection algorithms have been developed in order to reduce storage requirements of case-based learning algorithms. Therefore, their original aim is not the tree simplification. Surprisingly, as far as we know, few works have attempted to exploit this wealth of efficient algorithms in favor of knowledge discovery. This is the aim of this paper. We analyze through large experiments and discussions the contribution of the state-of-the-art reduction techniques to tree simplification. Moreover, we propose an original mixed procedure which deals with the selection problem by jointly removing features and instances. We show that in some cases, this algorithm is very efficient to improve the standard post-pruning performances, used to combat the overfitting problem.
Chapter PDF
Similar content being viewed by others
References
H. Almuallim and T.G. Dietterich. Learning with many irrelevant features. In Ninth National Conference on Artificial Intelligence, pages 547–552, 1991.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J Stone. Classification And Regression Trees. Chapman & Hall, 1984.
L.A. Breslow and D.W. Aha. Simplifying decision trees: A survey, to appear inKnowledge Engineering Review, 1997.
C.E. Brodley and M.A. Friedl. Identifying and eliminating mislabeled training instances. In Thirteen National Conference on Artificial Intelligence, 1996.
K.J. Cherkauer and J.W. Shavlik. Growing simpler decision trees to facilitate knowledge discovery. In Second International Conference on Knowledge Discovery and Data Mining, 1996.
G.W. Gates. The reduced nearest neighbor rule. IEEE Trans. Inform. Theory, pages 431–433, 1972.
P. E. Hart. The condensed nearest neighbor rule. IEEE Trans. Inform. Theory, pages 515–516, 1968.
G. H. John. Robust decision trees: Removing outliers from databases. In First International Conference on Knowledge Discovery and Data Mining, pages 174–179, 1995.
G.H. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problem. In Eleventh International Conference on Machine Learning, pages 121–129, 1994.
M.J. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 459–468, 1996.
J.R. Quinlan. Induction of decision trees.Machine Learning, 1:81–106, 1986
J.R. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann, 1993.
R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual ACM Conference on Computational Learning Theory, pages 80–91, 1998.
M. Sebban and R. Nock. Combining feature and example pruning by uncertainty minimization. In Sixteenth Conference on Uncertainty in Artificial Intelligence, 2000.
M. Sebban and R. Nock. Instance pruning as an information preserving problem. In Seventeenth International Conference on Machine Learning, 2000.
Thrun ET AL. The monk’s problem: a performance comparison of different learning algorithms. Technical reportCMU-CS 91-197-Carnegie Mellon University, 1991.
P.E. Utgoff. An improved algorithm for incremental induction of decision trees. In Eleventh Iternationla Conference on Machine Learning, pages 318–325, 1994.
D.R. Wilson and T.R. Martinez. Instance pruning techniques. In Fourteenth International Conference on Machine Learning, pages 404–411, 1997
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sebban, M., Nock, R. (2000). Contribution of Dataset Reduction Techniques to Tree-Simplification and Knowledge Discovery. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_5
Download citation
DOI: https://doi.org/10.1007/3-540-45372-5_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive