Contribution of Dataset Reduction Techniques to Tree-Simplification and Knowledge Discovery

Sebban, Marc; Nock, Richard

doi:10.1007/3-540-45372-5_5

Marc Sebban⁴ &
Richard Nock⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1910))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2791 Accesses

Abstract

In the Knowledge Discovery in Databases (KDD) field, the human comprehensibility of models is as important as the accuracy optimization. To address this problem, many methods have been proposed to simplify decision trees and improve their understandability. Among different classes of methods, we find strategies which deal with this problem by a priori reducing the database, either through feature selection or case selection. At the same time, many other efficient selection algorithms have been developed in order to reduce storage requirements of case-based learning algorithms. Therefore, their original aim is not the tree simplification. Surprisingly, as far as we know, few works have attempted to exploit this wealth of efficient algorithms in favor of knowledge discovery. This is the aim of this paper. We analyze through large experiments and discussions the contribution of the state-of-the-art reduction techniques to tree simplification. Moreover, we propose an original mixed procedure which deals with the selection problem by jointly removing features and instances. We show that in some cases, this algorithm is very efficient to improve the standard post-pruning performances, used to combat the overfitting problem.

Download to read the full chapter text

Chapter PDF

A two-stage discretization algorithm based on information entropy

Article 24 May 2017

Frequent Pattern Discovery as Table Constraint Satisfaction Problem

Explainable models via compression of tree ensembles

Article 29 November 2023

References

H. Almuallim and T.G. Dietterich. Learning with many irrelevant features. In Ninth National Conference on Artificial Intelligence, pages 547–552, 1991.
Google Scholar
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J Stone. Classification And Regression Trees. Chapman & Hall, 1984.
Google Scholar
L.A. Breslow and D.W. Aha. Simplifying decision trees: A survey, to appear inKnowledge Engineering Review, 1997.
Google Scholar
C.E. Brodley and M.A. Friedl. Identifying and eliminating mislabeled training instances. In Thirteen National Conference on Artificial Intelligence, 1996.
Google Scholar
K.J. Cherkauer and J.W. Shavlik. Growing simpler decision trees to facilitate knowledge discovery. In Second International Conference on Knowledge Discovery and Data Mining, 1996.
Google Scholar
G.W. Gates. The reduced nearest neighbor rule. IEEE Trans. Inform. Theory, pages 431–433, 1972.
Google Scholar
P. E. Hart. The condensed nearest neighbor rule. IEEE Trans. Inform. Theory, pages 515–516, 1968.
Google Scholar
G. H. John. Robust decision trees: Removing outliers from databases. In First International Conference on Knowledge Discovery and Data Mining, pages 174–179, 1995.
Google Scholar
G.H. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problem. In Eleventh International Conference on Machine Learning, pages 121–129, 1994.
Google Scholar
M.J. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 459–468, 1996.
Google Scholar
J.R. Quinlan. Induction of decision trees.Machine Learning, 1:81–106, 1986
Google Scholar
J.R. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann, 1993.
Google Scholar
R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual ACM Conference on Computational Learning Theory, pages 80–91, 1998.
Google Scholar
M. Sebban and R. Nock. Combining feature and example pruning by uncertainty minimization. In Sixteenth Conference on Uncertainty in Artificial Intelligence, 2000.
Google Scholar
M. Sebban and R. Nock. Instance pruning as an information preserving problem. In Seventeenth International Conference on Machine Learning, 2000.
Google Scholar
Thrun ET AL. The monk’s problem: a performance comparison of different learning algorithms. Technical reportCMU-CS 91-197-Carnegie Mellon University, 1991.
Google Scholar
P.E. Utgoff. An improved algorithm for incremental induction of decision trees. In Eleventh Iternationla Conference on Machine Learning, pages 318–325, 1994.
Google Scholar
D.R. Wilson and T.R. Martinez. Instance pruning techniques. In Fourteenth International Conference on Machine Learning, pages 404–411, 1997
Google Scholar

Download references

Author information

Authors and Affiliations

French West Indies and Guiana University Department of Mathematics and Computer Science, TRIVIA Research Team, 95159 - Pointe-à-Pitre, France
Marc Sebban & Richard Nock

Authors

Marc Sebban
View author publications
You can also search for this author in PubMed Google Scholar
Richard Nock
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, Norwegian University of Science and Technology, O.S. Bragstads plass 2E, 7491, Trondheim, Norway
Jan Komorowski
Department of Computer Science, University of North Carolina, Charlotte, NC 28223, USA
Jan Żytkow
Laboratoire ERIC, Université Lyon 2, 5 avenue Pierre Mendès-France, 69676, Bron, France
Djamel A. Zighed

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sebban, M., Nock, R. (2000). Contribution of Dataset Reduction Techniques to Tree-Simplification and Knowledge Discovery. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_5

Download citation

DOI: https://doi.org/10.1007/3-540-45372-5_5
Published: 18 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Contribution of Dataset Reduction Techniques to Tree-Simplification and Knowledge Discovery

Abstract

Chapter PDF

Similar content being viewed by others

A two-stage discretization algorithm based on information entropy

Frequent Pattern Discovery as Table Constraint Satisfaction Problem

Explainable models via compression of tree ensembles

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Contribution of Dataset Reduction Techniques to Tree-Simplification and Knowledge Discovery

Abstract

Chapter PDF

Similar content being viewed by others

A two-stage discretization algorithm based on information entropy

Frequent Pattern Discovery as Table Constraint Satisfaction Problem

Explainable models via compression of tree ensembles

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.