Abstract
Knowledge discovery is the process of identifying useful patterns from large data sets. There are two families of approaches to be used for knowledge discovery: clustering, when the classes of domain objects are not known; and inductive learning algorithms, when the classes are known and the goal is to construct a domain model useful to identify new unseen objects. Clustering algorithms have also been proposed to analyze the data when the classes are known. However, to our knowledge, inductive learning methods are not used to analyze the available data but only for prediction. What we propose here is a methodology, namely FTree, that uses a decision tree to analyze both the available data identifying patterns and some important aspects of the domain (at least from the domain’s part represented by the data at hand) such as similarity between classes, separability, characterization of classes and even some possible errors on data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
E. Armengol. Usages of generalization in CBR. In R.O. Weber and M. M. Richter, editors, ICCBR-2007. Case-based Reasoning and Development, number 4626 in Lecture Notes in Artificial Intelligence, pages 31–45. Springer-Verlag, 2007.
E. Armengol. Building partial domain theories from explanations. Knowledge Intelligence, 2/08:19–24, 2008.
E. Armengol and E. Plaza. Discovery of toxicological patterns with lazy learning. In V. Palade, R.J. Howlett, and L. Jain, editors, KES-2003, number 2774 in Lecture Notes in Artificial Intelligence, pages 919–926. Springer, 2003.
A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984.
J. Gehrke, R. Ramakrishnan, and V. Ganti. RainForest - a framework for fast decision tree construction of large datasets. Data Mining and Knowledge Discovery, 4(2/3):127–162, 2000.
L.E. Gómez, M.A. Verdugo, B. Arias and R.L. Schalock. Formulari de l’escala gencat de qualitat de vida. manual d’aplicació de l’escala gencat de qualitat de vida. Technical report, Departament d’Acció Social i Ciutadania, Generalitat de Catalunya, Barcelona, 2008.
L.E. Gómez, M.A. Verdugo, B. Arias and R.L. Schalock. Informe sobre la creació d’una escala multidimensional per avaluar la qualitat de vida de les persones usuàries dels serveis socials a catalunya. Technical report, Departament d’Acció Social i Ciutadania, Generalitat de Catalunya, Barcelona, 2008.
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31(3):264–323, September 1999.
T. Kohonen. The self-organizing map. Neurocomputing, 21(1-3):1–6, 1998.
R. López de Mántaras. A distance-based attribute selection measure for decision tree induction. Machine Learning, 6:81–92, 1991.
O. Maimon and L. Rokach, editors. Data Mining and Knowledge Discovery Handbook, 2nd ed. Springer, 2010.
M. Núñez. The use of background knowledge in decision tree induction. Machine Learning, 6:231–250, 1991.
J. Ortega and D. Fisher. Flexibly exploiting prior knowledge in empirical learning. In Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2, IJCAI’95, pages 1041–1047, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.
M. J. Pazzani. Knowledge discovery from data? IEEE Intelligent Systems, 15(2):10–13, 2000.
J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
J. R. Quinlan. Discovering rules by induction from large collection of examples. In Expert Systems in the Microelectronic Age. D. Michie (Ed.), pages 168–201. Edimburg Eniversity Press, 1979.
R.L. Schalock and M.A. Verdugo. Handbook of quality of life for human service practitioners. Washington, DC, 2002.
J. C. Shafer, R. Agrawal, and M. Mehta. Sprint: A scalable parallel classifier for data mining. In VLDB, pages 544–555, 1996.
S. M. Sivagama. A knowledge discovery using decision tree by Gini coefficient. In International Conference on Business, Engineering and Industrial Applications (ICBEIA), pages 232–235, 2011.
Y. Tsai, Paul H. King, Ph. D, Michael S. Higgins, Ph. D, and Nimesh P. Patel. An expert-guided decision tree construction strategy: An application in knowledge discovery with medical databases. In AMIA Annual Fall Symposium, pages 208–212, 1997.
Acknowledgements
The authors thank Susana Puig their helpful comments and suggestions, and the Taller Jeroni de Moragas. This research is partially funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 689176 (SYSMICS project), the projects RASO (TIN2015-71799-C2-1-P) and RPREF (CSIC Intramural 201650E044) and the grants 2014-SGR-118 and 2014-SGR-788 from the Generalitat de Catalunya.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Armengol, E., García-Cerdaña, À., Dellunde, P. (2017). Experiences Using Decision Trees for Knowledge Discovery. In: Torra, V., Dahlbom, A., Narukawa, Y. (eds) Fuzzy Sets, Rough Sets, Multisets and Clustering. Studies in Computational Intelligence, vol 671. Springer, Cham. https://doi.org/10.1007/978-3-319-47557-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-47557-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47556-1
Online ISBN: 978-3-319-47557-8
eBook Packages: EngineeringEngineering (R0)