Abstract
In real-life data, in general, many attribute values are missing. Therefore, rule induction requires preprocessing, where missing attribute values are replaced by appropriate values. The rule induction method used in our research is based on rough set theory.
In this paper we present our results on a new approach to missing attribute values called a closest fit. The main idea of the closest fit is based on searching through the set of all cases, considered as vectors of attribute values, for a case that is the most similar to the given case with missing attribute values. There are two possible ways to look for the closest case: we may restrict our attention to the given concept or to the set of all cases. These methods are compared with a special case of the closest fit principle: replacing missing attribute values by the most common value from the concept. All algorithms were implemented in system OOMIS. Our experiments were performed on preterm birth data sets collected at the Duke University Medical Center.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bairagi, R., Suchindran, C.M.: An estimator of the cutoff point maximizing sum of sensitivity and specificity. Sankhya, Series B, Indian Journal of Statistics 51, 263–269 (1989)
Booker, L.B., Goldberg, D.E., Holland, J.F.: Classifier systems and genetic algorithms. In: Carbonell, J.G. (ed.) Machine Learning. Paradigms and Methods, pp. 235–282. The MIT Press, Cambridge (1990)
Grzymala-Busse, J.W.: On the unknown attribute values in learning from examples. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1991. LNCS (LNAI), vol. 542, pp. 368–377. Springer, Heidelberg (1991)
Grzymala-Busse, J.W.: LERS—A system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht (1992)
Grzymala-Busse, J.W., Goodwin, L.K.: Predicting preterm birth risk using machine learning from data with missing values. Bull. of Internat. Rough Set Society 1, 17–21 (1997)
Grzymala-Busse, J.W.: LERS—A knowledge discovery system. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 2, Applications, Case Studies and Software Systems, pp. 562–565. Physica-Verlag, Hidleberg (1998)
Grzymala-Busse, J.W., Wang, A.Y.: Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In: Proc. of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC 1997) at the Third Joint Conference on Information Sciences (JCIS 1997), Research Triangle Park, NC, March 2–5, pp. 69–72 (1997)
Grzymala-Busse, J.W., Zou, X.: Classification strategies using certain and possible rules. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS (LNAI), vol. 1424, pp. 37–44. Springer, Heidelberg (1998)
Grzymala-Busse, J.W., Goodwin, L.K., Zhang, X.: Increasing sensitivity of preterm birth by changing rule strengths. In: Submitted for the 8th Workshop on Intelligent Information Systems (IIS 1999), Ustronie, Poland, June 14–18 (1999)
Holland, J.H., Holyoak, K.J., Nisbett, R.E.: Induction. Processes of Inference, Learning, and Discovery. The MIT Press, Cambridge (1986)
Michalski, R.S., Mozetic, I., Hong, J., Lavrac, N.: The AQ15 inductive learning system: An overview and experiments. Department of Computer Science, University of Illinois, Rep. UIUCDCD-R-86-1260 (1986)
Pawlak, Z.: Rough sets. International Journal Computer and Information Sciences 11, 341–356 (1982)
Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Stefanowski, J.: On rough set based approaches to induction of decision rules. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Data Mining and Knowledge Discovery, pp. 500–529. Physica-Verlag, Hidleberg (1998)
Swets, J.A., Pickett, R.M.: Evaluation of Diagnostic Systems. Methods from Signal Detection Theory. Academic Press, London (1982)
Ziarko, W.: Systems: DataQuest, DataLogic and KDDR. In: Proc. of the Fourth Int. Workshop on Rough Sets, Fuzzy Sets and Machine Discovery RSFD 1996, Tokyo, Japan, November 6–8, pp. 441–442 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grzymała-Busse, J.W., Grzymała-Busse, W.J., Goodwin, L.K. (1999). A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data. In: Zhong, N., Skowron, A., Ohsuga, S. (eds) New Directions in Rough Sets, Data Mining, and Granular-Soft Computing. RSFDGrC 1999. Lecture Notes in Computer Science(), vol 1711. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48061-7_49
Download citation
DOI: https://doi.org/10.1007/978-3-540-48061-7_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66645-5
Online ISBN: 978-3-540-48061-7
eBook Packages: Springer Book Archive