Abstract
Since the amount of information is rapidly growing, there is an overwhelming interest in efficient distributed computing systems including Grids, public-resource computing systems, P2P systems and cloud computing. In this paper we take a detailed look at the problem of modeling and optimization of network computing systems for parallel decision tree induction methods. First, we present a comprehensive discussion on mentioned induction methods with a special focus on their parallel versions. Next, we propose a generic optimization model of a network computing system that can be used for distributed implementation of parallel decision trees. To illustrate our work we provide results of numerical experiments showing that the distributed approach enables significant improvement of the system throughput.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ben-Haim, Y., Yom-Tov, E.: A streaming parallel decision tree algorithm. In: Proceedings of the PASCAL Workshop on Large Scale Learning Challenge, Helsinki, Finland (2008)
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1-2), 245–271 (1997)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth (1984)
Brodley, C.E., Utgoff, P.E.: Multivariate decision trees. Machine Learning 19(1), 45–77 (1995)
Cover, T.M.: The best two independent measurements are not the two best. IEEE Transactions on Systems, Man and Cybernetics 4(1), 116–117 (1974)
Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(1-4), 131–156 (1997)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Willey and Sons, New York (2001)
Foster, I., Iamnitchi, A.: On death, taxes and the convergence of peer-to-peer and grid computing. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, pp. 118–128. Springer, Heidelberg (2003)
ILOG: CPLEX 11.0. user’s manual (2007)
Jin, R., Agrawal, G.: Communication and memory efficient parallel decision tree construction. In: Proceedings of the 3rd SIAM Conference on Data Mining, San Francisco, US, pp. 119–129 (2003)
Kufrin, R.: Decision trees on parallel processors. Parallel Processing for Artificial Intelligence 3, 279–306 (1997)
Kurzyński, M.: The optimal strategy of a tree classifier. Pattern Recognition 16(1), 81–87 (1983)
Landwehr, N., et al.: Logistic model trees. Machine Learning 95(1-2), 161–205 (2005)
Mehta, M., et al.: SLIQ: A fast scalable classifier for data mining. In: Proceedings of the 5th International Conference on Extending Database Technology, pp. 18–32. Avignon, France (1996)
Mitchell, T.M.: Machine Learning. McGraw-Hill Company, Incorporated, New York (1997)
Nabrzyski, J., Schopf, J., Wêglarz, J.: Grid resource management: state of the art and future trends. Kluwer Academic Publishers, Boston (2004)
Paliouras, G., Bree, D.S.: The effect of numeric features on the scalability of inductive learning programs. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 218–231. Springer, Heidelberg (1995)
Pióro, M., Medhi, D.: Routing, Flow, and Capacity Design in Communication and Computer Networks. Morgan Kaufman Publishers, San Francisco (2004)
Quinlan, J.R.: C4.5: Program for Machine Learning. Morgan Kaufman, San Mateo (1993)
Shafer, J., et al.: SPRINT: A scalable parallel classifier for data mining. In: Proceedings of the 22nd Conference on Very Large Databases, pp. 544–555 (1996)
Srivastava, A., et al.: Parallel formulations of decision tree classification algorithms. Data Mining and Knowledge Discovery 3(3), 237–261 (1999)
Taylor, I.: From P2P to Web services and grids: peers in a client/server world. Springer, Heidelberg (2005)
Yidiz, O.T., Dikmen, O.: Parallel univariate decision trees. Pattern Recognition Letters 28, 825–832 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Walkowiak, K., Woźniak, M. (2009). Decision Tree Induction Methods for Distributed Environment. In: Cyran, K.A., Kozielski, S., Peters, J.F., Stańczyk, U., Wakulicz-Deja, A. (eds) Man-Machine Interactions. Advances in Intelligent and Soft Computing, vol 59. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00563-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-00563-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00562-6
Online ISBN: 978-3-642-00563-3
eBook Packages: EngineeringEngineering (R0)