Abstract
Due to its simplicity and good performance, Random Forest attains much interest from the research community. The splitting attribute at each node of a decision tree for Random Forest is determined from a predefined number of randomly selected attributes (a subset of the entire attribute set). The size of an attribute subset (subspace) is one of the most important factors that stems multitude of influences over Random Forest. In this paper, we propose a new technique that dynamically determines the size of subspaces based on the relative size of the current data segment to the entire data set. In order to assess the effects of the proposed technique, we conduct experiments involving five widely used data set from the UCI Machine Learning Repository. The experimental results indicate the capability of the proposed technique on improving the ensemble accuracy of Random Forest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adnan, M.N., Islam, M.Z.: ComboSplit: Combining various splitting criteria for building a single decision tree. In: Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition, pp. 1–8 (2014)
Adnan, M.N., Islam, M.Z.: A comprehensive method for attribute space extension for random forest. In: Proceedings of 17th International Conference on Computer and Information Technology, Dec (2014)
Adnan, M.N., Islam, M.Z.: Complement random forest. In: Proceedings of the 13th Australasian Data Mining Conference (AusDM), pp. 89–97 (2015)
Adnan, M.N., Islam, M.Z.: Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 391–396 (2015)
Adnan, M.N., Islam, M.Z.: One-vs-all binarization technique in the context of random forest. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 385–390 (2015)
Adnan, M.N., Islam, M.Z.: Forest CERN: A New Decision Forest Building Technique. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 304–315. Springer, Cham (2016). doi:10.1007/978-3-319-31753-3_25
Adnan, M.N., Islam, M.Z.: Knowledge discovery from a data set on dementia through decision forest. In: Proceedings of the 14th Australasian Data Mining Conference (AusDM) (2016). Accepted
Adnan, M.N., Islam, M.Z.: Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl. Based Syst. 110, 86–97 (2016)
Amasyali, M.F., Ersoy, O.K.: Classifier ensembles with the extended space forest. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2014)
Arlot, S.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)
Barros, R.C., Basgalupp, M.P., de Carvalho, A.C.P.L.F., Freitas, A.A.: A survey of evolutionary algorithm for decision tree induction. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(3), 291–312 (2012)
Bernard, S., Adam, S., Heutte, L.: Dynamic random forests. Pattern Recognit. Lett. 33, 1580–1586 (2012)
Bernard, S., Heutte, L., Adam, S.: Forest-RK: A New Random Forest Induction Method. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS, vol. 5227, pp. 430–437. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85984-0_52
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group. San Diego (1985)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)
Cutler, A., Zhao, G.: Pert: perfect random tree ensembles. Comput. Sci. Stat. 33, 204–497 (2001)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers (2006)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)
Hunt, E., Marin, J., Stone, P.: Experiments in Induction. Academic Press, New York (1966)
Islam, M.Z., Giggins, H.: Knowledge discovery through sysfor - a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australlasian Data Mining Conference (2011)
Jain, A.K., Mao, J.: Artificial neural network: a tutorial. Computer 29(3), 31–44 (1996)
Kataria, A., Singh, M.D.: A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 3(6), 354–360 (2013)
Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39, 261–283 (2013)
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with ensemble accuracy. Mach. Learn. 51, 181–207 (2003)
Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2004)
Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the third IEEE International Conference on Data Mining, pp. 585–588. (2003)
Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets.html. Accessed 15 Mar 2016
Liu, S., Patel, R.Y., Daga, P.R., Liu, H., Fu, G., Doerksen, R.J., Chen, Y., Wilkins, D.E.: Combined rule extraction and feature elimination in supervised classification. IEEE Trans. NanoBioscience 11(3), 228–236 (2012)
Maimon, O., Rokach, L. (eds.): The Data Mining and Knowledge Discovery Handbook. Springer, New York (2005)
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the 14th International Conference on Machine Learning, pp. 211–218 (1997)
Maudes, J., Rodriguez, J.J., Osorio, C.G., Pedrajas, N.G.: Random feature weights for decision tree ensemble construction. Inf. Fusion 13, 20–30 (2012)
Menze, B., Petrich, W., Hamprecht, F.: Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal. Bioanal. Chem. 387, 1801–1807 (2007)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Murthy, S.K.: On growing better decision trees from data. Ph.D. thesis, The Johns Hopkins University, Baltimore, Maryland (1997)
Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min. Knowl. Discov. 2, 345–389 (1998)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1619–1630 (2006)
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21
Shipp, C.A., Kuncheva, L.I.: Relationships between combination methods and measures of diversity in combining classifiers. Inf. Fusion 3, 135–148 (2002)
Robnik-Šikonja, M.: Improving Random Forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30115-8_34
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, Boston(2006)
Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65, 247–271 (2006)
Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)
Williams, G.J.: Combining decision trees: Initial results from the MIL algorithm. In: Proceedings of the First Australian Joint Artificial Intelligence Conference, pp. 273–289, Sydney, Australia, 2–4 Nov 1988, 1987
Ye, Y., Wu, Q., Huang, J.Z., Ng, M.K., Li, X.: Stratified sampling of feature subspace selection in random forests for high dimensional data. Pattern Recognit. 46, 769–787 (2014)
Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: the state of the art. Int. J. Forecast. 14, 35–62 (1998)
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)
Zhang, L., Suganthan, P.N.: Random forests with ensemble of feature spaces. Pattern Recognit. 47, 3429–3437 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Adnan, M.N., Islam, M.Z. (2017). Effects of Dynamic Subspacing in Random Forest. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-69179-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69178-7
Online ISBN: 978-3-319-69179-4
eBook Packages: Computer ScienceComputer Science (R0)