Abstract
This paper asks at what level of class imbalance one-class classifiers outperform two-class classifiers in credit scoring problems in which class imbalance, referred to as the low-default portfolio problem, is a serious issue. The question is answered by comparing the performance of a variety of one-class and two-class classifiers on a selection of credit scoring datasets as the class imbalance is manipulated. We also include random oversampling as this is one of the most common approaches to addressing class imbalance. This study analyses the suitability and performance of recognised two-class classifiers and one-class classifiers. Based on our study we conclude that the performance of the two-class classifiers deteriorates proportionally to the level of class imbalance. The two-class classifiers outperform one-class classifiers with class imbalance levels down as far as 15% (i.e. the imbalance ratio of minority class to majority class is 15:85). The one-class classifiers, whose performance remains unvaried throughout, are preferred when the minority class constitutes approximately 2% or less of the data. Between an imbalance of 2% to 15% the results are not as conclusive. These results show that one-class classifiers could potentially be used as a solution to the low-default portfolio problem experienced in the credit scoring domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hand, D.J., Henley, W.E.: Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society, Series A, 523–541 (1997)
Verstraeten, G., den Poel, D.V.: The impact of sample bias on consumer credit scoring performance and profitability. Journal of the Operational Research Society 56, 981–992 (2004)
Joint British Bankers Asc, London Investment Banking Asc, Intl. Swaps, Derivatives Asc Industry Working Group.: The irb approach for low default portfolios (ldps)- recommendations of the joint bba, liba, isda industry working group. BBA, LIBA, ISDA Working Paper (2004)
West, D.: Neural network credit scoring models. Computers and OR 27, 1131–1152 (2000)
Lee, H., Cho, S.: The novelty detection approach for different degrees of class imbalance. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4233, pp. 21–30. Springer, Heidelberg (2006)
Lee, H., Cho, S.: Focusing on non-respondents: Response modeling with novelty detectors. Expert Systems with Applications 33, 522–530 (2007)
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a case study. ACM SIGKDD Explorations Newsletter 6, 60–69 (2004)
Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural computation 13, 1443–1471 (2001)
Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)
Bank for Intl. Settlements: Basel II: intl. convergence of capital measurement and capital standards: a revised framework. BIS (2004)
Baesens, B., Gestel, T.V., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state-of-the-art classification algorithms for credit scoring. JORS 54, 627–635 (2003)
Thomas, L.C., Oliver, R.W., Hand, D.J.: A survey of the issues in consumer credit modelling research. Journal of the Operational Research Society 56, 1006–1015 (2005)
Duda, R.O., Hart, P.E.: Pattern classification and scene analysis (1973)
Ritter, G., Gallegos, M.T.: Outliers in statistical pattern recognition and an application to automatic chromosome classification. Pattern Recognition Letters 18, 525–540 (1997)
Bishop, C.M.: Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing 141, 217–222 (1994)
Japkowicz, N., Myers, C., Gluck, M.: A novelty detection approach to classification. In: Proceedings of the Fourteenth Joint Conference on Artificial Intelligence (1995)
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognition Letters 20, 1191–1199 (1999)
Tax, D.: One-class classification. Unpub. doc/dis. Delft University of Technology (2001)
Hodge, V., Austin, J.: A survey of outlier detection methodologies. AI Rev. 22, 85–126 (2004)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM, New York (2009)
Tax, D.M.J., Duin, R.P.W.: Characterizing one-class datasets. In: Proceedings of the 16th Annual Symposium of the Pattern Recognition Assoc. of S. Africa, pp. 21–26 (2005) (Citeseer)
Tax, D.M.J., Duin, R.P.W.: Support vector data description. ML 54, 45–66 (2004)
Asuncion, A., Newman, D.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2007)
Hoff, K.J., Tech, M., Lingner, T., Daniel, R., Morgenstern, B., Meinicke, P.: Gene prediction in metagenomic fragments. BMC Bioinf. 9, 217 (2008)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explrs. Newsl. 6, 20–29 (2004)
Liu, A., Ghosh, J., Martin, C.: Generative oversampling for mining imbalanced datasets. In: Proceedings of the 2007 International Conference on Data Mining, DMIN, pp. 25–28 (2007)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, pp. 265–320. Morgan Kaufmann Publishers, San Francisco (2000)
Rijsbergen, C.J.V.: Information Retrieval. Butterworths, London (1979)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines
Ong, C.S., Huang, J.J., Tzeng, G.H.: Building credit scoring models using genetic programming. Expert Systems with Applications 29, 41–47 (2005)
Hand, D.J.: Consumer credit and statistics. Statistics in Finance, 69–81 (1998)
Quinlan, J.R.: Simplifying decision trees. Machine Intel. 27, 234 (1987)
Elkan, K.: Invited talk- the real challenges in data mining- a contrarian view (2003)
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proc of the 23rd Intl. Conf. on ML, pp. 233–240. ACM, New York (2006)
Elazmeh, W., Japkowicz, N., Matwin, S.: Evaluating misclassifications in imbalanced data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 126–137. Springer, Heidelberg (2006)
Drummond, C., Holte, R.C.: Explicitly representing expected cost: An alternative to ROC representation. In: Proc. of 6th ACM SIGKDD, pp. 198–207. ACM, New York (2000)
Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 429–449 (2002)
Weiss, G.M.: Mining with rarity. ACM SIGKDD Explorations Newsletter 6, 7–19 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kennedy, K., Mac Namee, B., Delany, S.J. (2010). Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem. In: Coyle, L., Freyne, J. (eds) Artificial Intelligence and Cognitive Science. AICS 2009. Lecture Notes in Computer Science(), vol 6206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17080-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-17080-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17079-9
Online ISBN: 978-3-642-17080-5
eBook Packages: Computer ScienceComputer Science (R0)