Experimental Comparison of Financial Distress Prediction Models Using Imbalanced data sets
الموضوعات :Seyed Behrooz Razavi Ghomi 1 , Alireza Mehrazin 2 , Mohammad Reza Shoorvarzi 3 , Abolghasem Masih Abadi 4
1 - Department of Accounting, Neyshabur Branch, Islamic Azad University, Neyshabur, Iran
2 - Department of Accounting, Neyshabur Branch, Islamic Azad University, Neyshabur, Iran
3 - Department of Accounting, Neyshabur Branch, Islamic Azad University, Neyshabur, Iran
4 - Department of Accounting, Sabzevar Branch, Islamic Azad University, Sabzevar, Iran
الکلمات المفتاحية: Tuning parameters, Financial ratios, Imbalanced data sets, Financial distress prediction models, Grid search optimization,
ملخص المقالة :
From machine learning perspective, the problem of predicting financial distress is challenging because the distribution of the classes is extremely imbalanced. The goal of this study was comparing the performance of financial distress prediction models for the imbalanced data sets with different proportions. In this study, the data of the previous year before financial distress was used for 760 company year for the time period of 2007-2017. Besides using traditional classifications such as logistic regression, linear discriminant analysis, artificial neural network, and the classification models of least square support vector machine with four kernel functions, random forest and the Knn algorithm, the measures of the area under the curve and Friedman and Nemenyi tests were also utilized to determine the average rank and the difference significance of the Auc of the models. For selecting the models´ optimal parameters, the combined method of grid search optimization and cross validation was used. The results of this experimental study showed that for the balanced and imbalanced datasets with lower proportions, the best performance was for the random forest. For more imbalanced datasets, the best performance belonged to the least square support vector machine with sigmoid, radial, and linear kernel functions; performance of Knn algorithm had no significant difference from the other models and the performance of the artificial neural network was average or appropriate. Also, the performances of the linear logistic regression and linear discriminant analysis were weaker than other nonlinear models.
[1] Ahn, H., Kim, K. J., Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach, Applied Soft Computing, 2009, 9, P.599–607. Doi:10.1016/j.asoc.2008.08.002
[2] Alaka, H. A., Oyedele, L. O., Owolabi, H. A., Kumar, V., Ajayi, S. O., Akinade, O. O., Bilal, M, Systematic Review of Bankruptcy Prediction Models: Towards A Framework for Tool Selection, Expert Systems with Applications, 2018, 94, P.164–184. Doi: 10.1016/j.eswa.2017.10.040
[3] Aliabadi, M., Sarraf, F., Darabi, R., The Power Indexes of the CEO and the Performance of the Company under Pressure Based on Product Market Competition, Advances in Mathematical Finance and Applications, 2020, Accepted Manuscript Available Online from 21 April 2020 (in Persian). Doi: 10.22034/amfa.2020.1867187.121
[4] Altman, E. I., Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy, Journal of Finance, 1968, 23(4), P.889-609. Doi:10.2307/2978933
[5] Anderson, R., The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation, Oxford University Press, 2007.
[6] Anwar, M. N., Complexity Measurement for Dealing with Class Imbalance Problems in Classification Modelling, Thesis for Doctor of Philosophy, Massey University, Institute of Fundamental Sciences, 2012.
[7] Arieshanti, I., Purwananto, Y., Ramadhani, A., Nuha, M. U., Ulinnuha, N., Comparative Study of Bankruptcy Prediction Models, TELKOMNIKA (Telecommunication Computing Electronics and Control), 2013, 11(3), P.591-596. Doi: 10.12928/TELKOMNIKA.v11i3.1095
[8] Azayite, F. Z., Achchab, S., Hybrid Discriminant Neural Networks for Bankruptcy Prediction and Risk Scoring, Procedia Computer Science, 2016, P.83, P.670–674. Doi:10.1016/j.procs.2016.04.149
[9] Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J., Benchmarking state-of-the-art classification algorithms for credit scoring, Journal of the Operational Research Society, 2003, 54(6), P.627–635. Doi:10.1057/palgrave.jors.2601545
[10] Balcaen, S., Ooghe, H., 35 Years of Studies on Business Failure: An Overview of the Classic Statistical Methodologies and Their Related Problems. British Accounting Review, 2006, 38(1), P.63-93. Doi:10.1016/j.bar.2005.09.001
[11] Beaver, W., Financial Ratios as Predictor of Failure, Journal of Accounting Research, 1996, 4, P.71-111. Doi:10.2307/2490171
[12] Bishop, C. M., Pattern Recognition and Machine Learning, Springer, 2006.
[13] Boser, B., Guyon, I., Vapnik, V., A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, 1992.
[14] Botshkan, M., Salimi, M., Mottahedjoo, S., Developing a Hybrid Approach for Financial Distress Prediction of Listed Companies in Tehran Stock Exchange, Journal of Financial Research, 2018, 20, P.173-192 (in Persian).
[15] Boyle, T., Dealing with Imbalanced Data: A Guide to Effectively Handling Imbalanced Datasets in Python, 2018.
[16] Breiman, L, Random Forests. Machine Learning, 45(1), 2001, P.5-32. Doi:10.1023/A:1010933404324
[17] Brown, I., Mues, C., An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets, Expert Systems with Applications, 2012, 39(3), P.3446-3453. Doi: org/10.1016/j.eswa.2011.09.033
[18] Buda, M, A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks. Royal Institute of Technology, School of Computer Science and Communication, Sweden, 2017, Doi:10.1016/j.neunet.2018.07.011
[19] Campa, D., Camacho, M., The Impact of SME’s pre-bankruptcy Financial Distress on Earnings Management Tools. International Review of Financial Analysis, 2015, 42, P.222-234. Doi:10.1016/j.irfa.2015.07.004
[20] Chaudhuri, A., De, K., Fuzzy Support Vector Machine for Bankruptcy Prediction. Applied Soft Computing, 11, 2011, P. 2472–2486. Doi:10.1016/j.asoc.2010.10.003.
[21] Chawlaet, N.V., Data Mining for Imbalanced Datasets: An Overview, 2005.
[22] Chen, Y.S, an empirical study of a hybrid imbalanced-class DT-RST classification procedure to elucidate therapeutic effects in uraemia patients, 2016, 54(6), P.983–1001. Doi:10.1007/s11517-016-1482-0
[23] Chuang, C.L., Application of hybrid case-based reasoning for enhanced performance in bankruptcy prediction, Information Sciences, 2013, 236, P.174–185. Doi:10.1016/j.ins.2013.02.015
[24] Danenas, P., Garsva, G., Selection of support vector machines based classifiers for credit risk domain. Expert Syst. Appl, 2015, 42(6), P.3194–3204. Doi:10.1016/j.eswa.2014.12.001
[25] D'aveni, R. A, The aftermath of organizational decline: A Longitudinal Study of the Strategic and Managerial Characteristics of Declining Firms. Academy of Management Journal, 1989, 32(3), P.577-605. Doi:10.5465/256435
[26] De Andres, J., Landajo, M., Lorca, P., Bankruptcy prediction models based on multinorm analysis: An alternative to accounting ratios. Knowledge-Based Systems, 2012, 30, P.67-77. Doi:10.1016/j.knosys.2011.11.005
[27] Desman, J., Statistical Comparisons of Classifiers over Multiple Data Sets, Journal of Machine Learning Research, 2006, 7, P.1–30.
[28] Desai, V. S., Crook, J. N., Overstreet, G. A. Jr., A comparison of neural networks and linear scoring models in the credit union environment, European Journal of Operational Research, 1996, 95(1), P.24–37. Doi:10.1016/0377-2217(95)00246-4
[29] Dilsha, M., Kiruthika, A., Neural network approach for microfinance credit scoring, J. Stat. Manag. Syst, 2015, 18(1–2), P.121–138. Doi:10.1080/09720510.2014.961767
[30] Ding, Y., Song, X., Zen, Y., Forecasting Financial Condition of Chinese Listed Companies Based On Support Vector Machine, Expert Systems with Applications, 2008, P.3081–3089. Doi:10.1016/j.eswa.2007.06.037
[31] Fallahpoor, S., Eram, A., Predicting companies' financial distress using ant colony algorithm, Journal of Financial research, 2016, 18(2), P.347-368 (in Persian).
[32] Fitzpartrick, P.J., A comparison of ratios of successful industrial enterprises with those of failed firms, Certif. Publ. Account, 1932, 10, P.598–605, 11, P.656–662; 12, P.727–731.
[33] Friedman, M, A Comparison of Alternative Tests of Significance for the Problem of Rankings, Annals of Mathematical Statistics, 1940, 11(1), P. 86–92.
[34] Gepp, A., Kumar, K., Bhattacharya, S., Business Failure Prediction Using Decision Trees, Journal of Forecasting, 2010, 29(6), P.536-555 Doi:10.1002/for.1153.
[35] Ghasemi, S., Sarlak, A., Investigating the Impact of the Financial Crisis on Conservative Accounting and Transparency of Banking Information, Advances in Mathematical Finance and Applications, 3(3), 2018, P.53-68 Doi: 10.22034/AMFA.2018.544949
[36] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G. (2008). On the Class Imbalance Problem. Fourth International Conference on Natural Computation (pp. 192-201). Jinan, China: IEEE. Doi: 10.1109/ICNC.2008.871
[37] He, H., Garcia, E. A., Learning from Imbalanced Data, IEEE Transactions on Knowledge and Data Engineering, 21(9), 2009, P.1263-1284. Doi: 10.1109/TKDE.2008.239
[38] Henley, W. E., Hand, D. J., Construction of a k-nearest Neighbour Credit Scoring System, IMA Journal of Management Mathematics, 1997, 8(4), P.305–321. Doi: 10.1093/imaman/8.4.305
[39] Hsu, C.W., Chang, C.C., Lin, C.J., A Practical Guide to Support Vector Classification. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University, 2004.
[40] Huang, X.B., Liu, X.L., Ren, Y.Q., Enterprise credit risk evaluation based on neural network algorithm, Cogn. Syst. Res. 2018, P.52 317–324. Doi:10.1016/j.cogsys.2018.07.023
[41] Huang, Z., Chen, H., Hsu, C. J., Chen, W. H., Wu, S., Credit Rating Analysis with Support Vector Machines and Neural Networks: A Market Comparative Study, Decision support systems, 2004, 37(4), P.543-558. Doi:10.1016/S0167-9236 (03)00086-1
[42] Iturriaga, F. J. L., Sanz, I. P., Bankruptcy Visualization and Prediction Using Neural Networks: A Study of US Commercial Banks, Expert Systems with Applications, 2015, 42(6), P.2857-2869.Doi:10.1016/j.eswa.2014.11.025
[43] Jo, H., Han, I., Lee, H., Bankruptcy Prediction Using Case-Based Reasoning, Neural Networks, and Discriminant Analysis, Expert Systems with Applications, 1997, 13, P.97–108. Doi: 10.1016/S0957-4174 (97)00011-0
[44] Kasabov., Evolving Connectionist Systems for Adaptive Learning and Knowledge Discovery: Trends and Directions. Knowledge. -Based Syst. 2015, 80, P.24–33. Doi:10.1016/j.knosys.2014.12.032
[45] Khashman, A., Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes, Expert Systems with Applications, 2010, 37, P.6233–6239. Doi:10.1016/j.eswa.2010.02.101
[46] Kim, M. J., Han, I., The Discovery of Experts' Decision Rules from Qualitative Bankruptcy Data Using Genetic Algorithms, Expert Systems with Applications, 2003, 25(4), P.637-646. Doi: 10.1016/S0957-4174(03)00102-7
[47] Kim, M. J., Kang, D. K., Kim, H.B, Geometric Mean Based Boosting Algorithm with Over-Sampling to Resolve Data Imbalance Problem for Bankruptcy Prediction, Expert Systems with Applications, 2015, 42(3), P.1074-1082. Doi:10.1016/j.eswa.2014.08.025
[48] Kim, S.Y., Prediction of Hotel Bankruptcy Using Support Vector Machine, Artificial Neural Network, Logistic Regression, and Multivariate Discriminant Analysis, The Service Industries Journal, 2011, 31 (3), P.441- 468. Doi:10.1080/02642060802712848
[49] Kim, T., Ahn, H, A., Hybrid Under-Sampling Approach for Better Bankruptcy Prediction, Journal of Intelligence and Information Systems, 2015, 21(2), P.173-190.
[50] Kotsiantis, S., Kanellopoulos, D., Pintelas, P., Handling Imbalanced Datasets: A Review. GESTS International Transactions on Computer Science and Engineering, 30, 2006.
[51] Kumar, P. R., Ravi, V., Bankruptcy Prediction in Banks and Firms via Statistical and Intelligent Techniques: A Review, European Journal of Operational Research, 2007, 180, P.1–28. Doi:10.1016/j.ejor.2006.08.043
[52] Lane, P. C., Clarke, D., Hender, P., On Developing Robust Models for Favorability Analysis: Model Choice, Feature Sets and Imbalanced Data, Decision Support Systems, 2012, 53(4), P.712-718. Doi:10.1016/j.dss.2012.05.028
[53] Lee, T., Chiu, C., Chou, Y., Lu, C, Mining the Customer Credit Using Classification and Regression Tree and Multivariate Adaptive Regression Splines, Computational Statistics and Data Analysis, 2006, 50, P.1113–1130. Doi:10.1016/j.csda.2004.11.006
[54] Lessmann, S., Baesens, B., Mues, C., Pietsch, S., Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings, IEEE Transactions on Software Engineering, 2008, 34(4), P.485–496. Doi: 10.1109/TSE.2008.35
[55] Li, H., Sun, J., Ranking-order Case-Based Reasoning for Financial Distress Prediction, Knowledge-based Systems, 2008, 21, P.868–878. Doi:10.1016/j.knosys.2008.03.047
[56] Li, H., Sun, J., Forecasting Business Failure: The Use of Nearest- Neighbor Support Vectors and Correcting Imbalanced Samples - Evidence from the Chinese Hotel Industry, Tourism Management, 2012, 33, P.622–634. Doi:10.1016/j.tourman.2011.07.004
[57] Li, X., Wang, F., Chen, X., Support vector machine ensemble based on choquet integral for financial distress prediction, Int. J. Pattern Recognit. Artif. Intell, 2015, 29(4), P.1–16. Doi:10.1142/S0218001415500160
[58] Liao, J.-J., Shih, C.-H., Chen, T.-F., Hsu, M. F., An Ensemble-Based Model for Two-Class Imbalanced Financial Problem, Economic Modeling, 2014. Doi:10.1016/j.econmod.2013.11.013
[59] Lin, S.W., Ying, K.C., Chen, S.C., Lee Z.J., Particle swarm optimization for parameter determination and
feature selection of support vector machines, Expert Systems with Applications, 2008, 35, P.1817-1824. Doi:10.1016/j.eswa.2007.08.088
[60] Lin, W. Y., Hu, Y. H., Tsai, C. F., Machine Learning in Financial Crisis Prediction: A Survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, 42, P.421–436. Doi: 10.1109/TSMCC.2011.2170420
[61] Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F., An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics., Information Sciences, 2013, 250, P.113-14. Doi:10.1016/j.ins.2013.07.007
[62] McKee, T. E., Greenstein, M., Predicting Bankruptcy Using Recursive Partitioning and a Realistically Proportioned Data Set, Journal of Forecasting, 2000, 19(3), P.219-230. Doi:10.1002/(SICI) 1099-131X(200004)19
[63] Messier, Jr., W., Hansen, J., Inducing Rules for Expert System Development: An Example Using Default and Bankruptcy Data, Management Science, 1988, 34(12), P.1403–1415. Doi:10.1287/mnsc.34.12.1403
[64] Min, J., Lee, Y., Bankruptcy Prediction Using Support Vector Machine with Optimal Choice of Kernel Function Parameters, Expert Systems with Applications, 2005, 28, P.603-614. Doi:10.1016/j.eswa.2004.12.008
[65] Min, S.-H., Lee, J., Han, I., Hybrid Genetic Algorithms and Support Vector Machines for Bankruptcy Prediction, Expert Systems with Applications, 2006, 31, P.652–660. Doi:10.1016/j.eswa.2005.09.070
[66] N. Sung, T. K., Chang, N., Lee, G., Dynamics of Modeling in Data Mining: Interpretive Approach to Bankruptcy Prediction. Journal of Management Information Systems, 1999, 16, P.63–85. Doi:10.1080/07421222.1999.11518234
[67] Namazi, M., Kazemnezhad, M., Nematelahi, M., Comparing Different Feature Selection Methods in Financial Distress Prediction of the Firms Listed in Tehran Stock Exchange, Journal of Financial Engineering and Securities Management, 2016, 29(7), P.193-212 (in Persian).
[68] Nemenyi, P. B, Distribution-free Multiple Comparisons, Ph.D. Thesis. Princeton University, 1963.
[69] Odom, M., Sharda, R., A neural networks model for bankruptcy prediction, in: Proceedings of the IEEE International Conference on Neural Network, 1990, 2, P.163-168. Doi: 10.1109/IJCNN.1990.137710
[70] Ohlson, J.A., Financial ratios and probabilistic prediction of bankruptcy, J. Account. Res. 1980, 18(1), P.109–131. Doi: 10.2307/2490395
[71] Ooghe, H., Joos, P., Failure Prediction, Explanation of Misclassifications and Incorporation of Other Relevant Variables: Result of Empirical Research in Belgium, Working paper, Department of Corporate Finance, Ghent University (Belgium), 1990.
[72] Pal, R., Kupka, K., Aneja, A.P., Business health characterization: A hybrid regression and support vector machine analysis, Expert Syst. Appl, 2016, 49, P.48–59. Doi:10.1016/j.eswa.2015.11.027
[73] Piri, S., Delen, D., Liu, T., A Synthetic Informative Minority Over-Sampling (SIMO) Algorithm Leveraging Support Vector Machine to Enhance Learning from Imbalanced Datasets. Decision Support Systems, 2018, 106, P.15-29. Doi:10.1016/j.dss.2017.11.006
[74] Rezaei, F., Tolaminejad, B., The Financial Applications of the Colony Ant Algorithm, Accounting and Auditing Studies, 2012, 3(1), P.48-59 (in Persian).
[75] Rezaei, N., Javaheri, M., The Predictability of Neural Network and Genetic Algorithm from Companies’ Financial Crisis, Advances in Mathematical Finance and Applications, 2020, 5(2), P.183-196 (in Persian). Doi: 10.22034/AMFA.2019.1863963.1195
[76] Rosner, R. L., Earnings Manipulation in Failing Firms, Contemporary Accounting Research, 2003, 20(2), P.361-408. Doi:10.1506/8EVN-9KRB-3AE4-EE81
[77] Sartre, F., Mazzucchelli, A., Gregorio, A. D., Bankruptcy Forecasting Using Case-Based Reasoning: The Creeperie Approach, Expert Systems with Applications, 2016, P.64, 400–411. Doi:10.1016/j.eswa.2016.07.033
[78] Saruei, S., The Study of Performance of Springerit, Zimsky and Ahlson Models in Predicting Bankruptcy of Listed Companies in Tehran Stock Exchange, M. A. thesis, Arak Islamic Azad University, Arak, Iran, 2010, (in Persian).
[79] Setayesh, M., Kazemnezhad, M., Hallaj, M., The Usefulness of Random Forest Classifier and Relief Features Selection in Financial Distress Prediction: Empirical Evidence of Companies Listed on Tehran Stock Exchange, Journal of Financial Accounting Research, 2016, 28(8), P.1-24 (in Persian).
[80] Sun, J., Lang, J., Fujita, H., Li, H., Imbalanced Enterprise Credit Evaluation with DTE-SBD: Decision Tree Ensemble Based on SMOTE. Information Sciences, 2018, 425, P.76–91. Doi:10.1016/j.ins.2017.10.017
[81] Tay, F. E., Cao, L., Application of Support Vector Machines in Financial Time Series Forecasting, Omega, 29(4), 2001, P.309-317. Doi:10.1016/S0305- 0483(01)00026-3
[82] Thabtah F., Machine Learning in Autistic Spectrum Disorder Behavioral Research: A Review and Ways Forward Informatics for Health and Social Care, 2018, b, 43(2), P.1-20. Doi:10.1080/17538157.2017.1399132
[83] Thabtah, F., Kamalov, F., Rajab, K., A New Computational Intelligence Approach to Detect Autistic Features for Autism Screening, International Journal of Medical Infromatics, 2018, 117, P.112-124. Doi:10.1016/j.ijmedinf.2018.06.009
[84] Tian, S., Yu, Y., Zhou, M., Data Sample Selection Issues for Bankruptcy Prediction, Risk, Hazards and Crisis in Public Policy, 2015, 6(1), P.91-116. Doi:10.1002/rhc3.12071
[85] Vapnik, V, Statistical learning theory, Wiley, New York, 1998. Doi: 10.1109/72.788640
[86] Wald, A., On Statistical Problem Arising in the Classification of an Individual into One of Two Groups, Annals of Mathematical Statistics, 1994, 15(2), P.145-162.
[87] Wang, M., Chen, H., Li, H., Cai, Z., Zhao, X., Tong, C., Li, J., Xu, X., Grey Wolf Optimization Evolving Kernel Extreme Learning Machine: Application to Bankruptcy Prediction, Engineering Applications of Artifcial Intelligence, 2017, 63, P.54 – 68. Doi:10.1016/j.engappai.2017.05.003
[88] Weiss, G. M., Provost, F. J., Learning when training data are costly: The effect of class distribution on tree induction, Journal of Artificial Intelligence Research, 2003, 19, P.315–354. Doi:10.1613/jair.1199
[89] Weiss, G.M., Mining with Rarity: A Unifying Framework, SIGKDD Explor, 2004, 6(1), P.1–7. Doi:10.1145/1007730.1007734
[90] Wilson, R. L., Sharda, R., Bankruptcy Prediction Using Neural Networks, Decision Support Systems, 1994, 11(5), P.545- 557. Doi:10.1016/0167-9236 (94)90024-8
[91] Xia, Y., Liu, C., Li, Y.Y., et al, a boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring, Expert Syst Appl, 2017, 78, P.225–241. Doi:10.1016/j.eswa.2017.02.017
[92] Yeh, I. C., Lien., C. H., The Comparisons of Data Mining Techniques for the Predictive Accuracy of Probability of Default of Credit Card Clients, Expert Systems with Applications, 2009, 36, P.2473–2480. Doi:10.1016/j.eswa.2007.12.020
[93] Zhou, L., Lai, K. K., Yen, J., Bankruptcy Prediction Using SVM Models with a New Approach to Combine Features Selection and Parameter Optimization., International Journal of Systems Science, 2014, 45(3), P.241-253. Doi:10.1080/00207721.2012.720293
[94] Zmijewski, M. E., Methodological Issues Related to the Estimation of Financial Distress Prediction Models, Journal of Accounting Research, 1984, 22, P.59–82. Doi:10.2307/2490859