Solving Imbalanced Data Distribution Problem in Bankruptcy Prediction by Cost-Sensitive Learning Method
Subject Areas : Financial engineeringseyed behrooz razavi 1 * , ebrahim abbasi 2
1 - Department of Accounting، Faculty of Accounting, Sanabad Golbahar Institute of Higher Educational, Golbahar, Iran.
2 - Department of management, faculty of social sciences and economics, ALzahra University ,Tehran, Iran
Keywords: Bankruptcy Prediction, Financial ratios, Grid search optimization, Imbalanced datasets, cost-sensitive learning,
Abstract :
This study aimed to add cost-sensitive learning technique to imbalanced data-based bankruptcy prediction models in order to reduce type I error and increase the geometric mean criterion of overall accuracy to reduce the misclassification costs of bankrupt companies for stakeholders. For this purpose, type I error, type II error, and the geometric mean of overall accuracy of bankruptcy models based on cost-sensitive learning were compared with bankruptcy prediction models with highly imbalanced datasets. The statistical sample included 1200 year-companies since 2001- 2020, consisting of 90% healthy companies and 10% bankrupt companies. Hypotheses test results showed that adding a cost-sensitive learning technique to the bankruptcy prediction models led to a significant decrease in the type I error, a significant increase in the type II error, and a significant increase in geometric mean of accuracy of imbalanced data-based models at 95% confidence level. Also, with the increase in the misclassification cost of bankrupt companies, type I error had a downward trend and the II type error had an upward trend, and the geometric mean of accuracy had an upward trend.
_||_