Hybrid Model Based on Genetic Algorithm, Bayesian Optimization and Machine Learning for Predicting Credit Status of Legal Clients
Subject Areas : Corporate governancePardis Fooladi 1 , Mohsen Amini Khozani 2 , Zohreh Hajiha 3 , Shadi Shahverdiani 4
1 - Ph.D. Candidate, Department of financial Management, Shahr-e- Qods Branch, Islamic Azad University, Tehran, Iran.
2 - َAssistant Prof., Department of financial Management, Shahr-e- Qods Branch, Islamic Azad University, Tehran, Iran
3 - prof., Department of Accounting, South Tehran Branch, Islamic Azad University, Tehran, Iran
4 - Assistant Prof., Department of financial Management, Shahr-e- Qods Branch, Islamic Azad University, Tehran, Iran
Keywords: Credit Scoring, hybrid model, genetic algorithm, Bayesian optimization, XGBoost machine learning model,
Abstract :
This article presents a novel hybrid model for credit scoring of banking customers, combining Genetic Algorithm, Bayesian Optimization, and the XGBoost machine learning model. The primary goal of this model is to enhance accuracy and efficiency in credit risk assessment and reduce the costs associated with prediction errors. In this study, real-world data from banking customers were utilized, and after preprocessing, including normalization and handling of missing data, the Genetic Algorithm was employed for optimal feature selection. Subsequently, Bayesian Optimization was applied as an advanced tool to fine-tune the hyperparameters of XGBoost. The results indicate the superior performance of the proposed model compared to conventional credit rating methods. The hybrid model achieved an accuracy of 79.3% and demonstrated excellent classification performance for both creditworthy and non-creditworthy customers, particularly in high-risk categories. Statistical analyses and performance comparisons with existing methods confirm the positive impact of feature selection and optimized hyperparameter tuning. This model can serve as a practical tool for banks and financial institutions to mitigate credit risk and improve customer management.
1. حبیبی، م.، دموری، د. و انصاری سامانی، ح. (1403). بررسی عوامل مؤثر بر ثبات مالی بانکها: شواهدی از شاخص نسبت خالص تأمین مالی پایدار. پژوهشهای راهبردی بودجه و مالیه، 5(2)، 11-43.
2. درواری، ج.، صیقلی، م. و محمدزاده، ا. (1404). طراحی الگوی مناسب اعتبارسنجی مشتریان در کارگزاری بر اساس فناوری بلاکچین. دانش سرمایهگذاری, 14(55), 699-726.
3. رجبیپور میبدی، ع، لگزیان، م. و فصاحت، ج. (1392). مطالعه تاثیر نوع صنعت بر معیارهای اعتباردهی به مشتریان حقوقی بانک صادرات ایران با استفاده از تحلیل پوششی دادهها. پژوهش در مدیریت تولید و عملیات، 4(1)، 129-144.
4. Afjal, M., Salamzadeh, A., & Dana, L. P. (2023). Financial fraud and credit risk: Illicit practices and their impact on banking stability. Journal of Risk and Financial Management, 16(9), 386.
5. Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The journal of finance, 23(4), 589-609.
6. Bock, A. (2015). The Concepts of Decision Making: An Analysis of Classical Approaches and Avenues for the Field of Enterprise Modeling. In: Ralyté, J., España, S., Pastor, Ó. (eds) The Practice of Enterprise Modeling. PoEM 2015. Lecture Notes in Business Information Processing, vol 235. Springer, Cham. https://doi.org/10.1007/978-3-319-25897-3_20.
7. Brownlee, J. (2016). XGBoost With python: Gradient boosted trees with XGBoost and scikit-learn. Machine Learning Mastery.
8. Camanho, A. S., & D’Inverno, G. (2023). Data Envelopment Analysis: A Review and Synthesis. Advanced Mathematical Methods for Economic Efficiency Analysis: Theory and Empirical Applications, 33-54.
9. Cervantes-Ojeda, J., Gómez-Fuentes, M. C., & Fresán-Figueroa, J. A. (2024, November). Applying Genetic Algorithms to Validate a Conjecture in Graph Theory: The Minimum Dominating Set Problem. In Mexican International Conference on Artificial Intelligence (pp. 271-282). Cham: Springer Nature Switzerland.
10. Chaki, J. (2023). A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial Intelligence. In Handling Uncertainty in Artificial Intelligence (pp. 47-69). Singapore: Springer Nature Singapore.
11. Charnes, A., Cooper, W.W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European journal of operational research, 2(6), 429-444.
12. Charnes, A., Cooper, W.W., Lewin, A.Y., Seiford, L.M. (1994). Basic DEA Models. In: Data Envelopment Analysis: Theory, Methodology, and Applications. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0637-5_2.
13. Chen, N., Ribeiro, B. & Chen, A. (2016) Financial credit risk assessment: a recent review. Artif Intell Rev 45, 1–23. https://doi.org/10.1007/s10462-015-9434-x.
14. Chen, T. (2014). Introduction to boosted trees. University of Washington Computer Science, 22(115), 14-40.
15. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.
16. Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
17. Chern, CC., Lei, WU., Huang, KL. et al. (2021). A decision tree classifier for credit assessment problems in big data environments. Inf Syst E-Bus Manage 19, 363–386. https://doi.org/10.1007/s10257-021-00511-w.
18. Chi, G., Uddin, M. S., Habib, T., Zhou, Y., Islam, M. R., & Chowdhury, M. A. I. (2019). A hybrid model for credit risk assessment: empirical validation by real-world credit data. Journal of Risk Model Validation, 14(4).
19. Chun, H., & Kwon, Y. (2018). A Study on Feature Selection in Machine Learning: The Case of Credit Risk Prediction. Journal of Financial Engineering, 15(2), 98-105.
20. Cleff, T. (2019). Applied statistics and multivariate data analysis for business and economics: A modern approach using SPSS, Stata, and Excel. Springer.
21. Cooper, W. W., Seiford, L. M., & Tone, K. (2006). Introduction to data envelopment analysis and its uses: with DEA-solver software and references. Springer Science & Business Media.
22. Darvari, J., Sayqali, M. and Mohammadzadeh, A. (2014). Designing an appropriate model for assessing credit in brokerage based on blockchain technology. Investment Knowledge, 14(55), 699-726.
23. De Leone, R. (2024). Data Envelopment Analysis. In: Pardalos, P.M., Prokopyev, O.A. (eds) Encyclopedia of Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-54621-2_107-1.
24. Demma Wube, H., Zekarias Esubalew, S., Fayiso Weldesellasie, F., & Girma Debelee, T. (2024). Deep Learning and Machine Learning Techniques for Credit Scoring: A Review. In Pan African Conference on Artificial Intelligence (pp. 30-61). Springer, Cham.
25. Emrouznejad, A., & Yang, G. L. (2018). A survey and analysis of the first 40 years of scholarly literature in DEA: 1978–2016. Socio-economic planning sciences, 61, 4-8.
26. Fakhravar, H. (2020). Quantifying uncertainty in risk assessment using fuzzy theory. arXiv preprint arXiv:2009.09334.
27. Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189-1232.
28. Gambacorta, L., Huang, Y., Qiu, H., & Wang, J. (2024). How do machine learning and non-traditional data affect credit scoring? New evidence from a Chinese fintech firm. Journal of Financial Stability, 73, 101284.
29. Gil-Lafuente, A. M. (2005). Fuzzy logic in financial analysis (Vol. 175). Berlin: Springer.
30. Giudici, P. (2005). Applied data mining: statistical methods for business and industry. John Wiley & Sons.
31. Habibi, M., Damouri, D. and Ansari Samani, H. (2014). Investigating the factors affecting the financial stability of banks: Evidence from the net sustainable financial ratio index. Strategic Research on Budget and Finance, 5(2), 11-43.
32. Hafez, I. Y., Hafez, A. Y., Saleh, A., Abd El-Mageed, A. A., & Abohany, A. A. (2025). A systematic review of AI-enhanced techniques in credit card fraud detection. Journal of Big Data, 12(1), 6.
33. Haris, M., Yao, H., & Fatima, H. (2024). The impact of liquidity risk and credit risk on bank profitability during COVID-19. Plos one, 19(9), e0308356.
34. Hayashi, Y. (2022). Emerging trends in deep learning for credit scoring: A review. Electronics, 11(19), 3181.
35. Hlongwane, R., Ramaboa, K. K., & Mongwe, W. (2024). Enhancing credit scoring accuracy with a comprehensive evaluation of alternative data. Plos one, 19(5), e0303566.
36. Jolliffe, I. T. (2002). Principal component analysis for special types of data (pp. 338-372). Springer New York.
37. Kandi, K., & García-Dopico, A. (2025). Enhancing Performance of Credit Card Model by Utilizing LSTM Networks and XGBoost Algorithms. Machine Learning and Knowledge Extraction, 7(1), 20.
38. Katoch, S., Chauhan, S. S., & Kumar, V. (2021). A review on genetic algorithm: past, present, and future. Multimedia tools and applications, 80, 8091-8126.
39. Langat, K. K., Waititu, A. G., Ngare, P. O. (2024). Modified XGBoost Hyper-Parameter Tuning Using Adaptive Particle Swarm Optimization for Credit Score Classification. Machine Learning Research, 9(2), 64-74. https://doi.org/10.11648/j.mlr.20240902.15.
40. Lashkaripour, A., Goharimanesh, M., Mehrizi, A. A., & Densmore, D. (2018). An adaptive neural-fuzzy approach for microfluidic droplet size prediction. Microelectronics Journal, 78, 73-80.
41. Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124-136.
42. Li, H., Cao, Y., Li, S., Zhao, J., & Sun, Y. (2020). XGBoost model and its application to personal credit evaluation. IEEE Intelligent Systems, 35(3), 52-61.
43. Li, Y., Zhao, R. & Sha, M. (2024). A Hybrid Credit Risk Evaluation Model Based on Three-Way Decisions and Stacking Ensemble Approach. Comput Econ. https://doi.org/10.1007/s10614-024-10747-6.
44. Marqués Marzal, A. I., García, V., & Sánchez Garreta, J. S. (2013). A literature review on the application of evolutionary computing to credit scoring.
45. Melin, P., Miramontes, I., & Prado-Arechiga, G. (2018). A hybrid model based on modular neural networks and fuzzy systems for classification of blood pressure and hypertension risk diagnosis. Expert Systems with Applications, 107, 146-164.
46. Meng, X. A hybrid model for assessing the price behavior of financial markets: a case study of the HSI. J Ambient Intell Human Comput (2024). https://doi.org/10.1007/s12652-024-04894-9.
47. Moradi, S., Mokhatab Rafiei, F. (2019). A dynamic credit risk assessment model with data mining techniques: evidence from Iranian banks. Financ Innov 5, 15. https://doi.org/10.1186/s40854-019-0121-9.
48. Nica, I., Delcea, C., & Chiriță, N. (2024). Mathematical Patterns in Fuzzy Logic and Artificial Intelligence for Financial Analysis: A Bibliometric Study. Mathematics, 12(5), 782.
49. Noorizadeh, A., Mahdiloo, M., & Farzipoor Saen, R. (2013). Evaluating relative value of customers via data envelopment analysis. Journal of Business & Industrial Marketing, 28(7), 577-588.
50. Onar, S. C., Cebi, S., Kahraman, C., & Oztaysi, B. (2024, July). A Bibliometric Analysis on Fuzzy Approaches in Financial Management. In International Conference on Intelligent and Fuzzy Systems (pp. 116-122). Cham: Springer Nature Switzerland.
51. Oreski, Stjepan & Oreški, Goran. (2014). Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Systems with Applications: An International Journal. 41. 2052-2064. 10.1016/j.eswa.2013.09.004.
52. Ozupek, O., Yilmaz, R., Ghasemkhani, B., Birant, D., & Kut, R. A. (2024). A Novel Hybrid Model (EMD-TI-LSTM) for Enhanced Financial Forecasting with Machine Learning. Mathematics, 12(17), 2794.
53. Paradi, J. C., Yang, Z., & Zhu, H. (2011). Assessing bank and bank branch performance: modeling considerations and approaches. Handbook on data envelopment analysis, 315-361.
54. Qin, C., Zhang, Y., Bao, F., Zhang, C., Liu, P., & Liu, P. (2021). XGBoost optimized by adaptive particle swarm optimization for credit scoring. Mathematical Problems in Engineering, 2021(1), 6655510.
55. Rajabipour Meybodi, A., Legzian, M. and Fasahet, J. (2013). Studying the effect of industry type on the quality of creditworthiness of Iranian banks' legal rights using data envelopment analysis. Research in Production and Operations Management, 4(1), 129-144.
56. Ray, S. C. (2004). Data envelopment analysis: theory and techniques for economics and operations research. Cambridge university press.
57. Rosenzweig, P. (2014). The benefits—and limits—of decision models. McKinsey Quarterly, 1, 106-115.
58. Shen, C., & Wu, J. (2025). Research on credit risk of listed companies: a hybrid model based on TCN and DilateFormer. Scientific Reports, 15(1), 2599.
59. Shi, S., Tse, R., Luo, W., D’Addona, S., & Pau, G. (2022). Machine learning-driven credit risk: a systemic review. Neural Computing and Applications, 34(17), 14327-14339.
60. Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems, 25, 2951-2959.
61. Vaisband, M., Schubert, M., Gassner, F. J., Geisberger, R., Greil, R., Zaborsky, N., & Hasenauer, J. (2023). Validation of genetic variants from NGS data using deep convolutional neural networks. BMC bioinformatics, 24(1), 158.
62. Wasserbacher, H., & Spindler, M. (2022). Machine learning for financial forecasting, planning and analysis: recent developments and pitfalls. Digital Finance, 4(1), 63-88.
63. Yan, L. (2013). Modeling Fuzzy Data with Fuzzy Data Types in Fuzzy Database and XML Models. Int. Arab J. Inf. Technol., 10(6), 610-615.
64. Yanofsky, C. M., & Bickel, D. R. (2010). Validation of differential gene expression algorithms: application comparing fold-change estimation to hypothesis testing. BMC bioinformatics, 11, 1-14.
65. Yu, C., Jin, Y., Xing, Q., Zhang, Y., Guo, S., & Meng, S. (2024). Advanced user credit risk prediction model using lightgbm, xgboost and tabnet with smoteenn. arXiv preprint arXiv:2408.03497.
66. Zadeh, L. A. (1965). Fuzzy sets. Information and Control.
67. Zalasiński, M., Łapa, K., & Cpałka, K. (2018). Prediction of values of the dynamic signature features. Expert Systems with Applications, 104, 86-96.
68. Zedda, S. (2024). Credit scoring: does XGboost outperform logistic regression? A test on Italian SMEs. Research in International Business and Finance, 102397.
69. Zhang, C., & Ma, Y. (2012). Ensemble machine learning (Vol. 144). New York: springer.
70. Zhou, Y., Wang, Y., Wang, K., Kang, L., Peng, F., Wang, L., & Pang, J. (2020). Hybrid genetic algorithm method for efficient and robust evaluation of remaining useful life of supercapacitors. Applied Energy, 260, 114169.