A New Multi-Stage Feature Selection and Classification Approach: Bank Customer Credit Risk Scoring
Subject Areas : Data Mining
1 - Islamic Azad University
Keywords: Genetic Algorithm, Clustering, Credit risk prediction, filtering method, Hybrid feature selection,
Abstract :
AbstractLots of information about customers are stored in the databases of banks. These databases can be used to assess the credit risk. Feature selection is a well-known concept to reduce the dimension of such databases. In this paper, a multi-stage feature selection approach is proposed to reduce the dimension of database of an Iranian bank including 50 features. The first stage of this paper is devoted to removal of correlated features. The second stage of it is allocated to select the important features with genetic algorithm. The third stage is proposed to weight the variables using different filtering methods. The fourth stage selects feature through clustering algorithm. Finally, selected features are entered into the K-nearest neighbor (K-NN) and Decision Tree (DT) classification algorithms. The aim of the paper is to predict the likelihood of risk for each customer based on effective and optimum subset of features available from the customers.
[1] Abdi, F., Khalili-Damghani, K., Abolmakarem, S. (2017)
"Solving customer insurance coverage sales plan problem
using a multi-stage data mining approach", Kybernetes, 47(1) .
https://doi.org/10.1108/K-07-2017-0244
[2] Apornak A., Raissi S., Keramati A., Khalili-Damghani K.,
(2020), optimizing human resource cost of an emergency
hospital using multi-objective Bat algorithm, International
Journal of Healthcare Management, 1-7
[3] Arora, N., Kaur, P.D. (2020) "A Bolasso based consistent
feature selection enabled random forest classification
algorithm: An application to credit risk assessment", Applied
Soft Computing, 86, 105936,
https://doi.org/10.1016/j.asoc.2019.105936
[4] Bijak K, Thomas L.C. (2012). "Does segmentation always
improve model performance in credit scoring?" Expert Systems
with Applications 39, 2433–2442
[5] Danenas P., Garsva G. (2015). "Selection of Support Vector
Machines based classifiers for credit risk domain", Expert
Systems with Applications, 42, 3194–3204
[6] Florez-Lopez R., Ramon-Jeronimo J.M., (2015) "Enhancing
accuracy and interpretability of ensemble strategies in credit
risk assessment. A correlated-adjusted decision forest
proposal", Expert Systems with Applications, 42, 5737–5753
[7] Guyon S, Elisseeff A, (2003) "An Introduction to Variable and
Feature Selection", Journal of Machine Learning Research, 3,
1157-1182
[8] Hajek P., Michalak K., (2013). "Feature selection in corporate
credit rating prediction", Knowledge-Based Systems, 51, 72–84
[9] Harris T. (2015). "Credit scoring using the clustered support vector
machine", Expert Systems with Applications, 42, 741–750
[10] Henley, W. E. (1995). "Statistical aspects of credit scoring.
Dissertation", The Open University, Milton Keynes, UK.
[11] Hens A.B., Tiwari M.K. (2012) "Computational time reduction
for credit scoring: An integrated approach based on support
vector machine and stratified sampling method", Expert
Systems with Applications, 39, 6774–6781
[12] Hsieh N-C, Hung L-P. (2010) "A data driven ensemble
classifier for credit scoring analysis", Expert Systems with
Applications, 37, 534–545
[13] Khalili-Damghani, K., Abdi, F., Abolmakarem, S. (2018) "
Hybrid soft computing approach based on clustering, rule
mining, and decision tree analysis for customer segmentation
problem: Real case of customer-centric industries", Applied
Soft Computing, 73, 816-828
[14] Khalili-Damghani, K., Abdi, F., Abolmakarem, S. (2018)
"Solving customer insurance coverage recommendation
problem using a two-stage clustering-classification model",
International Journal of Management Science and
Engineering Management, 14(1)9-19
[15] Khashei M,Rezvan M.T., A ZeinalHamadani, AND MBijari.
(2013). "A Bi-Level Neural-Based Fuzzy Classification
Approach for Credit Scoring Problems", Complexity, 18 (6),
46-57.
[16] Khashman A. (2010) "Neural networks for credit risk
evaluation: Investigation of different neural models and
learning schemes", Expert Systems with Applications, 37,
6233–6239
[17] Khashman A. (2011) "Credit risk evaluation using neural
networks: Emotional versus conventional models", Applied
Soft Computing 11, 5477–5484
[18] Kittidecha C, Yamada K (2018) Application of Kansei
engineering and data mining in the Thai ceramic
manufacturing. Journal of Industrial Engineering International
14, 757–766
Int. https://doi.org/10.1007/s40092-018-0253-y
[19] Laha A. (2007). "Building contextual classifiers by integrating
fuzzy rule based classification technique and K-NN method for
credit scoring", Advanced Engineering Informatics, 21, 281–
291
[20] Larose D. T., Larose C.D., (2014) "Discovering knowledge in
data: an introduction to data mining", Second ed., John Wiley
& Sons, Inc., Hoboken, New Jersey.
[21] Lessmann S, Baesens B, Seow H-V, and Thomas L.C., (2015),
"Benchmarking state-of-the-art classification algorithms for
credit scoring: An update of research", European Journal of
Operational Research, 247(1),124-136
[22] Maldonado S, Perez J, Bravo C (2017) "Cost-based feature
selection for Support Vector Machines –An application in
credit scoring", European Journal of Operational Research,
261 (2) 656–665
[23] Marqués A.I, Garcia. V., Sanches J.S. (2012) "Two-level
classifier ensembles for credit risk assessment", Expert Systems
with Applications, 39, 10916–10922
[24] Moradkhani M, Amiri A, Javaherian M, Safari H, (2015) "A
hybrid algorithm for feature subset selection in high-
dimensional datasets using FICA and IWSSr algorithm",
Applied Soft Computing, 35, 123-135
[25] Nalić, J., Martinović, G, Žagar, D. (2020) "New hybrid data
mining model for credit scoring based on feature selection
algorithm and ensemble classifiers", Advanced Engineering
Informatics, 45, 101130
[26] Nourian R. , Meysam Mousavi S., Raissi S., (2019) A fuzzy
expert system for mitigation of risks and effective control of
gas pressure reduction stations with a real application, Journal
of Loss Prevention in the Process Industries,59, 77-90.
[27] Oreski S, Oreski D, Oreski G, (2012) "Hybrid system with
genetic algorithm and artificial neural networks and its
application to retail credit risk assessment", Expert Systems
with Applications, 39, 12605–12617
[28] Oreski S, Oreski G. (2014). "Genetic algorithm-based heuristic
for feature selection in credit risk assessment", Expert Systems
with Applications, 41 (4) 2052-2064
[29] Papouskova, M., Hajek, P. (2019) "Two-stage consumer credit
risk modelling using heterogeneous ensemble learning",
Decision Support Systems, Vol.118, pp.33-45
[30] Pelleg D., Moore A. (2002). "X-means: Extending K-means
with Efficient Estimation of the Number of Clusters",
Proceedings of the Seventeenth International Conference on
Machine Learning, PP. 727-734.
[31] Pławiak, P., Abdar, M., Acharya, UR. (2019), "Application of
new deep genetic cascade ensemble of SVM classifiers to
predict the Australian credit scoring", Applied Soft Computing,
Vol. 84, 105740, https://doi.org/10.1016/j.asoc.2019.105740
[32] Ping Y., Yongheng L. (2011). "Neighborhood rough set and
SVM based hybrid credit scoring classifier", Expert Systems
with Applications, 38, 11300–11304
[33] Rtayli, N. Enneya, N. (2020) "Selection Features and Support
Vector Machine for Credit Card Risk Identification", Procedia
Manufacturing, 45, 941-948.
[34] Shen, F., Zhao, X., Li. Z., Li. K., Meng. Z. (2019) "A novel
ensemble classification model based on neural networks and a
classifier optimisation technique for imbalanced credit risk
evaluation", Physica A: Statistical Mechanics and its
Applications, Vol. 256, 121073,
https://doi.org/10.1016/j.physa.2019.121073
[35] Thomas L. C., Edelman D. B., Crook J. N. (2002). "Credit
scoring and its applications". Philadelphia, PA: SIAM.
[36] Tsai C-F, Eberle W, Chu C-Y, (2013). "Genetic algorithms in
feature and instance selection", Knowledge-Based Systems, 39,
240–247
[37] Tsai C-F., Hsu Y-F., Yen D.C., (2014) "A comparative study
of classifier ensembles for bankruptcy prediction", Applied Soft
Computing. 24, 977–98.
[38] Wang G, Ma J, Huang L, Xu K, (2012) "Two credit scoring
models based on dual strategy ensemble trees", Knowledge-
Based Systems, 26, 61–68
[39] Wang G, Ma J, (2012). "A hybrid ensemble approach for
enterprise credit risk assessment based on Support Vector
Machine", Expert Systems with Applications, 39, 5325–5331
[40] Wang D, Zhang Z, Bai R, Mao Y. (2018). "A hybrid system
with filter approach and multiple population genetic algorithm
for feature selection in credit scoring", Journal of
Computational and Applied Mathematics, 329, 307-321
[41] Wu, W.-W. (2011). "Improving classification accuracy and
causal knowledge for better credit decisions". International
Journal of Neural Systems, 21(04), 297–309
[42] Xiao J, Xie L, He C, Jiang X, (2012). "Dynamic classifier
ensemble model for customer classification with imbalanced
class distribution", Expert Systems with Applications, 39,
3668–3675
[43] Yap B. W., Ong S.H., Mohamed Husain N.H. (2011). "Using
data mining to improve assessment of credit worthiness via
credit scoring models", Expert Systems with Applications, 38,
13274–13283
[44] Yu L., Yao X., WangSh., LaiK.K.(2011). "Credit risk
evaluation using a weighted least squares SVM classifier with
design of experiment for parameter selection", Expert Systems
with Applications, 38, 15392–15399
[45] Zhao Z, XuSh, KangB, KabirM.M.J., LiuY, and Wasinger R.
(2015) "Investigation and improvement of multi-layer
perceptron neural networks for credit scoring", Expert Systems
with Applications, 42 (7) 3508–3516
[46] Zhu, H., Beling, P. A., and Overstreet, G. A. (2002). "A
Bayesian framework for the combination of classifier outputs".
The Journal of the Operational Research Society, 53(7), 719–
727.