Using Clustering and Genetic Algorithm Techniques in Optimizing Decision Trees for Credit Scoring of Bank Customers
Subject Areas : FuturologyMahmood Alborzi 1 , Mohammad Khanbabaei 2 , M. E. Mohammad Pourzarandi 3
1 - استادیار و عضو هیئت علمی دانشگاه آزاد اسلامی واحد علوم و تحقیقات
2 - عضو باشگاه پژوهشگران جوان، دانشگاه آزاد اسلامی، واحد علوم و تحقیقات (مسئول مکاتبات)
3 - دانشیار دانشگاه آزاد اسلامی واحد تهران مرکز ی
Keywords: Credit Scoring, Classification, Genetic algorithm, Decision Trees, Feature
, 
, Selection, Clustering,
Abstract :
Decision trees technique as one of the data mining techniques, is used in credit scoring ofbank customers to classify them in order to offer credit facilities. The main problem is incomplexity of decision trees, excessive size, lack of flexibility and low accuracy inclassification. The purpose of this paper is to propose a compound model in the optimization ofdecision trees by using genetic algorithm technique. It appears that genetic algorithm can chooseappropriate features and build decision trees to reduce complexity and increase flexibility indecision trees. In the proposed compound model, the credit data is initially divided into twoclusters by Simple means clustering technique. On the next step, the important credit scoringfeatures in the data set are selected using genetic algorithm and the five feature selectionalgorithm based on Filter, Wrapper and Embedded approaches. Subsequently, five decisiontrees based on C4.5 algorithm in each cluster are constructed with a set of the selected features.The best decision trees in each cluster, are selected and combined based on the desiredoptimality criteria, mentioned in this paper, to construct the final decision tree. WEKA machinelearning tool and GATree software were used to in this purpose. Results show that using theproposed compound model in building decision trees leads to increased classification accuracy,compared to other algorithms in this paper. However the algorithm complexity of the proposedcompound model is more than some of the classification algorithms compared in this paper.