Improved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring
محورهای موضوعی : Operation ResearchMajid Mohammadi Rad 1 , Mahdi Afzali 2
1 - Department of Computer and Information Technology, Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
2 - Faculty of Computer Engineering, Islamic Azad University, Zanjan Branch, Zanjan,Iran
کلید واژه: Data mining, Clustering, Evolution Algorithm, Credit Score, Clustering Validity Measure,
چکیده مقاله :
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank with the credit scoring approach. A survey was also used to measure the clustering validity index which resulted in a new validity index. Finally, the results were compared to identify the best algorithm and validity measure.
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank with the credit scoring approach. A survey was also used to measure the clustering validity index which resulted in a new validity index. Finally, the results were compared to identify the best algorithm and validity measure.
Ben-David, A. (2008). Rule effectiveness in rule-based systems: A credit scoring case study. Expert Systems with Applications, 34(4), 2783-2788.
Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1), 1-27.
Chou, C. H., Su, M. C., & Lai, E. (2004). A new cluster validity measure and its application to image compression. Pattern Analysis and Applications, 7(2), 205-220.
Chou, C. H., Su, M. C., & Lai, E. (2004). A new cluster validity measure and its application to image compression. Pattern Analysis and Applications, 7(2), 205-220.
Crook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3), 1447-1465.
Das, S., & Konar, A. (2009). Automatic image pixel clustering with an improved differential evolution. Applied Soft Computing, 9(1), 226-236.
Das, S., Abraham, A., & Konar, A. (2008). Automatic clustering using an improved differential evolution algorithm. IEEE Transactions on systems, man, and cybernetics-Part A: Systems and Humans, 38(1), 218-237.
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2), 224-227.
Desai, V. S., Conway, D. G., Crook, J. N., & Overstreet Jr, G. A. (1997). Credit-scoring models in the credit-union environment using neural networks and genetic algorithms. IMA Journal of Management Mathematics, 8(4), 323-346.
Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics, 4(1), 95-104.
Hand, D. J., & Vinciotti, V. (2003). Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern recognition letters, 24(9), 1555-1562.
Harrell, F. E., & Lee, K. L. (1985). A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality. Biostatistics: Statistics in Biomedical, Public Health and Environmental Sciences’, North-Holland, New York, United States, 333-343.
Holland, J. H. (1975). Adaption in natural and artificial systems. Ann Arbor MI: The University of Michigan Press.
Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert systems with applications, 33(4), 847-856.
Islam, M. J., Wu, Q. J., Ahmadi, M., & Sid-Ahmed, M. A. (2007, November). Investigating the performance of naive-bayes classifiers and k-nearest neighbor classifiers. In Convergence Information Technology, 2007. International Conference on (pp. 1541-1546). IEEE.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323.
Keramati, A., & Yousefi, N. (2011, January). A proposed classification of data mining techniques in credit scoring. In Proc. 2011 Int. Conf. on Industrial Engineering and Operations Management Kuala Lumpur, Malaysia.
Kettani, O., Ramdani, F., & Tadili, B. (2015). AK-means: an automatic clustering algorithm based on K-means. Journal of Advanced Computer Science & Technology, 4(2), 231-236.
Kuo, R., & Zulvia, F. (2013). Automatic clustering using an improved particle swarm optimization. Journal of Industrial and Intelligent Information, 1(1).
Lahsasna, A., Ainon, R. N., & Teh, Y. W. (2010). Credit Scoring Models Using Soft Computing Methods: A Survey. Int. Arab J. Inf. Technol., 7(2), 115-123.
Li, F. C. (2009, August). The hybrid credit scoring strategies based on knn classifier. In Fuzzy Systems and Knowledge Discovery, 2009. FSKD'09. Sixth International Conference on (Vol. 1, pp. 330-334). IEEE..
Marinakis, Y., Marinaki, M., Doumpos, M., Matsatsinis, N., & Zopounidis, C. (2008). Optimization of nearest neighbor classifiers via metaheuristic algorithms for credit risk assessment. Journal of Global Optimization, 42(2), 279-293.
Pakhira, M. K., Bandyopadhyay, S., & Maulik, U. (2004). Validity index for crisp and fuzzy clusters. Pattern recognition, 37(3), 487-501.
Paredes, R., & Vidal, E. (2000). A class-dependent weighted dissimilarity measure for nearest neighbor classification problems. Pattern Recognition Letters, 21(12), 1027-1036.
Raposo, C., Antunes, C. H., & Barreto, J. P. (2014, June). Automatic Clustering using a Genetic Algorithm with New Solution Encoding and Operators. In International Conference on Computational Science and Its Applications (pp. 92-103). Springer, Cham.
Raposo, C., Antunes, C. H., & Barreto, J. P. (2014, June). Automatic Clustering using a Genetic Algorithm with New Solution Encoding and Operators. In International Conference on Computational Science and Its Applications (pp. 92-103). Springer, Cham.
Raposo, C., Antunes, C. H., & Barreto, J. P. (2014, June). Automatic Clustering using a Genetic Algorithm with New Solution Encoding and Operators. In International Conference on Computational Science and Its Applications (pp. 92-103). Springer, Cham
Sabzevari, H., Soleymani, M., & Noorbakhsh, E. (2007). A comparison between statistical and data mining methods for credit scoring in case of limited available data. In Proceedings of the 3rd CRC Credit Scoring Conference, Edinburgh, UK.
Sadatrasoul, S., Gholamian, M., & Shahanaghi, K. (2015). Combination of Feature Selection and Optimized Fuzzy Apriori Rules: The Case of Credit Scoring. International Arab Journal of Information Technology (IAJIT), 12(2).
Srikrishna, A., Srinivas, V. S., & Jetson, V. R. A naive Fuzzy Clustering Method for Pixel Segmentation by using Differential Evolution.
Tsai, C. F., & Wu, J. W. (2008). Using neural network ensembles for bankruptcy prediction and credit scoring. Expert systems with applications, 34(4), 2639-2649.
Van Gestel, T., & Baesens, B. (2009). Credit Risk Management: Basic concepts: Financial risk components, Rating analysis, models, economic and regulatory capital. Oxford University Press.
West, D., Dellana, S., & Qian, J. (2005). Neural network ensemble strategies for financial decision applications. Computers & operations research, 32(10), 2543-2559.