Presenting a Hybrid Model based on the Machine Learning for the Classification of Banking and Insurance Industry Common Customers
Subject Areas : Business ManagementHamidreza Amirhassankhani 1 , Abbas Toloie Eshlaghy 2 * , Reza Radfar 3 , Alireza pourebrahimi 4
1 - Ph.D. Candidate of Information Technology Management Group, UAE Branch, Islamic Azad University, Dubai, UAE
2 - Professor, Department of Industrial Management, Science and Research Unit, Islamic Azad University, Tehran, Iran
3 - Professor, Department of Industrial Management, Science and Research Unit, Islamic Azad University, Tehran, Iran
4 - Assistant Professor, Department of Management, Karaj Branch, Islamic Azad University, Karaj, Iran
Keywords: Genetic Algorithm, Classification, Support vector machine, Insurance, Bank,
Abstract :
Global competition, dynamic markets, and rapidly shrinking innovation and technology cycles, all have imposed significant challenges on the financial, banking, and insurance industries and the need to data analysis for improving decision-making processes in these organizations has become increasingly important. In this regard, the data stored in the databases of these organizations are considered as valuable sources of information and knowledge needed for organizational decisions. In the present research, the researchers focus on the common customers of the bank and insurance industry. The purpose is to provide a methodology to predict the performance of new customers based on the behavior of previous customers. To this end, a hybrid model based on support vector machine and genetic algorithm is used. The support vector machine is responsible for modeling the relationship between customer performance and their identity information and the genetic algorithm is responsible for tuning and optimizing the parameters of the support vector machine. The results obtained from customer classification using the proposed model in this research led to customer classification with a high accuracy of 99%.
Key Words: support vector machine, genetic algorithm, classification, banking, insurance.
- Introduction
In this research, the researchers aim to present an efficient model based on support vector machine and genetic algorithm for classifying and predicting the performance of new common customers of banking and insurance industry. The purpose of this research is to enable investment holdings that are common
shareholders of banks and insurance companies to achieve the highest level of customization in decision making for customers and adopt diverse and efficient decisions in accordance with their customers' characteristics and strengthen interactions with customers, better meet customer needs and improve customer satisfaction and loyalty. Accordingly, these holdings can achieve significant results in each of the above-mentioned areas by strengthening databases, communication links of information companies and increasing accuracy in entering and registering initial information and relying on machine learning methods.
- Literature Review
Among the studies that have been conducted in recent years in the field of banking industry customer classification, the study of Jamshidi et.al. (2019) is included. They presented a multi-objective approach based on adaptive neuro-diffusion inference system for detecting bank money laundering and currency exchange. Magomedov et al. (2018), Dorofeev et al. (2018) and Plaksiy et al. (2018) have used machine learning methods based on artificial intelligence to design and monitor anti-money laundering systems. Leite et al. (2019) and Tiwari et al. (2020) have compiled a rich collection of researches based on machine learning and artificial intelligence to deal with money laundering and other banking crimes in their review papers.
- Methodology
In this study, the researchers aim to model the classification of common customers of banking and insurance industry using a hybrid method based on support vector machine network and optimization using genetic algorithm. For this purpose, first the independent and dependent variables are determined. In this regard, the identity information of customers is defined as the independent variables and the class that each customer is placed in as the dependent variable. In the next step, the customer set is divided into two groups of training and testing data. The data is randomly divided into two groups of training and testing, such that 90 percent of the data is used in the training phase and the rest in the testing phase.
- Result
The criteria of accuracy, recall and precision are used to evaluate the methods of predicting the class of common insurance and bank customers in this research. The most important criterion for determining the efficiency of classification techniques is the Accuracy criterion. This measure calculates the overall accuracy of a classifier. It indicates the fact that the designed classifier has correctly classified what percentage of the entire set of test records. The results obtained in this research show that the support vector machine set by the genetic algorithm for customer classification has correctly recognized 99.98% of the test data. Considering the desired amount of the three criteria of accuracy, recall and precision of this combined method, it is found that this method is able to efficiently classify common bank and insurance customers.
- Discussion
In this research, the researchers implemented a support vector machine for classifying common customers of banking and insurance and examined the obtained results. After going through the training process and obtaining the optimal parameters of the support vector machines using the genetic algorithm, the performance of this method was evaluated in the testing phase with 6060 customers whose information was not given to the support vector machines in the training phase. The comparison of the output of the support vector machine network with the actual class of customers indicates the appropriate fit of the outputs obtained from the support vector machine network with the real data. Based on the results obtained, the classification error of the proposed model is 0.0003. These results mean that the accuracy of the performance of the support vector machine is about 99.97 percent, which can be considered as an acceptable accuracy. Nowadays, in most organizations, data is rapidly being collected and stored. However, it can be argued that despite the existence of a large volume of data, organizations generally face a lack of knowledge in decision-making. Although using various conventional reporting tools, information can be provided to users so that they can draw conclusions about the data and the logical relationships between them, when a huge volume of data is involved, even experienced and professional users cannot detect useful patterns in the abundance of data. Nowadays, machine learning techniques have been considered to meet the needs of various organizations and companies in discovering knowledge from a large volume of data. Data mining is the process of extracting information and knowledge and discovering hidden patterns from a very large database. Telecommunication companies, banks, insurance companies, advertising companies and all companies that have large databases can use data mining to improve their decision-making processes. Data mining causes organizations to reach higher levels of knowledge and unknown patterns from the data level. The extracted patterns can be a relationship between the features and characteristics of the system such as the type of demand and the type of customer, future predictions based on the system characteristics, rules (if-then) between the system variables, classifications and clustering of objects and records similar to each other in a system, and the like.
Abdou, H., Pointon, J., & El-Masry, A. (2008). Neural nets versus conventional techniques in credit scoring in Egyptian banking. Expert Systems with Applications, 35(3), 1275-1292. doi:10.1016/j.eswa.2007.08.030
Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision support systems, 50(3), 602-613. doi:10.1016/j.dss.2010.08.008
Boyacioglu, M. A., Kara, Y., & Baykan, Ö. K. (2009). Predicting bank financial failures using neural networks, support vector machines and multivariate statistical methods: A comparative analysis in the sample of savings deposit insurance fund (SDIF) transferred banks in Turkey. Expert Systems with Applications, 36(2), 3355-3366. doi:10.1016/j.eswa.2008.01.003
Chen, F. L., & Li, F. C. (2010). Combination of feature selection approaches with SVM in credit scoring. Expert systems with applications, 37(7), 4902-4909. doi:10.1016/j.eswa.2009.12.025
Chu, B. H., Tsai, M. S., & Ho, C. S. (2007). Toward a hybrid data mining model for customer retention. Knowledge-Based Systems, 20(8), 703-718. do:10.1016/j.knosys.2006.10.003
Dorofeev, D., Khrestina, M., Usubaliev, T., Dobrotvorskiy, A., & Filatov, S. (2018, May). Application of machine analysis algorithms to automate implementation of tasks of combating criminal money laundering. In International Conference on Digital Transformation and Global Society (pp. 375-385). Springer, Cham.
Duman, E., & Ozcelik, M. H. (2011). Detecting credit card fraud by genetic algorithm and scatter search. Expert Systems with Applications, 38(10), 13057-13063. doi:10.1016/j.eswa.2011.04.110
Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert systems with applications, 33(4), 847-856. doi:10.1016/j.eswa.2006.07.007
Huang, Y. M., Hung, C. M., & Jiau, H. C. (2006). Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem. Nonlinear Analysis: Real World Applications, 7(4), 720-747. doi:10.1016/j.nonrwa.2005.04.006
Jamshidi, M. B., Gorjiankhanzad, M., Lalbakhsh, A., & Roshani, S. (2019, May). A novel multiobjective approach for detecting money laundering with a neuro-fuzzy technique. In 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC) (pp. 454-458). IEEE. doi:10.1109/ICNSC.2019.8743234
Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data mining techniques for the detection of fraudulent financial statements. Expert systems with applications, 32(4), 995-1003. doi:10.1016/j.eswa.2006.02.016
Lee, B., Cho, H., Chae, M., & Shim, S. (2010). Empirical analysis of online auction fraud: Credit card phantom transactions. Expert Systems with Applications, 37(4), 2991-2999. doi:10.1016/j.eswa.2009.09.034
Lee, T. S., Chiu, C. C., Chou, Y. C., & Lu, C. J. (2006). Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Computational Statistics & Data Analysis, 50(4), 1113-1130. doi:10.1016/j.csda.2004.11.006
Lin, C. S., Tzeng, G. H., & Chin, Y. C. (2011). Combined rough set theory and flow network graph to predict customer churn in credit card accounts. Expert Systems with Applications, 38(1),8-15. doi:10.1016/j.eswa.2010.05.039
Lin, S. W., Shiue, Y. R., Chen, S. C., & Cheng, H. M. (2009). Applying enhanced data mining approaches in predicting bank performance: A case of Taiwanese commercial banks. Expert Systems with Applications, 36(9), 11543-11551. doi:10.1016/j.eswa.2009.03.029
Luo, S. T., Cheng, B. W., & Hsieh, C. H. (2009). Prediction model building with clustering-launched classification and support vector machines in credit scoring. Expert Systems with Applications, 36(4), 7562-7566. doi:10.1016/j.eswa.2008.09.028
Magomedov, G. S., Dobrotvorsky, A. S., Khrestina, M. P., Pavelyev, S. A., & Yusubaliev, T. R. (2018). Application of Artificial Intelligence Technologies for the Monitoring of Transactions in AML-Systems Using the Example of the Developed Classification Algorithm. Int. J. Eng. Technol, 7, 76-79.
Nie, G., Rowe, W., Zhang, L., Tian, Y., & Shi, Y. (2011). Credit card churn forecasting by logistic regression and decision tree. Expert Systems with Applications, 38(12), 15273-15285. doi:10.1016/j.eswa.2011. 06.028
Paasch, C. A. (2008). Credit card fraud detection using artificial neural networks tuned by genetic algorithms. Hong Kong University of Science and Technology (Hong Kong), 1-1112.
Plaksiy, K., Nikiforov, A., & Miloslavskaya, N. (2018, August). Applying big data technologies to detect cases of money laundering and counter financing of terrorism. In 2018 6th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW) (pp. 70-77). IEEE. 10.1109/W-FiCloud.2018.00017
Sobreira Leite, G., Bessa Albuquerque, A., & Rogerio Pinheiro, P. (2019). Application of technological solutions in the fight against money laundering—A systematic literature review. Applied Sciences, 9(22), 1-29. doi:10.3390/app9224800
Quah, J. T., & Sriganesh, M. (2008). Real-time credit card fraud detection using computational intelligence. Expert systems with applications, 35(4), 1721-1732. doi:10.1016/j.eswa.2007.08.093
Sánchez, D., Vila, M. A., Cerda, L., & Serrano, J. M. (2009). Association rules applied to credit card fraud detection. Expert systems with applications, 36(2), 3630-3640. doi:10.1016/j.eswa.2008.02.001
Šušteršič, M., Mramor, D., & Zupan, J. (2009). Consumer credit scoring models with limited data. Expert Systems with Applications, 36(3), 4736-4744. doi:10.1016/j.eswa.2008.06.016
Tiwari, M., Gepp, A., & Kumar, K. (2020). A review of money laundering literature: the state of research in key areas. Pacific Accounting Review, Vol. 32 No. 2, pp. 271-303. doi:10.1108/PAR-06-2019-0065
Xie, Y., Li, X., Ngai, E. W. T., & Ying, W. (2009). Customer churn prediction using improved balanced random forests. Expert Systems with Applications, 36(3), 5445-5449. doi:10.1016/j.eswa.2008.06.121
Yap, B. W., Ong, S. H., & Husain, N. H. M. (2011). Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Systems with Applications, 38(10), 13274-13283. doi:10.1016/j.eswa.2011.04.147
Zhao, H., Sinha, A. P., & Ge, W. (2009). Effects of feature construction on classification performance: An empirical study in bank failure prediction. Expert Systems with Applications, 36(2), 2633-2644. doi:10.1016/j.eswa.2008.01.053