Risk Classification of Imbalanced Data for Car Insurance Companies: Machine Learning Approaches
Subject Areas : International Journal of Mathematical Modelling & ComputationsFarzan Khamesian 1 , Maryam Esna-Ashari 2 , Eric Dei Ofosu-Hene 3 , Farbod Khanizadeh 4
1 - Insurance Research Center, Tehran, Iran
2 - Insurance Research Center, Tehran, Iran
3 - Department of Accounting and Finance, Faculty of Business and Law, De Montfort University, Leicester, UK
4 - Insurance Research Center, Tehran, Iran
Keywords: Classification, Machine Learning, supervised Learning, Imbalanced Data, Claim Risk,
Abstract :
This paper presents a mechanism for insurance companies to assess the most effective features to classify the risk of their customers for third party liability (TPL) car insurance. Basically, the process of underwriting is carried out based on the expert experiences and the industry suffers from lack of a systematic method to categorize their policyholders with respect to the risk level. We analyzed 13,388 observations of an insurance claim dataset from body injury reports provided by an Iranian insurance company. The main challenge is the imbalanced dataset. Here we employ logistic regression and random forest with different resampling of the original data in order to increase the performance of models. Results indicate that the random forest with the hybrid resampling methods is the best classifier and furthermore, victim age, premium, car age and insured age are the most important factors for claims prediction.