• Home
  • Mahnaz Manteqipour
  • OpenAccess
    • List of Articles Mahnaz Manteqipour

      • Open Access Article

        1 - Designing a hybrid model for classification of imbalanced data in the field of third party insurance
        Mahnaz Manteqipour parisa Rahimkhani
        The major part of Iran's insurance industry portfolio is the field of compulsory civil liability insurance of motor vehicle owners against third parties. Therefore, detecting the behavior of this insurance field will be effective in order to provide better services to t More
        The major part of Iran's insurance industry portfolio is the field of compulsory civil liability insurance of motor vehicle owners against third parties. Therefore, detecting the behavior of this insurance field will be effective in order to provide better services to the customers of the insurance industry. Predicting the claim rates for insurance policies, based on the features saved for each insurance policy, is one of the problems of the insurance industry that can be solved with the help of data mining techniques. Insurance is designed using the law of large numbers. In simpler words, a sufficient number of insurance policies are issued, and a small part of this number of insurance policies deal with claims. From the sum of the issued insurance premiums, the cost of claims will be compensated. Therefore, the insurance industry is faced with imbalanced data. The imbalances of insurance industry data causes many challenges in data classification. In the field of third-party insurance and in the data set of this research, there are 14 features for every policies and the data imbalance ratio is 1 to 0.0092, which is considered severe imbalanced.MethodIn this research, we deal with the classification of severe imbalanced data in the field of third party insurance. To overcome the problem of imbalanced data, two hybrid models with different architectures based on 5 basic Gaussian Bayes models, support vectors, logistic regression, decision tree and nearest neighbor are designed. First proposed hybrid model is using random sampling from whole dataset and applying a resampling method for classification and second one selects samples from each labels separately and apply a classification model on the whole selected data. The results of these models are compared. ResultsThe obtained results show that the proposed hybrid models can predict the occurrence or non-occurrence of traffic accidents better than other data mining algorithms. The popular measures such as precisions and recalls of two proposed hybrid models show that second hybrid model has higher performance. And in ensemble phase, the number of models in simple voting as a hyper parameter can be adjusted based on the company's strategy. Also, the use of decision tree to ensemble basic models to build a combined model provides better results than simple voting of basic models.DiscussionTo do more research on the problem of imbalance data classification more complicated resampling data algorithms could be applied and the results be compared. Manuscript profile