پیش‌بینی میزان هزینۀ سالانۀ بیمۀ درمانی با استفاده از یادگیری ماشین

محورهای موضوعی : کاربرد هوش مصنوعی و فناوری اطلاعات

1 - دانشجوی کارشناسی ارشد هوش مصنوعی، دانشگاه امام حسین (ع)، تهران ، ایران
2 - دانشجوی دکتری هوش مصنوعی، دانشگاه زنجان، زنجان، ایران

تاریخ دریافت : 1402/12/19 تاریخ پذیرش : 1403/05/13 تاریخ انتشار : 1403/05/13

کلید واژه: بیمۀ درمانی, هزینۀ درمانی, طبقه‌بندی, یادگیری ماشین,

چکیده مقاله :

بیمۀ درمانی، یکی از راهکارهای کاهش هزینه‌های تحمیلی بر افراد جامعه است. مطالعه و بررسى در حوزۀ خسارات و بیماری‌ها، کمک می‌کند تا ذی‌نفعان به‌راحتی بتوانند دراین‌خصوص سیاست‌گزاری كنند. نرخ بيمه، تحت‌تأثیر برخى مسائل پزشكى است. برآورد دقيق هزینه‌های مراقبت‌های بهداشتی فردی و درمانی، برای طیفی از ذی‌نفعان و آژانس‌های بهداشتى مهم است. ازاین‌رو با پیش‌بینی هزینه‌های درمانی، هم بیمه‌شونده و هم بیمه‌گذار، می‌توانند تا حدودی آینده را پیش‌بینی کنند و گزینه‌های بهتری برای تصمیم‌گیری داشته باشند. پیش‌بینی هزینه‌کرد کم، متوسط یا زیاد افراد برای درمان بیماری و شناسایی عوامل مؤثر در هزینه‌های بیمۀ درمانی، از اهداف این مقاله است. در این مقاله از داده‌های ادارۀ سرشماری جمعیت امریکا مشتمل بر 1338 نمونه با ویژگی‌های سن، جنسیت، شاخص تودۀ بدنی (BMI)، سیگاری‌بودن، تعداد افراد تحت تکفل، منطقه و هزینۀ سالانه، استفاده شده است. در روش پیشنهادی ابتدا به تحلیل و بررسی مجموعه‌داده پرداخته می‌شود تا یک دید کلی از آن به دست آید و عوامل تأثیرگذار در هزینۀ درمانی شناسایی شوند. سپس با پیش‌پردازش و دسته‌بندی هزینه‌ها به کم، متوسط و زیاد، داده‌ها به شکل مناسب برای طبقه‌بندی تبدیل می‌شوند. در مرحلۀ بعد، از الگوریتم‌های طبقه‌بندی برای یادگیری دستۀ هر کدام از نمونه‌ها استفاده می‌شود و با ارزیابی آن‌ها، بهترین الگوریتم انتخاب می‌شود. در انتها با روش بهبود پارامتر و تنظیم پارامترهای الگوریتم، عملکرد الگوریتم بهبود می‌یابد و مدل پیش‌بینی میزان هزینۀ سالانه ایجاد می‌شود. بررسی مجموعه‌داده نشان داد که سیگاری‌بودن، افزایش سن و اضافه‌وزن بر روی هزینه‌های درمانی تأثیر گذارند. نتایج طبقه‌بندی نیز بیانگر این است که الگوریتم جنگل تصادفی با دقت 91% توانایی پیش‌بینی میزان هزینه‌کرد کم، متوسط و زیاد برای درمان بیماری را دارد.

چکیده انگلیسی:

Health insurance is one of the ways to reduce the costs imposed on society.Studying and researching in the field of damages and diseases helps the stakeholders to easily make policies in this regard.The insurance rate is affected by some medical issues. Accurate estimation of individual health care and treatment costs is important for a range of stakeholders and health agencies.Therefore, by predicting medical expenses, both the insured and the insurer can predict the future to some extent and have better options for making decisions. One of the goals of this article is to predict the low, medium or high spending of people for the treatment of the disease and to identify the effective factors in health insurance costs. In this article, the data of the US Census Bureau including 1338 samples with the features of age, gender, body mass index (BMI),smoking,number of dependents,region and annual cost are used. In the proposed method, the data set is first analyzed and reviewed in order to get a general view of it and to identify the influencing factors in the treatment cost.Then, by pre-processing and categorizing costs into low, medium and high, the data is converted into a form suitable for classification. In the next step, classification algorithms are used to learn the category of each of the samples, and by evaluating them, the best algorithm is selected. In the end, with the method of parameter improvement and algorithm parameters adjustment, the performance of the algorithm is improved and the annual cost prediction model is created.Examining the dataset showed that being a smoking, increasing age and being overweight have an effect on treatment costs.The classification results also show that the random forest algorithm has the ability to predict low, medium, and high costs for disease treatment with 91% accuracy.

منابع و مأخذ:

Arab, M., Fathian, M., & Aliahmadi Jeshfaghani, H. (2022). Forecast of Medical Expenses of Iran Health Insurance Organization Using Machine Learning Based Methods. Iranian Journal of Health Insurance, 0-0.
Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20, 273-297.
Dong, S., & Fei, D. (2021). Improve the interpretability by decision tree regression: exampled by an insurance dataset. 2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI),
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.
Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2), 1.
Islam, M. A., Nag, A., Chandra, P., Fahim, S. F. A., & Hoque, M. M. (2023). Healthcare Cost Patterns and Prediction: Investigating Personal Datasets Using Data Analytics. Authorea Preprints.
Lantz, B. (2019). Machine learning with R: expert techniques for predictive modeling. Packt publishing ltd.
Loh, W. Y. (2011). Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(1), 14-23.
Marquardt, D. W., & Snee, R. D. (1975). Ridge regression in practice. The American Statistician, 29(1), 3-20.
Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 workshop on empirical methods in artificial intelligence,
Schapire, R. E., & Freund, Y. (2013). Boosting: Foundations and algorithms. Kybernetes, 42(1), 164-166.
Syarif, I., Prugel-Bennett, A., & Wills, G. (2016). SVM parameter optimization using grid search and genetic algorithm to improve classification performance. TELKOMNIKA (Telecommunication Computing Electronics and Control), 14(4), 1502-1509.
Tajaddodi Nodehi, M., Hosseini Khatibani, S., Yazdinejad, M., & Zolfi, S. (2023). Predicting people's health insurance costs using machine learning and ensemble learning methods. Iranian Journal of Insurance Research, 13(1), 1-14. https://doi.org/10.22056/ijir.2024.01.01
Tianqi, C., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.
Zhang, S., Li, X., Zong, M., Zhu, X., & Wang, R. (2017). Efficient kNN classification with different numbers of nearest neighbors. IEEE transactions on neural networks and learning systems, 29(5), 1774-1785.

مقالات مرتبط

مدیریت عملکرد کسب و کار توسط عینی‏ سازي داده‏ها با داشبورد سلف سرویس در عصر تحول دیجیتال
تاریخ چاپ : 1403/05/13
بررسی اثر هوش مصنوعی در حکمرانی شبکه ای سازمانهای خدماتی، مورد مطالعه: صنعت توزیع گاز ایران
تاریخ چاپ : 1403/05/13

اشتراک گذاری

آدرس مقاله

پیش‌بینی میزان هزینۀ سالانۀ بیمۀ درمانی با استفاده از یادگیری ماشین