Diabetes detection via machine learning using four implemented spanning tree algorithms
محورهای موضوعی : Data miningYas Ghiasi 1 , Mehdi Seifbarghy 2 , Davar Pishva 3
1 - Department of Industrial Engineering, Faculty of Engineering, Alzahra University, Tehran, Iran
2 - Department of Industrial Engineering, Faculty of Engineering, Alzahra University, Tehran, Iran.
3 - Faculty of Sustainability and Tourism, Ritsumeikan Asia Pacific University (APU),Beppu, Oita, Japan
کلید واژه: Diabetes, Data mining, Machine Learning, Multi-criteria decision-making, MCDM, Tree-based algorithms,
چکیده مقاله :
This paper considers an accurate and efficient diabetes detection scheme via machine learning. It uses the science of data mining and pattern matching in its diabetes diagnosis process. It implements and evaluates 4 machine learning classification algorithms, namely Decision tree, Random Forest, XGBoost and LGBM. Then selects and introduces the one that performs the best towards its objective using multi-criteria decision-making methods. Its results reveal that Random Forest algorithm outperformed other algorithms with higher accuracy. It also examines the details of features that have a greater effect on diabetes detection. Considering that diabetes is one of the most deadly, disabling, and costly diseases observed today, its alarmingly increasing rates, and difficulty of its diagnosis because of many vague signs and symptoms, utilization of such approach can help doctors increase accuracy of their diagnosis and treatment schemes. Hence, this paper uses the science of data mining as a tool to gather and analyze existing data on diabetes and help doctors with its diagnosis and treatment process. The main contribution of this paper can therefore be its applied nature to an essential field and accuracy of its pattern recognition via several analytical approaches.
This paper considers an accurate and efficient diabetes detection scheme via machine learning. It uses the science of data mining and pattern matching in its diabetes diagnosis process. It implements and evaluates 4 machine learning classification algorithms, namely Decision tree, Random Forest, XGBoost and LGBM. Then selects and introduces the one that performs the best towards its objective using multi-criteria decision-making methods. Its results reveal that Random Forest algorithm outperformed other algorithms with higher accuracy. It also examines the details of features that have a greater effect on diabetes detection. Considering that diabetes is one of the most deadly, disabling, and costly diseases observed today, its alarmingly increasing rates, and difficulty of its diagnosis because of many vague signs and symptoms, utilization of such approach can help doctors increase accuracy of their diagnosis and treatment schemes. Hence, this paper uses the science of data mining as a tool to gather and analyze existing data on diabetes and help doctors with its diagnosis and treatment process. The main contribution of this paper can therefore be its applied nature to an essential field and accuracy of its pattern recognition via several analytical approaches.
Abedian, I., Ayoobi, A., Ghaffary, H., Zabbah, I. (2019). Diagnosis of diabetes by using a data mining method based on native data. Journal of Torbat Heydariyeh University of Medical Sciences, Volume 7, No.1: 1-14
Ahsan, M. M., Luna, S. A., Siddique, Z. (2022). Machine-Learning-Based Disease Diagnosis: A Comprehensive Review. Healthcare. 10(3), 541. https://doi.org/10.3390/healthcare10030541.
Azizi, F., Hadaegh, F. (2015). The upward trend of diabetes and pre-diabetes in Iran. Iranian Journal of Endocrinology and Metabolism. 17 (1) :1-3
Bansal, M., Goyal, A., Choudhary, A. (2022). A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning. Decision Analytics Journal. Volume 3, 100071.
Benbelkacem, S., Atmani B. (2019). Random Forests for Diabetes Diagnosis. International Conference on Computer and Information Sciences (ICCIS). Sakaka, Saudi Arabia, pp. 1-4, doi: 10.1109/ICCISci.2019.8716405.
Chang, V., Bailey, J., Xu, Q. A., Sun, Z. (2022). Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Nature Public Health Emergency Collection. https://doi.org/10.1007/s00521-022-07049-z
Dekamini, F., Ehsanifar, M. (2021). Comparison of the Efficiency of Data Mining Algorithms in Predicting the Diagnosis of Diabetes. Iranian Journal of Diabetes and Metabolism. Vol. 21, No 4.
Faraz, S., Singh ,P. (2022). Diabetes Prediction using Machine Learning. Journal of Applied Science and Education. Vol. 02, Iss. 02, S. No. 003, pp. 1-12.
Febrian, M. E., Ferdinan, F. X., Sendani, G. P., Suryanigrum, K. M., Yunanda, R. (2023). Diabetes prediction using supervised machine learning. 7th International Conference on Computer Science and Computational Intelligence. Procedia Computer Science, Vol 216, Pages 21-30.
Ghosh, P., Azam, S., Karim, A., Hassan, M., Roy, K., Jonkman, M. (2021). A Comparative Study of Different Machine Learning Tools in Detecting Diabetes. Procedia Computer Science. 25th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems. Volume 192, Pages 467-477. https://doi.org/10.1016/j.procs.2021.08.048.
Jaiswal, V., Negi, A., Pal, T. (2021). A review on current advances in machine learning based diabetes prediction. Primary Care Diabetes. Volume 15, Issue 3, Pages 435-443.
Jijo, B. T., Abdulazeez, A. M. (2021). Classification Based on Decision Tree Algorithm for Machine Learning. Journal of Applied Science and Technology Trends. Vol. 02, No. 01, pp. 20)
Kaggle Data Science, Pima Indians Diabetes Database, San Francisco, USA, https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database (Last access: 2023.5.30).
Khanam, J. J., Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express. Volume 7, Issue 4, Pages 432-439.
Maniruzzaman, M., Rahman M. J., Ahammed B., Abedin M. M. (2020). Classification and prediction of diabetes disease using machine learning paradigm. Health Information Science and Systems 8, 7. https://doi.org/10.1007/s13755-019-0095-z.
Mercaldo, F., Nardone, V., Santone, A. (2017). Diabetes Mellitus Affected Patients Classification and Diagnosis through Machine Learning Techniques. Procedia Computer Science. Volume 112, Pages 2519-2528
Moghaddassi, H., Hoseini, A., Asadi, F., Jahanbakhsh, M. (2012). Application of Data Mining in Health. Health Information Management; 9(2): 304.
Mohammed Al-Nussairi, M., Eljinini, M. A. H. (2022). A Hybrid Approach for Enhancing the Classification Accuracy for Diabetes Disease. Journal of Information Technology Research. Volume 15, Issue 1. DOI: 10.4018/JITR.298024.
Mosharrafzadeh, S., Ravaei, B., Koozegar, E. (2021). Diagnosis of Diabetes Using a Random Forest Algorithm. Iranian Journal of Diabetes and Metabolism; Vol. 21, No 2.
Mujumdar, A., Vaidehi, V. (2019). Diabetes Prediction using Machine Learning Algorithms. International Conference on Recent Trends in Advanced Computing (ICRTAC). Procedia Computer Science 165 (2019) 292–299.
Naz, H., Ahuja S. (2020). Deep learning approach for diabetes prediction using PIMA Indian dataset. Journal of Diabetes & Metabolic Disorders. 19(1):391-403 https://doi.org/10.1007/s40200-020-00520-5.
Perdana, A., Hermawan, A., Avianto, D. (2023). Analyze Important Features of PIMA Indian DatabaseFor Diabetes Prediction Using KNN. Journal SISFOKOM (Sistem Informasi dan Komputer), Volume 12, Nomor 01, PP 70-75.
Rajeswari, M., Prabhu, P. (2019). A Review of Diabetic Prediction Using Machine Learning Techniques. International Journal of Engineering and Techniques. Volume 5 Issue 4.
Roy, K., Ahmad, M. et al. (2021). An Enhanced Machine Learning Framework for Type 2 Diabetes Classification Using Imbalanced Data with Missing Values. Complexity. Volume 2021, Article ID 9953314, 21 pages.
Sisodia, D., Sisodia D. S. (2018). Prediction of Diabetes using Classification Algorithms. International Conference on Computational Intelligence and Data Science. Procedia Computer Science vol. 132, 1578–1585.