مقایسه عملکرد مدل‌های داده‌کاوی در پیش‌بینی بارش باران با استفاده از رویکرد دسته‌بندی (مطالعه موردی: ایستگاه هواشناسی سینوپتیک فرودگاه همدان)

صالحی سربیژن, مرتضی; دزفولیان, حمید رضا

doi:10.30495/wsrcj.2024.75668.11411

کد مقاله : WSRCJ-2310-11411 (R2) بازدید : 401 صفحه: 113 - 126

10.30495/wsrcj.2024.75668.11411

نوع مقاله: پژوهشی

مقایسه عملکرد مدل‌های داده‌کاوی در پیش‌بینی بارش باران با استفاده از رویکرد دسته‌بندی (مطالعه موردی: ایستگاه هواشناسی سینوپتیک فرودگاه همدان)

محورهای موضوعی : کاربرد کامپیوتر در مسائل آب و خاک

مرتضی صالحی سربیژن ^{1
*} , حمید رضا دزفولیان ²

1 - استادیار، گروه مهندسی مکانیک، دانشکده فنی و مهندسی، دانشگاه زابل، زابل، ایران.
2 - استادیار، گروه مهندسی صنایع، دانشکده فنی و مهندسی، دانشگاه بوعلی سینا، همدان، ایران.

تاریخ دریافت : 1402/07/20 تاریخ پذیرش : 1402/10/27 تاریخ انتشار : 1402/10/01

کلید واژه: شبکه عصبی مصنوعی, مدل K نزدیک‌ترین همسایگی, ماشین بردار پشتیبان, پیش‌بینی بارش باران, مدل‌های درخت تصمیم,

چکیده مقاله :

زمینه و هدف: بارندگی یکی از پدیده‌های پیچیده طبیعی و از مهم‌ترین اجزای چرخه آب بوده و در سنجش خصوصیات اقلیمی هر منطقه نقش بسیار مهمی ایفا می‌کند. شناخت میزان و روند تغییرات بارش به‌عنوان یکی از عناصر مهم هواشناسی، از یک‌سو جهت داشتن مدیریت اثربخش و برنامه‌ریزی دقیق‌تر برای بخش‌های کشاورزی، اقتصادی و اجتماعی و از سوی دیگر برای مطالعاتی مانند رواناب‌ها، خشک‌سالی‌ها، وضعیت آب‌های زیرزمینی و سیلاب‌ها ضروری است. همچنین پیش‌بینی بارش در مناطق شهری تأثیر بسیار زیادی بر کنترل ترافیک، جریان فاضلاب‌ها و فعالیت‌های ساخت‌وساز دارد. روش پژوهش: هدف این مطالعه مقایسه دقت مدل‌های کلاس‌بندی درخت تصمیم (چاید (CHAID)، درخت تصمیم C5، نیو بیزین (NB)، کوئست (Quest) و جنگل تصادفی)، k نزدیک‌ترین همسایگی (KNN)، ماشین بردار پشتیبان (SVM) و شبکه عصبی مصنوعی (ANN) جهت پیش‌بینی وقوع بارش باران با استفاده از داده‌های یک دوره 50 ساله در ایستگاه سینوپتیک فرودگاه همدان است. در این مطالعه از 80 درصد داده‌ها جهت آموزش و از 20 درصد داده‌ها جهت صحت سنجی مدل‌ها استفاده‌شده و نتایج حاصل از اجرای مدل‌ها با استفاده از معیارهای ماتریس درهم‌ریختگی (اغتشاش)، منحنی ROC و شاخص AUC مقایسه شدند. برای ساخت متغیر کلاس‌بندی داده‌های بارش و عدم بارش، با توجه به داده‌های بارش، روزهای سال در دو کلاس روزهای وقوع بارش (y) و روزهای عدم وقوع بارش (n) دسته‌بندی شدند. در این تحقیق پیش‌پردازش داده‌ها با استفاده از پیش‌پردازش خودکار داده‌ها (ADP) انجام شده و آنگاه کاهش ابعاد متغیرها از روش PCA استفاده شد. یافته‌ها: در این مطالعه با توجه به روش PCA ابعاد متغیرها به 5 بعد کاهش یافت. همچنین از داده‌های موجود تقریباً 80 درصد، روزها بدون بارش و 20 درصد روزها با بارش هستند. نتایج تحقیق نشان داد که مدل KNN با معیار صحت 9/91 برای داده‌های آموزشی و مدل SVM، 13/89 درصد برای داده‌های آزمون بهترین عملکرد را بین مدل‌های داده‌کاوی داشتند. شاخص AUC مدل KNN برابر 97/0 در داده‌های آموزشی و در داده‌های آزمون مقدار 94/0 برای الگوریتم SVM به دست آمد. همچنین با توجه به منحنی عملکرد سیستم (ROC) برای داده‌های بارش همدان مدل KNN نسبت به سایر مدل‌ها عملکرد بهتری را دارا می‌باشد. توجه به شاخص حساسیت در ماتریس اغتشاش، مدل‌های KNN و SVM در پیش‌بینی عدم وقوع بارش برای داده‌های آموزش بهتر عمل کردند. با توجه به شاخص خاصیت در پیش‌بینی وقوع بارش مدل‌های RT و KNN نتایج بهتری داشتند. نتایج: نتایج تحقیق نشان داد که در داده‌های آموزش مقدار معیار صحت برای مدل‌های RT، C5، ANN، SVM، BN،KNN ، CHAID و QUEST به ترتیب 82/86، 78/89، 55/89، 96/89، 06/88، 9/91، 29/88 و 46/87 بدست آمده اند. همچنین این معیار در داده‌های آزمون برای این مدل‌ها به ترتیب 2/83، 9/87، 12/88، 13/89، 12/87، 19/88، 93/86 و 76/86 به دست آمد. مقدار شاخص AUC در داده‌های آموزش برای مدل‌های RT، C5، ANN، SVM، BN،KNN ، CHAID و QUEST به ترتیب 94/0، 92/0، 94/0، 94/0، 93/0، 97/0، 93/0 و 89/0 به دست آمد. همچنین این معیار در داده‌های آزمون برای این مدل‌ها به ترتیب 89/0، 89/0، 93/0، 94/0، 92/0، 90/0، 92/0 و 88/0 برآورد شد. همان‌طور که مشاهده شد، با توجه به معیارهای صحت و شاخص AUC در داده‌های آموزش مدل KNN و با توجه به داده‌های آزمون مدل SVM کارا تر در پیش‌بینی بارش باران بودند.

چکیده انگلیسی:

Background and Aim: Rainfall is one of the complex natural phenomena and one of the most crucial component of the water cycle, playing a significant role in assessing the climatic characteristics of each region. Understanding the amount and trends of rainfall changes is essential for effective management and more precise planning in agricultural, economic, and social sectors, as well as for studies related to runoff, droughts, groundwater status, and floods. Additionally, rainfall prediction in urban areas has a significant impact on traffic control, sewage flow, and construction activities. Method: The objective of this study is to compare the accuracy of classification models, including Chi-squared Automatic Interaction Detector (CHAID), C5 decision tree, Naive Bayes (NB), Quest tree, and Random Forest, k-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Artificial Neural Network (ANN) in predicting rainfall occurrence using 50 years of data from the synoptic station at Hamedan Airport. In this study, 80% of the data is used for training the models, and 20% for model validation and the results obtained from the model executions are compared using metrics such as confusion matrix, Receiver Operating Characteristic (ROC) curve, and the Area Under the Curve (AUC) index. To create the classification variable for rainfall and non-rainfall data, based on rainfall data, the days of the year are categorized into two classes: days with rainfall (y) and days without rainfall (n). Data preprocessing is performed using Automatic Data Preprocessing (ADP). Then, Principal Component Analysis (PCA) is employed to reduce the dimensions of the variables. Results: In this study, the PCA method reduces the dimensions of the variables to 5. Also, approximately 80% of the available data corresponds to rainless days, while 20% corresponds to rainy days. The research results indicated that the KNN model with an accuracy of 91.9% for training data and the SVM model with 89.13% for test data exhibit the best performance among the data mining models. The AUC index for the KNN model is 0.967 for training data and 0.935 for test data, while for the SVM algorithm, it is 0.967 for training data and 0.935 for test data. According to the ROC curve for Hamedan rainfall data, the KNN model outperforms other models. Considering the sensitivity index in the confusion matrix, the KNN and SVM models perform better in predicting non-rainfall occurrence for training data. In terms of the precipitation occurrence prediction, the RT and KNN models show better results according to the specificity index. Conclusion: The results demonstrated that for the RT, C5, ANN, SVM, BN, KNN, CHAID, QUEST, accuracy metrics was obtained 86.82%, 89.78%, 89.55%, 89.96%, 88.06%, 91.9%, 88.29%, 87.46%, 91.9%, respectively for training data. Moreover, for test data, the accuracy metrics for this model was obtained 83.82%, 87.9%, 88.12%, 89.13%, 87.12%, 89.13%, 87.12%, 88.19%, 86.93%, 86.76%, respectively. The AUC index in the training data for RT, C5, ANN, SVM, BN, KNN, CHAID QUEST models was 0.94%, 0.99%, 0.94%, 0.94%, 0.93%, 0.97%, 0.93%, 0.89%, respectively. In addition, for the test data, this metric was evaluated 0.89%, 0.89%, 0.93%, 0.94%, 0.92%, 0.90%, 0.92%, 0.88% respectively. As observed, considering accuracy metric and AUC index for training data KNN model and for test data SVM model were more sufficient in rainfall prediction.

منابع و مأخذ:

Adaryani, F. R., Jamshid Mousavi, S., & Jafari, F. (2022). Short-term rainfall forecasting using machine learning-based approaches of PSO-SVR, LSTM and CNN. Journal of Hydrology, 614, 128463. doi: https://doi.org/10.1016/j.jhydrol.2022.128463.

Aftab, S., Ahmad, M., Hameed, N., Bashir, S., Ali, I., & Nawaz, Z. (2018). Rainfall Prediction in Lahore City using Data Mining Techniques. International Journal of Advanced Computer Science and Applications, 9. doi: 10.14569/IJACSA.2018.090439.

Alavi, A., Gandomi, A., Mollahassani, A., Heshmati R, A. A., & rashed, a. (2010). Modeling of maximum dry density and optimum moisture content of stabilized soil using artificial neural networks. Journal of Plant Nutrition and Soil Science, 173, 368-379. doi: 10.1002/jpln.2008002.

Alcantara, A. L., & Ahn, K.-H. (2021). Future flood riverine risk analysis considering the heterogeneous impacts from tropical cyclone and non-tropical cyclone rainfalls: Application to daily flows in the Nam River Basin, South Korea. Advances in Water Resources, 154, 103983. doi: https://doi.org/10.1016/j.advwatres.2021.103983.

Anderson, C. J., Cadeddu, R., Anderson, D. N., Huxford, J. A., VanLuik, E. R., Odeh, K., & Bortolato, M. (2023). A novel naïve Bayes approach to identifying grooming behaviors in the force-plate actometric platform. Journal of Neuroscience Methods, 110026. doi: https://doi.org/10.1016/j.jneumeth.2023.110026.

Bagirov, A. M., Mahmood, A., & Barton, A. (2017). Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach. Atmospheric Research, 188, 20-29. doi: https://doi.org/10.1016/j.atmosres.2017.01.003.

Bahrami, M., Amiri, M. J., Rezaei Maharluei, F., & Ghaffari, K. A. (2017). Data Pre-Processing Effects on the Artificial Neural Network Performance to Predict Monthly Rainfall (Case Study: Abadeh County). Iranian journal of Ecohydrology, 4(1), 29-37. doi: 10.22059/ije.2017.60880. [in Persian]

Bhattacharya, B., & Solomatine, D. P. (2005). Neural networks and M5 model trees in modelling water level–discharge relationship. Neurocomputing, 63, 381-396. doi: https://doi.org/10.1016/j.neucom.2004.04.016.

Cramer, S., Kampouridis, M., Freitas, A., & Alexandridis, A. (2017). An extensive evaluation of seven machine learning methods for rainfall prediction in weather derivatives. Expert Systems with Applications, 85. doi: 10.1016/j.eswa.2017.05.029.

Danandeh Mehr, A., Nourani, V., Khosrowshahi, V., & Ghorbani, M. A. (2018). A hybrid support vector regression–firefly model for monthly rainfall forecasting. International Journal of Environmental Science and Technology, 16, 1-12. doi: 10.1007/s13762-018-1674-2.

Dastourani, M. T., Habibipoor, A., Ekhtesasi, M. R., Talebi, A., & Mahjoobi, J. (2013). Evaluation of the Decision Tree Model in Precipitation Prediction (Case study: Yazd Synoptic Station). Iran-Water Resources Research, 8(3), 14-27. [in Persian]

Fahad, S., Su, F., Khan, S. U., Naeem, M. R., & Wei, K. (2023). Implementing a novel deep learning technique for rainfall forecasting via climatic variables: An approach via hierarchical clustering analysis. Science of The Total Environment, 854, 158760. doi: https://doi.org/10.1016/j.scitotenv.2022.158760.

Haidar, A., & Verma, B. (2018). A novel approach for optimizing climate features and network parameters in rainfall forecasting. Soft Computing, 22. doi: 10.1007/s00500-017-2756-7.

He, S., Li, Z., & Liu, X. (2023). An improved GEV boosting method for imbalanced data classification with application to short-term rainfall prediction. Journal of Hydrology, 617.128882. doi: https://doi.org/10.1016/j.jhydrol.2022.128882.

Josso, P., Hall, A., Williams, C., Le Bas, T., Lusty, P., & Murton, B. (2023). Application of random-forest machine learning algorithm for mineral predictive mapping of Fe-Mn crusts in the World Ocean. Ore Geology Reviews, 162, 105671. doi: https://doi.org/10.1016/j.oregeorev.2023.105671.

Kisi, O., Genc, O., Dinc, S., & Zounemat-Kermani, M. (2016). Daily pan evaporation modeling using chi-squared automatic interaction detector, neural networks, classification and regression tree. Computers and Electronics in Agriculture, 122, 112-117. doi: https://doi.org/10.1016/j.compag.2016.01.026.

Lee, S., Bae, J. H., Hong, J., Yang, D., Panagos, P., Borrelli, P., & Lim, K. J. (2022). Estimation of rainfall erosivity factor in Italy and Switzerland using Bayesian optimization based machine learning models. CATENA, 211, 105957. doi: https://doi.org/10.1016/j.catena.2021.105957.

Mahtabi, G., Taran, F., & Mozafari, S. (2018). Prediction of daily rainfall using meteorological data of previous days (case study: Isfahan city). Journal of Physical Geography, 11(39), 99-114. [in Persian]

Mallika, M., & Nirmala, M. (2018). An environmental study on forecasting rainfall using data mining technique and ARIMA model: An integrated approach. Ekoloji, 27, 1133-1141.

Markuna, S., Kumar, P., Ali, R., Vishwakarma, D., Kushwaha, K., Kumar, R., & Kuriqi, A. (2023). Application of Innovative Machine Learning Techniques for Long-Term Rainfall Prediction. Pure and Applied Geophysics, 180. doi: 10.1007/s00024-022-03189-4.

Mishra, N., Soni, H., Sharma, S., & Upadhyay, A. (2018). Development and Analysis of Artificial Neural Network Models for Rainfall Prediction by Using Time-Series Data. International Journal of Intelligent Systems and Applications, 10, 16-23. doi: 10.5815/ijisa.2018.01.03.

Modaresi, F., & Araghinejad, S. (2014). A Comparative Assessment of Support Vector Machines, Probabilistic Neural Networks, and K-Nearest Neighbor Algorithms for Water Quality Classification. Water Resources Management, 28(12), 4095-4111. doi: 10.1007/s11269-014-0730-z.

Thanushkodi, N. K. (2010). An Improved k-Nearest Neighbor Classification Using Genetic Algorithm. International Journal of Computer Science Issues, 7.

Pang, S.-l,.& Gong, J. z. (2009). C5.0 Classification Algorithm and Application on Individual Credit Evaluation of Banks. Systems Engineering - Theory & Practice, 29, 94-104. doi: 10.1016/S1874-8651(10)60092-0.

Ritschard, G. (2010). CHAID and earlier supervised tree methods.

Sattari, M. T., Joudi, A. R., & Kusiak, A. (2016). Estimation of Water Quality Parameters With Data-Driven Model. Journal AWWA, 108(4), E232-E239. doi: https://doi.org/10.5942/jawwa.2016.108.0012.

Shin, K.-s., Lee, T., & Kim, H.-j. (2005). Shin, K.S.: An Application of Support Vector Machines in Bankruptcy Prediction Model. Expert Systems and Applications 28, 127-135. Expert Systems with Applications, 28, 127-135. doi: 10.1016/j.eswa.2004.08.009.

Singh, P. (2018). Indian summer monsoon rainfall (ISMR) forecasting using time series data: A fuzzy-entropy-neuro based expert system. Geoscience Frontiers, 9(4), 1243-1257. doi: https://doi.org/10.1016/j.gsf.2017.07.011.

Zarei, M., Zandi, R., & Naemitabar, M. (2022). Assessment of Flood Occurrence Potential using Data Mining Models of Support Vector Machine, Chaid and Random Forest (Case study: Frizi watershed). journal of watershed management research, 13(25), 133-144. doi: 10.52547/jwmr.13.25.133.

_||_

مقالات مرتبط

ارزیابی سطوح مختلف ورودی نرم افزار Rosetta در برآورد برخی ویژگی‌های هیدرولیکی خاک
تاریخ چاپ : 1403/03/16
بررسی تغییرات کاربری ارضی و مولفه‌های اقلیمی در شهرستان مشکین شهر
تاریخ چاپ : 1402/01/01
بررسی رطوبت سطح خاک شهرستان اردبیل با استفاده داده‌های ماهواره‌ای لندست 8 و سنتیل 1
تاریخ چاپ : 1401/04/01
برآورد فرسايش‌پذيري ذاتي خاک در برابر باد به کمک الگوريتم ژنتيک در ترکيب با شبکه عصبي مصنوعي
تاریخ چاپ : 1403/03/16
بررسی اثربخشی داده های ماهواره ای سنجنده های TM ماهواره های لندست 5 و سنجنده های OLI و TIRS لندست 8 در پایش اثرات خشکسالی بر منابع آب سطحی تالاب میانکاله
تاریخ چاپ : 1403/09/21
قابليت شاخص‌هاي VCADI، TSDI و TVDI در برآورد خشکسالي اراضي زراعي روستاي حصار شهرستان ماهنشان
تاریخ چاپ : 1403/07/11

اشتراک گذاری

آدرس مقاله

مقایسه عملکرد مدل‌های داده‌کاوی در پیش‌بینی بارش باران با استفاده از رویکرد دسته‌بندی (مطالعه موردی: ایستگاه هواشناسی سینوپتیک فرودگاه همدان)