Ensemble concept drift detection in data stream mining: A Review
محورهای موضوعی : journal of Artificial Intelligence in Electrical Engineering
هادی ترازودار
1
,
کرم الله باقری فرد
2
,
صمد نجاتیان
3
,
Hamid Parvin
4
,
سیه راضیه ملک حسینی
5
1 - گروه مهندسی کامپیوتر ، واحد یاسوج،دانشگاه آزاد اسلامی، یاسوج ، ایران
2 - گروه مهندسی کامپیوتر ، واحد یاسوج، دانشگاه آزاد اسلامی، یاسوج، ایران
3 - دانشگاه آزاد اسلامی واحد یاسوج
4 - Department of Computer Engineering, College of Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran.
5 - گروه مهندسی کامپیوتر ، واحد یاسوج، دانشگاه آزاد اسلامی، یاسوج، ایران
کلید واژه: Data stream mining, concept drift , concept drift detection , ensemble learning,
چکیده مقاله :
In the data-driven era, machine learning plays a vital role in analyzing and processing big data. One of the fundamental challenges in this area is managing conceptual drift in data streams, where the changing distribution of data reduces the accuracy of learning models and makes them ineffective in predicting the future . Traditional classifiers are not expected to learn patterns in non-stationary distributions of data. For any real-time use, the classifier must detect concept drift and adapt over time. Compared with concept drift detection for a data stream, the challenges of ensemble concept drift detection arise from three aspects: first, the training data becomes more complex, Second, the underlying distribution becomes more complex , and third, the correlation between data streams becomes more complex . In this article, we provide a comprehensive review of ensemble concept drift detectors in data stream mining, and also review their techniques, key points, advantages, and limitations .
[1] Gama, J. (2012). A survey on learning from data streams: current and future trends. Progress in Artificial Intelligence, 1, 45-55.
[2] Bahri, M. (2020). Improving IoT data stream analytics using summarization techniques (Doctoral dissertation, Institut Polytechnique de Paris)
[3] Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine learning , 23 , 69-101
[4] Polikar, R. (2012). Ensemble learning. Ensemble machine learning: Methods and applications, 1-34.
[5] Liu, H., Gegov, A., Cocea, M., Liu, H., Gegov, A., & Cocea, M. (2016). Ensemble learning approaches. Rule-based systems for big data: a machine learning approach, 63-73.
[6] Ravichandran, T., Gavahi, K., Ponnambalam, K., Burtea, V., & Mousavi, SJ (2021). Ensemble-based machine learning approach for improved leak detection in water mains. Journal of Hydroinformatics, 23(2), 307-323.
[7] Littlestone, N., & Warmuth, MK (1994). The weighted majority algorithm. Information and computation, 108(2), 212-261.
[8] Street, WN, & Kim, Y. (2001, August). A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 377-382).
[9] Wang, H., Fan, W., Yu, PS, & Han, J. (2003, August). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 226-235).
[10] Kolter, JZ, & Maloof, MA (2007). Dynamic weighted majority: An ensemble method for drifting concepts. The Journal of Machine Learning Research, 8, 2755-2790.
[11] Elwell, R., & Polikar, R. (2009, June). Incremental learning in nonstationary environments with controlled forgetting. In 2009 international joint conference on neural networks (pp. 771-778). IEEE.
[12] Gama, J., Medas, P., Castillo, G., & Rodrigues, P. (2004). Learning with drift detection. In Advances in Artificial Intelligence–SBIA 2004: 17th Brazilian Symposium on Artificial Intelligence, Sao Luis, Maranhao, Brazil, September 29-October 1, 2004..
[13] Nishida, K., & Yamauchi, K. (2007, October). Detecting concept drift using statistical testing. In International conference on discovery science (pp. 264-269). Berlin, Heidelberg: Springer Berlin Heidelberg
[14] Bifet, A., & Gavalda, R. (2009). Adaptive learning from evolving data streams. In Advances in Intelligent Data Analysis VIII: 8th International Symposium on Intelligent Data Analysis, IDA 2009, Lyon, France,
[15] Ditzler, G., & Polikar, R. (2011, April). Hellinger distance based drift detection for nonstationary environments. In 2011 IEEE symposium on computational intelligence in dynamic and uncertain environments (CIDUE) (pp. 41-48). IEEE
[16] Minku, LL, & Yao, X. (2011). DDD: A new ensemble approach for dealing with concept drift. IEEE transactions on knowledge and data engineering, 24(4), 619-633.
[17] Brzeziński, D., & Stefanowski, J. (2011, May). Accuracy updated ensemble for data streams with concept drift. In International conference on hybrid artificial intelligence systems (pp. 155-163). Berlin, Heidelberg: Springer Berlin Heidelberg.
[18] Ang, HH, Gopalkrishnan, V., Zliobaite, I., Pechenizkiy, M., & Hoi, SC (2012). Predictive handling of asynchronous concept drifts in distributed environments. IEEE Transactions on Knowledge and Data Engineering, 25(10), 2343-2355.
[19] Woźniak, M., Kasprzak, A., & Cal, P. (2013). Weighted aging classifier ensemble for the incremental drifted data streams. In Flexible Query Answering Systems: 10th International Conference, FQAS 2013, Granada, Spain, September 18-20, 2013. Proceedings 10 (pp. 579-588). Springer Berlin Heidelberg.
[20] Brzezinski, D., & Stefanowski, J. (2013). Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE transactions on neural networks and learning systems, 25(1), 81-94
[21] Brzezinski, D., & Stefanowski, J. (2014). Combining block-based and online methods in learning ensembles from concept drifting data streams. Information Sciences, 265, 50-67.
[22] Liao, J. W., & Dai, B. R. (2014, May). An ensemble learning approach for concept drift. In 2014 International Conference on Information Science & Applications (ICISA) (pp. 1-4). IEEE.
[23] Maciel, BIF, Santos, SGTC, & Barros, RSM (2015, November). A lightweight concept drift detection ensemble. In 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 1061-1068). IEEE
[24] Soares, SG, & Araújo, R. (2015). An on-line weighted ensemble of regressor models to handle concept drifts. Engineering Applications of Artificial Intelligence, 37, 392-406.
[25] Sidhu, P., & Bhatia, MPS (2015). An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection. International Journal of Machine Learning and Cybernetics, 6(6), 883-909
[26] Li, P., Wu, X., Hu, X., & Wang, H. (2015). Learning concept-drifting data streams with random ensemble decision trees. Neurocomputing, 166, 68-83.
[27] Mirza, B., Lin, Z., & Liu, N. (2015). Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing, 149, 316-329.
[28] Mejri, D., Limam, M., & Weihs, C. (2016). Monitoring a Dynamic Weighted Majority Method Based on Datasets with Concept Drift. In Analysis of Large and Complex. Data (pp. 241-250). Springer International Publishing.
[29] Dehghan, M., Beigy, H., & ZareMoodi, P. (2016). A novel concept drift detection method in data streams using ensemble classifiers. Intelligent Data Analysis, 20(6), 1329-1350. ,
[30] Haque, A., Khan, L., Baron, M., Thuraisingham, B., & Aggarwal, C. (2016, May). Efficient handling of concept drift and concept evolution over stream data. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE) (pp. 481-492). IEEE. ,
[31] Gomes, HM, Barddal, JP, Enembreck, F., & Bifet, A. (2017). A survey on ensemble learning for data stream classification. ACM Computing Surveys (CSUR), 50(2), 1-36.
[32] Ren, S., Liao, B., Zhu, W., & Li, K. (2018). Knowledge-maximized ensemble algorithm for different types of concept drift. Information Sciences, 430, 261-281
[33] Sidhu, P., & Bhatia, MPS (2019). A two ensemble system to handle the concept of drifting data streams: recurring dynamic weighted majority. International journal of machine learning and cybernetics, 10, 563-578. ,
[34] Idrees, MM, Minku, LL, Stahl, F., & Badii, A. (2020). A heterogeneous online learning ensemble for non-stationary environments. Knowledge-based systems, 188, 104983
[35] da Silva, BLS, Inaba, FK, Salles, EOT, & Ciarelli, PM (2020). Fast deep stacked networks based on extreme learning machine applied to regression problems. Neural Networks, 131, 14-28
[36] Museba, T., Nelwamondo, F., Ouahada, K., & Akinola, A. (2021). Recurrent adaptive classifier ensemble for handling recurring concept drifts. Applied Computational Intelligence and Soft Computing, 2021(1), 5533777.
[37] Mahdi, O. A., Pardede, E., & Ali, N. (2021). A hybrid block-based ensemble framework for the multi-class problem to react to different types of drifts. Cluster Computing, 24(3), 2327-2340.
[38] Komorniczak, J., Zyblewski, P., & Ksieniewicz, P. (2022). Statistical drift detection ensemble for batch processing of data streams. Knowledge-Based Systems, 252, 109380.
[39] Xu, L., Ding, X., Peng, H., Zhao, D., & Li, X. (2023). Adtcd: An adaptive anomaly detection approach towards the concept of dr.
[40] Tanha, J., Samadi, N., Abdi, Y., & Razzaghi-Asl, N. (2022). CPSSDS: conformal prediction for semi-supervised classification on data streams. Information Sciences, 584, 212-234.
[41] da Silva, BLS, & Ciarelli, PM (2024). A fast online stacked regressor to handle concept drifts. Engineering Applications of Artificial Intelligence, 131, 107757.
[42] Cai, S., Zhao, Y., Hu, Y., Wu, J., Wu, J., Zhang, G., ... & Sosu, RNA (2024). CD-BTMSE: A Concept Drift detection model based on Bidirectional Temporal Convolutional Network and Multi-Stacking Ensemble learning. Knowledge-Based Systems, 294, 111681
[43] Arora, S., Rani, R., & Saxena, N. (2024). SETL: a transfer learning based dynamic ensemble classifier for concept drift detection in streaming data. Cluster Computing , 27 (3), 3417-3432.
[44] Deng, D., Shen, W., Deng, Z., Li, T., & Liu, A. (2025). An Ensemble Learning Model Based on Three-Way Decision for Concept Drift Adaptation. Tsinghua Science and Technology , 30 (5), 2029-2047.