A novel and Intelligent Ensemble Framework for Real-Time Detection and Adaptation to Concept Drift in Data Streams Using Incremental Decision Trees
Subject Areas : Computer Engineering and IT
هادی ترازودار
1
,
کرم الله باقری فرد
2
,
صمد نجاتیان
3
,
حمید پروین
4
,
سیه راضیه ملک حسینی
5
1 -
2 -
3 -
4 -
5 -
Keywords: Concept Drift, Data Stream, Incremental Decision Tree, Hoeffding, Ensemble Learning,
Abstract :
Learning from real-time data has been increasingly considered over the past decade. The change in data distribution in online learning, known as concept drift, reduces the accuracy of learning models and makes them ineffective in future predictions. This research aims to design and develop a novel ensemble incremental decision tree algorithm that is capable of detecting concept drift and automatically adapting to changes in data distribution. To achieve this goal, a new architecture of ensemble incremental decision tree is presented that uses an adaptive probabilistic sampling strategy to continuously monitor the pattern of data changes and automatically and in real time performs structural updates in the decision tree. Unlike traditional methods that respond reactively to changes, this approach has an active monitoring mechanism that enables early detection of concept drift by tracking changes in the model error function. In this way, the proposed model is able to maintain high accuracy even in streaming data scenarios with irregular changes. Extensive experiments were conducted on the dataset and the results show that the proposed method performs better than existing methods in several evaluation criteria including accuracy, recall, and precision.
[1] Quintana, D., Suárez-Cetrulo, L., & Cervantes, A. (2022) "A survey on machine learning for recurring concept drifting data streams." Expert Systems with Applications, 118934. [DOI: 10.1016/j.eswa.2022.118934]
[2] Žliobaitė, R. (2019). Vyresnio amžiaus žmonių informacijos apdorojimo greičio, atminties ir vykdomųjų funkcijų sąsajos su subjektyviais kognityviniais nusiskundimais ir depresiškumu (Doctoral dissertation, Vilniaus universitetas.).
[3] Hoeffding, W. (1994). Probability inequalities for sums of bounded random variables. The collected works of Wassily Hoeffding, 409-426.
[4] Gama, J., P. Medas, G. Castillo, and P. Rodrigues (2004). Learning with drift detection. In SBIA Brazilian Symposium on Artificial Intelligence, pp. 286–295. Springer
[5] Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2018). Learning under concept drift: A review. IEEE transactions on knowledge and data engineering, 31(12), 2346-2363.
[6] Amin, M., Al-Obeidat, F., Tubaishat, A., Shah, B., Anwar, S., & Tanveer, T. A. (2023). Cyber security and beyond: Detecting malware and concept drift in AI-based sensor data streams using statistical techniques. Computers and Electrical Engineering, 108, 108702.
[7] Ko, A. H., Sabourin, R., & Britto Jr, A. S. (2008). From dynamic classifier selection to dynamic ensemble selection. Pattern recognition, 41(5), 1718-1731.
[8] Ikonomovska, E., Gama, J., Sebastião, R., & Gjorgjevik, D. (2009). Regression trees from data streams with drift detection. In Discovery Science: 12th International Conference, DS 2009, Porto, Portugal, October 3-5, 2009 12 (pp. 121-135). Springer Berlin Heidelberg.
[9] Bifet, A., & Gavalda, R. (2009). Adaptive learning from evolving data streams. In Advances in Intelligent Data Analysis VIII: 8th International Symposium on Intelligent Data Analysis, IDA 2009, Lyon, France, August 31-September 2, 2009. Proceedings 8 (pp. 249-260). Springer Berlin Heidelberg.
[10] Xu, Y., Xu, R., Yan, W., & Ardis, P. (2017, May). Concept drift learning with alternating learners. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 2104-2111). IEEE.
[11] Pratama, M., Ashfahani, A., & Hady, A. (2019, December). Weakly supervised deep learning approach in streaming environments. In 2019 IEEE International Conference on Big Data (Big Data) (pp. 1195-1202). IEEE
[12] Pratama, M., Pedrycz, W., & Webb, G. I. (2019). An incremental construction of deep neuro fuzzy system for continual learning of nonstationary data streams. IEEE Transactions on Fuzzy Systems, 28(7), 1315-1328.
[13] Das, M., Pratama, M., Savitri, S., & Zhang, J. (2019, November). Muse-rnn: A multilayer self-evolving recurrent neural network for data stream classification. In 2019 IEEE International Conference on Data Mining (ICDM) (pp. 110-119). IEEE.
[14] Pratama, M., Za’in, C., Lughofer, E., Pardede, E., & Rahayu, D. A. (2021). Scalable teacher forcing network for semi-supervised large scale data streams. Information Sciences, 576, 407-431.
[15] Komorniczak, J., Zyblewski, P., & Ksieniewicz, P. (2022). Statistical drift detection ensemble for batch processing of data streams. Knowledge-Based Systems, 252, 109380.
[16] Yu, H., Liu, W., Lu, J., Wen, Y., Luo, X., & Zhang, G. (2023). Detecting group concept drift from multiple data streams. Pattern Recognition, 134, 109113.
[17] Tanha, J., Samadi, N., Abdi, Y., & Razzaghi-Asl, N. (2022). CPSSDS: Conformal prediction for semi-supervised classification on data streams. Information Sciences, 584, 212-234.
[18] da Silva, B. L. S., & Ciarelli, P. M. (2024). A fast online stacked regressor to handle concept drifts. Engineering Applications of Artificial Intelligence, 131, 107757.
[19] Cai, S., Zhao, Y., Hu, Y., Wu, J., Wu, J., Zhang, G., ... & Sosu, R. N. A. (2024). CD-BTMSE: A Concept Drift detection model based on Bidirectional Temporal Convolutional Network and Multi-Stacking Ensemble learning. Knowledge-Based Systems, 294, 111681.
[20] Arora, S., Rani, R., & Saxena, N. (2024). SETL: a transfer learning based dynamic ensemble classifier for concept drift detection in streaming data. Cluster Computing, 27(3), 3417-3432.
[21] Deng, D., Shen, W., Deng, Z., Li, T., & Liu, A. (2025). An Ensemble Learning Model Based on Three-Way Decision for Concept Drift Adaptation. Tsinghua Science and Technology, 30(5), 2029-2047.
[22] Kumar, A., Kaur, P., & Sharma, P. (2015). A survey on Hoeffding tree stream data classification algorithms. CPUH-Res. J, 1(2), 28-32.
[23] Banar, F., Tabatabaei, A., & Saleh, M. (2023, May). Stream Data Classification with Hoeffding Tree: An Ensemble Learning Approach. In 2023 9th International Conference on Web Research (ICWR) (pp. 208-213).
[24] Svoboda R et al (2023) A natural gas consumption forecasting system for continual learning scenarios based on Hoeffding trees with change point detection mechanism. arXiv preprint. arXiv:2309
[25] Gonçalves Jr, P. M., de Carvalho Santos, S. G., Barros, R. S., & Vieira, D. C. (2014). A comparative study on concept drift detectors. Expert Systems with Applications, 41(18), 8144-8156.
[26] Weinberg, A. I., & Last, M. (2023). Enhat—synergy of a tree-based ensemble with hoeffding adaptive tree for dynamic data streams mining. Information Fusion, 89, 397-404.
[27] Ouyang, Z., Zhou, M., Wang, T., & Wu, Q. (2009, November). Mining concept-drifting and noisy data streams using ensemble classifiers. In 2009 International Conference on Artificial Intelligence and Computational Intelligence (Vol. 4, pp. 360-364). IEEE
[28] Lucas, J. M., & Saccucci, M. S. (1990). Exponentially weighted moving average control schemes: properties and enhancements. Technometrics, 32(1), 1-12.
[29] Ikonomovska, E., & Gama, J. (2008, October). Learning model trees from data streams. In International Conference on Discovery Science (pp. 52-63). Berlin, Heidelberg: Springer Berlin Heidelberg.
[30] Ikonomovska E, Gama J, Džeroski S. (2011).Learning model trees from evolving data streams. Data Mining and Knowledge Discovery 2011, 23: 128–168
[31] Gomes, H. M., Barddal, J. P., Enembreck, F., & Bifet, A. (2017). A survey on ensemble learning for data stream classification. ACM Computing Surveys (CSUR), 50(2), 1-36.
[32] Ikonomovska, E., Gama, J., & Džeroski, S. (2015). Online tree-based ensembles and option trees for regression on evolving data streams. Neurocomputing, 150, 458-470
[33] Gomes, H. M., Barddal, J. P., Ferreira, L. E. B., & Bifet, A. (2018, April). Adaptive random forests for data stream regression. In ESANN.
[34] Kumar, M., Khan, S. A., Bhatia, A., Sharma, V., & Jain, P. (2023, February). Machine learning algorithms: A conceptual review. In 2023 1st International Conference on Intelligent Computing and Research Trends (ICRT) (pp. 1-7). IEEE.
[35] Zhong, Y., Zhou, J., Li, P., & Gong, J. (2023). Dynamically evolving deep neural networks with continuous online learning. Information Sciences, 646, 119411.
[36] Wu, Y., Liu, L., Yu, Y., Chen, G., & Hu, J. (2023). An Adaptive Ensemble Framework for Addressing Concept Drift in IoT Data Streams. Authorea Preprints.
[37] Liu, Wenzheng, et al. "An Adaptive Hoeffding Tree Model Based on Differential Entropy and Relative Entropy for Concept Drift Detection." 2024 International Joint Conference on Neural Networks (IJCNN). IEEE, 2024.
[38] Gama J, Rocha R, Medas P.(2003). Accurate decision trees for mining high-speed data streams. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Washington DC: ACM; 2003, 523–528.
[39] Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and computation, 108(2), 212-261.