An Efficient Approach for Multi-Label Streaming Feature Selection
Subject Areas : Machine learning
Azar Rafie
1
,
Parham Moradi
2
*
1 - Department of Computer Engineering, Shahrekord Branch, Islamic Azad University, Shahrekord, Iran
2 - School of engineering, RMIT University Melbourne, Astralia
Keywords: Streaming multi-label data, feature selection, mutual information, redundancy, Relevancy.,
Abstract :
With the rapid growth of multi-label streaming data, efficient feature selection becomes a critical challenge. Traditional methods often struggle to handle the dynamic nature of continuously arriving data. This paper introduces OSM-MI, a novel online feature selection method designed for multi-label streaming datasets. OSM-MI uses mutual information to dynamically select features, minimizing redundancy and maximizing relevance. The method is compared with existing algorithms, including OM-NRS, OMGFS, and MUCO, across several datasets such as Yeast, Medical, Scene, Enron, and others. Experimental results show that OSM-MI outperforms the other methods in terms of accuracy, precision, and efficiency, while also maintaining lower execution times. Statistical significance is confirmed through the Wilcoxon test, demonstrating OSM-MI's robustness for real-time multi-label classification. This work provides an efficient, scalable solution for feature selection in streaming environments.
[1] S. Gilpin, B. Qian, and I. Davidson, “Efficient hierarchical clustering of large high dimensional datasets,” in Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, San Francisco, California, USA, 2013, pp. 1371-1380. https://doi.org/10.1145/2505515.2505527
[2] J. Dai, W. Chen, and Y. Qian, “Multi-label feature selection with missing features via implicit label replenishment and positive correlation feature recovery,” IEEE Transactions on Knowledge and Data Engineering, 2025. 10.1109/TKDE.2025.3536080
[3] A. RAFIEI, P. MORADI, and A. Ghaderzadeh, “Multi-Label Feature Selection Using a Hybrid Approach Based on the Particle Swarm Optimization Algorithm,” 2023. 20.1001.1.16823745.1401.20.4.7.7
[4] P. Kiyoumarsi, F. Kiyoumarsi, B. Z. Dehkordi, and M. Karbasiyoun, “A Feature Selection Method on Gene Expression Microarray Data for Cancer Classification Abstract,” Journal of Optimization in Soft Computing, vol. 2, no. 3, pp. 35-44, 2024. https://doi.org/10.82553/josc.2024.140308101189068
[5] J. Abdollahi, B. Nouri-Moghaddam, N. Mikaeilvand, S. J. Gudakahriz, A. Khosravani, and A. Mirzaei, “A Review of Feature Selection,” Journal of Optimization in Soft Computing, vol. 2, no. 4, pp. 16-20, 2025. https://doi.org/10.82553/josc.2025.140309071191740
[6] W. Ding, T. F. Stepinski, Y. Mu, L. Bandeira, R. Ricardo, Y. Wu, Z. Lu, T. Cao, and X. Wu, “Subkilometer crater discovery with boosting and transfer learning %J ACM Trans. Intell. Syst. Technol,” vol. 2, no. 4, pp. 1-22, 2011. https://doi.org/10.1145/1989734.1989743
[7] M. Wang, H. Li, D. Tao, K. Lu, and X. Wu, “Multimodal Graph-Based Reranking for Web Image Search %J Trans. Img. Proc,” vol. 21, no. 11, pp. 4649-4661, 2012. 10.1109/TIP.2012.2207397
[8] K. Yu, X. Wu, W. Ding, and J. Pei, “Scalable and Accurate Online Feature Selection for Big Data %J ACM Trans. Knowl. Discov. Data,” vol. 11, no. 2, pp. 1-39, 2016. https://doi.org/10.1145/2976744
[9] Y. Hochma, and M. Last, “Fast online feature selection in streaming data,” Machine Learning, vol. 114, no. 1, pp. 1, 2025. https://doi.org/10.1007/s10994-024-06712-x
[10] S. Perkins, and J. Theiler, “Online feature selection using grafting,” in Proceedings of the Twentieth International Conference on International Conference on Machine Learning, Washington, DC, USA, 2003, pp. 592-599.
[11] J. Zhou, D. P. Foster, R. A. Stine, and L. H. Unga, “Streamwise feature selection,”.Journal of Machine Learning Research, vol. 7, pp. 1861-1885, 2006.
[12] L. Zou, T. Zhou, and J. Dai, “Online Multi-Label Streaming Feature Selection by Label Enhancement and Fuzzy Synergistic Discrimination Information,” IEEE Transactions on Fuzzy Systems, 2025. 10.1109/TFUZZ.2025.3554982
[13] J. Liu, Y. Lin, Y. Li, W. Weng, and S. Wu, “Online Multi-label Streaming Feature Selection Based on Neighborhood Rough Set,” vol. 84, pp. 273-287, 2018. https://doi.org/10.1016/j.patcog.2018.07.021
[14] J. Liu, Y. Lin, S. Wu, and C. Wang, “Online Multi-label Group Feature Selection,” Knowledge-Based Systems, vol. 143, pp. 42-57, 2018. https://doi.org/10.1016/j.knosys.2017.12.008
[15] W. Jiang, G. Er, and Q. Dai, “Similarity-based online feature selection in content-based image retrieval,” in IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, pp. 02–712. 10.1109/TIP.2005.863105
[16] A. Rafie, P. Moradi, and A. Ghaderzadeh, “A multi-objective online streaming multi-label feature selection using mutual information,” Expert Systems with Applications, vol. 216, pp. 119428, 2023. https://doi.org/10.1016/j.eswa.2022.119428
[17] X. Wu, K. Yu, W. Ding, H. Wang, and X. Zhu, “Online feature selection with streaming features,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, pp. 1178–1192. 10.1109/TPAMI.2012.197
[18] S. Eskandari, and M. M. Javidi, “Online streaming feature selection using rough sets,” International Journal of Approximate Reasoning, vol. 69, pp. 35-57, 2016. https://doi.org/10.1016/j.ijar.2015.11.006
[19] M. Rahmaninia, and P. Moradi, “OSFSMI: online stream feature selection method based on mutual information,” Applied Soft Computing, vol. 68, pp. 733-746, 2018. https://doi.org/10.1016/j.asoc.2017.08.034
[20] Y. Lin, Q. Hu, J. Liu, J. Li, and X. Wu, “Streaming feature selection for multilabel learning based on fuzzy mutual information,” IEEE Transactions on Fuzzy Systems, vol. 25, no. 6, pp. 1491-1507, 2017. 10.1109/TFUZZ.2017.2735947
[21] J. Liu, Y. Lin, Y. Li, W. Weng, and S. Wu, “Online multi-label streaming feature selection based on neighborhood rough set,” Pattern Recognition, vol. 84, pp. 273-287, 2018. https://doi.org/10.1016/j.patcog.2018.07.021
[22] D. Paul, A. Jain, S. Saha, and J. Mathew, “Multi-objective PSO based online feature selection for multi-label classification,” Knowledge-Based Systems, vol. 222, pp. 106966, 2021. https://doi.org/10.1016/j.knosys.2021.106966
[23] H. L. X. W. Z. L. W. Ding, “Group Feature Selection with Streaming Features,” in 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA 2013. 10.1109/ICDM.2013.137
[24] J. Wang, M. Wang, P. Li, and L. Liu, “Online Feature Selection with Group Structure Analysis,” EEE Transactions on Knowledge and Data Engineering, vol. 27, no. 11, 2015. https://doi.org/10.48550/arXiv.1608.05889
[25] X. He, D. Cai, and P. Niyogi, “Laplacian score for feature selection,” Advances in neural information processing systems, vol. 18, 2005.
[26] H. Wang, D. Yu, Y. Li, Z. Li, and G. Wang, "Multi-label online streaming feature selection based on spectral granulation and mutual information." pp. 215-228. https://doi.org/10.3390/e25071071
[27] S. C. H. Hoi, J. Wang, P. Zhao, and R. Jin, “Online feature selection for mining big data,” in Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, Beijing, China, 2012, pp. 93-100. https://doi.org/10.1016/j.swevo.2025.101896
[28] L. Yu, and H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution." pp. 856-863.
[29] H. O. Parametric, “Handbook Of Parametric And Nonparametric Statistical Procedures.” https://doi.org/10.1201/9780429186196