A feature selection method on gene expression microarray data for cancer classification Abstract
Subject Areas : Machine learningFarshad Kiyoumarsi 1 , Parham Kiyoumarsi 2 , Behzad Zamani 3 , محمد کرباسیون 4
1 -
2 - Isfahan University
3 - Islamic Azad University, Shahrekord Branch, Iran
4 - Islamic Azad University, Shahrekord Branch
Keywords: Feature selection, gene expression, microarray, cancer classification,
Abstract :
In medical data extraction, the gene dimension is often much larger than the sample size. To address this issue, we need to use a feature selection algorithm to select gene feature subsets with a strong correlation with the phenotype to ensure the accuracy of subsequent analyses. This research presents a new three-stage hybrid gene feature selection method, which combines a variance filter, extremely randomized tree, and whale optimization algorithm. Initially, a variance filter is employed to reduce the dimension of the gene feature space, and then an extremely randomized tree is utilized to further reduce the gene feature set. Finally, the whale optimization algorithm is applied to select the optimal gene feature subset. We evaluated the proposed method using the K-nearest neighbors (KNN) classifier on four published gene expression profile datasets and compared it with other gene selection algorithms. The results demonstrate that the proposed method has significant advantages in various evaluation indicators.
[1] V. Kalpana, V. Vijaya Kishore, and R. Satyanarayana, "MRI and SPECT Brain Image Analysis Using Image Fusion," Mobile Radio Communications and 5G Networks: Proceedings of Third MRCN 2022, pp. 571-586: Springer, 2023.
[2] S. A. Abdulrahman, W. Khalifa, M. Roushdy, and A.-B. M. Salem, “Comparative study for 8 computational intelligence algorithms for human identification,” Computer Science Review, vol. 36, pp. 100237, 2020.
[3] Y. Xia, S. Huang, Y. Wu, Y. Yang, S. Chen, P. Li, and J. Zhuang, “Clinical application of chromosomal microarray analysis for the diagnosis of Williams–Beuren syndrome in Chinese Han patients,” Molecular genetics & genomic medicine, vol. 7, no. 2, pp. e00517, 2019.
[4] V. Yuvaraj, and D. Maheswari, “Lung cancer classification based on enhanced deep learning using gene expression data,” Measurement: Sensors, vol. 30, pp. 100902, 2023.
[5] N. D. Cilia, C. De Stefano, F. Fontanella, S. Raimondo, and A. Scotto di Freca, “An experimental comparison of feature-selection and classification methods for microarray datasets,” Information, vol. 10, no. 3, pp. 109, 2019.
[6] V. Kalpana, V. Vijaya Kishore, and K. Praveena, "A common framework for the extraction of ILD patterns from CT image," Emerging Trends in Electrical, Communications, and Information Technologies: Proceedings of ICECIT-2018, pp. 511-520: Springer, 2019.
[7] M. Annamalai, and P. B. Muthiah, “An early prediction of tumor in heart by cardiac masses classification in echocardiogram images using robust back propagation neural network classifier,” Brazilian Archives of Biology and Technology, vol. 65, pp. e22210316, 2022.
[8] I. Jain, V. K. Jain, and R. Jain, “Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification,” Applied Soft Computing, vol. 62, pp. 203-215, 2018.
[9] D. P. Berrar, W. Dubitzky, and M. Granzow, A practical approach to microarray data analysis: Springer, 2003.
[10] A. Dabba, A. Tari, and S. Meftali, “A new multi-objective binary Harris Hawks optimization for gene selection in microarray data,” Journal of Ambient Intelligence and Humanized Computing, vol. 14, no. 4, pp. 3157-3176, 2023.
[11] S. Azadifar, M. Rostami, K. Berahmand, P. Moradi, and M. Oussalah, “Graph-based relevancy-redundancy gene selection method for cancer diagnosis,” Computers in Biology and Medicine, vol. 147, pp. 105766, 2022.
[12] S. Acharya, S. Saha, and N. Nikhil, “Unsupervised gene selection using biological knowledge: application in sample clustering,” BMC bioinformatics, vol. 18, pp. 1-13, 2017.
[13]Y. Huang, and L. Zhang, “Gene selection for classifications using multiple PCA with sparsity,” Tsinghua Science and Technology, vol. 17, no. 6, pp. 659-665, 2012.
[14] A. K. Dwivedi, “Artificial neural network model for effective cancer classification using microarray gene expression data,” Neural Computing and Applications, vol. 29, pp. 1545-1554, 2018.
[15] S. Liu, C. Xu, Y. Zhang, J. Liu, B. Yu, X. Liu, and M. Dehmer, “Feature selection of gene expression data for cancer classification using double RBF-kernels,” BMC bioinformatics, vol. 19, no. 1, pp. 1-14, 2018.
[16] R. Ali, A. Manikandan, and J. Xu, “A Novel framework of Adaptive fuzzy-GLCM Segmentation and Fuzzy with Capsules Network (F-CapsNet) Classification,” Neural Computing and Applications, pp. 1-17, 2023.
[17] N. Almugren, and H. Alshamlan, “A survey on hybrid feature selection methods in microarray gene expression data for cancer classification,” IEEE access, vol. 7, pp. 78533-78548, 2019.
[18] H. Almazrua, and H. Alshamlan, “A comprehensive survey of recent hybrid feature selection methods in cancer microarray gene expression data,” IEEE Access, 2022.
[19] M. Khalsan, L. R. Machado, E. S. Al-Shamery, S. Ajit, K. Anthony, M. Mu, and M. O. Agyeman, “A survey of machine learning approaches applied to gene expression analysis for cancer prediction,” IEEE Access, vol. 10, pp. 27522-27534, 2022.
[20] Y. Liang, S. Zhang, H. Qiao, and Y. Yao, “iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection,” Analytical Biochemistry, vol. 630, pp. 114335, 2021.