The Application of the CatBoost Algorithm for Malware Category Detection
Subject Areas : Multimedia Processing, Communications Systems, Intelligent Systems
1 - Assistant Professor, Department of Computer Engineering, Larestan Branch, Islamic Azad university, Larestan, Iran
Keywords: Android Operating System, Malware Category, Machine Learning, CatBoost.,
Abstract :
Introduction: Android is the most widely used mobile operating system. Its popularity, open-source nature, and extensive app ecosystem have made it a target for malwares. While many studies focus on detecting Android malware, fewer have explored malware classification by attack types. This classification is essential for understanding malicious patterns. This paper proposes using the CatBoost algorithm to detect categories of Android malware, employing a hybrid analysis that combines static and dynamic features. The Kronodroid dataset, which contains 14 distinct malware category labels, serves as the benchmark for evaluation.
Method: A novel feature selection method called VFCE is introduced. Four feature selection techniques - variance, F-test, Chi-Square, and Extra Tree - were applied sequentially to the dataset. By selecting 50, 100, and 200 features using these methods along with VFCE, a total of 15 distinct feature sets were generated. The dataset was divided into training (70%) and testing (30%) subsets. I evaluated the performance of eight classification algorithms: CatBoost, Random Forest, Decision Tree, Support Vector Machine, Logistic Regression, Multi-Layer Perceptron, Bagging, and K-Nearest Neighbors across these feature sets.
Results: The VFCE feature selection method produced better results when selecting 100 features, especially in combination with the CatBoost algorithm. I compared the performance of various algorithms, including CatBoost, using the proposed feature selection method. They were assessed based on several metrics: accuracy, precision, recall, F-measure, false positive rate (FPR), root mean squared error (RMSE), training time, and testing time. CatBoost outperformed the other algorithms and previous studies on this dataset, achieving an accuracy of 93.28%, precision of 93.32%, recall of 93.28%, F-measure of 93.19%, and a rapid testing time of 0.07 seconds.
Discussion: This study examines the use of the CatBoost algorithm for Android malware category detection. The proposed VFCE feature selection method enhances both accuracy and speed. The CatBoost algorithm, in conjunction with the proposed feature selection method, improves accuracy, precision, recall, F-measure, RMSE, and testing time.
M. Alazab, "Profiling and classifying the behavior of malicious codes", Journal of Systems and Software, vol. 100, pp. 91-102, 2015.
D. Ruby, "Android Statistics In 2023 (Market Share & Users)", vol. 27, p. 2023.
M. Alazab, M. Alazab, A. Shalaginov, A. Mesleh, and A. Awajan, "Intelligent mobile malware detection using permission requests and API calls", vol. 107, pp. 509-521, 2020.
A. Kivva. (2024). IT threat evolution in Q1 2024. Mobile statistics. Available: https://securelist.com/it-threat-evolution-q1-2024-mobile-statistics/112750/.
A. Guerra-Manzanares, "Machine learning for android malware detection: mission accomplished? a comprehensive review of open challenges and future perspectives", Computers & Security, vol. 138, p. 103654, 2024.
R. A. Yunmar, S. S. Kusumawardani, and F. Mohsen, "Hybrid android malware detection: a review of heuristic-based approach", Ieee Access, vol. 12, pp. 41255-41286, 2024.
A. Muzaffar, H. R. Hassen, M. A. Lones, H. J. C. Zantout, and Security, "An in-depth review of machine learning based Android malware detection", Computers & Security, vol. 121, p. 102833, 2022.
A. Afianian, S. Niksefat, B. Sadeghiyan, and D. Baptiste, "Malware Dynamic Analysis Evasion Techniques: A Survey," ACM Computing Surveys (CSUR), vol. 52, no. 6, pp. 1-28, 2019.
H. H. R. Manzil and S. Manohar Naik, "Android malware category detection using a novel feature vector-based machine learning model", Cybersecurity, vol. 6, no. 1, p. 6, 2023.
E. B. Karbab, M. Debbabi, A. Derhab, and D. Mouheb, "MalDozer: Automatic framework for android malware detection using deep learning", Digital Investigation, vol. 24, pp. S48-S59, 2018.
M. Waheed and S. Qadir, "Effective and efficient android malware detection and category classification using the enhanced kronodroid dataset", Security and Communication Networks, vol. 2024, no. 1, p. 7382302, 2024.
H. Bai, N. Xie, X. Di and Q. Ye, "Famd: a fast multifeature android malware detection framework, design, and implementation", in IEEE Access, vol. 8, pp. 194729-194740, 2020, doi: 10.1109/ACCESS.2020.3033026.
K. Xu, Y. Li and R. H. Deng, "ICCDetector: ICC-Based Malware Detection on Android", IEEE Transactions on Information Forensics and Security, vol. 11, no. 6, pp. 1252-1264, June 2016, doi: 10.1109/TIFS.2016.2523912.
A. Rahali, A. H. Lashkari, G. Kaur, L. Taheri, F. GAGNON, and F. Massicotte, "DIDroid: Android Malware Classification and Characterization Using Deep Image Learning", Proceedings of the 2020 10th International Conference on Communication and Network Security, Tokyo, Japan, 2021. Available: https://doi.org/10.1145/3442520.3442522.
D. S. Keyes, B. Li, G. Kaur, A. H. Lashkari, F. Gagnon and F. Massicotte, "EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics", 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS), Hamilton, Canada, pp. 1-12, 2021.
S. Mahdavifar, A. F. Abdul Kadir, R. Fatemi, D. Alhadidi and A. A. Ghorbani, "Dynamic android malware category classification using semi-supervised deep learning", 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress, Calgary, Canada, pp. 515-522, 2020.
S. Mahdavifar, D. Alhadidi, and A. A. Ghorbani, "Effective and efficient hybrid android malware classification using pseudo-label stacked auto-encoder", Journal of Network and Systems Management, vol. 30, no. 1, p. 22, 2022.
S. Aurangzeb and M. Aleem, "Evaluation and classification of obfuscated android malware through deep learning using ensemble voting mechanism," Scientific Reports, vol. 13, no. 1, p. 3093, 2023.
A. Guerra-Manzanares, H. Bahsi, and S. Nomm, "Kronodroid: time-based hybrid-featured dataset for effective android malware detection and characterization," Computers & Security, vol. 110, p. 102399, 2021.
M. Waheed and S. Qadir, "Kronodroid improved dataset", 2022, https://github.com/ semw/kronodroid_improved_hybrid_detection_v2.git.
L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, "CatBoost: unbiased boosting with categorical features", Advances in Neural Information Processing Systems, vol. 31, pp. 6638-6648, 2018.
A. V. Dorogush, V. Ershov, and A. Gulin, "CatBoost: gradient boosting with categorical features support", arXiv preprint arXiv: 1810.11363, 2018.
A. Tharwat, "Classification assessment methods", Applied Computing and Informatics, vol. 17, no. 1, pp. 168-192, 2021.
A. Ahmad, D. Saraswat, V. Aggarwal, A. Etienne, and B. Hancock, "Performance of deep learning models for classifying and detecting common weeds in corn and soybean production systems", Computers and Electronics in Agriculture, vol. 184, p. 106081, 2021.