Improve Spam Detection in the Internet Using Feature Selection based on the Metahuristic Algorithms
Subject Areas : Evolutionary ComputingAbdulbaghi Ghaderzadeh 1 , sahar Hosseinpanahi 2 , Sarkhel Taher kareem 3
1 - Department of Computer Engineering, Sanandaj Branch, Islamic Azad University, Sanandaj, Iran
2 - Department of Computer Engineering, Sanandaj Branch, Islamic Azad University, Sanandaj, Iran
3 - Computer Department, College of Science, University of Halabja, Halabja, Iraq.
Sulaimani Polytechnic University, Technical College of Informatics,Computer Networks Department, Sulaimani, Iraq.
Keywords: Spam detection, metaheuristic algorithms, Emperor Penguin Optimizer(EPO), Feature Selection,
Abstract :
Nowadays, spam is a major challenge regarding emails. Spam is a specific type of email that is sent to the network for malicious purposes. Spam plays an important role in stealing information and can include fake links to trick users. Machine learning and data mining techniques such as artificial neural networks are the most applicable methods to detect spam. The multi-layer artificial neural network needs to select the most important features as inputs to reduce the output error for accurate spam detection. In the proposed method, a smart method based on swarm intelligence algorithms is used for feature selection. In this study, a binary version of Emperor Penguin Optimizer (EPO) is used to select more appropriate features. The proposed method uses the selected features for learning and classification in the spam detection process. Experiments in the MATLAB environment on the Spambase dataset show that with the increase in population the error in spam detection in Emails will decrease about 14.61% and with the increase in feature space, it will decrease about 43.85% in the best situation. Experiments show that the proposed method has less error in detecting spam compare to other methods, multilayer artificial neural network, recursive neural network, support vector machine, Bayesian network, and whale optimization algorithm. Experiments show that the error of spam detection in the proposed approach is about 23.57% less than the whale optimization algorithm. Empirical results, obtained through simulations on the Spambase dataset, show our approach outperforms the other existing methods on precision value.