A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Subject Areas : Data MiningSamira Amjad 1 , Farhad Soleimanian Gharehchopogh 2
1 - Department of Computer Engineering, Maragheh Branch, Islamic Azad University, Maragheh, Iran.
2 - Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, IRAN
Keywords: Feature Selection, Email Spam Detection, Scatter Searching Algorithm, K-Nearest Neighbors,
Abstract :
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not monitored. Today, email is the foundation of many internet attacks that have happened. The Hackers and penetrators are using email spam as a way to penetrate into computer systems junk. Email can contain viruses, malware, and malicious code. Therefore, the type of email should be detected by security tools and avoid opening suspicious emails. In this paper, a new model has been proposed based on the hybrid of Scatter Searching Algorithm (SSA) and K-Nearest Neighbors (KNN) to email spam detection. The Results of proposed model on Spambase dataset shows which our model has more accuracy with Feature Selection (FS) and in the best case, its percentage of accuracy is equal to 94.54% with 500 iterations and 57 features. Also, the comparison shows that the proposed model has better accuracy compared to the evolutionary algorithm (data mining and decision detection such as C4.5).
1. Salihovic I., Serdarevic H., and Kevric J., 2018. The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks. in International Symposium on Innovative and Interdisciplinary Applications of Advanced Technologies. pp. 476-483.
2. Pandey A.C., Rajpoot D.S., 2019. Spam review detection using spiral cuckoo search clustering method. Evolutionary Intelligence. Vol. 12, Issue 2, pp. 147-164.
3. Diale M., Celik T., and Van Der Walt C., 2019. Unsupervised feature learning for spam email filtering. Computers & Electrical Engineering. vol. 74, pp. 89-104.
4. Cabrera-Leon Y., Baez P.G., and Suarez-Araujo C.P., 2019. E-mail spam filter based on unsupervised neural architectures and thematic categories: design and analysis. in Computational Intelligence, ed: Springer. pp. 239-262.
5. M. Habib, H. Faris, M.A. Hassonah, J. Alqatawna, A.F. Sheta, A.M. Al-Zoubi, Automatic Email Spam Detection using Genetic Programming with SMOTE, 2018 Fifth HCT Information Technology Trends (ITT), IEEE, pp. 185190, 2018.
6. B. Martin, 1999. Instance-based learning: nearest neighbour with generalisation.
7. Hasanluo M. and Soleimanian Gharehchopogh F., 2016. Software cost estimation by a new hybrid model of particle swarm optimization and k-nearest neighbor algorithms. Journal of Electrical and Computer Engineering Innovations, vol. 4, pp. 49-55.
8. Scheuerer S. and Wendolsky R., 2006. A scatter search heuristic for the capacitated clustering problem, European Journal of Operational Research. vol. 169, pp. 533-547.
9. Maleki I., Gharehchopogh F.S., Ayat Z., and Ebrahimi L., 2014. A Novel Hybrid Model of Scatter Search and Genetic Algorithms for Software Cost Estimation. Magnt Research Report. 2, pp. 359-371.
10. Chaves A.A. and Lorena L.A.N., 2010. Clustering search algorithm for the capacitated centered clustering problem. Computers & Operations Research, vol. 37, pp. 552-558, 2010.
11. Chikh R. and Chikhi S., 2019. Clustered negative selection algorithm and fruit fly optimization for email spam detection. Journal of Ambient Intelligence and Humanized Computing, vol. 10, pp. 143-152.
12. Shuaib M., Abdulhamid S.M., Adebayo O.S., Osho O., Idris I., Alhassan J.K., Rana N., 2019. Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification. SN Applied Sciences, pp. 1:390.
13. Faris H., Ala’M A.-Z., Heidari A.A., Aljarah I., Mafarja M., Hassonah M.A., 2019. An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks, Information Fusion, 48, pp. 67-83.
14. Singh M., 2019. Classification of Spam Email Using Intelligent Water Drops Algorithm with Naïve Bayes Classifier. in Progress in Advanced Computing and Intelligent Engineering, ed: Springer, pp. 133-138.
15. Roy S.S., Sinha A., Roy R., Barna C., Samui P., 2018. Spam Email Detection Using Deep Support Vector Machine, Support Vector Machine and Artificial Neural Network, International Workshop Soft Computing Applications SOFA 2016: Soft Computing Applications. pp. 162-174.
16. Agarwal K., Kumar T., 2018. Email Spam Detection Using Integrated Approach of Naïve Bayes and Particle Swarm Optimization, Second International Conference on Intelligent Computing and Control Systems (ICICCS). pp. 685-690.
17. Wijaya A. and Bisri A., 2016. Hybrid decision tree and logistic regression classifier for email spam detection, in 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE), pp. 1-4.
18. Abdolahnezhad M.R., Banirostam T., 2016. Improved negative selection algorithm for email spam detection application. International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE), 5(4):956-960
19. Rathod S.B. and Pattewar T.M., 2015. Content based spam detection in email using Bayesian classifier, in 2015 International Conference on Communications and Signal Processing (ICCSP), pp. 1257-1261.
20. Idris I., Selamat A., Nguyen N.T., Omatu S., Krejcar O., Kuca K., 2015. A combined negative selection algorithm–particle swarm optimization for an email spam detection system. Engineering Applications of Artificial Intelligence, 39, pp. 33-44.
21. Behjat A.R., Mustapha A., Nezamabadi-pour H., Sulaiman M.N., and Mustapha N., 2013. A PSO-Based Feature Subset Selection for Application of Spam/Non-spam Detection. in Soft Computing Applications and Intelligent Systems, ed: Springer, pp. 183-193.
22. El-Alfy E.-S.M. and Abdel-Aal R.E., 2011. Using GMDH-based networks for improved spam detection and email feature analysis. Applied Soft Computing, vol. 11, pp. 477-488.
23. Ying K.-C., Lin S.-W., Lee Z.-J., and Lin Y.-T., 2010. An ensemble approach applied to classify spam e-mails. Expert Systems with Applications, vol. 37, pp. 2197-2201.
24. Su M.-C., Lo H.-H., and Hsu F.-H., 2010. A neural tree and its application to spam e-mail detection. Expert Systems with Applications, vol. 37, pp. 7976-7985.
25. Khalandi S. and Soleimanian Gharehchopogh F., 2018. A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier. Journal of Advances in Computer Engineering and Technology, vol. 4, pp. 31-40.
26. Majidpour H. and Soleimanian Gharehchopogh F., 2018. An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification. Journal of Advances in Computer Research, vol. 9, pp. 29-40.
27. Allahverdipour A. and Soleimanian Gharehchopogh F., 2018. An improved k-nearest neighbor with crow search algorithm for feature selection in text documents classification. Journal of Advances in Computer Research, vol. 9, pp. 37-48.
28. Miandoab E. E. and Gharehchopogh F.S., 2016. A novel hybrid algorithm for software cost estimation based on cuckoo optimization and k-nearest neighbors algorithms. Engineering, Technology & Applied Science Research, vol. 6, pp. 1018-1022.
29. Hopkins M., Reeber E., Forman G., and Suermondt J., 1999. "Spambase," UML Repository, ed.
30. Awad M., and Foqaha M., 2016. Email Spam Classification Using Hybrid Approach of RBF Neural Network and Particle Swarm Optimization. International Journal of Network Security & Its Applications, vol. 8, no. 4, pp. 17-28.
31. Sharma S. and Arora A., 2013. Adaptive approach for spam detection," International Journal of Computer Science Issues (IJCSI), vol. 10, p. 23.
32. Wan Y., Wang M., Z. Ye, and Lai X., 2016. A feature selection method based on modified binary coded ant colony optimization algorithm, Applied Soft Computing, vol. 49, pp. 248-258.
33. Barani F., Mirhosseini M., and Nezamabadi-Pour H., 2017. Application of binary quantum-inspired gravitational search algorithm in feature subset selection, Applied Intelligence, vol. 47, pp. 304-318.
34. Zhang C., C. Liu, Zhang X., and Almpanidis G., 2017. An up-to-date comparison of state-of-the-art classification algorithms. Expert Systems with Applications. vol. 82, pp. 128-150.
35. Lee S., Park Y.-T., and B. d’Auriol J., 2012. A novel feature selection method based on normalized mutual information," Applied Intelligence, vol. 37, pp. 100-120.