Phishing URL detection model based on machine learning
Subject Areas : network securityپریسا دانشجو 1 , saeed ahmadi 2
1 -
2 - Computer Department, College of Engineering, Islamic Azad University, Tehran West Branch, Tehran, Iran
Keywords: phishing, URL address, deep learning, channel layer, multi-head self-attention,
Abstract :
Phishing attacks have always posed significant threats to Internet security. One of the most common forms of phishing is through URLs, where attackers disguise fake URLs as valid URLs to trick users into clicking on them. Machine learning techniques have shown promise for identifying phishing URLs, but their effectiveness can vary based on the approach used. Objectives: The objective of this research is to propose two machine learning methods, "Convolutional Neural Networks" (CNN) and "Multiple Self-Awareness" (MHSA), to identify phishing URLs. In addition, to evaluate and compare the effectiveness of this approach compared to other methods and models. Research method: A dataset of URLs was collected and labeled as phishing or legitimate. The performance of several models using different machine learning methods, including CNN and MHSA, was evaluated to classify these URLs using different criteria, such as accuracy, precision, recall and F1 score. Results: The results show that the combination of CNN and MHSA models performs better than other individual models and reaches 98.3% accuracy. which provides a significant improvement in identifying phishing URLs compared to existing modern methods. Conclusion: The combination of CNN and MHSA is an effective approach to detect phishing URLs. This method performs better than existing modern methods and provides a more accurate and reliable method for detecting phishing URLs. The results of this study show the potential of using hybrid methods in improving the accuracy and reliability of phishing URL detection methods based on machine learning.
[1] James, L. (2006). Banking on phishing. In James, L. (Ed.), Phishing Exposed (pp. 1-35). Syngress. ISBN 9781597490306
[2] Sundara Pandiyan, S., Selvaraj, P., Burugari, V. K., Benadit P, J., & Kanmani, P. (2022). Phishing attack detection using Machine Learning. Measurement: Sensors, 24, 100476. ISSN 2665-9174
[3] Ahammad, S. K. H., Kale, S. D., Upadhye, G. D., Pande, S. D., Babu, E. V., Dhumane, A. V., & Bahadur, M. D. K. J. (2022). Phishing URL detection using machine learning methods. Advances in Engineering Software, 173, 103288. ISSN 0965-9978
[4] Berners-Lee, T., Masinter, L., & McCahill, M. (Eds.). (1994). Uniform Resource Locators (URL). Request for Comments: 1738. Network Working Group. CERN. Standards Track. Updated by: 1808, 2368, 2396, 3986, 6196, 6270, 8089. Obsoleted by: 4248, 4266. Errata Exist
[5] L. Wenyin, G. Liu, B. Qiu and X. Quan, "Antiphishing through Phishing Target Discovery," in IEEE Internet Computing, vol. 16, no. 2, pp. 52-61, March- April 2012, doi: 10.1109/MIC.2011.103
[6] Safi, A., & Singh, S. (2023). A systematic literature review on phishing website detection techniques. Journal of King Saud University - Computer and Information Sciences, 35(2), 590-611. ISSN 1319-1578
[7] Vrbančič, G., Fister, I., & Podgorelec, V. (2020). Datasets for phishing websites detection. Data in Brief, 33, 106438. ISSN 2352-3409
[8] Zheng, F., Yan, Q., Leung, V. C. M., Yu, F. R., & Ming, Z. (2022). HDP-CNN: Highway deep pyramid convolution neural network combining wordlevel and character-level representations for phishing website detection. Computers & Security, 114, 102584. ISSN 0167-4048
[9] Wei, W., Ke, Q., Nowak, J., Korytkowski, M., Scherer, R., & Woźniak, M. (2020). Accurate and fast URL phishing detector: A convolutional neural network approach. Computer Networks, 178, 107275. ISSN 1389-1286
[10] Sahingoz, O. K., Baykal, S. I., & Bulut, D. (2018). Phishing detection from urls by using neural networks. Computer Science & Information Technology (CS & IT), 41-54.
[11] Remmide, M. A., Boumahdi, F., Boustia, N., Feknous, C. L., & Della, R. (2022). Detection of Phishing URLs Using Temporal Convolutional Network. Procedia Computer Science, 212, 74-82. ISSN 1877-0509.
[12] Marwa M. Emam, Nagwan Abdel Samee, Mona M. Jamjoom, Essam H. Houssein, Optimized deep learning architecture for brain tumor classification using improved Hunger Games Search Algorithm, Computers in Biology and Medicine, Volume 160, 2023, 106966, ISSN 0010-4825
[13] Sundara Pandiyan S, Prabha Selvaraj, Vijay Kumar Burugari, Julian Benadit P, Kanmani P, Phishing attack detection using Machine Learning, Measurement: Sensors, Volume 24, 2022, 100476, ISSN 2665-9174,
[14] Kai Florian Tschakert, Sudsanguan Ngamsuriyaroj, Effectiveness of and user preferences for security awareness training methodologies, Heliyon, Volume 5, Issue 6, 2019, e02010, ISSN 2405-8440
[15] Mohsen Soori, Behrooz Arezoo, Roza Dastres, Machine learning and artificial intelligence in CNC machine tools, A review, Sustainable Manufacturing and Service Economics, 2023, 100009, ISSN 2667-3444,
[16] Tianyuan Liu, Hangbin Zheng, Pai Zheng, Jinsong Bao, Junliang Wang, Xiaojia Liu, Changqi Yang, An expert knowledge-empowered CNN approach for welding radiographic image recognition, Advanced Engineering Informatics, Volume 56, 2023, 101963, ISSN 1474-0346,
[17] Jun Ma, Guolin Yu, Weizhi Xiong, Xiaolong Zhu, Safe semisupervised learning for pattern classification, Engineering Applications of Artificial Intelligence, Volume 121, 2023, 106021, ISSN 0952-1976
[18] Benavides-Astudillo, E., Fuertes, W., Sanchez-Gordon, S., Rodriguez- Galan, G., Martínez-Cepeda, V., Nuñez-Agurto, D. (2023). Comparative Study of Deep Learning Algorithms in the Detection of Phishing Attacks Based on HTML and Text Obtained from Web Pages. In: Botto-Tobar, M., Zambrano Vizuete, M., Montes León, S., Torres-Carrión, P., Durakovic, B. (eds) Applied Technologies. ICAT 2022. Communications in Computer and Information Science, vol 1755. Springer, Cham. https://doi.org/10.1007/978-3-031-24985-3_28
[19] J. Kumar, A. Santhanavijayan, B. Janet, B. Rajendran and B. S. Bindhumadhava, "Phishing Website Classification and Detection Using Machine Learning," 2020 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2020, pp. 1-6, doi: 10.1109/ICCCI48352.2020.9104161.
[20] Do, Q.N.; Selamat, A.; Krejcar, O.; Yokoi, T.; Fujita, H. Phishing Webpage Classification via Deep Learning-Based Algorithms: An Empirical Study. Appl. Sci. 2021, 11, 9210. https://doi.org/10.3390/ app11199210
[21] M. N. Alam, D. Sarma, F. F. Lima, I. Saha, R. -E. -. Ulfath and S. Hossain, "Phishing Attacks Detection using Machine Learning Approach," 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 2020, pp. 1173-1179, doi: 10.1109/ICSSIT48917.2020.9214225.
[22] Y. Huang, Q. Yang, J. Qin and W. Wen, "Phishing URL Detection via CNN and Attention-Based Hierarchical RNN," 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), Rotorua, New Zealand, 2019, pp. 112-119, doi: 10.1109/TrustCom/BigDataSE.2019.00024.
[23] M. A. Adebowale, K. T. Lwin and M. A. Hossain, "Deep Learning with Convolutional Neural Network and Long Short-Term Memory for Phishing Detection," 2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), Island of Ulkulhas, Maldives, 2019, pp. 1-8, doi: 10.1109/SKIMA47702.2019.8982427.
[24] Bahnsen, A. C., Bohorquez, C. E., Villegas, S., Vargas, J., & González, F. A. (2017). Classifying phishing URLs using recurrent neural networks. In 2017 APWG symposium on electronic crime research (eCrime) (pp. 1–8). Scottsdale, AZ, USA.
[25] Bahnsen, A. C., Bohorquez, C. E., Villegas, S., Vargas, J., & González, F. A. (2017). Classifying phishing URLs using recurrent neural networks. In 2017 APWG symposium on electronic crime research (eCrime) (pp. 1–8). Scottsdale, AZ, USA.
[26] Zhang J., Li X. Phishing detection method based on borderline-smote deep belief network security, privacy, and anonymity in computation, communication, and storage. SpaCCS 2017, Lecture notes in computer science, vol. 10658, Springer, Cham (2017), pp. 45-53
[27] Yang P., Zhao G., Zeng P. Phishing website detection based on multidimensional features driven by deep learning IEEE Access, 7 (2019), pp. 15196-15209