Adversarial Attacks in a text sentiment Analysis model
Subject Areas : Multimedia Processing, Communications Systems, Intelligent SystemsSahar Mokarrami Sefidab 1 , Seyed Abolghasem Mirroshandel 2 , hamidreza Ahmadifar 3 , Mahdi Mokarrami 4
1 - Faculty of Engineering, University of Guilan, Rasht, Iran
2 - Assistant Professor, Faculty of Engineering, University of Guilan, Rasht, Iran
3 - Assistant Professor, Faculty of Engineering, University of Guilan, Rasht, Iran
4 - Payam-e Noor University of Guilan, Rasht, Iran
Keywords: Adversarial Examples, Sentiment Analysis, Loss function gradient, natural language processing, Text Attacks,
Abstract :
Background and Purpose: Recently some researchers have shown that deep learning models, despite their high accuracy, can be vulnerable through some manipulations of their input samples. This manipulation leads to the production of new samples called Adversarial examples. These samples are very similar to the original ones, so humans cannot differentiate between these samples and the original, and cannot remove them from the dataset before predicting the model and preventing model errors. Various types of research have been done to generate malicious samples and inject them into the model, among which, the production of text samples has its own difficulties due to the discrete nature of the text. In this research, we tried to reach the highest level of vulnerability by providing a method with the least manipulation of the input data, and by testing the proposed method, we were able to bring the accuracy of CNN and LSTM models to less than 10%.Methods: In this research, for making malicious samples, first, a word that can increase the amount of error in the classification prediction is selected from the word dictionary as a candidate word for replacement by using Taylor expansion and then considering the importance of each word in the calculated cost of the corresponding candidate word, we proposed an arrangement for substitution between words. Finally, we moved the words in the specified order until the output of the model changed.Results: The evaluation of the presented method on two sentiment analysis models, LSTM and CNN, has shown that the proposed method has been very effective in reducing the accuracy of both models to less than 10% with a small number of replacements and this indicates the success of the proposed method compared to some other similar methods.Conclusion: As mentioned, most of the attention of science and industry is on the production of different systems using deep learning methods, so their security of them is also important. It is important to increase the strength of the models against adversarial examples. In this research, a method with the least amount of manipulation was presented to produce textual conflict samples. It seems that in the future it will be possible to use different methods of making natural texts to produce samples that, in addition to the apparent similarity to the original sample, are also comprehensible in terms of content.
C.Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhun, I. Goodfellow and R. Fergus, “Intriguing properties of neural networks”, 2nd International Conference on Learning Representations, ICLR 2014, Banff, Canada, 2014. |
R. Jia., P. Liang, Adversarial examples for evaluating reading comprehension systems. In EMNLP, 2017 |
Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural machine translation”, In Proceedings of ICLR, 2018. |
I. Fursov, A. Zaytsev, P. Burnyshev, E. Dmitrieva, N. Klyuchnikov, A. Kravchenko, E. aArtemova and E. Burnaev, “A differentiable language model adversarial attack on text classifiers”, arXiv:2107.11275v1 [cs.CL], 23 Jul 2021. |
Z. Kong, J. Xue, Y. Wang, L. Huang, Z. Niu and E. Li, “A survey on adversarial attack in the age of artificial intelligence”, Wireless Communications and Mobile Computing, Volume 2021, Article ID 4907754, 22 pages, 2021. |
J. Xu and Q. Du, “TextTricker:Loss-based and gradient-based adversarial attacks on text classification models”, Engineering Applications of Artificial Intelligence,Volume 92, Elsevier, 0952-1976, 2020. |
H. Hosseini, S. Kannan, B. Zhang and R. Poovendran, “Deceiving google’s perspective api built for detecting toxic comments,” arXiv preprint arXiv:1702.08138, 2017. |
M. Alzantot, Y. Sharma, A. Elgohary, B. Ho, M. Srivastava and K. Chang, “Generating natural language adversarial examples, in Proceedings of Conference on Empiritical Methods in Natural Language Processing (EMNLP), 2018. |
B. Liang, H. Li, M. Su, P. Bian, X. Li and W. ChangShi, “Deep text classification can be fooled”, arXiv preprint arXiv:1704.08006, 2017. |
S. Samanta and S. Mehta, “Towards crafting text adversarial samples”, arXiv preprint arXiv:2003.10388, July 2017. |
N. Papernot, P. McDaniel, A. Swami and R. Harang,“Crafting adversarial input sequences for recurrent neural networks”, In 2016 IEEE Military Communications Conference, MILCOM 2016, Baltimore, MD, USA, p.p. 49–54, November 1-3, 2016. |
M. Sato, J. Suzuki, H. Shindo and Y. Matsumoto, “Interpretable adversarial perturbation in input embedding space for text”, In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, (IJCAI 2018), Stockholm, Sweden, p.p. 4323– 4330, July 13-19, 2018. |
M. Behjati, S. M. Moosavi-Dezfooli, M. SoleymaniBaghshah and P. Frossard, “Universal adversarial attacks on text classifiers”, In ICASSP, 2019. |
L. Song, X. Yu, H. Peng and K. Narasimhan, “Universal adversarial attacks with natural triggers for text classification”, arXiv:2005.00174v2 [cs.CL], 7 Apr 2021. |
S. Ren, Y. Deng, H. He and W. Che,“Generating natural language adversarial examples through probability weighted word saliency”, In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, p.p. 1085–1097, 2019. |
J. Ebrahimi, A. Rao, D. Lowd and D. Dou, “Hotflip: White-box adversarial examples for text classification”, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers, p.p. 31–36, 2018. |
E. Wallace, S Feng, N. Kandpal, M. Gardner and S. Singh, “Universal adversarial triggers for attacking and analyzing nlp”, arXiv preprint arXiv:1908.07125, 2019. |
H. Zhang, H. Zhou, N. Miao and L. Li, “Generating fluent adversarial examples for natural languages”, in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019. |
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D Manning, A. Ng and C. Potts, “Recursive deep models for semantic compositionality over a sentiment treebank”, in Proceedings of the conference on empirical methods in natural language processing (EMNLP), p.p. 1631–1642, 2013. |
T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch and A. Joulin, “Advances in pre-training distributed word representations”, In LREC, 2018. |