The opinion mining of Digikala reviews by semi-supervised support vector machine
Subject Areas : Multimedia Processing, Communications Systems, Intelligent Systemszohre Karimi 1 * , Hadis Haghiri 2
1 - Assistant Professor, Department of Engineering, Damghan University, Damghan, Iran
2 - B.A, Computer Engineering, Damghan University, Damghan, Iran
Keywords: Opinion mining, Sentiment Analysis, Semi-Supervised Learning, Semi-supervised Support vector Machine, Digikala opinions,
Abstract :
Introduction: The widespread use of the internet and social media platforms has led to an explosion of digital data, including users' opinions about various services and products. These opinions are valuable sources of information for businesses and organizations to understand the needs and preferences of their customers. Supervised machine learning models have been proven to be effective in analyzing users' opinions. However, to achieve efficient results, a sufficient amount of labeled training data is necessary. Labeling data requires a considerable amount of time and resources, which can be a significant challenge for many organizations. This is where the concept of semi-supervised learning comes in, which utilizes both labeled and unlabeled data to improve the performance of the model.Method: In this paper, a semi-supervised approach to analyze users' Persian opinions has been proposed. The method takes advantage of the abundant unlabeled data available in addition to a small number of labeled data in the training phase. The proposed method uses the support vector machine (SVM) algorithm, which has been shown to be effective in opinion mining in related research. The proposed method extracts emotional words from comments using sentiment lexicons and then extracts term frequency-inverse of document frequency vectors. The semi-supervised SVM algorithm is then applied to these vectors to estimate the polarity of sentiments.Results: To evaluate the performance of the proposed method, it has been tested on the Digikala comments dataset and compared with the supervised SVM algorithm and semi-supervised self-training method for different numbers of labeled data based on accuracy, precision, recall, and F1 criteria. The results indicate that the proposed semi-supervised method outperforms the supervised SVM algorithm and the semi-supervised method of self-training. The impact of the size of unlabeled data is also investigated in the experiments.Discussion: One of the advantages of the proposed method is that it can estimate the polarity of opinions that have not been trained in the training phase, which is not possible in some graph-based methods. Furthermore, it is not affected by the error of training with labeled data in self-training methods. In conclusion, the proposed semi-supervised method provides an efficient solution for analyzing users' opinions in Persian. This method can be used by businesses and organizations to gain insights into their customers' opinions and improve their products and services accordingly.
[1] M. Kang, J. Ahn and K. Lee, "Opinion Mining using Ensemble Text Hidden Markov Models for Text Classification," Expert Systems with Applications, pp. 218-227, 2018. |
[2] S. Mokarrami Sefidab, S. A. Mirroshandel, H. Ahmadifar and M. Mokarrami, "Adversarial Attacks on a Text Sentiment Analysis Model," Intelligent Multimedia Processing and Communication Systems, vol. 2, no. 2, pp. 9-16, 2021. |
[3] L. Yue, C. Weitong, L. Xue, Z. Wanli and Y. Minghao, "A survey of sentiment analysis in social media," Knowledge and Information Systems, pp. 617-663, 2019. |
[4] T. P.D., "Thumbs up or thumbs down?: Semantic Orientation Applied to Unsupervised Classification of Reviews," 40th Annual Meeting on Association for Computational Linguistics, pp. 417-424, 2002. |
[5] X. Ding, B. Liu and P. S. Yu, "A Holistic Lexicon-based Approach to Opinion Mining," Proceedings of the International Conference on Web Search and Web Data, pp. 231-240, 2008. |
[6] Z. Karimi and K. Nasiri, "Sentiment Analysis of Digikala Opinions using Adaptive Neuro-Fuzzy Inference System," In Proceeding of 4th International Conference on Soft Computing, pp. 1035-1043, 2021. |
[7] M. S. Sabuj, Z. Afrin and K. M. A. Hasan, "Opinion mining using support vector machine with web based diverse data," International Conference on Pattern Recognition and Machine Intelligence, pp. 673-678, 2017. |
[8] M. R. Saleh, M. Teresa Martín-Valdivia, A. Montejo-Ráez and L. A. Ureña López, "Experiments with SVM to classify opinions in different domains," Expert Systems with Applications, pp. 14799-14804, 2011. |
[9] X. Zhu and A. B. Goldberg, "Introduction to Semi-Supervised Learning," Synthesis lectures on artificial intelligence and machine learning 3, no. 1, pp. 1-130, 2009. |
[10] Z. Karimi and S. Shiry Ghidary, "Semi-Supervised Metric Learning in Stratified Spaces via Intergrating Local Constraints and Information-theoretic non-local Constraints," Neurocomputing 312, pp. 165-176, 2018. |
[11] F. Hassan Khan, U. Qamar and S. Bashir, "A Semi-Supervised Approach to Sentiment Analysis using Revised Sentiment Strength based on SentiWordNet," Knowledge and information Systems, pp. 851-872., 2017. |
[12] D. Anand and D. Naorem, "Semi-Supervised Aspect Based Sentiment Analysis for Movies Using Review Filtering," Procedia Computer Science, pp. 86-93, 2016. |
[13] Y. He and D. Zhou, "Self-training from labeled features for sentiment analysis," Information Processing & Management, pp. 606-616, 2011. |
[14] J. Ortigosa-Hernández, J. Diego Rodríguez, L. Alzate, M. Lucania, I. Inza and J. A. Lozano, "Approaching Sentiment Analysis by using Semi-Supervised Learning of Multi-dimensional Classifiers," Neurocomputing 92, pp. 98-115, 2012. |
[15] M. Najafzadeh, S. Rahati Quchan and R. Ghaemi, "A Semi-Supervised Framework based on Self-constructed Adaptive Lexicon for Persian Sentiment Analysis," Signal and Data Processing, pp. 89-102, 2018. |
[16] E. Asgarian, M. Kahani and S. Sharifi, "Hesnegar: Persian sentiment wordnet," Signal and Data Processing, pp. 71-86, 2018. |
[17] Z. Rajabi and M. Hourali, "Sentiment Analysis Methods in Persian Text: A survey," Signal and Data Processing, pp. 107-132, 2022. |
[18] E. Vaziripour, C. Giraud-Carrier and D. Zappala, "Analyzing the Political Sentiment of Tweets in Farsi," Tenth International AAAI Conference on Web and Social Media, 2016. |
[19] Z. Li, Y. Fan, B. Jiang, T. Lei and W. Liu, "A Survey on Sentiment Analysis and Opinion Mining for Social Multimedia," Multimedia Tools and Applications, pp. 6939-6967, 2019. |
[20] Z. Karimi, "Opinion Mining of Drug Reviews using Support Vector Machine for Multiple Instance Learning," 1st International and 3rd National Conference on Biomathematics, pp. 218-224, 2022. |
[21] A. Bagheri and M. Saraee, "Persian Sentiment Analyzer: A Framework based on a Novel Feature Selection Method," International Journal of Artificial Intelligence, pp. 115-129, 2014. |
[22] M. Shams, A. Shakery and H. Faili, "A Nonparametric LDA-based Induction Method for Sentiment Analysis," Artificial Intelligence and Signal Processing, 2012. |
[23] I. Dehdarbehbahani, A. Shakery and H. Faili, "Semi-Supervised Word Polarity Identification in Resource-lean Languages," Neural networks 58, pp. 50-59, 2014. |
[24] K. Dashtipour, A. Hussain, Q. Zhou, A. Gelbukh, A. Y. A. Hawalah and E. Cambria, "PerSent: A Freely Available Persian Sentiment Lexicon," International Conference on Brain Inspired Cognitive Systems, pp. 310-320, 2016. |
[25] E. Cambria, P. Soujanya, H. Amir and L. Bing, "Computational Intelligence for Affective Computing and Sentiment Analysis [Guest Editorial]," IEEE Computational Intelligence Magazine, pp. 16-17, 2019. |
[26] P. Hosseini, A. Ahmadian Ramaki, M. Anvari, H. Maleki and S. A. Mirroshandel, "SentiPers: A Sentiment Analysis Corpus for Persian," Conference on Computational Linguistics, 2013. |
[27] B. Sabeti, P. Hosseini, G. Ghassem-Sani and S. A. Mirroshandel, "An ontology based sentiment lexicon for Persian," Global Conference on Artificial Intelligence (GCAI), pp. 329-339, 2016. |
[28] M. Moradi, P. Khosravizade and V. Bahram, "Constructing tagged corpora with a web approach as a corpus," the 2th symposium on computational Linguistics, 2012. |
[29] K. Dashtipour, M. Gogate, A. Adeel, H. Larijani and A. Hussain, "Sentiment Analysis of Persian Movie Reviews Using Deep Learning," Entropy, 2021. |
[30] P. F. Brown, P. V. de Souza, R. L. Mercer, V. J. D. Pietra and C. L. Jennifer, "Class-based n-gram Models of Natural Language," Computational linguistics, p. 467–479, 1992. |
[31] L. Gonbadi and N. Ranjbar, "Sentiment Analysis of People’s opinion about Iranian National," Intelligent Multimedia Processing and Communication Systems, vol. 3, no. 4, pp. 51-60, 2023. |
[32] M. B. Dastgheib, S. Koleini and F. Rasti, "The Application of Deep Learning in Persian Documents Sentiment Analysis," International Journal of Information Science and Management (IJISM), pp. 1-15, 2020. |
[33] R. Ghasemi, S. A. Ashrafi Asli and S. Momtazi, "Deep Persian sentiment analysis: Cross-lingual training for low-resource languages," ournal of Information Science 48, pp. 449-462, 2022. |
[34] G. Ansari, C. Saxena, T. Ahmad and M. Doja, "Aspect Term Extraction using Graph-based Semi-Supervised Learning," Procedia Computer Science, vol. 167, pp. 2080-2090, 2020. |
[35] Y. Ren, N. Kaji, N. Yoshinaga and M. Kitsuregawa, "Sentiment Classification in Under-resourced Languages using Graph-based Semi-Supervised Learning Methods," IEICE TRANSACTIONS on Information and Systems, pp. 790-797, 2014. |
[36] T. Yang, L. Hu, C. Shi, H. Ji, X. Li and L. Nie, "HGAT: Heterogeneous Graph Attention Networks for Semi-Supervised Short Text Classification," ACM Transactions on Information Systems (TOIS), pp. 1-29, 2021. |
[37] N. F. F. D. Silva, L. F. Coletta and E. R. Hruschka, "A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning," ACM Computing Surveys (CSUR), pp. 1-26, 2016. |
[38] Z. Jahanbakhsh-Nagadeh, M.-R. Feizi-Derakhshi and A. Sharifi, "A Semi-Supervised Model for Persian Rumor Verification based on Content Information," Multimed Tools Applications 80, p. 35267–35295, 2021. |
_||_