MSDSA: Imbalanced Data Sentiment Analysis using Manifold Smoothness Satisfied Data
Subject Areas : Journal of Computer & Robotics
Shima Rashidi
1
,
Jarar Tanha
2
*
,
Arash Sharifi
3
,
Mehdi HoseinZadeh
4
1 - Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran.
2 - Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
3 - Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
4 - Pattern Recognition and Machine Learning Lab, Gachon University, Seongnam, Republic of Korea.
Keywords: Twitter Sentiment Analysis, Manifold Smoothness, SMOTE, XGBoost, BERT,
Abstract :
This paper proposes a new approach to imbalanced sentiment analysis. The main goal of sentiment analysis is to understand the attitudes and preferences of the user reviews. Recently, this research area has received more attention. In this paper, we focus on imbalanced data in sentiment analysis. The proposed method has three steps. First, we learn a discriminative representation of text tweets. To do so, we fine-tune the BERT model in a supervised manner using a proposed loss function based on manifold smoothness. In this case, the goal is to find a new representation in which each sample's local neighbors belong to the same class label. Second, using the new representation, the over-sampling of the minority class has been done. To do this, we have modified the SMOTE algorithm so that only samples that satisfy the manifold smoothness should be added to the generated sample set. Third, combining the original and over-sampled data, we learn the XGBoost algorithm as a final task predictor. To evaluate the proposed model, we have applied it to the SemEval-2017 Task4 dataset. We have done considerable experiments to show the effectiveness of the proposed method. The obtained results show the strength of the proposed approach.