Analysis of Users’ Opinions about Reasons for Divorce
Subject Areas : Software Engineering and Information SystemsFatemeh Eghrari Solout 1 * , Mehdi Hosseinzadeh 2
1 - Department of Computer Engineering, Islamic Azad University, Science and Research Branch, Tehran, Iran.
2 - Department of Computer Engineering, Islamic Azad University, Science and Research Branch, Tehran, Iran
Keywords: Divorce, Keywords: social networks, users' comments, comment mining, Content analysis,
Abstract :
One of the most important issues related to knowledge discovery is the field of comment mining. Opinion mining is a tool through which the opinions of people who comment about a specific issue can be evaluated in order to achieve some interesting results. This is a subset of data mining. Opinion mining can be improved using the data mining algorithms. One of the important parts of opinion mining is the sentiment analysis in social networks. Today, the social networks contain billions of users' comments about different issues. In previous researches in this area, various methods have been used for Persian comments analysis. In these studies, preprocessing is one of the most important parts. It arranges the data set for analysis in a standard form. The number of hashtags selected for analysis is limited. To detect the positive and negative comments, knowledge extraction or neural network techniques have been used. The current research presents a method of analysis which can analyze any hashtag for each group of users and has no limitations in this regard. Type of hashtag, the number of likes, type of user and type of positive and negative sentences can be analyzed by this method. The results of simulation and comparison of divorce data set show that the proposed method has an acceptable performance.
1. Vijay.B.Raut et al, 2014. “Survey on Opinion Mining and Summarization of User Reviews on Web”, International Journal of Computer Science and Information Technologies (IJCSIT), Vol 5(2). 1026-1030.
2. G.Angulakshmi, 2014. An Analysis on Opinion Mining: Techniques and Tools, International Journal of Advanced Research in Computer and Communication Engineering Vol. 3, Issue 7,
3. Raisa Varghese, Jayasree, 2013. “A Survey on Sentiment Analysis and Opinion Mining”, International Journal of Research in Engineering and Technology (IJRET), Vol 2 Issue 11 Nov
4. Martin Mikula and KristínaMachová, 2015.” Classification of opinions in conversational content”, IEEE 13th International Symposium on Applied Machine Intelligence and Informatics • January 22-24,
5. Sagar Bhuta and Uchit Doshi, 2014. "A review of techniques for sentiment analysis of twitter data", Issues and Challenges in Intelligent Computing Techniques (ICICT), International Conference on, pp. 583-591
6. Xiuzhen Zhang, Zhixin Zhou, Mingfang Wu, 2014. “Positive, Negative, or Mixed? Mining Blogs for Opinions”, Melbourne Australia.
7. Jianxin Li, 2014. Opinion Mining and Sentiment Analysis in Social Networks: A Retweeting Structure-Aware Approach, Utility and Cloud Computing (UCC), IEEE/ACM 7th International Conference on
8. Peiman Barnaghi, 2016. Opinion Mining and Sentiment Polarity on Twitter and Correlation between Events and Sentiment, Big Data Computing Service and Applications (Big Data Service), IEEE Second International Conference on
9. R. Feldman, 2013. "Techniques and applications for sentiment analysis", Communications of the ACM, vol. 56, pp. 82-89.
10. R. Piryani, (2015) Analytical mapping of opinion mining and sentiment analysis research during 20, Information Processing and Management, Elsevier Ltd. All rights reserved
11. Wang, H., & Wang, W. (2014). Product weakness finder: An opinion-aware system through sentiment analysis. Industrial Management & Data Systems, 114 (8), 1301–1320
12. Weichselbraun, A., Gindl, S., & Scharl, A. (2014). Enriching semantic knowledge bases for opinion mining in big data applications. Knowledge-Based Systems, 69 , 78–85
13. P. D. Turney, (2002) "Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews," in Proceedings of the 40th annual meeting on association for computational linguistics, , pp. 417- 424.
14. Khatibi, T., Sepehri, V. & Hamidpour, B. (2008). Efficacy of chi square method to select the parameters in the Persian text comment mining, Presented at the National Conference on Electrical Engineering, Computer and Information Technology, Hamedan, Sama educational and cultural center.
15. www.rcisp.com
16. Peikari, N. & Yaghoubi, S. (2015). The sentiment analysis in Twitter social network using the text mining techniques, International Conference on Web Researching.
17. Banitalebi, A. (2016). Opinion mining of social networks with the use of machine learning algorithms, International Conference on Computer Engineering and Information Technology.
The Analysis of Users’ Opinions about Reasons for Divorce
Abstract
By the dramatic expansion of the internet, social networks and increasing use of them we see a massive volume of users' emotional comments regarding different subject maters. Studying and analyzing a massive volume of comments have faced a lot of problems and using new scientific techniques is an unavoidable necessity. Comments diffused over social networks can be considered an important source of information in the decision-making process. Opinion mining as a relative new scientific area in natural language processing, attempts to extract knowledge from comments text. The aim is to study a massive volume of emotional comments regarding an entity (a phenomenon, a product, and etc.,) by the use of a model, and to present a brief summary of the expressed sentiment to the user. To achieve this goal, statistical techniques, data mining and natural language processing are used. In the current study comments from users regarding reasons for divorce were investigated to identify what are the reasons for divorce in users’ opinion.
Key words: Social network, Content analysts, comment mining, Divorce, Users’ comments
1. Introduction
Using social media has created a lot of opportunities for people to express their opinions, but serious problems occur once they are going to express their beliefs. Belief mining is a kind of natural language processing which can track people’s mood regarding a particular product through reviewing. In the current study a method is presented to investigate and identify the reasons for divorce in users’ opinion using users’ comments about divorce; and to determine how much a factor affects divorce.
Belief mining is natural language and textual analysis processing area whose aim is to discover and extract mental quantities from textual sources. Generally, belief mining tasks can be referred to as sentiment analysis and its aim is to confirm the polarity of the distinct source text (e.g. differentiation amongst negative, neutral and positive beliefs). The second mission includes identifying the degree of objectivity and subjectivity of a text (i.e., real data vs. opinions). This task is called sometimes “belief extraction”. The aim of the third mission is to discover or summarize outspoken opinions regarding the elected capabilities of the evaluated products. Some authors refer to this task as sentiment analysis. All three classes of belief mining can greatly benefit from additional data which are providable from social network.
In the current research a method has been presented which investigates users’ opinions on divorce. The proposed method consists of several stages and uses tools such as hash tag, labeling, Likes, etc. In this method users are divided in different categories, and the reasons for divorce are determined based on users’ opinions. The remainder of the paper is organized as follows: main concepts of opinion mining, previous works, proposed idea and conclusion.
2. Opinion mining
Opinion mining or sentiment analysis are processes which receive, extract and categorize comments, emotions and feelings regarding a subject matter; the subject matter can be anything namely a product, a person and etc [1].
Opinions can be expressed directly or comparatively regarding a subject matter. In a direct opinion on a subject matter, beliefs are expressed, while in comparative opinions, things are compared. In addition, feelings expression can be categorized based on the sentence level, the positive and negative features and etc [2].
Feelings are the result of a person relationship with the environment and include complex collections and coordinated components for responsiveness. These responsivenesses may include physical settings, emotional and/or practical statements. In his interaction with the environment, a person faces a series of problems in his life. Sentiments have different intensities. When personal experiences of the environment are being formed these sentiments can be positive or negative [3].
2-1 Main phases of social networks opinion mining
Opinion mining is a process which includes many of technological areas, but generally two main phases can be considered in the opinion mining process. The first phase is the pre-processing of documents. This phase output can be of two different forms [4].
· Based on the document
· Based on the concept
What is important in the document based display format is the better way of displaying documents, for instance, converting documents into an intermediate and semi-structured format, or indexing documents, or any other display which makes it more efficient to work with documents. Any entity in this display will be finally again a document. In the second type, displaying documents is improved; concepts and meanings available in the document and the relationship among them and any other conceptual information which is extractable will be extracted from opinions [5].
In this type of displaying we do not encounter documents as an entity any more, but we encounter concepts extracted from these documents. The next step is to extract knowledge from these intermediate forms of displaying documents. Knowledge extraction from a document method differs based on the way of displaying a document. Displaying based on document is used for grouping, classifying, visualizing and so on [6].
2-2 Ranking opinions
Opinions should be classified in different classes such as sport, artistic, emotional, political, and etc., and when a new opinion is recorded its relative class should be determined. Opinions are ranked based on the available ranking methods. After ranking the given class should be tested in terms of efficiency. The recorded opinion will be labeled and these tags will be compared to real tags of the given class. The ratio of documents ranked correctly to the total number of documents is called accuracy, and this measure is used to compare rankers [7].
· Precision: a fraction of retrieved documents that are relative to the query
· Recall: the fraction of retrieved relative documents.
2-3 Comments categorizing
Comment categorizing means labeling each comment with a concept and putting similar comments in a similar category. To put it differently, each comment is assigned to a specific category based on having the words of similar semantic units implicit in it. One of the important categories is the emotional comments category in social network [8].
The measures used in opinion mining are as follows:
· Orientation: an opinion the author expresses regarding a certain subject matter which can be negative, positive or neutral.
· Aggregation: the relationship between two mentioned elements in terms of how similar they are.
· Subjectivity: it determines if the represented opinion is subjective or objective.
· Length: length of a comment (Short, long, medium, etc,)
2-4 Categorizing based on polarity
It determines wheatear a sentence labeled as subjective is positive, negative or neutral. Categorizing at sentence level is different from categorizing at document level because the number of words in a sentence is less than the number of words in a document. Recent research studies show that the efficiency of polarity categorization in two features can be increased at sentence level using learning algorithms: information of polarity and linguistics features.
Linguistics features include speech, words combination depth, statements type, and domain. Not only supervised learning methods but also semi- supervised techniques can be implemented for symmetrical categorization [9].
3. Previous works
Considering level, opinion mining is divided into two general categories of document or sentence level and also feature based opinion mining. In the first category (document or sentence) a comment is completely evaluated as an opinion and positive or negative emotion is assigned to it. In the final report of this category, the amount of users’ final satisfaction toward an entity will be expressed. In the second category some specific features of an entity will be considered. For example, if comments are about a mobile phone, the battery, screen, and body are amongst this mobile phone features. In this category final report, the amount of users’ satisfaction toward each feature is expressed. In the current study a strategy for opinion mining at the sentence level is presented.
3-1 dictionary-based approach
The first work explained dictionary-based opinion mining was proposed by Turney. At the first step of this approach comments were labeled using words part of speech (POS) tagger. In addition, 4 special patterns have been considered as candid pattern to express opinions. At the second step, using PMI (point wise mutual information) measure, the degree of dependence between two terms with the same pattern is investigated. This measure is expressed as follows:
In the above formula, Pr(term1 ^ term2) shows the simultaneous occurrence probability of two terms and Pr(term2) Pr(term1) shows the occurrence probability of these two terms when they are statistically independent. Therefore, this phrase orientation is extracted based on the following formula:
SO (phrase) = PMI (phrase, ”excellent”) - PMI (phrase, ”poor”)
At the third step, the average computed SO score of all phrases determines the positive or negative orientation of the document.
In [11] attempts have been made to discover domain independent syntactic features of opinions via labeling words. To do this, syntactic tags such as noun, adjective, and verb have been taken into account. The main idea of this approach is based on the issue that some of the features used by POS tagger are domain independent and some are domain dependent. Dictionary based approaches focus on syntactic patterns or the words themselves. The main focus of the mentioned approaches so far, was on syntactic selection, but in [11], a series of words and phrases together with their syntactic orientation have been used. In addition, in this approach, in order to determine the final orientation of words, phrases have been used with a collection of semantic intensifiers such as “very” and negative such as “not”. In the proposed approach in this article labeling has been employed which is performed two times.
Among other works in which syntactic tagger has been used as the base of the work, we can refer to [12]. In all of these works, fixed patterns have been considered to discover opinion. Syntactic patterns of words are shown in table 1; they are representative of an opinion expression in a text e.g. the first row of this table says if the first word is an adjective and the next word is a noun, probably an opinion expression exist in these two words. Table 2 defines signs used by syntactic tagger [13].
Table 1. Words pattern
Third Word | Second Word | First Word | S.NO |
Anything | NN or NNS | JJ | 1 |
not NN n or NNS | JJ | RB,RBR, or RBS | 2 |
not NN n or NNS | JJ | JJ | 3 |
not NN n or NNS | JJ | NN or NNS | 4 |
Anything | VB,VBD,VBN, or VBG | RB,RBR, or RBS | 5 |
Table 2. The used signs
Description | Tag | S.NO |
Noun, proper, singular | NNP | 1 |
Noun, common , plural | NNPS | 2 |
Adverb | RB | 3 |
Adjective or numeral, ordinal | JJ | 4 |
Adjective ,superlative | JJR | 5 |
Noun, common, singular | NN | 6 |
Adverb, comparative | RBR | 7 |
Verb, base, form | VB | 8 |
Verb, present participle, or Gerund | VBD | 9 |
WH-determiner | WDT | 10 |
Conjunction, coordinating | CC | 11 |
Numeral ,cardinal | CD | 12 |
Determiner | DT | 13 |
In the proposed approach of the article a large data set has been used for labeling which includes 10 million words and tags.
3-2 Works performed in Persian language
Amongst the large volume of articles published in the last decade in opinion mining domain, few of works are related to non-English languages, and limited works are also in Persian language. In order to select the proper features for categorizing algorithm, Chi square method has been used. Moreover, the categorizing algorithm used in this work is decision tree algorithm.
In the final report represented, it is expressed that using Chi square method for selecting features does not lead to the improvement of opinion mining in Persian language. After that, unsupervised LDA based approach called LDASA has been introduced. In order to create linguistic sources and sensitive phrases bank, translation of English vocabulary bank has been used. Finally, the work has been implemented on three data set (mobile phones, hotels and digital cameras) [14].
4. Proposed method
The proposed method to analyze opinions regarding divorce consists of some stages which are explained in the following.
4-1 Pre-processing
At this stage, first the data set of users’ comments are filtered and some of the cases are deleted:
· Posts which are not in Persian
· Posts in which inappropriate and unethical words had been used
· Posts including no opinion
· Posts irrelative to divorce
In the following the main hash tags for content analysis are separated as tabulated below.
Common key words and phrases for divorce content analysis have been created based on table 3. In fact, hash tags have been grouped and each group includes similar hash tags.
Table 3. Hash tags grouping
Hash tags | phrases |
Divorce | I got divorced, I divorced, separation, divorce |
court | # Justice Department # Enghelab Court # Family Court # Court of Appeals # Public Prosecutor's Office # Tribunal #Property # Legal |
Consultation | # Consultation # Lawyer # Consultation # Law # Legal counsel # Uncontested Divorce # Divorce # Proxy # Free legal advice # Family legal advice # institution # Legal Institution # attorney-at-law # Family attorney # Criminal lawyer # Legal Lawyer # Tehran Lawyer # Legal file |
Lawyer | # Lawyer, Proxy, legitimacy, civil rights, Family attorney, attorney at law, Free Lawyer |
Betrayal | Betrayal, traitor, disloyal, nerveless, |
Separation | Isolation, left alone, living alone |
In analyzing opinions on divorce, those which has used the above hash tags will be separated.
2-4 Users categorizing
Users who have used the above hash tags will be separated, and these users will be divided into lawyers, consultants and people involved in divorce categories. Each category has measures and users are categorized based on these measures.
3-4 Posts categorizing
Posts related to hash tags and posts associated to each group of users have been separated, and each group’s information can be seen in figure 1.
Figure 1. Content analysis based on user type
Figure 2. the number of filtered users’ follows
Figure 3. Hash tags content analysis based on users
4-4 Labeling stage
The reasons for divorce were analyzed and labeled in the current research based on users’ opinion. Presented tags in table 4 were used for labeling. These tags were applied to comments left by determined users.
Table 4. Divorce tags
Divorce tag name |
Financial problem |
Bad tempered |
Unemployment |
Sexual problem |
Betrayal |
Capricious |
Addiction |
Apathetic |
After labeling opinions, they were categorized based on the above tags i.e., each tag includes a set of opinions, and each opinion can be one of the reasons for divorce. In the following it should be investigated wheatear opinions are positive or negative. Positive sentence is a sentence which has been confirmed in user’s perspective, and the sentence confirms a subject matter. But a negative sentence rejects a state. The result of categorizing labeled opinions can be seen in table 5.
Table 5. The number of opinions for each tag
Divorce tag name | Number of opinions |
Financial problem | 3246 |
Bad tempered | 2578 |
Unemployment | 2045 |
Sexual problem | 2162 |
Betrayal | 3854 |
Capricious | 2908 |
Addiction | 2875 |
Apathetic | 1366 |
5. Data analysis
In order to detect the positive and negative sentences, the opinions were labeled again. The tags collection of Research Center of Intelligent Processing has been used for labeling which includes 10 million words and different tags. Labeling was performed based on positive or negative tags [15].
After labeling, which can be positive or negative, the opinions were counted to determine the percentage of positive and negative sentences exist for each divorce reason. Moreover, considering the initial data set it should be investigated that how much of positive and negative comments has gotten likes. The result of this investigation shows the reasons for divorce in users’ opinion which can be seen in table 6.
Table 6. The number of likes for positive and negative sentences of each tag
Divorce tag name | Opinions No | Positive opinion No | Positive like No | Negative opinion No | Negative like No |
Financial problem | 3246 | 2041 | 9532 | 1753 | 7245 |
Bad tempered | 2578 | 1057 | 7965 | 982 | 6421 |
Unemployment | 2045 | 1164 | 8695 | 851 | 6308 |
Sexual problem | 2162 | 1425 | 9337 | 635 | 7561 |
Betrayal | 3854 | 2745 | 10267 | 843 | 8435 |
Capricious | 2908 | 2078 | 9397 | 742 | 6718 |
Addiction | 2875 | 1578 | 5463 | 953 | 6512 |
Apathetic | 1366 | 523 | 6452 | 451 | 5178 |
The results of the reasons for divorce can be seen in figure 4.
Figure 4. Reasons for divorce in users’ opinion
6. Conclusion
User’s opinion on divorce reasons were investigated in this article. A multi stage method was used for content analysis which works based on hash tag, users, positive and negative sentences, and the number of likes; and the result of applying this method to the sum of opinions show different reasons for divorce. In addition, the number of likes presented by other users for each reason shows a notable result for the given reason.
7. References
1) G. Qiu, B. Liu, J. Bu, and C. Chen, "Opinion word expansion and target extraction through double propagation," Comput. Linguist., vol. 37, no. 1, pp. 9-27, Mar. 2011
2) Liu, Bing. “Sentiment Analysis and Opinion Mining” in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010,
3) B. Liu, "Sentiment analysis and opinion mining," Synthesis Lectures on Human Language Technologies, pp. 1-167, 2012
4) G. Qiu, B. Liu, J. Bu, and C. Chen, "Opinion word expansion and target extraction through double propagation," Comput. Linguist., vol. 37, no. 1, pp. 9-27, Mar. 2011
5) H. Wang, D. Can, A. Kazemzadeh, F. Bar, and S. Narayanan, "A system for real-time twitter sentiment analysis of 2012 u. s. presidential election cycle," in Proceedings of the ACL 2012 System Demonstrations. Association for Computational Linguistics, 2012, pp. 115-120
6) Ding, X. & Liu, B., 'The Utility of Linguistic Rules in Opinion Mining', in Proceedings of the SIGIR 2007, SIGIR, 2007, Amsterdam, The Netherlands.
7) Utz, S., & Beukeboom, C. JThe role of Social Network Sites in Romantic Relationships: Effects on jealousy and relationship happiness. Journal ofComputer-Mediated Communication, 16(4), 511–527. . (2011).
8) Kunpeng Zhang, Yu Cheng, Yusheng Xie, Daniel Honbo, Ankit Agrawal, Diana Palsetia , Kathy Lee , Wei-keng Liao , Alok Choudhary, SES: Sentiment Elicitation System for Social Media Data, Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops, p.129-136, December 11-11, 2011.
9) R. Xia and C. Zong, "A POS-based Ensemble Model for Cross-domain Sentiment Classification," in IJCNLP, 2011, pp. 614-622.
10) AGARWAL, XIE, B., VOVSHA, I., RAMBOW, O., AND PASSONNEAU, R. Sentiment analysis of twitter data. In Proceedings of the ACL 2011 Workshop on Languages in Social Media (2011).
11) M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, "Lexicon-based methods for sentiment analysis," Computational linguistics, vol. 37, pp. 267-307, 2011.
12) K. Jain and Y. Pandey, "Analysis and Implementation of Sentiment Classification Using Lexical POS Markers," International Journal, vol. 2, 2013
13) P. D. Turney, "Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews," in Proceedings of the 40th annual meeting on association for computational linguistics, 2002, pp. 417- 424.
14) T. Khatibi, M. Sepehri, & B. Hamid Pour, “Investigating the effect of Chi square method for selecting features in opinion mining of Persian text”, presented in the Second National Conference on Electrical Engineering, Computer and Information Technology, Hamedan, Sama Educational and Cultural Center, Hamedan 2008,
15) www.rcisp.com